Database Indexing 101: B-Trees vs. Hash Indexes

Indexing is the most critical tool for database performance. By transforming full-table scans into direct lookups, indexes ensure your application remains fast as data grows.

1. B-Tree Indexes (The All-Rounder)

B-Tree (Balanced Tree) is the default indexing method for almost all relational databases (PostgreSQL, MySQL, Oracle).

How it Works

A B-Tree stores data in a sorted, hierarchical structure.
- Balanced Structure: The distance from the root to any leaf node is always the same, ensuring predictable performance.
- Sorted Keys: Keys are stored in ascending order within nodes.
- Pointers: Internal nodes contain pointers to child nodes representing specific ranges of values.

Pros

Range Queries: Highly efficient for operators like <, >, BETWEEN, and LIKE 'prefix%'.
Sorting: Because data is naturally sorted, it helps speed up ORDER BY and GROUP BY operations.
Self-Balancing: Automatically adjusts as data is inserted or deleted, maintaining $O(\log n)$ performance.

Cons

Slightly Slower Lookups: Requires traversing multiple levels from root to leaf, which is $O(\log n)$.

2. Hash Indexes (The Specialist)

Hash indexes are specialized structures designed for extreme speed in specific scenarios.

How it Works

A Hash index uses a Hash Function to map a key directly to a specific "bucket" in a hash table.
- Direct Access: It doesn't traverse a tree; it calculates an address and jumps straight to it.
- Key-to-Bucket: Each entry is stored based on the result of the hash function.

Pros

Ultra-Fast Equality: Offers $O(1)$ constant-time lookups for exact matches (e.g., WHERE id = 500).
Compact: Often consumes less space than B-Trees because it doesn't store the hierarchical pointers.

Cons

No Range Queries: Since the hash of "1" is totally different from the hash of "2", you cannot search for a range of values.
No Sorting: Hash tables are inherently unordered.
Collisions: If two different keys produce the same hash, performance can degrade.

3. Comparison Summary

Feature	B-Tree Index	Hash Index
Best For	General purpose, Range queries	Exact match lookups
Search Speed	$O(\log n)$	$O(1)$ (Average)
Operators	`=`, `>`, `<`, `BETWEEN`, `LIKE`	Only `=` and `IN`
Storage Order	Sorted	Unordered
Default	Yes (in most RDBMS)	No

4. When to Use What?

Use B-Tree when:

You need to perform range searches (e.g., finding all users between age 20 and 30).
You want to retrieve data in a specific sorted order.
You are looking for a "safe" default that handles almost any query pattern.
You use partial matches like LIKE 'Smith%'.

Use Hash when:

You are 100% certain you will only ever use equality (=) checks on that column.
You have a massive table where a tiny speed gain in lookups is critical (e.g., looking up a Session ID or a unique UUID).
The table has very high cardinality (unique values), which minimizes hash collisions.

Rule of Thumb: Stick with B-Tree unless you have a specific, proven performance bottleneck that only a Hash index can solve.

Database Indexing 101: B-Trees vs. Hash Indexes

1. B-Tree Indexes (The All-Rounder)

How it Works

Pros

Cons

2. Hash Indexes (The Specialist)

How it Works

Pros

Cons

3. Comparison Summary

4. When to Use What?

Use B-Tree when:

Use Hash when:

Comments (0)