Database Indexing 101: B-Trees vs. Hash Indexes

Published on 2026-04-09 11:17 by Frugle Me (Last updated: 2026-04-09 11:19)

#B-Trees #Hash Indexes
Share:

Indexing is the most critical tool for database performance. By transforming full-table scans into direct lookups, indexes ensure your application remains fast as data grows.

1. B-Tree Indexes (The All-Rounder)

B-Tree (Balanced Tree) is the default indexing method for almost all relational databases (PostgreSQL, MySQL, Oracle).

How it Works

A B-Tree stores data in a sorted, hierarchical structure.
- Balanced Structure: The distance from the root to any leaf node is always the same, ensuring predictable performance.
- Sorted Keys: Keys are stored in ascending order within nodes.
- Pointers: Internal nodes contain pointers to child nodes representing specific ranges of values.

Pros

  • Range Queries: Highly efficient for operators like <, >, BETWEEN, and LIKE 'prefix%'.
  • Sorting: Because data is naturally sorted, it helps speed up ORDER BY and GROUP BY operations.
  • Self-Balancing: Automatically adjusts as data is inserted or deleted, maintaining $O(\log n)$ performance.

Cons

  • Slightly Slower Lookups: Requires traversing multiple levels from root to leaf, which is $O(\log n)$.

2. Hash Indexes (The Specialist)

Hash indexes are specialized structures designed for extreme speed in specific scenarios.

How it Works

A Hash index uses a Hash Function to map a key directly to a specific "bucket" in a hash table.
- Direct Access: It doesn't traverse a tree; it calculates an address and jumps straight to it.
- Key-to-Bucket: Each entry is stored based on the result of the hash function.

Pros

  • Ultra-Fast Equality: Offers $O(1)$ constant-time lookups for exact matches (e.g., WHERE id = 500).
  • Compact: Often consumes less space than B-Trees because it doesn't store the hierarchical pointers.

Cons

  • No Range Queries: Since the hash of "1" is totally different from the hash of "2", you cannot search for a range of values.
  • No Sorting: Hash tables are inherently unordered.
  • Collisions: If two different keys produce the same hash, performance can degrade.

3. Comparison Summary

Feature B-Tree Index Hash Index
Best For General purpose, Range queries Exact match lookups
Search Speed $O(\log n)$ $O(1)$ (Average)
Operators =, >, <, BETWEEN, LIKE Only = and IN
Storage Order Sorted Unordered
Default Yes (in most RDBMS) No

4. When to Use What?

Use B-Tree when:

  • You need to perform range searches (e.g., finding all users between age 20 and 30).
  • You want to retrieve data in a specific sorted order.
  • You are looking for a "safe" default that handles almost any query pattern.
  • You use partial matches like LIKE 'Smith%'.

Use Hash when:

  • You are 100% certain you will only ever use equality (=) checks on that column.
  • You have a massive table where a tiny speed gain in lookups is critical (e.g., looking up a Session ID or a unique UUID).
  • The table has very high cardinality (unique values), which minimizes hash collisions.

Rule of Thumb: Stick with B-Tree unless you have a specific, proven performance bottleneck that only a Hash index can solve.

Comments (0)

Want to join the conversation?

Please log in to add a comment.