Engineering for Sub-100ms Latency: Indexing and Query Optimization

Published on 2026-03-10 11:18 by Frugle Me (Last updated: 2026-03-10 11:18)

Share:

Engineering for Sub-100ms Latency: Indexing and Query Optimization

When a system requires sub-100ms response times, every millisecond spent on disk I/O, network hops, or CPU cycles counts. To hit these targets consistently, you must move away from generic "out-of-the-box" configurations and move toward data-locality and memory-first strategies.

1. Advanced Indexing Strategies

Standard B-Tree indexes are often insufficient for ultra-low latency. You need to reduce the number of lookups and disk seeks.

Covering Indexes

Ensure your index contains all the data required by the SELECT clause.
* The Benefit: The database engine retrieves data directly from the index (stored in RAM or sequential disk blocks) and never touches the actual table (the "heap").
* Example: If querying SELECT email FROM users WHERE user_id = 123, an index on (user_id, email) is a covering index.

Composite Index Ordering

When indexing multiple columns, the order is critical. Use the Selectivity Principle:
1. Equality columns first: Columns used with = in the WHERE clause.
2. Sort columns second: Columns used in ORDER BY.
3. Range columns last: Columns used with >, <, or BETWEEN.

Specialized Index Types

  • Hash Indexes: If you only perform exact matches (e.g., looking up a session ID), Hash indexes offer O(1) complexity, outperforming the O(log n) of B-Trees.
  • Partial Indexes: Only index a subset of the data (e.g., WHERE status = 'active'). This keeps the index small enough to fit entirely in the CPU L3 cache or system RAM.

2. Query Optimization Patterns

How you write the query is just as important as how you index the data.

Eliminate Joins via Denormalization

Joins are the primary enemy of low latency because they require multiple index lookups and data merging.
* Strategy: Duplicate data across tables to make your "hot path" queries hit only a single table.
* Trade-off: Faster reads at the expense of slightly slower, more complex writes.

Pagination via "Seek Method"

Never use OFFSET and LIMIT for large datasets. OFFSET 10000 forces the database to scan and discard 10,000 rows.
* The Fast Way: Use keyset pagination.
* Example: WHERE id > last_seen_id LIMIT 20. This uses the index to jump directly to the starting point.

Prepared Statements

Parsing, validating, and planning a query can take 5–10ms.
* Strategy: Use prepared statements to perform the "plan" phase once. Subsequent executions only send the parameters, reducing CPU overhead on the database server.


3. Architectural Tactics

Architecture often trumps code-level optimization.

In-Memory Layer

If the data fits in memory, it should be in memory.
* Redis/Valkey: Use as a look-aside cache to store the results of expensive queries.
* In-Memory Databases: For real-time applications, move the entire dataset to a system designed for RAM-first storage.

Physical Locality (Sharding)

Large indexes become slow as they grow beyond the size of the buffer pool.
* Strategy: Shard your database by a partition key (e.g., tenant_id or geo_region). Smaller tables result in smaller, shallower index trees, leading to faster traversals.

Connection Pooling

Establishing a TLS/TCP connection can take 30ms+.
* Action: Use a connection pooler (like PgBouncer for PostgreSQL) to keep a "warm" set of connections open, reducing the initial handshake latency to near zero.


Summary Checklist for <100ms Latency

  • [ ] Is the query hitting a Covering Index?
  • [ ] Are all Joins removed from the critical path?
  • [ ] Is the "hot" dataset small enough to fit in RAM?
  • [ ] Is Connection Pooling enabled?
  • [ ] Are you using Keyset Pagination instead of Offset?

Comments (0)

Want to join the conversation?

Please log in to add a comment.