SELECT DISTINCT vs. GROUP BY: Which Performs Better?

Published on 2026-04-15 11:06 by Frugle Me (Last updated: 2026-04-15 11:06)

#distinct #group by
Share:

SELECT DISTINCT vs. GROUP BY: Which Performs Better?

When you need to retrieve a list of unique values from a database column, you generally have two choices in SQL: SELECT DISTINCT or SELECT ... GROUP BY. A common question among developers is: Which one is faster?


๐Ÿš€ The Short Answer

In most modern database engines (PostgreSQL, MySQL, SQL Server, Oracle), there is NO performance difference between the two. The query optimizer usually treats them as the same logical operation. It transforms both queries into the same execution planโ€”typically using either a Hash Aggregate or a Sort/Stream Aggregate.


๐Ÿ” Deep Dive: How the Database Sees Them

1. SELECT DISTINCT

Example: SELECT DISTINCT dept FROM emp;
- Intent: Tells the database you want the unique values of the dept column.
- Execution: The database scans the table and removes duplicates as it finds them.

2. GROUP BY

Example: SELECT dept FROM emp GROUP BY dept;
- Intent: Tells the database to organize the rows into buckets based on the dept value.
- Execution: Because there are no aggregate functions (like SUM or COUNT), the database simply returns the key for each bucket.


๐Ÿ›  Execution Strategies

The performance of both queries depends entirely on how the database retrieves the data, not the syntax you choose.

Scenario A: No Index (Sequential Scan)

If the dept column is not indexed:
- The database must perform a Full Table Scan.
- It builds a Hash Table in memory.
- As it reads each row, it checks if the value exists in the hash table.
- Both queries will take roughly the same amount of time.

Scenario B: With an Index (Index Scan)

If the dept column is indexed:
- The database can perform an Index Only Scan.
- It skips the actual data table and only reads the smaller index file.
- In some databases (like MySQL or PostgreSQL), it may use a Loose Index Scan (or Skip Scan), which jumps from one unique value to the next without reading every single entry. This is extremely fast for both queries.


โš–๏ธ When to Use Which?

While performance is equal, readability and intent matter.

Use SELECT DISTINCT when:

  • You simply want to remove duplicates from a result set.
  • The query is simple and focused only on unique values.
  • It makes the code easier for other humans to read and understand.

Use GROUP BY when:

  • You are performing aggregations (e.g., COUNT, SUM, AVG).
  • You are performing complex analysis on the groups.
  • Using it without an aggregate function is often considered "non-idiomatic" SQL and can confuse other developers.

๐Ÿ“ Summary Table

Feature SELECT DISTINCT GROUP BY
Logic Remove duplicates Group rows into sets
Performance Same Same
Readability High (for unique lists) High (for aggregates)
Best Practice Use for uniqueness Use for math/grouping

โœ… Conclusion

Don't stress over the performance difference between these two specific queries. Instead, focus on Indexing. If your query is slow, adding an index to the dept column will provide a 100x speed improvement, whereas switching from DISTINCT to GROUP BY will likely provide 0x change.


๐Ÿ’ก Next Steps

To help you optimize this further, I can explain:
- How to use EXPLAIN ANALYZE to see your specific execution plan.
- Why an Index is more important than the syntax used.
- How NULL values are handled by both queries.

Would you like to see the execution plan for a specific database like PostgreSQL or MySQL?

Comments (0)

Want to join the conversation?

Please log in to add a comment.