Counting the rows in a SQL Group By query
I recently needed count the number of rows in an SQL query that had a Group By clause. It looked something like this:
SELECT account_name
FROM events
WHERE created_at >= CURDATE() - INTERVAL 3 MONTH
GROUP BY account_id
This provides a list of account names (28 in my case), but if you try to count them using:
SELECT COUNT(account_name) as c
FROM events
WHERE created_at >= CURDATE() - INTERVAL 3 MONTH
GROUP BY account_id
You get a list where each row is the count of how many events there are for each account, which is not what I want.
To get the number of accounts, use a sub-select:
SELECT COUNT(account_name) as c
FROM (
SELECT account_name FROM events
WHERE created_at >= CURDATE() - INTERVAL 3 MONTH
GROUP BY account_id
) AS account_names
This give a single row which is the count of the number of rows returned by the sub-select.
As I don’t use sub-selects often, it took me a couple of goes to get this right as I initially forgot the final AS account_names which is needed to name the sub-select as a table which can be selected from.
Simple, once I got it working!
By using a sub-query you're forcing the engine to use a temporary table to hold the results that are then queried by the outer select, which may be inefficient depending on the data size. If the goal is to only know the number of distinct accounts in the matching events, I would start with the query in your middle example, drop the GROUP BY and add DISTINCT before 'account_name' in the COUNT. This should be quicker and allows the engine to optimise how it runs.