MySQL Standard Group By
Teaching SQL is interesting because folks try syntax that experience would tell you shouldn’t work. It was interesting when I discovered what should be broken from my perspective but was expected behavior in MySQL. It became clearer to me why it’s there as I did some experimenting with it enabled and disabled. While I’d still argue it’s broken, it’s the only way to get support for advanced aggregation concepts.
The reason that I find that it broken is a matter of perspective not standards. The fact that you can select a set of non-aggregated columns with an aggregated column, and exclude one, more than one, or all of the non-aggregated columns from the GROUP BY
clause seemed like a broken behavior. The MySQL behavior is explained in Chapter 11.12.3 of the MySQL 5.1 Reference. What wasn’t clear from the documentation when I read it was why the behavior is allowed. Gary’s comment on the original post made me look more deeply into the behavior.
If you check the referenced documentation, the process is called GROUP BY
hidden columns. That label didn’t immediately resonate with me. That’s probably of my background with IBM’s DB2 and Oracle. I’m so accustom to how they work, that sometimes I think they set the standards but they don’t. It turns out that the GROUP BY
clause in the ANSI SQL standards doesn’t allow for expressions.
The default configuration of MySQL allows for the expressions (functions) in the GROUP BY
clause. The documentation refers to hidden columns, which are in point of fact expressions in the GROUP BY
clause. MySQL does support expressions in the GROUP BY
by default, and requires, like Oracle, that when you use an expression in the SELECT
clause that you mirror it in the GROUP BY
clause. This means you group on the result of the expression, not a column in the table or view.
In the default configuration, you have three options. They are determinate results, indeterminate results, and hidden column results. The following cover these components.
Determinate Results
Determinate results are straight forward. They require that all non-aggregated columns in the SELECT
clause are mirrored in the GROUP BY
clause. This means that the non-aggregated column values are the key upon which results are aggregated.
An example of determinate results is:
1 2 3 4 5 6 | SELECT key_one , key_two , SUM(counter) FROM GROUPING GROUP BY key_one , key_two; |
Indeterminate Results
Indeterminate results isn’t straight forward but isn’t hard to grasp. An indeterminate result set is returned when one or more non-aggregated columns in a SELECT
clause aren’t listed in the GROUP BY
clause. The columns listed in the SELECT
clause but excluded from the GROUP BY
clause return meaningless values because they’re column values chosen indeterminately from all pre-aggregated rows.
The following query runs in a generically configured MySQL instance without an error. It returns a meaningless key_two
column value from the pre-aggregated row set. In the example, the aggregation column counts the unique key_one
column values. This behavior makes key_one
a determinate value, and key_two
an indeterminate value.
1 2 3 4 5 | SELECT key_one , key_two , SUM(counter) FROM GROUPING GROUP BY key_one; |
You can fix this mixed return set by adding the key_two
column to the GROUP BY
clause, which would return a determinate set. Alternatively, you can prevent the default behavior for the GROUP BY
clause by adding the ONLY_FULL_GROUP_BY
mode variable to your SQL_MODE
system variable.
After setting the SQL_MODE>
variable, a GROUP BY
must contain all non-aggregated columns. When you make this change to the SQL_MODE
system variable, you also disable any queries that use an expression in their GROUP BY
clause. Those queries with expressions in the group by will now raise an ERROR 1055
exception, like this:
ERROR 1055 (42000): 'sampledb.grouping.key_two' isn't in GROUP BY |
If you want to prevent indeterminate results and don’t use expressions in the GROUP BY
clause, you can add the ONLY_FULL_GROUP_BY
mode to your SQL_MODE
system variable. You can do that during a session with the following syntax:
SET SQL_MODE=(SELECT CONCAT(@@sql_mode,',ONLY_FULL_GROUP_BY')); |
Or, you can start the mysqld
with the following option:
mysqld --sql_mode="sql_mode1,sql_mode2, ... ,sql_mode(n+1)" |
A better alternative, is to add it to the my.cnf
configuration file on Linux, or my.ini
configuration file on Windows. You can add it to this line, which is done at the end of the line.
# Set the SQL mode to strict sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,ONLY_FULL_GROUP_BY" |
If you opt for changing it at the server configuration file, you must stop and restart the mysqld
process. You can do that on Windows from the command line, like this on Windows provide the service name is mysql
. If you’ve set the Windows service to mysql51
, then you need to substitute mysql51
for mysql
.
Hidden Column Results
A hidden column result, is a result generated by an expression in the GROUP BY
clause. The following is a query that lets you sum transactions by the month name. This is supported by the default behavior of hidden columns. The following is an approach that you might find in Oracle because they don’t support aggregation by only part of the non-aggregated columns in a query.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | SELECT il.month AS "MON-YEAR" , il.base AS "BASE" , il.plus10 AS "10_PLUS" , il.plus20 AS "20_PLUS" FROM (SELECT CONCAT(UPPER(SUBSTRING(MONTHNAME(t.transaction_date),1,3)),'-',EXTRACT(YEAR FROM t.transaction_date)) AS MONTH , MONTH(t.transaction_date) AS sortkey , LPAD(CONCAT('$',FORMAT(SUM(t.transaction_amount) * 1.0,2)),10,' ') AS base , LPAD(CONCAT('$',FORMAT(SUM(t.transaction_amount) * 1.1,2)),10,' ') AS plus10 , LPAD(CONCAT('$',FORMAT(SUM(t.transaction_amount) * 1.2,2)),10,' ') AS plus20 FROM TRANSACTION t WHERE EXTRACT(YEAR FROM t.transaction_date) = 2009 GROUP BY CONCAT(UPPER(SUBSTRING(MONTHNAME(t.transaction_date),1,3)),'-',EXTRACT(YEAR FROM t.transaction_date)) , MONTH(t.transaction_date)) il ORDER BY il.sortkey; |
The interesting thing about MySQL’s approach is that the Oracle equivalent of an inline view is unnecessary. You can simply switch the functions like the exmaple below.
1 2 3 4 5 6 7 8 9 10 | SELECT DATE_FORMAT(t.transaction_date,'%m-%Y') AS MONTH , MONTH(t.transaction_date) AS sortkey , LPAD(CONCAT('$',FORMAT(SUM(t.transaction_amount) * 1.0,2)),10,' ') AS base , LPAD(CONCAT('$',FORMAT(SUM(t.transaction_amount) * 1.1,2)),10,' ') AS plus10 , LPAD(CONCAT('$',FORMAT(SUM(t.transaction_amount) * 1.2,2)),10,' ') AS plus20 FROM TRANSACTION t WHERE EXTRACT(YEAR FROM t.transaction_date) = 2009 GROUP BY DATE_FORMAT(t.transaction_date,'%m-%Y') , MONTH(t.transaction_date) ORDER BY sortkey; |
When you add the ONLY_FULL_GROUP_BY
mode to your SQL_MODE
system variable, you disallow this type of behavior and the query would no longer work. It would return the following error:
ERROR 1055 (42000): 'sampledb.t.transaction_date' isn't in GROUP BY |
Complete Code Sample ↓
Expand this section to see the sample working code for indeterminate results.
This builds the GROUPING
table, inserts nine rows, and tests it before and after setting the variable in the session. The change reverts after you exit and re-enter the database.
-- Conditionally drop sample table. SELECT 'DROP TABLE IF EXISTS grouping' AS "Statement"; DROP TABLE IF EXISTS GROUPING; -- Create sample table. SELECT 'CREATE TABLE grouping' AS "Statement"; CREATE TABLE GROUPING ( key_one CHAR(5) , key_two CHAR(5) , counter INT UNSIGNED ); -- Insert nine rows into the sample table. SELECT 'INSERT INTO grouping' AS "Statement" INSERT INTO GROUPING VALUES ('One','Uno-1',1),('Two','Due-1',2),('Three','Tre-1',3) ('One','Uno-2',1),('Two','Due-2',2),('Three','Tre-2',3) ('One','Uno-3',1),('Two','Due-3',2),('Three','Tre-3',3); -- Query with hidden columns, no enforcement on non-aggregate columns. SELECT 'SELECT non-aggregates FROM grouping with hidden columns' AS "Statement"; SELECT key_one, key_two, SUM(counter) FROM GROUPING GROUP BY key_one; -- Check current system mode variable assignment. SELECT 'SELECT @@sql_mode' AS "Statement"; SELECT @@sql_mode; -- Append the mode to the existing system mode variable. SELECT 'SET SQL_MODE=(SELECT ...)' AS "Statement"; SET SQL_MODE=(SELECT CONCAT(@@sql_mode,',ONLY_FULL_GROUP_BY')); -- Check modified system mode variable assignment. SELECT 'SELECT @@sql_mode' AS "Statement"; SELECT @@sql_mode; |