GROUP BY Quirk
It’s always interesting to see how others teach SQL courses. It can be revealing as to whether they understand SQL or only understand a dialect of SQL. In this case, one of my old students was taking a graduate course in SQL and the teacher was using MySQL. The teacher made an issue of using ANSI SQL:1999 or SQL3 and asked the following question, which I suspect is a quiz bank question from a textbook:
“How would you get all students’ names and for each student the number of courses that the
student has registered for?”
They referenced the MySQL 5.7 documentation for the GROUP BY and SQL:1999 as if MySQL implemented the ANSI SQL:1999 specification defined the standard. I didn’t know whether to laugh or cry because they were referring to MySQL 5.7 when we’re all using MySQL 8 and anybody who’s worked in more than MySQL knows that the behavior for a GROUP BY in MySQL can work without listing the necessary non-aggregated columns in the SELECT-list.
For example, their working solution, which is from the instructor and the author of their MySQL textbook the correct perspective of ANSI:1999 behavior. It doesn’t matter that their solution is actually based on ANSI:1992 not ANSI:1999 because it will only succeed because of a quirk of MySQL:
SELECT a.studentname , COUNT(b.courseid) FROM students a INNER JOIN registeredcourses b ON a.studentid = b.studentid GROUP BY a.studentid; |
While it works in MySQL, it doesn’t work because it conforms to an ANSI standard. It works in MySQL, notwithstanding that standard because it violates the standard.
In Oracle, PostgreSQL, and SQL Server, it raises an exception. For example, Oracle raises the following exception:
SELECT a.studentname * ERROR at line 1: ORA-00979: not a GROUP BY expression |
The correct way to write the GROUP BY is:
SELECT a.studentname , COUNT(b.courseid) FROM students a INNER JOIN registeredcourses b ON a.studentid = b.studentid INNER JOIN courses c ON b.courseid = c.courseid GROUP BY a.studentname; |
Then, it would return:
Student Name Course IDs ------------------------------ ---------- Montgomery Scott 1 Leonard McCoy 2 James Tiberus Kirk 3 |
For reference, here’s a complete test case for MySQL:
/* Drop table conditionally. */ DROP TABLE IF EXISTS students; /* Create table. */ CREATE TABLE students ( studentID int unsigned primary key auto_increment , studentName varchar(30)); /* Drop table conditionally. */ DROP TABLE IF EXISTS courses; /* Create table. */ CREATE TABLE courses ( courseid int unsigned primary key auto_increment , coursename varchar(40)); /* Drop table conditionally. */ DROP TABLE IF EXISTS registeredcourses; /* Create table. */ CREATE TABLE registeredcourses ( courseid int unsigned , studentid int unsigned ); /* Insert into students. */ INSERT INTO students ( studentName ) VALUES ('James Tiberus Kirk') ,('Leonard McCoy') ,('Montgomery Scott'); /* Insert into courses. */ INSERT INTO courses ( coursename ) VALUES ('English Literature') ,('Physics') ,('English Composition') ,('Botany') ,('Mechanical Engineering'); /* Insert into registeredcourses. */ INSERT INTO registeredcourses ( studentid , courseid ) VALUES (1,1) ,(1,3) ,(1,4) ,(2,2) ,(2,5) ,(3,4); /* Check global sql_mode to ensure only_full_group_by is set. */ SELECT @@GLOBAL.SQL_MODE; /* Query with a column not found in the SELECT-list. */ SELECT a.studentname , COUNT(b.courseid) FROM students a INNER JOIN registeredcourses b ON a.studentid = b.studentid GROUP BY a.studentid; /* Query consistent with ANSI SQL:1992 */ SELECT a.studentname , COUNT(b.courseid) FROM students a INNER JOIN registeredcourses b ON a.studentid = b.studentid INNER JOIN courses c ON b.courseid = c.courseid GROUP BY a.studentname; |
and, another complete test case for Oracle:
/* Drop tabhe unconditionallly. */ DROP TABLE students; /* Create table. */ CREATE TABLE students ( studentID NUMBER PRIMARY KEY , studentName VARCHAR(30)); /* Drop table unconditionally. */ DROP TABLE courses; /* Create table. */ CREATE TABLE courses ( courseid NUMBER PRIMARY KEY , coursename VARCHAR(40)); /* Drop table unconditionally. */ DROP TABLE registeredcourses; /* Create table. */ CREATE TABLE registeredcourses ( courseid NUMBER , studentid NUMBER ); /* Insert values in student. */ INSERT INTO students ( studentid, studentName ) VALUES (1,'James Tiberus Kirk'); INSERT INTO students ( studentid, studentName ) VALUES (2,'Leonard McCoy'); INSERT INTO students ( studentid, studentName ) VALUES (3,'Montgomery Scott'); /* Insert values in courses. */ INSERT INTO courses ( courseid, coursename ) VALUES (1,'English Literature'); INSERT INTO courses ( courseid, coursename ) VALUES (2,'Physics'); INSERT INTO courses ( courseid, coursename ) VALUES (3,'English Composition'); INSERT INTO courses ( courseid, coursename ) VALUES (4,'Botany'); INSERT INTO courses ( courseid, coursename ) VALUES (5,'Mechanical Engineering'); /* Insert values into registeredcourses. */ INSERT INTO registeredcourses ( studentid, courseid ) VALUES (1,1); INSERT INTO registeredcourses ( studentid, courseid ) VALUES (1,3); INSERT INTO registeredcourses ( studentid, courseid ) VALUES (1,4); INSERT INTO registeredcourses ( studentid, courseid ) VALUES (2,2); INSERT INTO registeredcourses ( studentid, courseid ) VALUES (2,5); INSERT INTO registeredcourses ( studentid, courseid ) VALUES (3,4); /* Non-ANSI SQL GROUP BY statement. */ SELECT a.studentname , COUNT(b.courseid) FROM students a INNER JOIN registeredcourses b ON a.studentid = b.studentid GROUP BY a.studentid; /* ANSI SQL GROUP BY statement. */ SELECT a.studentname AS "Student Name" , COUNT(b.courseid) AS "Course IDs" FROM students a INNER JOIN registeredcourses b ON a.studentid = b.studentid INNER JOIN courses c ON b.courseid = c.courseid GROUP BY a.studentname; |
I hope this helps those learning the correct way to write SQL.