MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

4 – GROUP BY Clause

without comments

GROUP BY Clause

Learning Outcomes

  • Learn how to group non-aggregated and aggregated columns.
  • Learn how to use ordinary columns in the GROUP BY clause.
  • Learn how to use formula columns in the GROUP BY clause.

Lesson Materials

As explained in the SELECT statement page, the minimum SELECT statement has a GROUP BY clause e. The GROUP BY clause is a comma delimited set of literals, column names, calculations of literals and column values that match those non-aggregated values in the SELECT-list. The GROUP BY clause differs from the SELECT-list because the comma delimited values can’t have aliases.

SELECT-list

The SELECT-list may include non-aggregated and aggregated columns. The non-aggregated columns may be literals, calculations of literals and columns. Aggregated columns use functions like these:

  • AVG
  • COUNT
  • MAX
  • MIN
  • SUM
  • STD

Aggregating functions add the values together or apply mathematical formula across a series of rows. They typically reduce the number of rows to a smaller set of rows. The GROUP BY clause determines how the aggregating functions will reduce the number of rows.

Alternatively, you can have a SELECT-list that contains one or a comma delimited set of aggregating functions without any non-aggregating elements and it will return one row. You get one row because the aggregating functions work across the entire set of rows returned and return an aggregated result or results in the single row for all of the rows returned by the query.

SELECT-list with only aggregated functions

The most common approach to a SELECT statement that uses only one or more aggregated functions is a query that counts the number of rows in a table. This type of query doesn’t require a GROUP BY clause occurs when you want to count the number of rows in a table or result set from two or more tables. You can query the number of rows in the cereal table with this SELECT statement:

SELECT COUNT(*)
FROM   cereal;

It returns the following:

  COUNT(*)
----------
	70

SELECT-list with a GROUP BY clause

A SELECT statement that has a combination of non-aggregated and aggregated function columns requires that you add a GROUP BY clause. You should note that its possible a GROUP BY may allow a query to work while failing to reduce the number of rows returned.

A query fails to reduce the number of rows when the list of non-aggregated columns are unique. A unique set of non-aggregated columns can’t be reduced by an aggregating function because the reduction is set by the non-aggregated columns.

The following query uses a non-aggregated set of columns that are unique, which means that the aggregating function will not reduce the number of rows returned by the query. The query builds on concepts introduced in the SELECT clause web page and it uses the same cereal table:

1
2
3
4
5
6
7
8
9
COL brand          FORMAT A30  HEADING "Brand"
COL ounces         FORMAT A10  HEADING "Ounces"
COL avg_per_ounce  FORMAT A10  HEADING "Average|Cost|Ounces"
SELECT   brand
,        LPAD(TO_CHAR(ounces,'00.00'),10,' ') AS ounces
,        LPAD(TO_CHAR(AVG(suggested_retail/ounces),'$0.00'),10,' ') AS avg_per_ounce
FROM     cereal
GROUP BY brand
,        LPAD(TO_CHAR(ounces,'00.00'),10,' ');

It returns the following sixty-eight rows:

					  Average
					  Cost
Brand			       Ounces	  Ounces
------------------------------ ---------- ----------
Strawberry Toast Crunch 	    18.25      $0.27
Lucky Charm			    16.00      $0.25
Cinnamon Vanilla Lucky Charm	    11.50      $0.22
...
Blueberry Rice Chex		    12.00      $0.21
Blueberry Rice Chex		    18.00      $0.22
Raisin Brand Crunch		    38.25      $0.12
Cheerios			    12.00      $0.23

The next example removes one of the non-aggregated columns from the query, which makes the non-aggregated brand column non-unique and enables the aggregating function to reduce the number of rows:

1
2
3
4
5
6
COL brand          FORMAT A30  HEADING "Brand"
COL avg_per_ounce  FORMAT A10  HEADING "Average|Cost|Ounces"
SELECT   brand
,        LPAD(TO_CHAR(AVG(suggested_retail/ounces),'$0.00'),10,' ') AS avg_per_ounce
FROM     cereal
GROUP BY brand;

It returns the following thirty-six rows from the original sixty-eight rows:

			       Average
			       Cost
Brand			       Ounces
------------------------------ ----------
Blueberry Toast Crunch		    $0.27
Apple Jacks			    $0.16
Honey Nut Cheerios		    $0.21
...
Chocolate Frosted Flakes	    $0.20
Trix				    $0.24
Rice Chex			    $0.22
Blueberry Rice Chex		    $0.22

You have options that allow you to filter aggregated sets with the HAVING clause.

Written by maclochlainn

December 24th, 2017 at 10:27 am