MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

SQL Handling Nulls

without comments

Interesting questions always come via my students. For example, “Why does the selective aggregation sample return null values as totals from the SUM() function in MySQL?”

First, here’s the code to build the sample table for the problem:

DROP TABLE IF EXISTS transaction;
CREATE TABLE transaction
( transaction_id      int unsigned primary key auto_increment
, transaction_date    date
, transaction_amount  double );
 
INSERT INTO transaction
( transaction_date, transaction_amount )
VALUES
 ('2021-01-10', 56)
,('2021-02-14',23.02)
,('2021-03-31',31.06)
,('2021-01-01',.25)
,('2020-01-02', 52)
,('2020-02-08',22.02)
,('2020-03-26',32.06)
,('2020-01-12',.75);;

Now, here’s the selective aggregation query:

SELECT   EXTRACT(YEAR FROM transaction_date) AS "Year"
,        SUM(
           CASE
             WHEN EXTRACT(MONTH FROM transaction_date) = 1 THEN transaction_amount
            END) AS "Jan"
,        SUM(
           CASE
             WHEN EXTRACT(MONTH FROM transaction_date) = 2 THEN transaction_amount
           END) AS "Feb"
,        SUM(
           CASE
             WHEN EXTRACT(MONTH FROM transaction_date) = 3 THEN transaction_amount
           END) AS "Mar"
,        SUM(
           CASE
             WHEN EXTRACT(MONTH FROM transaction_date) IN (1,2,3) THEN transaction_amount
           END) AS "1FQ"
FROM     transaction
WHERE    transaction_date BETWEEN '2020-01-15' AND '2021-03-15'
GROUP BY EXTRACT(YEAR FROM transaction_date)
ORDER BY 1;

It produces the following correct results (at least in response to the query above):

+------+-------+-------+-------+-------+
| Year | Jan   | Feb   | Mar   | 1FQ   |
+------+-------+-------+-------+-------+
| 2020 |  NULL | 22.02 | 32.06 | 54.08 |
| 2021 | 56.25 | 23.02 |  NULL | 79.27 |
+------+-------+-------+-------+-------+
2 rows in set (0.02 sec)

Why do you get null values for January 2020 and March 2021? That’s because the query returns null values when the conditions in the SELECT-list aren’t met for a row return. This happens:

  • When a row is returned for the month of February or March a null is returned in the January column.
  • When a row is returned for the month of January or March a null is returned in the February column.
  • When a row is returned for the month of January or February a null is returned in the March column.

That means there needs to be an ELSE clause in each of the CASE statements that sets the return value to zero. For example, the following query includes the ELSE clause and some nice formatting tricks:

SELECT   EXTRACT(YEAR FROM transaction_date) AS "Year"
,        CONCAT('$',LPAD(FORMAT(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM transaction_date) = 1 THEN transaction_amount
             ELSE 0
            END),2),8,' ')) AS "Jan"
,        LPAD(FORMAT(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM transaction_date) = 2 THEN transaction_amount
             ELSE 0
           END),2),8,' ') AS "Feb"
,        LPAD(FORMAT(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM transaction_date) = 3 THEN transaction_amount
             ELSE 0
           END),2),8,' ') AS "Mar"
,        LPAD(FORMAT(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM transaction_date) IN (1,2,3) THEN transaction_amount
             ELSE 0
           END),2),8,' ') AS "1FQ"
FROM     transaction
WHERE    transaction_date BETWEEN '2020-01-15' AND '2021-03-15'
GROUP BY EXTRACT(YEAR FROM transaction_date)
ORDER BY 1;

It returns:

+------+-----------+----------+----------+----------+
| Year | Jan       | Feb      | Mar      | 1FQ      |
+------+-----------+----------+----------+----------+
| 2020 | $    0.00 |    22.02 |    32.06 |    54.08 |
| 2021 | $   56.25 |    23.02 |     0.00 |    79.27 |
+------+-----------+----------+----------+----------+
2 rows in set (0.01 sec)

As always, I hope this helps answer a question that somebody is trying to sort out.

Written by maclochlainn

June 23rd, 2021 at 12:23 pm