SQL Handling Nulls
Interesting questions always come via my students. For example, “Why does the selective aggregation sample return null values as totals from the SUM()
function in MySQL?”
First, here’s the code to build the sample table for the problem:
DROP TABLE IF EXISTS transaction; CREATE TABLE transaction ( transaction_id int unsigned primary key auto_increment , transaction_date date , transaction_amount double ); INSERT INTO transaction ( transaction_date, transaction_amount ) VALUES ('2021-01-10', 56) ,('2021-02-14',23.02) ,('2021-03-31',31.06) ,('2021-01-01',.25) ,('2020-01-02', 52) ,('2020-02-08',22.02) ,('2020-03-26',32.06) ,('2020-01-12',.75);; |
Now, here’s the selective aggregation query:
SELECT EXTRACT(YEAR FROM transaction_date) AS "Year" , SUM( CASE WHEN EXTRACT(MONTH FROM transaction_date) = 1 THEN transaction_amount END) AS "Jan" , SUM( CASE WHEN EXTRACT(MONTH FROM transaction_date) = 2 THEN transaction_amount END) AS "Feb" , SUM( CASE WHEN EXTRACT(MONTH FROM transaction_date) = 3 THEN transaction_amount END) AS "Mar" , SUM( CASE WHEN EXTRACT(MONTH FROM transaction_date) IN (1,2,3) THEN transaction_amount END) AS "1FQ" FROM transaction WHERE transaction_date BETWEEN '2020-01-15' AND '2021-03-15' GROUP BY EXTRACT(YEAR FROM transaction_date) ORDER BY 1; |
It produces the following correct results (at least in response to the query above):
+------+-------+-------+-------+-------+ | Year | Jan | Feb | Mar | 1FQ | +------+-------+-------+-------+-------+ | 2020 | NULL | 22.02 | 32.06 | 54.08 | | 2021 | 56.25 | 23.02 | NULL | 79.27 | +------+-------+-------+-------+-------+ 2 rows in set (0.02 sec) |
Why do you get null values for January 2020 and March 2021? That’s because the query returns null values when the conditions in the SELECT
-list aren’t met for a row return. This happens:
- When a row is returned for the month of February or March a null is returned in the January column.
- When a row is returned for the month of January or March a null is returned in the February column.
- When a row is returned for the month of January or February a null is returned in the March column.
That means there needs to be an ELSE
clause in each of the CASE
statements that sets the return value to zero. For example, the following query includes the ELSE
clause and some nice formatting tricks:
SELECT EXTRACT(YEAR FROM transaction_date) AS "Year" , CONCAT('$',LPAD(FORMAT(SUM( CASE WHEN EXTRACT(MONTH FROM transaction_date) = 1 THEN transaction_amount ELSE 0 END),2),8,' ')) AS "Jan" , LPAD(FORMAT(SUM( CASE WHEN EXTRACT(MONTH FROM transaction_date) = 2 THEN transaction_amount ELSE 0 END),2),8,' ') AS "Feb" , LPAD(FORMAT(SUM( CASE WHEN EXTRACT(MONTH FROM transaction_date) = 3 THEN transaction_amount ELSE 0 END),2),8,' ') AS "Mar" , LPAD(FORMAT(SUM( CASE WHEN EXTRACT(MONTH FROM transaction_date) IN (1,2,3) THEN transaction_amount ELSE 0 END),2),8,' ') AS "1FQ" FROM transaction WHERE transaction_date BETWEEN '2020-01-15' AND '2021-03-15' GROUP BY EXTRACT(YEAR FROM transaction_date) ORDER BY 1; |
It returns:
+------+-----------+----------+----------+----------+ | Year | Jan | Feb | Mar | 1FQ | +------+-----------+----------+----------+----------+ | 2020 | $ 0.00 | 22.02 | 32.06 | 54.08 | | 2021 | $ 56.25 | 23.02 | 0.00 | 79.27 | +------+-----------+----------+----------+----------+ 2 rows in set (0.01 sec) |
As always, I hope this helps answer a question that somebody is trying to sort out.