This how-to post will provide a tutorial on adding a subtotal row above your results using SQL with a Cartesian join and window functions.
Here is our data:
product | num_sold |
---|---|
a | 1 |
a | 3 |
a | 5 |
b | 2 |
b | 4 |
b | 8 |
Here is the output that we want:
product | num_sold |
---|---|
total for a | 9 |
a | 1 |
a | 3 |
a | 5 |
total for b | 14 |
b | 2 |
b | 4 |
b | 8 |
One way to do an insert in SQL would be to use table generating functions, like those built into Hive. However, many SQL databases don’t have any of these functions built in. Amazon Redshift is unlikely to ever support explode() or any similar functions.
The natural way to insert rows though, is to use standard SQL and a Cartesian join. Although all joins in SQL are Cartesian joins, the term is often reserved for a join that utilizes a Cartesian product to get a set that is larger than any of the sets being joined.
Often, a Cartesian join is unintended, and can cause duplicate rows if the join predicate matches multiple rows because the key being joined on is not unique.
To do a Cartesian join to add rows above your results, you would create a helper table like this:
uniq_id | row_count |
---|---|
0 | 1 |
1 | 1 |
Then use SQL like this:
SELECT CASE WHEN b.uniq_id = 0 THEN 'total for ' ||a.product ELSE a.product END AS product, CASE WHEN b.uniq_id = 0 THEN a.total ELSE a.num_sold END AS num_sold FROM (SELECT *, SUM(1) OVER (PARTITION BY product ORDER BY product ROWS UNBOUNDED PRECEDING) AS row_count, SUM(num_sold) OVER (PARTITION BY product ORDER BY product ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS total FROM product_sold_table) a LEFT JOIN helper_table b ON (a.row_count = b.row_count) ORDER BY a.product, COALESCE(b.uniq_id,1), a.num_sold