window function – BigDataMark

Redshift has a count() window function, but it doesn’t support counting distinct items. However, one can still count distinct items in a window by using another method.

The following table represents records with a measure of various items per day:

date	item	measure
2015-01-01	table	12
2015-01-01	chair	51
2015-01-01	lamp	8
2015-01-02	table	17
2015-01-02	chair	72
2015-01-02	lamp	23
2015-01-02	bed	1
2015-01-02	dresser	2
2015-01-03	bed	1

To get all of the data plus the number of distinct items measured per day, we would want to use this window function:

SELECT *
, COUNT(DISTINCT item) OVER(PARTITION BY DATE) AS distinct_count
FROM mytable;

However, this doesn’t work, because as of the time I’m writing this article, the Redshift documentation says “ALL is the default. DISTINCT is not supported.”

Instead, one will need to use the DENSE_RANK() function to get this count, with the item to be counted in the ORDER BY window clause.

SELECT *
, DENSE_RANK() OVER(PARTITION BY date ORDER BY item) AS distinct_count
FROM mytable;

date	item	measure	distinct_count
2015-01-01	table	12	3
2015-01-01	chair	51	3
2015-01-01	lamp	8	3
2015-01-02	table	17	5
2015-01-02	chair	72	5
2015-01-02	lamp	23	5
2015-01-02	bed	1	5
2015-01-02	dresser	2	5
2015-01-03	bed	1	1

Update: I’m told that this no longer works. It must have been an undocumented feature (bug). You now need to do another query on top of this result set.

Tag: window function

Count distinct window function in Redshift

Share this: