Count distinct window function in Redshift

Redshift has a count() window function, but it doesn’t support counting distinct items. However, one can still count distinct items in a window by using another method.

The following table represents records with a measure of various items per day:

date	item	measure
2015-01-01	table	12
2015-01-01	chair	51
2015-01-01	lamp	8
2015-01-02	table	17
2015-01-02	chair	72
2015-01-02	lamp	23
2015-01-02	bed	1
2015-01-02	dresser	2
2015-01-03	bed	1

To get all of the data plus the number of distinct items measured per day, we would want to use this window function:

SELECT *
, COUNT(DISTINCT item) OVER(PARTITION BY DATE) AS distinct_count
FROM mytable;

However, this doesn’t work, because as of the time I’m writing this article, the Redshift documentation says “ALL is the default. DISTINCT is not supported.”

Instead, one will need to use the DENSE_RANK() function to get this count, with the item to be counted in the ORDER BY window clause.

SELECT *
, DENSE_RANK() OVER(PARTITION BY date ORDER BY item) AS distinct_count
FROM mytable;

date	item	measure	distinct_count
2015-01-01	table	12	3
2015-01-01	chair	51	3
2015-01-01	lamp	8	3
2015-01-02	table	17	5
2015-01-02	chair	72	5
2015-01-02	lamp	23	5
2015-01-02	bed	1	5
2015-01-02	dresser	2	5
2015-01-03	bed	1	1

Update: I’m told that this no longer works. It must have been an undocumented feature (bug). You now need to do another query on top of this result set.

2 Replies to “Count distinct window function in Redshift”

Nice workaround. However I think you could use a better example by having duplicate items on a particular date. With the current example, regular count(*) window function would work as well.

Wow — thanks! I’ve been trying to figure out a workaround all day :-)

Share this:

2 Replies to “Count distinct window function in Redshift”

Leave a Reply