I recently had to fix someone else’s data before I could analyze it. They had a slowly changing dimension where they had duplicate and overlapping records, so a join using a single point in time resulted in an accidental cartesian join.
This is annoying but easy to fix. The following example query will merge the duplicate and overlapping records. Just run it continuously until it doesn’t return any results, as it only merges adjacent rows. If you have multiple records in a set that overlap, it’ll only handle the first one in each set.
This uses window functions and the OVERLAPS() function, so hopefully you’re using PostgreSQL.
BEGIN; CREATE TEMPORARY TABLE newrows AS SELECT DISTINCT LEAST(a.id, b.id) AS id , a.company_id , a.office_id , a.plan_type , LEAST(a.date_start, b.date_start) AS date_start , GREATEST(a.date_end, b.date_end) AS date_end , a.is_removed FROM subscriptions a , subscriptions b WHERE a.id <> b.id AND a.company_id = b.company_id AND a.office_id = b.office_id AND a.plan_type = b.plan_type AND a.is_removed = b.is_removed AND (a.date_start, a.date_end) OVERLAPS (b.date_start, b.date_end); DELETE FROM subscriptions a USING subscriptions b WHERE a.id <> b.id AND a.company_id = b.company_id AND a.office_id = b.office_id AND a.plan_type = b.plan_type AND a.is_removed = b.is_removed AND (a.date_start, a.date_end) OVERLAPS (b.date_start, b.date_end); INSERT INTO subscriptions SELECT * FROM newrows; SELECT * FROM newrows; DROP TABLE newrows; END;