sql - Find duplicates takes a long itme -
i'm having following table layout: 4 different tables, each containing around 10 15 million entries. 3 string attributes of each table same (let's call them id, name1, name2). want read entries having same id column different (name1,name2) tuples. estimated less 0.5 % of entries matching.
we've created view allentries (basically union of relevant attributes on 4 tables) , our query looks this:
select * allentries group id having count(distinct(name1)) > 1 or count(distinct(name2)) > 1
executing query in our test database 2 million entries in each table (i.e. 8 million entries in view) takes around 2 3 minutes (nice server).
q: there performance improvement possible improve performance?
try cte row_number()
instead of traditional group by/having
approach:
;with ctedups ( select * ,row_number() over(partition name1 order id) rn1 ,row_number() over(partition name2 order id) rn2 allentries ) select * ctedups rn1 > 1 or rn2 > 1
Comments
Post a Comment