sql - Find duplicates takes a long itme -

- June 15, 2015

i'm having following table layout: 4 different tables, each containing around 10 15 million entries. 3 string attributes of each table same (let's call them id, name1, name2). want read entries having same id column different (name1,name2) tuples. estimated less 0.5 % of entries matching.

we've created view allentries (basically union of relevant attributes on 4 tables) , our query looks this:

select * allentries group id having count(distinct(name1)) > 1 or count(distinct(name2)) > 1

executing query in our test database 2 million entries in each table (i.e. 8 million entries in view) takes around 2 3 minutes (nice server).

q: there performance improvement possible improve performance?

try cte row_number() instead of traditional group by/having approach:

;with ctedups (     select  *             ,row_number() over(partition name1 order id) rn1             ,row_number() over(partition name2 order id) rn2        allentries ) select  *    ctedups   rn1 > 1     or  rn2 > 1

Search This Blog

Sp

sql - Find duplicates takes a long itme -

Comments

Post a Comment

Popular posts from this blog

c++11 - Intel compiler and "cannot have an in-class initializer" when using constexpr -

rest - Spring boot: Request method 'PUT' not supported -

java - WrongTypeOfReturnValue exception thrown when unit testing using mockito -