postgresql - How to make this SQL query reasonably fast -

- August 15, 2011

i optimize (read: make feasible @ all) sql query.

the following postgresql query retrieves records need. (believe i) confirm running query on small subset of actual db.

select b.*, a1.foo, a1.bar, a2.foo, a2.bar b left join a1 on b.n1_id = a1.n_id left join a2 on b.n2_id = a2.n_id b.l_id in (     select l_id c          l_id in (             select l_id b                  n1_id in (select n_id foo between foo_min , foo_max , bar between bar_min , bar_max)             union             select l_id b                  n2_id in (select n_id foo between foo_min , foo_max , bar between bar_min , bar_max)             )              , (property1 = 'y' or property2 = 'y')     )

the relevant part of db looks follows:

table a: n_id (pk); foo, int (indexed); bar, int (indexed);  table b: l_id (pk); n1_id (fk, indexed); n2_id (fk, (indexed);  table c: l_id (pk, fk); property1, char (indexed); property2, char (indexed);

explain tells me this:

"merge join  (cost=6590667.27..10067376.97 rows=453419 width=136)" "  merge cond: (a2.n_id = b.n2_id)" "  ->  index scan using pk_a on a2  (cost=0.57..3220265.29 rows=99883648 width=38)" "  ->  sort  (cost=6590613.72..6591747.27 rows=453419 width=98)" "        sort key: b.n2_id" "        ->  merge join  (cost=3071304.25..6548013.91 rows=453419 width=98)" "              merge cond: (a1.n_id = b.n1_id)" "              ->  index scan using pk_a on a1  (cost=0.57..3220265.29 rows=99883648 width=38)" "              ->  sort  (cost=3071250.74..3072384.28 rows=453419 width=60)" "                    sort key: b.n1_id" "                    ->  hash semi join  (cost=32475.31..3028650.92 rows=453419 width=60)" "                          hash cond: (b.l_id = c.l_id)" "                          ->  seq scan on b b  (cost=0.00..2575104.04 rows=122360504 width=60)" "                          ->  hash  (cost=26807.58..26807.58 rows=453419 width=16)" "                                ->  nested loop  (cost=10617.22..26807.58 rows=453419 width=16)" "                                      ->  hashaggregate  (cost=10616.65..10635.46 rows=1881 width=8)" "                                            ->  append  (cost=4081.76..10611.95 rows=1881 width=8)" "                                                  ->  nested loop  (cost=4081.76..5383.92 rows=1078 width=8)" "                                                        ->  bitmap heap scan on  (cost=4081.19..4304.85 rows=56 width=8)" "                                                              recheck cond: ((bar >= bar_min) , (bar <= bar_max) , (foo >= foo_min) , (foo <= foo_max))" "                                                              ->  bitmapand  (cost=4081.19..4081.19 rows=56 width=0)" "                                                                    ->  bitmap index scan on a_bar_idx  (cost=0.00..740.99 rows=35242 width=0)" "                                                                          index cond: ((bar >= bar_min) , (bar <= bar_max))" "                                                                    ->  bitmap index scan on a_foo_idx  (cost=0.00..3339.93 rows=159136 width=0)" "                                                                          index cond: ((foo >= foo_min) , (foo <= foo_max))" "                                                        ->  index scan using nx_b_n1 on b  (cost=0.57..19.08 rows=19 width=16)" "                                                              index cond: (n1_id = a.n_id)" "                                                  ->  nested loop  (cost=4081.76..5209.22 rows=803 width=8)" "                                                        ->  bitmap heap scan on a_1  (cost=4081.19..4304.85 rows=56 width=8)" "                                                              recheck cond: ((bar >= bar_min) , (bar <= bar_max) , (foo >= foo_min) , (foo <= foo_max))" "                                                              ->  bitmapand  (cost=4081.19..4081.19 rows=56 width=0)" "                                                                    ->  bitmap index scan on a_bar_idx  (cost=0.00..740.99 rows=35242 width=0)" "                                                                          index cond: ((bar >= bar_min) , (bar <= bar_max))" "                                                                    ->  bitmap index scan on a_foo_idx  (cost=0.00..3339.93 rows=159136 width=0)" "                                                                          index cond: ((foo >= foo_min) , (foo <= foo_max))" "                                                        ->  index scan using nx_b_n2 on b b_1  (cost=0.57..16.01 rows=14 width=16)" "                                                              index cond: (n2_id = a_1.n_id)" "                                      ->  index scan using pk_c on c  (cost=0.57..8.58 rows=1 width=8)" "                                            index cond: (l_id = b.l_id)" "                                            filter: ((property1 = 'y'::bpchar) or (property2 = 'y'::bpchar))"

all 3 tables have millions of rows. cannot change table definitions. where l_id in ( select l_id b...union...) restrictive , returns < 100 results.

what can make query execute in reasonable amount of time (a few seconds max)?

edit: forgot select 2 columns in outermost select. should change question though.

update seems difficult question, possibly due lack of information on part. love give more information, data base properietary , confidential.

i can retrieve rows of b conditions reasonably fast (0.1 s) following query:

with relevant_a (     select *                      foo between foo_min , foo_max              ,             bar between bar_min , bar_max ) relevant_c (     select * c         l_id in (             select l_id b                 n1_id in (                     select n_id relevant_a                 )             union             select l_id b                 n2_id in (                     select n_id relevant_a                 )         )         ,         (property1 = 'y' or property2= 'y') ), relevant_b (     select * b l_id in (         select l_id relevant_c     ) )  select * relevant_b

the join part becomes slow. query returns < 100 records, why join make slow?. have ideas how make simple join faster? should not costly add 4 columns of information table.

or this:

select b.*, a1.foo, a2.bar  b       left join a1 on b.n1_id = a1.n_id       left join a2 on b.n2_id = a2.n_id       inner join c on (c.l_id = b.l_id)       a1.foo between a1.foo_min , a1.foo_max ,       a2.bar between a2.bar_min , a2.bar_max ,      b.foo between b.foo_min , b.foo_max ,       b.bar between b.bar_min , bar_max   ,      (c.property1 = 'y' or c.property2 = 'y')

Search This Blog

Sp

postgresql - How to make this SQL query reasonably fast -

Comments

Post a Comment

Popular posts from this blog

java - WrongTypeOfReturnValue exception thrown when unit testing using mockito -

c++11 - Intel compiler and "cannot have an in-class initializer" when using constexpr -

rest - Spring boot: Request method 'PUT' not supported -