postgresql - How to make this SQL query reasonably fast -
i optimize (read: make feasible @ all) sql query.
the following postgresql query retrieves records need. (believe i) confirm running query on small subset of actual db.
select b.*, a1.foo, a1.bar, a2.foo, a2.bar b left join a1 on b.n1_id = a1.n_id left join a2 on b.n2_id = a2.n_id b.l_id in ( select l_id c l_id in ( select l_id b n1_id in (select n_id foo between foo_min , foo_max , bar between bar_min , bar_max) union select l_id b n2_id in (select n_id foo between foo_min , foo_max , bar between bar_min , bar_max) ) , (property1 = 'y' or property2 = 'y') )
the relevant part of db looks follows:
table a: n_id (pk); foo, int (indexed); bar, int (indexed); table b: l_id (pk); n1_id (fk, indexed); n2_id (fk, (indexed); table c: l_id (pk, fk); property1, char (indexed); property2, char (indexed);
explain
tells me this:
"merge join (cost=6590667.27..10067376.97 rows=453419 width=136)" " merge cond: (a2.n_id = b.n2_id)" " -> index scan using pk_a on a2 (cost=0.57..3220265.29 rows=99883648 width=38)" " -> sort (cost=6590613.72..6591747.27 rows=453419 width=98)" " sort key: b.n2_id" " -> merge join (cost=3071304.25..6548013.91 rows=453419 width=98)" " merge cond: (a1.n_id = b.n1_id)" " -> index scan using pk_a on a1 (cost=0.57..3220265.29 rows=99883648 width=38)" " -> sort (cost=3071250.74..3072384.28 rows=453419 width=60)" " sort key: b.n1_id" " -> hash semi join (cost=32475.31..3028650.92 rows=453419 width=60)" " hash cond: (b.l_id = c.l_id)" " -> seq scan on b b (cost=0.00..2575104.04 rows=122360504 width=60)" " -> hash (cost=26807.58..26807.58 rows=453419 width=16)" " -> nested loop (cost=10617.22..26807.58 rows=453419 width=16)" " -> hashaggregate (cost=10616.65..10635.46 rows=1881 width=8)" " -> append (cost=4081.76..10611.95 rows=1881 width=8)" " -> nested loop (cost=4081.76..5383.92 rows=1078 width=8)" " -> bitmap heap scan on (cost=4081.19..4304.85 rows=56 width=8)" " recheck cond: ((bar >= bar_min) , (bar <= bar_max) , (foo >= foo_min) , (foo <= foo_max))" " -> bitmapand (cost=4081.19..4081.19 rows=56 width=0)" " -> bitmap index scan on a_bar_idx (cost=0.00..740.99 rows=35242 width=0)" " index cond: ((bar >= bar_min) , (bar <= bar_max))" " -> bitmap index scan on a_foo_idx (cost=0.00..3339.93 rows=159136 width=0)" " index cond: ((foo >= foo_min) , (foo <= foo_max))" " -> index scan using nx_b_n1 on b (cost=0.57..19.08 rows=19 width=16)" " index cond: (n1_id = a.n_id)" " -> nested loop (cost=4081.76..5209.22 rows=803 width=8)" " -> bitmap heap scan on a_1 (cost=4081.19..4304.85 rows=56 width=8)" " recheck cond: ((bar >= bar_min) , (bar <= bar_max) , (foo >= foo_min) , (foo <= foo_max))" " -> bitmapand (cost=4081.19..4081.19 rows=56 width=0)" " -> bitmap index scan on a_bar_idx (cost=0.00..740.99 rows=35242 width=0)" " index cond: ((bar >= bar_min) , (bar <= bar_max))" " -> bitmap index scan on a_foo_idx (cost=0.00..3339.93 rows=159136 width=0)" " index cond: ((foo >= foo_min) , (foo <= foo_max))" " -> index scan using nx_b_n2 on b b_1 (cost=0.57..16.01 rows=14 width=16)" " index cond: (n2_id = a_1.n_id)" " -> index scan using pk_c on c (cost=0.57..8.58 rows=1 width=8)" " index cond: (l_id = b.l_id)" " filter: ((property1 = 'y'::bpchar) or (property2 = 'y'::bpchar))"
all 3 tables have millions of rows. cannot change table definitions. where l_id in ( select l_id b...union...)
restrictive , returns < 100 results.
what can make query execute in reasonable amount of time (a few seconds max)?
edit: forgot select 2 columns in outermost select
. should change question though.
update seems difficult question, possibly due lack of information on part. love give more information, data base properietary , confidential.
i can retrieve rows of b conditions reasonably fast (0.1 s) following query:
with relevant_a ( select * foo between foo_min , foo_max , bar between bar_min , bar_max ) relevant_c ( select * c l_id in ( select l_id b n1_id in ( select n_id relevant_a ) union select l_id b n2_id in ( select n_id relevant_a ) ) , (property1 = 'y' or property2= 'y') ), relevant_b ( select * b l_id in ( select l_id relevant_c ) ) select * relevant_b
the join part becomes slow. query returns < 100 records, why join make slow?. have ideas how make simple join faster? should not costly add 4 columns of information table.
or this:
select b.*, a1.foo, a2.bar b left join a1 on b.n1_id = a1.n_id left join a2 on b.n2_id = a2.n_id inner join c on (c.l_id = b.l_id) a1.foo between a1.foo_min , a1.foo_max , a2.bar between a2.bar_min , a2.bar_max , b.foo between b.foo_min , b.foo_max , b.bar between b.bar_min , bar_max , (c.property1 = 'y' or c.property2 = 'y')
Comments
Post a Comment