Интересный пример работы оптимизатора
Как обычно имеется 2 запроса(отличающиеся только входными параметрами) , с хорошим и не очень временем выполнения.
Запрос 1(быстрый):
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
SELECT "PC_NAME","CUSTOMER_PARTNUMBER","LOTID","REPORTED_VALUE","SHIPMENT_NAME",s."SHIPMENT_DATE", "REPORT_NAME",p."SOITECPARAM_NAME",
"CUSTOMERPARAM_NAME" || ' ' || "TYPE_CALCUL" as "CUSTOMERPARAM_NAME", "SOI_GROUP_PARAMETER", "LOWER_SPEC_LIMIT","UPPER_SPEC_LIMIT"
FROM lot_data l, shipment s, parameters p, shipment_pc sp, pc pc, report_parameters_groups gr
WHERE s."SHIPMENT_ID" = l."SHIPMENT_ID" and p."PARAMETER_ID" = l."PARAMETER_ID" and p."SOITECPARAM_NAME" = gr."SOITECPARAM_NAME" and
s."SHIPMENT_ID" = sp."SHIPMENT_ID" and pc."PC_ID" = sp."PC_ID" and s."SHIPMENT_DATE" >= '2018-11-16' and s."SHIPMENT_DATE" <= '2019-05-15' and pc."PC_NAME" in ('C1161-049-01')
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1270.30..96958.09 rows=8699 width=132) (actual time=1361.072..2096.656 rows=72267 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=270.30..95062.82 rows=3625 width=132) (actual time=1358.845..2040.379 rows=24089 loops=3)
Hash Cond: ((p."SOITECPARAM_NAME")::text = (gr."SOITECPARAM_NAME")::text)
-> Nested Loop (cost=234.50..94999.35 rows=3625 width=104) (actual time=1357.937..2018.144 rows=24089 loops=3)
-> Hash Join (cost=234.09..93388.66 rows=3625 width=79) (actual time=1357.907..1907.946 rows=24089 loops=3)
Hash Cond: (l."SHIPMENT_ID" = s."SHIPMENT_ID")
-> Parallel Seq Scan on lot_data l (cost=0.00..83181.67 rows=2649567 width=33) (actual time=0.018..970.644 rows=2118060 loops=3)
-> Hash (cost=233.99..233.99 rows=8 width=58) (actual time=12.045..12.046 rows=22 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Nested Loop (cost=0.84..233.99 rows=8 width=58) (actual time=10.810..12.027 rows=22 loops=3)
-> Nested Loop (cost=0.55..200.09 rows=21 width=28) (actual time=0.461..0.609 rows=84 loops=3)
-> Index Scan using unique_pc on pc (cost=0.27..8.30 rows=1 width=28) (actual time=0.414..0.415 rows=1 loops=3)
Index Cond: (("PC_NAME")::text = 'C1161-049-01'::text)
-> Index Only Scan using shipmentpc_idx1 on shipment_pc sp (cost=0.28..191.58 rows=21 width=8) (actual time=0.045..0.152 rows=84 loops=3)
Index Cond: ("PC_ID" = pc."PC_ID")
Heap Fetches: 4
-> Index Scan using shipment_pkey on shipment s (cost=0.28..1.61 rows=1 width=30) (actual time=0.135..0.135 rows=0 loops=252)
Index Cond: ("SHIPMENT_ID" = sp."SHIPMENT_ID")
Filter: (("SHIPMENT_DATE" >= '2018-11-16 00:00:00'::timestamp without time zone) AND ("SHIPMENT_DATE" <= '2019-05-15 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Index Scan using parameters_idx1 on parameters p (cost=0.41..0.43 rows=1 width=33) (actual time=0.003..0.003 rows=1 loops=72267)
Index Cond: ("PARAMETER_ID" = l."PARAMETER_ID")
-> Hash (cost=20.91..20.91 rows=1191 width=28) (actual time=0.853..0.854 rows=1191 loops=3)
Buckets: 2048 Batches: 1 Memory Usage: 88kB
-> Seq Scan on report_parameters_groups gr (cost=0.00..20.91 rows=1191 width=28) (actual time=0.009..0.381 rows=1191 loops=3)
Planning time: 2.358 ms
Execution time: 2123.327 ms
Запрос 2 (медленный):
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
SELECT "PC_NAME","CUSTOMER_PARTNUMBER","LOTID","REPORTED_VALUE","SHIPMENT_NAME",s."SHIPMENT_DATE", "REPORT_NAME",p."SOITECPARAM_NAME",
"CUSTOMERPARAM_NAME" || ' ' || "TYPE_CALCUL" as "CUSTOMERPARAM_NAME", "SOI_GROUP_PARAMETER", "LOWER_SPEC_LIMIT","UPPER_SPEC_LIMIT"
FROM lot_data l, shipment s, parameters p, shipment_pc sp, pc pc, report_parameters_groups gr
WHERE s."SHIPMENT_ID" = l."SHIPMENT_ID" and p."PARAMETER_ID" = l."PARAMETER_ID" and p."SOITECPARAM_NAME" = gr."SOITECPARAM_NAME" and
s."SHIPMENT_ID" = sp."SHIPMENT_ID" and pc."PC_ID" = sp."PC_ID" and s."SHIPMENT_DATE" >= '2019-04-16' and s."SHIPMENT_DATE" <= '2019-05-16' and pc."PC_NAME" in ('C21264-024-03')
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2.09..49060.25 rows=1328 width=132) (actual time=11022.561..11867.253 rows=4176 loops=1)
-> Nested Loop (cost=1.81..48659.05 rows=1328 width=104) (actual time=11022.541..11847.173 rows=4176 loops=1)
-> Nested Loop (cost=1.40..48068.98 rows=1328 width=79) (actual time=11022.513..11834.426 rows=4176 loops=1)
Join Filter: (s."SHIPMENT_ID" = l."SHIPMENT_ID")
-> Nested Loop (cost=0.84..233.99 rows=1 width=58) (actual time=1.447..1.535 rows=6 loops=1)
-> Nested Loop (cost=0.55..200.09 rows=21 width=28) (actual time=0.587..0.667 rows=20 loops=1)
-> Index Scan using unique_pc on pc (cost=0.27..8.30 rows=1 width=28) (actual time=0.519..0.520 rows=1 loops=1)
Index Cond: (("PC_NAME")::text = 'C21264-024-03'::text)
-> Index Only Scan using shipmentpc_idx1 on shipment_pc sp (cost=0.28..191.58 rows=21 width=8) (actual time=0.065..0.130 rows=20 loops=1)
Index Cond: ("PC_ID" = pc."PC_ID")
Heap Fetches: 2
-> Index Scan using shipment_pkey on shipment s (cost=0.28..1.61 rows=1 width=30) (actual time=0.042..0.042 rows=0 loops=20)
Index Cond: ("SHIPMENT_ID" = sp."SHIPMENT_ID")
Filter: (("SHIPMENT_DATE" >= '2019-04-16 00:00:00'::timestamp without time zone) AND ("SHIPMENT_DATE" <= '2019-05-16 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Index Scan using lot_data_pkey on lot_data l (cost=0.56..47796.17 rows=3106 width=33) (actual time=1961.536..1971.761 rows=696 loops=6)
Index Cond: ("SHIPMENT_ID" = sp."SHIPMENT_ID")
-> Index Scan using parameters_idx1 on parameters p (cost=0.41..0.43 rows=1 width=33) (actual time=0.002..0.002 rows=1 loops=4176)
Index Cond: ("PARAMETER_ID" = l."PARAMETER_ID")
-> Index Scan using report_parameters_groups_pkey on report_parameters_groups gr (cost=0.28..0.30 rows=1 width=28) (actual time=0.004..0.004 rows=1 loops=4176)
Index Cond: (("SOITECPARAM_NAME")::text = (p."SOITECPARAM_NAME")::text)
Planning time: 1.509 ms
Execution time: 11868.160 ms
Интересная особенность состоит в том, что в первом плане для доступа к таблице
lot_data используется :
1.
Parallel Seq Scan on lot_data l (cost=0.00..83181.67 rows=2649567 width=33) (actual time=0.018.. 970.644 rows=2118060 loops=3 )
во втором плане для доступа к таблице
lot_data используется :
Index Scan using lot_data_pkey on lot_data l (cost=0.56..47796.17 rows=3106 width=33) (actual time=1961.536.. 1971.761 rows=696 loops=6 )
В результате получается, что доступ по индексу оказывается более тяжелым , чем последовательное чтение.
Вопросы
1)можно ли в данной ситуации помочь оптимизатору ?
2) Есть подозрение, поподробнее посмотреть статистику
pg_stat для таблицы
lot_data , на ,что обратить внимание ?
3) Может быть дело вообще не в таблице
lot_data ? На , что еще обратить внимание ?