Оптимизация большого запроса возможна? / PostgreSQL

ReSQL.ru

Мобильная версия Контакт Правила FAQ Помощь

Гость

Войти | Регистрация | Профиль | Очистить

Новые сообщения | Избранное

Форумы | Пользователи | Статистика | Мод. лог | Поиск

Цитировать

Написать

Автор*:

Ввести пароль для входа

Тема*:

Сообщение

Данное сообщение тематическое

Сообщение содержит картинки или видео 18+

Автор:

ВНИМАНИЕ! На данном подфоруме действуют строгие правила. Удостоверьтесь, что ваше сообщение соответствует им!

Форум или тема закрыты для гостей. Необходима авторизация!

Загрузить последнюю сохраненную версию

Вложение:

Вставить как галерею

Максимальный размер вложений: 4,0 МБ, аудио/видео: 8,0 МБ. Картинки большего размера ужимаются, если возможно.

Введите код, изображенный на картинке. Если код нечитаемый, кликните картинку, чтобы загрузить другой вариант.

Отправляя сообщение, я выражаю свое согласие с правилами форума и принимаю пользовательское соглашение.

Доб. в избранное | Игнор. тему | Прикреп. тему | Пометить прочит. / непрочит. | Фильтр

Форумы / PostgreSQL [игнор отключен] [закрыт для гостей] / Оптимизация большого запроса возможна?

25 сообщений из 65, страница 1 из 3

все

Оптимизация большого запроса возможна?

#35373518

UKY

Гость

Доброго времени суток, многоуважаемые :)

Помогите, пожалуйста, оптимизировать запрос:

Код: plaintext

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.

--explain analyze
select 	servers.ip,
	services.nick,
	folders.path,
	files.name || '.' || files.type,
	files.size,
	tths.added,
	tths.tth,
	tths.id,
	tths.count AS count
from	servers
	join services	on (servers.id = services.server)
	join folders	on (folders.service = services.id)
	join files	on (files.path = folders.id)
	join tths	on (files.tth = tths.id)
	join
--explain analyze
	(select distinct on (tths.tth)
		tths.id	AS tth_id
		from	files join tths on files.tth = tths.id
		where	files.size >  0  and
			files.type is not null
			and lower(files.name) like '%'||'панда'||'%'
	) AS inq
	on (tths.id = inq.tth_id)
UNION
--explain analyze
select 	servers.ip,
	services.nick,
	folders.path,
	files.name,
	files.size,
	NULL,
	NULL,
	NULL,
	NULL AS count
from	servers
	join services	on (servers.id = services.server)
	join folders	on (folders.service = services.id)
	join files	on (files.path = folders.id)
where
	files.type is null and
	files.tth is null and
	lower(files.name) like '%'||'панда'||'%'
order 	by count desc NULLS LAST
limit	 100 ;

EXPLAIN ANALYZE выдает такое:

Код: plaintext

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.

 Limit  (cost= 486668 . 68 .. 486668 . 93  rows= 100  width= 406 ) (actual time= 23663 . 131 .. 23663 . 277  rows= 29  loops= 1 )
   ->  Sort  (cost= 486668 . 68 .. 486669 . 10  rows= 166  width= 406 ) (actual time= 23663 . 115 .. 23663 . 163  rows= 29  loops= 1 )
         Sort Key: public.tths.count
         Sort Method:  quicksort  Memory: 32kB
         ->  Unique  (cost= 486658 . 41 .. 486662 . 56  rows= 166  width= 406 ) (actual time= 23662 . 787 .. 23662 . 965  rows= 29  loops= 1 )
               ->  Sort  (cost= 486658 . 41 .. 486658 . 83  rows= 166  width= 406 ) (actual time= 23662 . 781 .. 23662 . 834  rows= 29  loops= 1 )
                     Sort Key: public.servers.ip, public.services.nick, public.folders.path, ((((public.files.name)::text || '.'::text) || (public.files.type)::text)), public.files.size, public.tths.added, public.tths.tth, public.tths.id, public.tths.count
                     Sort Method:  quicksort  Memory: 32kB
                     ->  Append  (cost= 180382 . 35 .. 486652 . 29  rows= 166  width= 406 ) (actual time= 7940 . 199 .. 23662 . 361  rows= 29  loops= 1 )
                           ->  Hash Join  (cost= 180382 . 35 .. 349969 . 25  rows= 165  width= 406 ) (actual time= 7940 . 194 .. 22407 . 952  rows= 27  loops= 1 )
                                 Hash Cond: (public.services.server = public.servers.id)
                                 ->  Hash Join  (cost= 180375 . 35 .. 349958 . 75  rows= 165  width= 403 ) (actual time= 7939 . 021 .. 22406 . 235  rows= 27  loops= 1 )
                                       Hash Cond: (public.folders.service = public.services.id)
                                       ->  Nested Loop  (cost= 180367 . 29 .. 349948 . 42  rows= 165  width= 396 ) (actual time= 7937 . 755 .. 22404 . 789  rows= 27  loops= 1 )
                                             ->  Hash Join  (cost= 180367 . 29 .. 348704 . 12  rows= 165  width= 334 ) (actual time= 7937 . 612 .. 22402 . 311  rows= 27  loops= 1 )
                                                   Hash Cond: (public.files.tth = public.tths.id)
                                                   ->  Seq Scan on files  (cost= 0 . 00 .. 156340 . 94  rows= 3198894  width= 286 ) (actual time= 0 . 033 .. 7397 . 845  rows= 3198894  loops= 1 )
                                                   ->  Hash  (cost= 180367 . 27 .. 180367 . 27  rows= 2  width= 56 ) (actual time= 7933 . 336 .. 7933 . 336  rows= 10  loops= 1 )
                                                         ->  Nested Loop  (cost= 180349 . 95 .. 180367 . 27  rows= 2  width= 56 ) (actual time= 7933 . 054 .. 7933 . 301  rows= 10  loops= 1 )
                                                               ->  Unique  (cost= 180349 . 95 .. 180349 . 96  rows= 2  width= 44 ) (actual time= 7933 . 019 .. 7933 . 107  rows= 10  loops= 1 )
                                                                     ->  Sort  (cost= 180349 . 95 .. 180349 . 95  rows= 2  width= 44 ) (actual time= 7933 . 013 .. 7933 . 045  rows= 20  loops= 1 )
                                                                           Sort Key: public.tths.tth
                                                                           Sort Method:  quicksort  Memory: 26kB
                                                                           ->  Nested Loop  (cost= 0 . 00 .. 180349 . 94  rows= 2  width= 44 ) (actual time= 39 . 126 .. 7932 . 723  rows= 20  loops= 1 )
                                                                                 ->  Seq Scan on files  (cost= 0 . 00 .. 180332 . 65  rows= 2  width= 4 ) (actual time= 4 . 062 .. 7825 . 612  rows= 20  loops= 1 )
                                                                                       Filter: ((type IS NOT NULL) AND (size >  0 ) AND (lower((name)::text) ~~ '%панда%'::text))
                                                                                 ->  Index Scan using tths_pkey on tths  (cost= 0 . 00 .. 8 . 63  rows= 1  width= 44 ) (actual time= 5 . 337 .. 5 . 341  rows= 1  loops= 20 )
                                                                                       Index Cond: (public.tths.id = public.files.tth)
                                                               ->  Index Scan using tths_pkey on tths  (cost= 0 . 00 .. 8 . 63  rows= 1  width= 52 ) (actual time= 0 . 009 .. 0 . 012  rows= 1  loops= 10 )
                                                                     Index Cond: (public.tths.id = public.tths.id)
                                             ->  Index Scan using folders_pkey on folders  (cost= 0 . 00 .. 7 . 53  rows= 1  width= 74 ) (actual time= 0 . 074 .. 0 . 078  rows= 1  loops= 27 )
                                                   Index Cond: (public.folders.id = public.files.path)
                                       ->  Hash  (cost= 5 . 25 .. 5 . 25  rows= 225  width= 15 ) (actual time= 1 . 223 .. 1 . 223  rows= 225  loops= 1 )
                                             ->  Seq Scan on services  (cost= 0 . 00 .. 5 . 25  rows= 225  width= 15 ) (actual time= 0 . 069 .. 0 . 632  rows= 225  loops= 1 )
                                 ->  Hash  (cost= 4 . 22 .. 4 . 22  rows= 222  width= 11 ) (actual time= 1 . 083 .. 1 . 083  rows= 222  loops= 1 )
                                       ->  Seq Scan on servers  (cost= 0 . 00 .. 4 . 22  rows= 222  width= 11 ) (actual time= 0 . 018 .. 0 . 503  rows= 222  loops= 1 )
                           ->  Subquery Scan "*SELECT* 2"  (cost= 4019 . 77 .. 136681 . 39  rows= 1  width= 100 ) (actual time= 153 . 754 .. 1254 . 284  rows= 2  loops= 1 )
                                 ->  Nested Loop  (cost= 4019 . 77 .. 136681 . 38  rows= 1  width= 100 ) (actual time= 153 . 745 .. 1254 . 265  rows= 2  loops= 1 )
                                       ->  Nested Loop  (cost= 4019 . 77 .. 136677 . 08  rows= 1  width= 97 ) (actual time= 153 . 691 .. 1254 . 156  rows= 2  loops= 1 )
                                             ->  Nested Loop  (cost= 4019 . 77 .. 136672 . 13  rows= 1  width= 90 ) (actual time= 153 . 627 .. 1254 . 040  rows= 2  loops= 1 )
                                                   ->  Bitmap Heap Scan on files  (cost= 4019 . 77 .. 136663 . 83  rows= 1  width= 28 ) (actual time= 153 . 589 .. 1253 . 883  rows= 2  loops= 1 )
                                                         Recheck Cond: (type IS NULL)
                                                         Filter: ((tth IS NULL) AND (lower((name)::text) ~~ '%панда%'::text))
                                                         ->  Bitmap Index Scan on files_type_folder_index  (cost= 0 . 00 .. 4019 . 77  rows= 196199  width= 0 ) (actual time= 99 . 075 .. 99 . 075  rows= 199141  loops= 1 )
                                                               Index Cond: (type IS NULL)
                                                   ->  Index Scan using folders_pkey on folders  (cost= 0 . 00 .. 8 . 29  rows= 1  width= 74 ) (actual time= 0 . 058 .. 0 . 060  rows= 1  loops= 2 )
                                                         Index Cond: (public.folders.id = public.files.path)
                                             ->  Index Scan using services_pkey on services  (cost= 0 . 00 .. 4 . 93  rows= 1  width= 15 ) (actual time= 0 . 044 .. 0 . 046  rows= 1  loops= 2 )
                                                   Index Cond: (public.services.id = public.folders.service)
                                       ->  Index Scan using servers_pkey on servers  (cost= 0 . 00 .. 4 . 29  rows= 1  width= 11 ) (actual time= 0 . 038 .. 0 . 040  rows= 1  loops= 2 )
                                             Index Cond: (public.servers.id = public.services.server)
 Total runtime:  23664 . 705  ms
( 52  rows)

Пробовал создавать такие индексы:

Код: plaintext

1.
2.
3.
4.
5.
6.

--CREATE INDEX files_size_index ON files(size NULLS LAST);
--CREATE INDEX tths_count_index ON tths(count DESC NULLS LAST);
--CREATE INDEX tths_tth_index ON tths(tth);
--CREATE INDEX files_lower_name_index ON files(lower(name));
CREATE INDEX files_type_folder_index ON files(type) WHERE type is null;
--CREATE INDEX files_type_no_folder_index ON files(type) WHERE type is not null;

Но использовался только тот индекс, который не закомментирован.

Размерность таблиц:
files 3,2 mln
tths 1,6 mln
folders 0,2 mln
services 235
servers 222

VACUUM ANALYZE делал. Какие еще меры, помимо обновления железа (планируется в будущем) можно предпринять?

Структура таблиц приведена ниже:

Код: plaintext

CREATE TABLE servers
(
	id	serial		PRIMARY KEY,
	ip	inet,
	network	integer,
	name	varchar( 250 ),
	size	bigint,
	added	date		DEFAULT CURRENT_DATE
);

CREATE TABLE services
(
	id	serial		PRIMARY KEY,
	server	integer,
	type	integer,
	nick	varchar( 250 ),
	added	date		DEFAULT CURRENT_DATE,
	last	date,
	descr	varchar( 250 ),
	size	bigint,
	files	integer,
	folders	integer
);
ALTER TABLE services ADD CONSTRAINT fk_services_servers
	FOREIGN KEY (server) REFERENCES servers(id);
ALTER TABLE services ADD CONSTRAINT fk_services_service
	FOREIGN KEY (type) REFERENCES service(id);

CREATE TABLE folders
(
	id	serial		PRIMARY KEY,
	path	varchar( 250 ),
	service	integer,
	rubric	integer		DEFAULT NULL
);
ALTER TABLE folders ADD CONSTRAINT fk_folders_rubrics
	FOREIGN KEY (rubric) REFERENCES rubrics(id);
ALTER TABLE folders ADD CONSTRAINT fk_folders_services
	FOREIGN KEY (service) REFERENCES services(id);

CREATE TABLE tths
(
	id	serial		PRIMARY KEY,
	TTH	character( 39 )	UNIQUE,
	count	integer,
	content	varchar,
	vector	tsvector,
	added	date		DEFAULT CURRENT_DATE
);

CREATE TABLE files
(
	id	serial		PRIMARY KEY,
	path	integer,
	name	varchar( 250 )	NOT NULL,
	type	character( 250 ),
	size	bigint,
	tth	integer
);
ALTER TABLE files ADD CONSTRAINT fk_files_folders
	FOREIGN KEY (path) REFERENCES folders(id);
ALTER TABLE files ADD CONSTRAINT fk_files_tths
	FOREIGN KEY (tth) REFERENCES tths(id);

...

Рейтинг:

0 / 0

15.06.2008, 15:32

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35373602

Гостик

Гость

Возможно поможет кластерный индекс.

http://www.postgresql.org/docs/8.2/static/sql-cluster.html

Код: plaintext

CLUSTER
Name
CLUSTER -- cluster a table according to an index
Synopsis

CLUSTER indexname ON tablename
CLUSTER tablename
CLUSTER

Description

CLUSTER instructs PostgreSQL to cluster the table specified by tablename based on the index specified by indexname. The index must already have been defined on tablename.

When a table is clustered, it is physically reordered based on the index information. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order. If one wishes, one can periodically recluster by issuing the command again.

When a table is clustered, PostgreSQL remembers on which index it was clustered. The form CLUSTER tablename reclusters the table on the same index that it was clustered before.

CLUSTER without any parameter reclusters all the tables in the current database that the calling user owns, or all tables if called by a superuser. (Never-clustered tables are not included.) This form of CLUSTER cannot be executed inside a transaction block.

When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired on it. This prevents any other database operations (both reads and writes) from operating on the table until the CLUSTER is finished.
Parameters

indexname

    The name of an index. 
tablename

    The name (possibly schema-qualified) of a table. 

Notes

CLUSTER loses all visibility information of tuples, which makes the table look empty to any snapshot that was taken before the CLUSTER command finished. That makes CLUSTER unsuitable for applications where transactions that access the table being clustered are run concurrently with CLUSTER. This is most visible with serializable transactions, because they take only one snapshot at the beginning of the transaction, but read-committed transactions are also affected.

In cases where you are accessing single rows randomly within a table, the actual order of the data in the table is unimportant. However, if you tend to access some data more than others, and there is an index that groups them together, you will benefit from using CLUSTER. If you are requesting a range of indexed values from a table, or a single indexed value that has multiple rows that match, CLUSTER will help because once the index identifies the table page for the first row that matches, all other rows that match are probably already on the same table page, and so you save disk accesses and speed up the query.

During the cluster operation, a temporary copy of the table is created that contains the table data in the index order. Temporary copies of each index on the table are created as well. Therefore, you need free space on disk at least equal to the sum of the table size and the index sizes.

Because CLUSTER remembers the clustering information, one can cluster the tables one wants clustered manually the first time, and setup a timed event similar to VACUUM so that the tables are periodically reclustered.

Because the planner records statistics about the ordering of tables, it is advisable to run ANALYZE on the newly clustered table. Otherwise, the planner may make poor choices of query plans.

There is another way to cluster data. The CLUSTER command reorders the original table by scanning it using the index you specify. This can be slow on large tables because the rows are fetched from the table in index order, and if the table is disordered, the entries are on random pages, so there is one disk page retrieved for every row moved. (PostgreSQL has a cache, but the majority of a big table will not fit in the cache.) The other way to cluster a table is to use

CREATE TABLE newtable AS
    SELECT * FROM table ORDER BY columnlist;

which uses the PostgreSQL sorting code to produce the desired order; this is usually much faster than an index scan for disordered data. Then you drop the old table, use ALTER TABLE ... RENAME to rename newtable to the old name, and recreate the table's indexes. The big disadvantage of this approach is that it does not preserve OIDs, constraints, foreign key relationships, granted privileges, and other ancillary properties of the table — all such items must be manually recreated. Another disadvantage is that this way requires a sort temporary file about the same size as the table itself, so peak disk usage is about three times the table size instead of twice the table size.
Examples

Cluster the table employees on the basis of its index emp_ind:

CLUSTER emp_ind ON emp;

Cluster the employees table using the same index that was used before:

CLUSTER emp;

Cluster all tables in the database that have previously been clustered:

CLUSTER;

Compatibility

There is no CLUSTER statement in the SQL standard.

...

Рейтинг:

0 / 0

15.06.2008, 20:48

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35374014

LeXa NalBat

Участник

Откуда: Москва

Сообщения: 2 843

Рейтинг: 0 / 0

Код: plaintext

1.
2.
3.

->  Hash Join  (cost= 180367 . 29 .. 348704 . 12  rows= 165  width= 334 ) (actual time= 7937 . 612 .. 22402 . 311  rows= 27  loops= 1 )
      Hash Cond: (public.files.tth = public.tths.id)
      ->  Seq Scan on files  (cost= 0 . 00 .. 156340 . 94  rows= 3198894  width= 286 ) (actual time= 0 . 033 .. 7397 . 845  rows= 3198894  loops= 1 )
      ->  Hash  (cost= 180367 . 27 .. 180367 . 27  rows= 2  width= 56 ) (actual time= 7933 . 336 .. 7933 . 336  rows= 10  loops= 1 )

создать индекс по files(tth)

Код: plaintext

->  Seq Scan on files  (cost= 0 . 00 .. 180332 . 65  rows= 2  width= 4 ) (actual time= 4 . 062 .. 7825 . 612  rows= 20  loops= 1 )
      Filter: ((type IS NOT NULL) AND (size >  0 ) AND (lower((name)::text) ~~ '%панда%'::text))

Код: plaintext

1.
2.
3.
4.

->  Bitmap Heap Scan on files  (cost= 4019 . 77 .. 136663 . 83  rows= 1  width= 28 ) (actual time= 153 . 589 .. 1253 . 883  rows= 2  loops= 1 )
      Recheck Cond: (type IS NULL)
      Filter: ((tth IS NULL) AND (lower((name)::text) ~~ '%панда%'::text))
      ->  Bitmap Index Scan on files_type_folder_index  (cost= 0 . 00 .. 4019 . 77  rows= 196199  width= 0 ) (actual time= 99 . 075 .. 99 . 075  rows= 199141  loops= 1 )
            Index Cond: (type IS NULL)

использовать полнотекстовый поиск

...

Рейтинг:

0 / 0

16.06.2008, 11:00

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35374273

UKY

Гость

2Гостик:
Пытался делать кластерный индекс (ох, как это долго все происходило :) по files(name), но не помогло. Этот индекс не использовался.

2LeXa_NalBat:
А если я пишу

Код: plaintext

ALTER TABLE files ADD CONSTRAINT fk_files_tths FOREIGN KEY (tth) REFERENCES tths(id);

то индекс разве сам не создается? Во всяком случае создал такой индекс, но СУБД не хочет его использовать (EXPLAIN ANALYZE см. ниже).

Прикрутил полнотекстовый поиск, создал для него GIN индекс. Круто. Быстро. Ничего не скажешь :)

Код: plaintext

Sort  (cost= 784418 . 75 .. 785975 . 67  rows= 622765  width= 406 ) (actual time= 21129 . 302 .. 21129 . 382  rows= 47  loops= 1 )
  Sort Key: public.tths.count, public.tths.id
  Sort Method:  quicksort  Memory: 38kB
  ->  Unique  (cost= 593966 . 70 .. 609535 . 83  rows= 622765  width= 406 ) (actual time= 21128 . 882 .. 21129 . 151  rows= 47  loops= 1 )
        ->  Sort  (cost= 593966 . 70 .. 595523 . 61  rows= 622765  width= 406 ) (actual time= 21128 . 877 .. 21128 . 956  rows= 47  loops= 1 )
              Sort Key: public.servers.ip, public.services.nick, public.folders.path, (ts_headline('russian'::regconfig, (((public.files.name)::text || '.'::text) || (public.files.type)::text), to_tsquery('панда'::text))), public.files.size, public.tths.added, public.tths.tth, public.tths.id, public.tths.count
              Sort Method:  quicksort  Memory: 38kB
              ->  Append  (cost= 94892 . 51 .. 419083 . 78  rows= 622765  width= 406 ) (actual time= 5312 . 341 .. 21128 . 448  rows= 47  loops= 1 )
                    ->  Hash Join  (cost= 94892 . 51 .. 408468 . 11  rows= 622753  width= 406 ) (actual time= 5312 . 337 .. 21064 . 884  rows= 45  loops= 1 )
                          Hash Cond: (public.files.path = public.folders.id)
                          ->  Hash Join  (cost= 78007 . 13 .. 311386 . 38  rows= 622753  width= 334 ) (actual time= 2213 . 472 .. 17848 . 099  rows= 45  loops= 1 )
                                Hash Cond: (public.files.tth = public.tths.id)
                                ->  Seq Scan on files  (cost= 0 . 00 .. 202236 . 19  rows= 4142919  width= 286 ) (actual time= 0 . 038 .. 9504 . 883  rows= 4142919  loops= 1 )
                                ->  Hash  (cost= 77960 . 87 .. 77960 . 87  rows= 3701  width= 56 ) (actual time= 1 . 504 .. 1 . 504  rows= 15  loops= 1 )
                                      ->  Nested Loop  (cost= 47210 . 25 .. 77960 . 87  rows= 3701  width= 56 ) (actual time= 1 . 102 .. 1 . 462  rows= 15  loops= 1 )
                                            ->  Unique  (cost= 47210 . 25 .. 47228 . 75  rows= 3701  width= 44 ) (actual time= 1 . 087 .. 1 . 236  rows= 15  loops= 1 )
                                                  ->  Sort  (cost= 47210 . 25 .. 47219 . 50  rows= 3701  width= 44 ) (actual time= 1 . 082 .. 1 . 135  rows= 35  loops= 1 )
                                                        Sort Key: public.tths.tth
                                                        Sort Method:  quicksort  Memory: 27kB
                                                        ->  Nested Loop  (cost= 125 . 22 .. 46990 . 90  rows= 3701  width= 44 ) (actual time= 0 . 152 .. 0 . 785  rows= 35  loops= 1 )
                                                              ->  Bitmap Heap Scan on files  (cost= 125 . 22 .. 14614 . 53  rows= 3910  width= 4 ) (actual time= 0 . 122 .. 0 . 222  rows= 35  loops= 1 )
                                                                    Recheck Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
                                                                    Filter: ((type IS NOT NULL) AND (size >  0 ))
                                                                    ->  Bitmap Index Scan on files_name_russian_vector_index  (cost= 0 . 00 .. 124 . 24  rows= 4143  width= 0 ) (actual time= 0 . 106 .. 0 . 106  rows= 37  loops= 1 )
                                                                          Index Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
                                                              ->  Index Scan using tths_pkey on tths  (cost= 0 . 00 .. 8 . 27  rows= 1  width= 44 ) (actual time= 0 . 007 .. 0 . 009  rows= 1  loops= 35 )
                                                                    Index Cond: (public.tths.id = public.files.tth)
                                            ->  Index Scan using tths_pkey on tths  (cost= 0 . 00 .. 8 . 28  rows= 1  width= 52 ) (actual time= 0 . 006 .. 0 . 008  rows= 1  loops= 15 )
                                                  Index Cond: (public.tths.id = public.tths.id)
                          ->  Hash  (cost= 9871 . 73 .. 9871 . 73  rows= 267972  width= 84 ) (actual time= 3098 . 566 .. 3098 . 566  rows= 267972  loops= 1 )
                                ->  Hash Join  (cost= 26 . 39 .. 9871 . 73  rows= 267972  width= 84 ) (actual time= 7 . 002 .. 2080 . 252  rows= 267972  loops= 1 )
                                      Hash Cond: (public.folders.service = public.services.id)
                                      ->  Seq Scan on folders  (cost= 0 . 00 .. 6160 . 72  rows= 267972  width= 74 ) (actual time= 0 . 074 .. 653 . 960  rows= 267972  loops= 1 )
                                      ->  Hash  (cost= 22 . 46 .. 22 . 46  rows= 315  width= 18 ) (actual time= 6 . 883 .. 6 . 883  rows= 315  loops= 1 )
                                            ->  Hash Join  (cost= 9 . 97 .. 22 . 46  rows= 315  width= 18 ) (actual time= 1 . 538 .. 6 . 103  rows= 315  loops= 1 )
                                                  Hash Cond: (public.services.server = public.servers.id)
                                                  ->  Seq Scan on services  (cost= 0 . 00 .. 8 . 15  rows= 315  width= 15 ) (actual time= 0 . 011 .. 0 . 661  rows= 315  loops= 1 )
                                                  ->  Hash  (cost= 6 . 10 .. 6 . 10  rows= 310  width= 11 ) (actual time= 1 . 494 .. 1 . 494  rows= 310  loops= 1 )
                                                        ->  Seq Scan on servers  (cost= 0 . 00 .. 6 . 10  rows= 310  width= 11 ) (actual time= 0 . 013 .. 0 . 713  rows= 310  loops= 1 )
                    ->  Subquery Scan *SELECT*  2   (cost= 4240 . 01 .. 4388 . 14  rows= 12  width= 84 ) (actual time= 63 . 301 .. 63 . 388  rows= 2  loops= 1 )
                          ->  Hash Join  (cost= 4240 . 01 .. 4388 . 02  rows= 12  width= 84 ) (actual time= 63 . 292 .. 63 . 369  rows= 2  loops= 1 )
                                Hash Cond: (public.services.server = public.servers.id)
                                ->  Hash Join  (cost= 4230 . 04 .. 4377 . 82  rows= 12  width= 81 ) (actual time= 61 . 543 .. 61 . 585  rows= 2  loops= 1 )
                                      Hash Cond: (public.folders.service = public.services.id)
                                      ->  Nested Loop  (cost= 4217 . 95 .. 4365 . 56  rows= 12  width= 74 ) (actual time= 60 . 215 .. 60 . 249  rows= 2  loops= 1 )
                                            ->  Bitmap Heap Scan on files  (cost= 4217 . 95 .. 4265 . 85  rows= 12  width= 12 ) (actual time= 60 . 171 .. 60 . 175  rows= 2  loops= 1 )
                                                  Recheck Cond: ((to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text)) AND (tth IS NULL) AND (type IS NULL))
                                                  ->  BitmapAnd  (cost= 4217 . 95 .. 4217 . 95  rows= 12  width= 0 ) (actual time= 60 . 159 .. 60 . 159  rows= 0  loops= 1 )
                                                        ->  Bitmap Index Scan on files_name_russian_vector_index  (cost= 0 . 00 .. 124 . 24  rows= 4143  width= 0 ) (actual time= 0 . 069 .. 0 . 069  rows= 37  loops= 1 )
                                                              Index Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
                                                        ->  Bitmap Index Scan on files_tth_index  (cost= 0 . 00 .. 4093 . 45  rows= 11784  width= 0 ) (actual time= 60 . 064 .. 60 . 064  rows= 267972  loops= 1 )
                                                              Index Cond: (tth IS NULL)
                                            ->  Index Scan using folders_pkey on folders  (cost= 0 . 00 .. 8 . 30  rows= 1  width= 74 ) (actual time= 0 . 018 .. 0 . 021  rows= 1  loops= 2 )
                                                  Index Cond: (public.folders.id = public.files.path)
                                      ->  Hash  (cost= 8 . 15 .. 8 . 15  rows= 315  width= 15 ) (actual time= 1 . 304 .. 1 . 304  rows= 315  loops= 1 )
                                            ->  Seq Scan on services  (cost= 0 . 00 .. 8 . 15  rows= 315  width= 15 ) (actual time= 0 . 011 .. 0 . 616  rows= 315  loops= 1 )
                                ->  Hash  (cost= 6 . 10 .. 6 . 10  rows= 310  width= 11 ) (actual time= 1 . 632 .. 1 . 632  rows= 310  loops= 1 )
                                      ->  Seq Scan on servers  (cost= 0 . 00 .. 6 . 10  rows= 310  width= 11 ) (actual time= 0 . 020 .. 0 . 595  rows= 310  loops= 1 )
Total runtime:  21129 . 891  ms

Вот только хотелось бы найти именно то слово, которое было задано в запросе. Т.е. если спросили "панда", то должно вернуться "Панда", "Пандамима", "Недопандас" и все в этом роде, но не "панды", "пандой"...
Это можно сделать при помощи полнотекстового поиска? В мануале чего-то такого не нашел...

...

Рейтинг:

0 / 0

16.06.2008, 12:43

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35374471

LeXa NalBat

Участник

Откуда: Москва

Сообщения: 2 843

Рейтинг: 0 / 0

UKYА если я пишу

Код: plaintext

ALTER TABLE files ADD CONSTRAINT fk_files_tths FOREIGN KEY (tth) REFERENCES tths(id);

то индекс разве сам не создается?не создается

UKYВо всяком случае создал такой индекс, но СУБД не хочет его использовать (EXPLAIN ANALYZE см. ниже).

Код: plaintext

1.
2.
3.
4.

->  Bitmap Heap Scan on files  (cost= 125 . 22 .. 14614 . 53  rows= 3910  width= 4 ) (actual time= 0 . 122 .. 0 . 222  rows= 35  loops= 1 )
  Recheck Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
  Filter: ((type IS NOT NULL) AND (size >  0 ))
  ->  Bitmap Index Scan on files_name_russian_vector_index  (cost= 0 . 00 .. 124 . 24  rows= 4143  width= 0 ) (actual time= 0 . 106 .. 0 . 106  rows= 37  loops= 1 )
    Index Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))

думаю, что постгрес не использует при джоине этот индекс из-за того, что ошибается на два порядка в оценке кол-ва строк, удовлетворяющих условию полнотекстового поиска (rows=4143, actual rows=37). кажется, постгрес не умеет оценивать объем выборки по полнотекстовому индексу: тынц . можно попробовать запинать командами set enable_* to off|on, для начала set enable_hashjoin to off.

UKY

Код: plaintext

1.
2.
3.

->  Hash Join  (cost= 26 . 39 .. 9871 . 73  rows= 267972  width= 84 ) (actual time= 7 . 002 .. 2080 . 252  rows= 267972  loops= 1 )
  Hash Cond: (public.folders.service = public.services.id)
  ->  Seq Scan on folders  (cost= 0 . 00 .. 6160 . 72  rows= 267972  width= 74 ) (actual time= 0 . 074 .. 653 . 960  rows= 267972  loops= 1 )
  ->  Hash  (cost= 22 . 46 .. 22 . 46  rows= 315  width= 18 ) (actual time= 6 . 883 .. 6 . 883  rows= 315  loops= 1 )

создайте индекс по folders(service)

UKYВот только хотелось бы найти именно то слово, которое было задано в запросе. Т.е. если спросили "панда", то должно вернуться "Панда", "Пандамима", "Недопандас" и все в этом роде, но не "панды", "пандой"...
Это можно сделать при помощи полнотекстового поиска? В мануале чего-то такого не нашел...я с полнотекстовым поиском не работал. может быть можно в качестве разделителя "слов" считать "границу букв"? тогда каждую букву полнотекстовый поиск будет считать отдельным словом, и получите нужны вам поиск.

...

Рейтинг:

0 / 0

16.06.2008, 13:53

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35377393

LeXa NalBat

Участник

Откуда: Москва

Сообщения: 2 843

Рейтинг: 0 / 0

"постгрес не умеет оценивать объем выборки по полнотекстовому индексу"

может обернуть полнотекстовый поиск по files.name в функцию на языке sql, для которой при создании указать ROWS например равным 10, также можно указать COST.

...

Рейтинг:

0 / 0

17.06.2008, 15:03

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35381282

UKY

Гость

Попробовал

Код: plaintext

set enable_hashjoin to off

стало хуже :)

LeXa NalBatя с полнотекстовым поиском не работал. может быть можно в качестве разделителя "слов" считать "границу букв"? тогда каждую букву полнотекстовый поиск будет считать отдельным словом, и получите нужны вам поиск.
Это сделать реально, но там тогда придется не каждую букву, а все части слова выделять, а это ох сколько памяти и времени на постоение индексов занимать будет...
Пока думаю сделать полнотекстовый поиск и уже результаты выборки доколнительно проверять LIKE '%панда%'. Хотя тоже часть результатов теряться будет...

LeXa NalBat"постгрес не умеет оценивать объем выборки по полнотекстовому индексу"

может обернуть полнотекстовый поиск по files.name в функцию на языке sql, для которой при создании указать ROWS например равным 10, также можно указать COST.

Тут я уже Вас не понимаю :) Еще ни разу не писал функции на постгресе. Может быть поэтому...

Огромное Вам спасибо за помощь!

...

Рейтинг:

0 / 0

18.06.2008, 19:55

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35381542

Гость

UKY
Вот только хотелось бы найти именно то слово, которое было задано в запросе. Т.е. если спросили "панда", то должно вернуться "Панда", "Пандамима", "Недопандас" и все в этом роде, но не "панды", "пандой"...
Это можно сделать при помощи полнотекстового поиска? В мануале чего-то такого не нашел...

в 8.3 и младше нельзя, в 8.4 будет можно (уже в CVS даже есть)

...

Рейтинг:

0 / 0

19.06.2008, 02:21

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35382358

LeXa NalBat

Участник

Откуда: Москва

Сообщения: 2 843

Рейтинг: 0 / 0

UKY LeXa NalBatможет обернуть полнотекстовый поиск по files.name в функцию на языке sql, для которой при создании указать ROWS например равным 10, также можно указать COST.Тут я уже Вас не понимаю :) Еще ни разу не писал функции на постгресе. Может быть поэтому...думаю, что это ухищрение может ускорить запрос. сделать что-то типа

Код: plaintext

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.

create table files ( id integer, name text );
insert into files select * from generate_series(1,1000000);
create index files_name_russian_vector_index on files using gin(to_tsvector('russian', name));
analyze files;
create function files_name_full_text_search(text) returns setof files language 'sql'  rows 10  as
  'select * from files where to_tsvector(''russian'', name) @@ to_tsquery($1)';
explain select * from files where to_tsvector('russian', name) @@ to_tsquery('панда');
explain select * from files_name_full_text_search('панда');
drop function files_name_full_text_search(text);
drop table files;

                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on files  (cost=12.01..2409.87  rows=1000  width=36)
   Recheck Cond: (to_tsvector('russian'::regconfig, name) @@ to_tsquery('панда'::text))
   ->  Bitmap Index Scan on files_name_russian_vector_index  (cost=0.00..11.76 rows=1000 width=0)
         Index Cond: (to_tsvector('russian'::regconfig, name) @@ to_tsquery('панда'::text))
(4 rows)
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Function Scan on files_name_full_text_search  (cost=0.00..2.60  rows=10  width=36)
(1 row)

...

Рейтинг:

0 / 0

19.06.2008, 12:31

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35383823

UKY

Гость

iz UKY
Вот только хотелось бы найти именно то слово, которое было задано в запросе. Т.е. если спросили "панда", то должно вернуться "Панда", "Пандамима", "Недопандас" и все в этом роде, но не "панды", "пандой"...
Это можно сделать при помощи полнотекстового поиска? В мануале чего-то такого не нашел...

в 8.3 и младше нельзя, в 8.4 будет можно (уже в CVS даже есть)
Круто :) Надо будет как-нибудь попробовать этот патчик прикрутить...

2 LeXa NalBat:
Это ухищрение не сработало :) Постгрес не обманишь...

...

Рейтинг:

0 / 0

19.06.2008, 20:40

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35384058

stopor

Гость

До выхода версии 8.4 можно сделать поиск по триграммам - контриб pg_trgm. Он шустро работает, правда UTF-8 не понимает.

Принцип такой:
1) Сделать таблицу для хранения отдельных слов

Код: plaintext

CREATE TABLE words (parent_id int, word text);
CREATE INDEX i_words_trgm ON words USING gin (t gin_trgm_ops);

Если слова часто повторяются, то можно разбить на две таблицы: words(word_id, word)-уникальные слова, words_link(word_id,parent_id) -связь слова с исходным текстом.

2) Исходный текст разбить по словам. Можно с помощью ts_vector+ts_stat как в мануале, можно регуляркой

Код: plaintext

1.
2.
3.

INSERT INTO words 
SELECT id, w
FROM (SELECT id, regexp_split_to_table(textfield, E'\\W') as w FROM mydata) a
WHERE w > ''

3) Установить в сессии низкий порог поиска функцией set_limit(), в районе 0.1-0.2. Чем меньше порог, тем больше будет ложных совпадений и медленнее поиск, но зато и нужных слов попадет больше.

4) Поиск должен идти сначала оператором % , а потом из уже ограниченного набора строк надо отсечь мусор обычным like

Код: plaintext

1.
2.
3.
4.

SELECT *
FROM words
WHERE word % 'раствор'
  and word like '%раствор%'

У меня на базе из 50 тысяч слов такой запрос с set_limit(0.2) отрабатывает ~7мс. Чем длиннее искомое слово, тем медленнее, но в 20мс на кэшированнных данных стабильно укладывается.

...

Рейтинг:

0 / 0

20.06.2008, 05:38

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35384360

LeXa NalBat

Участник

Откуда: Москва

Сообщения: 2 843

Рейтинг: 0 / 0

UKY2 LeXa NalBat:
Это ухищрение не сработало :) Постгрес не обманишь...покажите explain analyze

...

Рейтинг:

0 / 0

20.06.2008, 10:33

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35386409

UKY

Гость

Оказывается, запрос по словам и не тормозит вовсе :)

Код: plaintext

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.

Unique  (cost= 60363 . 09 .. 60397 . 79  rows= 4627  width= 60 ) (actual time= 2 . 442 .. 2 . 705  rows= 24  loops= 1 )
  ->  Sort  (cost= 60363 . 09 .. 60374 . 66  rows= 4627  width= 60 ) (actual time= 2 . 436 .. 2 . 525  rows= 46  loops= 1 )
        Sort Key: tths.tth, files.name
        Sort Method:  quicksort  Memory: 31kB
        ->  Nested Loop  (cost= 158 . 34 .. 60081 . 40  rows= 4627  width= 60 ) (actual time= 0 . 184 .. 1 . 305  rows= 46  loops= 1 )
              ->  Bitmap Heap Scan on files  (cost= 158 . 34 .. 18685 . 88  rows= 4940  width= 20 ) (actual time= 0 . 152 .. 0 . 351  rows= 46  loops= 1 )
                    Recheck Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
                    Filter: ((type IS NOT NULL) AND (size >  0 ))
                    ->  Bitmap Index Scan on files_name_russian_vector_index  (cost= 0 . 00 .. 157 . 11  rows= 5294  width= 0 ) (actual time= 0 . 128 .. 0 . 128  rows= 48  loops= 1 )
                          Index Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
              ->  Index Scan using tths_pkey on tths  (cost= 0 . 00 .. 8 . 37  rows= 1  width= 44 ) (actual time= 0 . 010 .. 0 . 012  rows= 1  loops= 46 )
                    Index Cond: (tths.id = files.tth)
Total runtime:  2 . 886  ms

Все тормоза уходят в join. На построение всяких хэш таблиц и всего такого:

Код: plaintext

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.

--explain analyze
select 	distinct on (count, tths.id, files.name)
	servers.ip,
	services.nick,
	folders.path,
	ts_headline('russian', files.name || '.' || files.type, to_tsquery('панда')),
	files.size,
	tths.added,
	tths.tth,
	tths.id AS id,
	tths.count AS count
from	servers
	join services	on (servers.id = services.server)
	join folders	on (folders.service = services.id)
	join files	on (files.path = folders.id)
	join tths	on (files.tth = tths.id)
	join
--explain analyze
	(select distinct on (tths.tth, files.name)
		tths.id	AS tth_id,
		files.name AS name
		from	files join tths on files.tth = tths.id
		where	files.size >  0  and
			files.type is not null and
			to_tsvector('russian', files.name) @@ to_tsquery('панда')
	) AS inq
	on (tths.id = inq.tth_id)
UNION
--explain analyze
select 	servers.ip,
	services.nick,
	ts_headline('russian', folders.path, to_tsquery('панда')),
	NULL,
	files.size,
	NULL,
	NULL,
	NULL AS id,
	NULL AS count
from	servers
	join services	on (servers.id = services.server)
	join folders	on (folders.service = services.id)
	join files	on (files.path = folders.id)
where
	files.type is null and
	files.tth is null and
	to_tsvector('russian', files.name) @@ to_tsquery('панда')
order 	by count desc NULLS LAST, id;

Код: plaintext

Sort  (cost= 1417225 . 78 .. 1419631 . 87  rows= 962436  width= 408 ) (actual time= 57774 . 369 .. 57774 . 414  rows= 30  loops= 1 )
  Sort Key: *SELECT*  1 .count, *SELECT*  1 .id
  Sort Method:  quicksort  Memory: 33kB
  ->  Unique  (cost= 1119877 . 39 .. 1143938 . 29  rows= 962436  width= 408 ) (actual time= 57774 . 094 .. 57774 . 266  rows= 30  loops= 1 )
        ->  Sort  (cost= 1119877 . 39 .. 1122283 . 48  rows= 962436  width= 408 ) (actual time= 57774 . 090 .. 57774 . 140  rows= 30  loops= 1 )
              Sort Key: *SELECT*  1 .ip, *SELECT*  1 .nick, *SELECT*  1 .path, *SELECT*  1 .ts_headline, *SELECT*  1 .size, *SELECT*  1 .added, *SELECT*  1 .tth, *SELECT*  1 .id, *SELECT*  1 .count
              Sort Method:  quicksort  Memory: 33kB
              ->  Append  (cost= 820342 . 82 .. 846589 . 91  rows= 962436  width= 408 ) (actual time= 57682 . 535 .. 57773 . 923  rows= 30  loops= 1 )
                    ->  Subquery Scan *SELECT*  1   (cost= 820342 . 82 .. 839591 . 12  rows= 962415  width= 408 ) (actual time= 57682 . 531 .. 57683 . 179  rows= 28  loops= 1 )
                          ->  Unique  (cost= 820342 . 82 .. 829966 . 97  rows= 962415  width= 408 ) (actual time= 57682 . 525 .. 57683 . 076  rows= 28  loops= 1 )
                                ->  Sort  (cost= 820342 . 82 .. 822748 . 86  rows= 962415  width= 408 ) (actual time= 57682 . 520 .. 57682 . 742  rows= 130  loops= 1 )
                                      Sort Key: public.tths.count, public.tths.id, public.files.name
                                      Sort Method:  quicksort  Memory: 62kB
                                      ->  Hash Join  (cost= 121011 . 44 .. 547061 . 08  rows= 962415  width= 408 ) (actual time= 5438 . 197 .. 57680 . 783  rows= 130  loops= 1 )
                                            Hash Cond: (public.files.path = public.folders.id)
                                            ->  Hash Join  (cost= 99324 . 88 .. 406040 . 05  rows= 962415  width= 334 ) (actual time= 3 . 126 .. 53499 . 275  rows= 130  loops= 1 )
                                                  Hash Cond: (public.files.tth = public.tths.id)
                                                  ->  Seq Scan on files  (cost= 0 . 00 .. 260301 . 67  rows= 5293967  width= 286 ) (actual time= 0 . 035 .. 40929 . 466  rows= 5293967  loops= 1 )
                                                  ->  Hash  (cost= 99267 . 04 .. 99267 . 04  rows= 4627  width= 56 ) (actual time= 2 . 277 .. 2 . 277  rows= 24  loops= 1 )
                                                        ->  Nested Loop  (cost= 60363 . 09 .. 99267 . 04  rows= 4627  width= 56 ) (actual time= 1 . 659 .. 2 . 220  rows= 24  loops= 1 )
                                                              ->  Unique  (cost= 60363 . 09 .. 60397 . 79  rows= 4627  width= 60 ) (actual time= 1 . 644 .. 1 . 861  rows= 24  loops= 1 )
                                                                    ->  Sort  (cost= 60363 . 09 .. 60374 . 66  rows= 4627  width= 60 ) (actual time= 1 . 640 .. 1 . 722  rows= 46  loops= 1 )
                                                                          Sort Key: public.tths.tth, public.files.name
                                                                          Sort Method:  quicksort  Memory: 31kB
                                                                          ->  Nested Loop  (cost= 158 . 34 .. 60081 . 40  rows= 4627  width= 60 ) (actual time= 0 . 153 .. 0 . 986  rows= 46  loops= 1 )
                                                                                ->  Bitmap Heap Scan on files  (cost= 158 . 34 .. 18685 . 88  rows= 4940  width= 20 ) (actual time= 0 . 125 .. 0 . 255  rows= 46  loops= 1 )
                                                                                      Recheck Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
                                                                                      Filter: ((type IS NOT NULL) AND (size >  0 ))
                                                                                      ->  Bitmap Index Scan on files_name_russian_vector_index  (cost= 0 . 00 .. 157 . 11  rows= 5294  width= 0 ) (actual time= 0 . 107 .. 0 . 107  rows= 48  loops= 1 )
                                                                                            Index Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
                                                                                ->  Index Scan using tths_pkey on tths  (cost= 0 . 00 .. 8 . 37  rows= 1  width= 44 ) (actual time= 0 . 007 .. 0 . 009  rows= 1  loops= 46 )
                                                                                      Index Cond: (public.tths.id = public.files.tth)
                                                              ->  Index Scan using tths_pkey on tths  (cost= 0 . 00 .. 8 . 38  rows= 1  width= 52 ) (actual time= 0 . 005 .. 0 . 007  rows= 1  loops= 24 )
                                                                    Index Cond: (public.tths.id = public.tths.id)
                                            ->  Hash  (cost= 12744 . 96 .. 12744 . 96  rows= 341648  width= 86 ) (actual time= 3845 . 669 .. 3845 . 669  rows= 342931  loops= 1 )
                                                  ->  Hash Join  (cost= 32 . 82 .. 12744 . 96  rows= 341648  width= 86 ) (actual time= 29 . 308 .. 2802 . 386  rows= 342931  loops= 1 )
                                                        Hash Cond: (public.folders.service = public.services.id)
                                                        ->  Seq Scan on folders  (cost= 0 . 00 .. 8014 . 48  rows= 341648  width= 76 ) (actual time= 20 . 784 .. 1132 . 448  rows= 342931  loops= 1 )
                                                        ->  Hash  (cost= 27 . 69 .. 27 . 69  rows= 410  width= 18 ) (actual time= 8 . 456 .. 8 . 456  rows= 410  loops= 1 )
                                                              ->  Hash Join  (cost= 12 . 96 .. 27 . 69  rows= 410  width= 18 ) (actual time= 2 . 072 .. 7 . 211  rows= 410  loops= 1 )
                                                                    Hash Cond: (public.services.server = public.servers.id)
                                                                    ->  Seq Scan on services  (cost= 0 . 00 .. 9 . 10  rows= 410  width= 15 ) (actual time= 0 . 015 .. 0 . 874  rows= 410  loops= 1 )
                                                                    ->  Hash  (cost= 7 . 98 .. 7 . 98  rows= 398  width= 11 ) (actual time= 2 . 015 .. 2 . 015  rows= 398  loops= 1 )
                                                                          ->  Seq Scan on servers  (cost= 0 . 00 .. 7 . 98  rows= 398  width= 11 ) (actual time= 0 . 014 .. 0 . 960  rows= 398  loops= 1 )
                    ->  Subquery Scan *SELECT*  2   (cost= 6739 . 48 .. 6998 . 79  rows= 21  width= 86 ) (actual time= 90 . 483 .. 90 . 570  rows= 2  loops= 1 )
                          ->  Hash Join  (cost= 6739 . 48 .. 6998 . 58  rows= 21  width= 86 ) (actual time= 90 . 474 .. 90 . 553  rows= 2  loops= 1 )
                                Hash Cond: (public.services.server = public.servers.id)
                                ->  Hash Join  (cost= 6726 . 52 .. 6985 . 23  rows= 21  width= 83 ) (actual time= 88 . 354 .. 88 . 395  rows= 2  loops= 1 )
                                      Hash Cond: (public.folders.service = public.services.id)
                                      ->  Nested Loop  (cost= 6712 . 30 .. 6970 . 72  rows= 21  width= 76 ) (actual time= 86 . 714 .. 86 . 747  rows= 2  loops= 1 )
                                            ->  Bitmap Heap Scan on files  (cost= 6712 . 30 .. 6796 . 03  rows= 21  width= 12 ) (actual time= 86 . 672 .. 86 . 677  rows= 2  loops= 1 )
                                                  Recheck Cond: ((to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text)) AND (tth IS NULL) AND (type IS NULL))
                                                  ->  BitmapAnd  (cost= 6712 . 30 .. 6712 . 30  rows= 21  width= 0 ) (actual time= 86 . 659 .. 86 . 659  rows= 0  loops= 1 )
                                                        ->  Bitmap Index Scan on files_name_russian_vector_index  (cost= 0 . 00 .. 157 . 11  rows= 5294  width= 0 ) (actual time= 0 . 080 .. 0 . 080  rows= 48  loops= 1 )
                                                              Index Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
                                                        ->  Bitmap Index Scan on files_tth_index  (cost= 0 . 00 .. 6554 . 93  rows= 21235  width= 0 ) (actual time= 86 . 547 .. 86 . 547  rows= 346939  loops= 1 )
                                                              Index Cond: (tth IS NULL)
                                            ->  Index Scan using folders_pkey on folders  (cost= 0 . 00 .. 8 . 31  rows= 1  width= 76 ) (actual time= 0 . 018 .. 0 . 020  rows= 1  loops= 2 )
                                                  Index Cond: (public.folders.id = public.files.path)
                                      ->  Hash  (cost= 9 . 10 .. 9 . 10  rows= 410  width= 15 ) (actual time= 1 . 617 .. 1 . 617  rows= 410  loops= 1 )
                                            ->  Seq Scan on services  (cost= 0 . 00 .. 9 . 10  rows= 410  width= 15 ) (actual time= 0 . 010 .. 0 . 788  rows= 410  loops= 1 )
                                ->  Hash  (cost= 7 . 98 .. 7 . 98  rows= 398  width= 11 ) (actual time= 1 . 528 .. 1 . 528  rows= 398  loops= 1 )
                                      ->  Seq Scan on servers  (cost= 0 . 00 .. 7 . 98  rows= 398  width= 11 ) (actual time= 0 . 020 .. 0 . 764  rows= 398  loops= 1 )
Total runtime:  57774 . 932  ms

А джоины кроме как использования денормализации можно ускорять? Или их можно как-нибудь убрать?
Еще раз пробовал "set enable_hashjoin to off". Запрос выполнялся 250 секунд и использовался "Merge Join".

P.S.: В понедельник уезжаю на военные сборы. Так что полтора месяца не буду вас всех доставать :)
P.P.S.: Огромное всем спасибо за предложенные варианты ускорения запроса. Сам бы до такого никогда бы не додумался :)

...

Рейтинг:

0 / 0

21.06.2008, 14:27

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35387802

LeXa NalBat

Участник

Откуда: Москва

Сообщения: 2 843

Рейтинг: 0 / 0

UKYВсе тормоза уходят в join. На построение всяких хэш таблиц и всего такогода. но для этого запроса возможен быстрый план. тормоза происходят из-за того, что постгрес выбирает неоптимальный порядок (или тип) джоинов. причина этой ошибки в том, что он неправильно оценивает кол-во строк удовлетворяющих условию полнотекстового поиска. попробуйте обернуть полнотекстовый поиск в функцию, созданную с опцией ROWS 10, как я писал, это может помочь. покажите explain analyze с использованием функции-обертки.

...

Рейтинг:

0 / 0

23.06.2008, 11:04

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35452659

UKY

Гость

Вернулся со сборов. Жив и здоров :)

LeXa NalBat UKYВсе тормоза уходят в join. На построение всяких хэш таблиц и всего такогода. но для этого запроса возможен быстрый план. тормоза происходят из-за того, что постгрес выбирает неоптимальный порядок (или тип) джоинов. причина этой ошибки в том, что он неправильно оценивает кол-во строк удовлетворяющих условию полнотекстового поиска. попробуйте обернуть полнотекстовый поиск в функцию, созданную с опцией ROWS 10, как я писал, это может помочь. покажите explain analyze с использованием функции-обертки.

Сделал так, как Вы мне порекомендовали:

Код: plaintext

create or replace function tth_search(query text) returns setof record as
$$
BEGIN
	return query
	select distinct on (tths.tth, files.name)
		tths.id,
		files.name
	from	files join tths on files.tth = tths.id
	where	files.size >  0  and
		files.type is not null and
		to_tsvector('russian', files.name) @@ to_tsquery(query);
END;
$$
language plpgsql rows  10 ;

explain analyze
select 	distinct on (count, tths.id, files.name)
	servers.ip,
	services.nick,
	folders.path,
	ts_headline('russian', files.name || '.' || files.type, to_tsquery('панда')),
	files.size,
	tths.added,
	tths.tth,
	tths.id AS id,
	tths.count AS count
from	servers
	join services	on (servers.id = services.server)
	join folders	on (folders.service = services.id)
	join files	on (files.path = folders.id)
	join tths	on (files.tth = tths.id)
	join
--explain analyze
	(
select * from tth_search('панда') AS (tth_id int, name varchar)
	) AS inq
	on (tths.id = inq.tth_id)
UNION
--explain analyze
select 	servers.ip,
	services.nick,
	ts_headline('russian', folders.path, to_tsquery('панда')),
	NULL,
	files.size,
	NULL,
	NULL,
	NULL AS id,
	NULL AS count
from	servers
	join services	on (servers.id = services.server)
	join folders	on (folders.service = services.id)
	join files	on (files.path = folders.id)
where
	files.type is null and
	files.tth is null and
	to_tsvector('russian', files.name) @@ to_tsquery('панда')
order 	by count desc NULLS LAST, id;

Получил это:

Код: plaintext

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.

Sort  (cost= 304038 . 58 .. 304043 . 83  rows= 2101  width= 408 ) (actual time= 57301 . 893 .. 57301 . 942  rows= 30  loops= 1 )
  Sort Key: "*SELECT* 1".count, "*SELECT* 1".id
  Sort Method:  quicksort  Memory: 33kB
  ->  Unique  (cost= 303870 . 11 .. 303922 . 64  rows= 2101  width= 408 ) (actual time= 57301 . 615 .. 57301 . 783  rows= 30  loops= 1 )
        ->  Sort  (cost= 303870 . 11 .. 303875 . 37  rows= 2101  width= 408 ) (actual time= 57301 . 610 .. 57301 . 657  rows= 30  loops= 1 )
              Sort Key: "*SELECT* 1".ip, "*SELECT* 1".nick, "*SELECT* 1".path, "*SELECT* 1".ts_headline, "*SELECT* 1".size, "*SELECT* 1".added, "*SELECT* 1".tth, "*SELECT* 1".id, "*SELECT* 1".count
              Sort Method:  quicksort  Memory: 33kB
              ->  Append  (cost= 296713 . 78 .. 303754 . 17  rows= 2101  width= 408 ) (actual time= 57210 . 019 .. 57301 . 431  rows= 30  loops= 1 )
                    ->  Subquery Scan "*SELECT* 1"  (cost= 296713 . 78 .. 296755 . 38  rows= 2080  width= 408 ) (actual time= 57210 . 015 .. 57210 . 652  rows= 28  loops= 1 )
                          ->  Unique  (cost= 296713 . 78 .. 296734 . 58  rows= 2080  width= 408 ) (actual time= 57210 . 008 .. 57210 . 536  rows= 28  loops= 1 )
                                ->  Sort  (cost= 296713 . 78 .. 296718 . 98  rows= 2080  width= 408 ) (actual time= 57210 . 003 .. 57210 . 207  rows= 130  loops= 1 )
                                      Sort Key: tths.count, tths.id, public.files.name
                                      Sort Method:  quicksort  Memory: 62kB
                                      ->  Hash Join  (cost= 117 . 72 .. 296599 . 15  rows= 2080  width= 408 ) (actual time= 648 . 109 .. 57208 . 194  rows= 130  loops= 1 )
                                            Hash Cond: (public.services.server = public.servers.id)
                                            ->  Hash Join  (cost= 104 . 76 .. 296531 . 60  rows= 2080  width= 405 ) (actual time= 646 . 112 .. 57198 . 953  rows= 130  loops= 1 )
                                                  Hash Cond: (public.folders.service = public.services.id)
                                                  ->  Nested Loop  (cost= 90 . 54 .. 296488 . 77  rows= 2080  width= 398 ) (actual time= 644 . 049 .. 57196 . 273  rows= 130  loops= 1 )
                                                        ->  Hash Join  (cost= 90 . 54 .. 280244 . 78  rows= 2080  width= 334 ) (actual time= 644 . 000 .. 57193 . 141  rows= 130  loops= 1 )
                                                              Hash Cond: (public.files.tth = tths.id)
                                                              ->  Seq Scan on files  (cost= 0 . 00 .. 260301 . 67  rows= 5293967  width= 286 ) (actual time= 0 . 007 .. 44122 . 366  rows= 5293967  loops= 1 )
                                                              ->  Hash  (cost= 90 . 41 .. 90 . 41  rows= 10  width= 56 ) (actual time= 3 . 629 .. 3 . 629  rows= 24  loops= 1 )
                                                                    ->  Nested Loop  (cost= 0 . 00 .. 90 . 41  rows= 10  width= 56 ) (actual time= 3 . 027 .. 3 . 553  rows= 24  loops= 1 )
                                                                          ->  Function Scan on tth_search  (cost= 0 . 00 .. 2 . 60  rows= 10  width= 4 ) (actual time= 3 . 003 .. 3 . 052  rows= 24  loops= 1 )
                                                                          ->  Index Scan using tths_pkey on tths  (cost= 0 . 00 .. 8 . 77  rows= 1  width= 52 ) (actual time= 0 . 009 .. 0 . 012  rows= 1  loops= 24 )
                                                                                Index Cond: (tths.id = tth_search.tth_id)
                                                        ->  Index Scan using folders_pkey on folders  (cost= 0 . 00 .. 7 . 80  rows= 1  width= 76 ) (actual time= 0 . 013 .. 0 . 015  rows= 1  loops= 130 )
                                                              Index Cond: (public.folders.id = public.files.path)
                                                  ->  Hash  (cost= 9 . 10 .. 9 . 10  rows= 410  width= 15 ) (actual time= 2 . 040 .. 2 . 040  rows= 410  loops= 1 )
                                                        ->  Seq Scan on services  (cost= 0 . 00 .. 9 . 10  rows= 410  width= 15 ) (actual time= 0 . 011 .. 0 . 967  rows= 410  loops= 1 )
                                            ->  Hash  (cost= 7 . 98 .. 7 . 98  rows= 398  width= 11 ) (actual time= 1 . 861 .. 1 . 861  rows= 398  loops= 1 )
                                                  ->  Seq Scan on servers  (cost= 0 . 00 .. 7 . 98  rows= 398  width= 11 ) (actual time= 0 . 019 .. 0 . 929  rows= 398  loops= 1 )
                    ->  Subquery Scan "*SELECT* 2"  (cost= 6739 . 48 .. 6998 . 79  rows= 21  width= 86 ) (actual time= 90 . 596 .. 90 . 683  rows= 2  loops= 1 )
                          ->  Hash Join  (cost= 6739 . 48 .. 6998 . 58  rows= 21  width= 86 ) (actual time= 90 . 588 .. 90 . 667  rows= 2  loops= 1 )
                                Hash Cond: (public.services.server = public.servers.id)
                                ->  Hash Join  (cost= 6726 . 52 .. 6985 . 23  rows= 21  width= 83 ) (actual time= 88 . 623 .. 88 . 665  rows= 2  loops= 1 )
                                      Hash Cond: (public.folders.service = public.services.id)
                                      ->  Nested Loop  (cost= 6712 . 30 .. 6970 . 72  rows= 21  width= 76 ) (actual time= 87 . 004 .. 87 . 037  rows= 2  loops= 1 )
                                            ->  Bitmap Heap Scan on files  (cost= 6712 . 30 .. 6796 . 03  rows= 21  width= 12 ) (actual time= 86 . 963 .. 86 . 967  rows= 2  loops= 1 )
                                                  Recheck Cond: ((to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text)) AND (tth IS NULL) AND (type IS NULL))
                                                  ->  BitmapAnd  (cost= 6712 . 30 .. 6712 . 30  rows= 21  width= 0 ) (actual time= 86 . 950 .. 86 . 950  rows= 0  loops= 1 )
                                                        ->  Bitmap Index Scan on files_name_russian_vector_index  (cost= 0 . 00 .. 157 . 11  rows= 5294  width= 0 ) (actual time= 0 . 096 .. 0 . 096  rows= 48  loops= 1 )
                                                              Index Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
                                                        ->  Bitmap Index Scan on files_tth_index  (cost= 0 . 00 .. 6554 . 93  rows= 21235  width= 0 ) (actual time= 86 . 823 .. 86 . 823  rows= 346939  loops= 1 )
                                                              Index Cond: (tth IS NULL)
                                            ->  Index Scan using folders_pkey on folders  (cost= 0 . 00 .. 8 . 31  rows= 1  width= 76 ) (actual time= 0 . 018 .. 0 . 021  rows= 1  loops= 2 )
                                                  Index Cond: (public.folders.id = public.files.path)
                                      ->  Hash  (cost= 9 . 10 .. 9 . 10  rows= 410  width= 15 ) (actual time= 1 . 591 .. 1 . 591  rows= 410  loops= 1 )
                                            ->  Seq Scan on services  (cost= 0 . 00 .. 9 . 10  rows= 410  width= 15 ) (actual time= 0 . 009 .. 0 . 790  rows= 410  loops= 1 )
                                ->  Hash  (cost= 7 . 98 .. 7 . 98  rows= 398  width= 11 ) (actual time= 1 . 855 .. 1 . 855  rows= 398  loops= 1 )
                                      ->  Seq Scan on servers  (cost= 0 . 00 .. 7 . 98  rows= 398  width= 11 ) (actual time= 0 . 011 .. 0 . 754  rows= 398  loops= 1 )
Total runtime:  57302 . 475  ms

Индексы, которые есть в СУБД:

Код: plaintext

1.
2.
3.
4.
5.
6.
7.
8.
9.

CREATE INDEX folders_service_index ON folders USING btree (service)
CREATE INDEX tths_count_index ON tths USING btree (count DESC NULLS LAST)
CREATE INDEX files_path_index ON files USING btree (path)
CREATE INDEX services_server_index ON services USING btree (server)
CREATE INDEX services_type_index ON services USING btree (type)
CREATE INDEX folders_rubruc_index ON folders USING btree (rubric)
CREATE INDEX files_type_folder_index ON files USING btree (type) WHERE (type IS NULL)
CREATE INDEX files_name_russian_vector_index ON files USING gin (to_tsvector('russian'::regconfig, (name)::text))
CREATE INDEX files_tth_index ON files USING btree (tth) WHERE (type IS NULL)

Ну что-то ни в какую не хочет оптимизироваться запрос...

...

Рейтинг:

0 / 0

25.07.2008, 16:16

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35453077

tadmin

Участник

Сообщения: 1 189

Рейтинг: 0 / 0

- если поиск по %панда% возвращает немного записей, то можно сохранить их во временную таблицу, а уж потом джойнить с остальным хохяйством.
- выделить часть запроса во view, часто это позволяет загнать запрос в более оптимальный план.
Например подзапрос (select distinct on ...)

...

Рейтинг:

0 / 0

25.07.2008, 18:45

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35453381

UKY

Гость

tadmin- если поиск по %панда% возвращает немного записей, то можно сохранить их во временную таблицу, а уж потом джойнить с остальным хохяйством.
- выделить часть запроса во view, часто это позволяет загнать запрос в более оптимальный план.
Например подзапрос (select distinct on ...)

Сделал временную таблицу - не помогло. И вид тоже делал - та же ситуация.

P.S.: В основном нагрузка идет на процессор. Жесткий диск не так уж и пыжится.

...

Рейтинг:

0 / 0

26.07.2008, 06:26

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35453483

tadmin

Участник

Сообщения: 1 189

Рейтинг: 0 / 0

а если

Код: plaintext

1.
2.
3.
4.

create view files_view as 
select *
from files 
where type is null and tth is null

а потом так:

Код: plaintext

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.

select 	servers.ip,
	services.nick,
	ts_headline('russian', folders.path, to_tsquery('панда')),
	NULL,
	files.size,
	NULL,
	NULL,
	NULL AS id,
	NULL AS count
from	files_view 
	join folders on (files_view.path = folders.id)
	join services on (folders.service = services.id)
           join servers on (servers.id = services.server)
where	to_tsvector('russian', files_view.name) @@ to_tsquery('панда')

Кстати, я не понял из схемы данных, нельзя ли кое-где указать inner join ?

...

Рейтинг:

0 / 0

26.07.2008, 13:07

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35453498

UKY

Гость

tadminа если

Код: plaintext

1.
2.
3.
4.

create view files_view as 
select *
from files 
where type is null and tth is null

а потом так:

Код: plaintext

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.

select 	servers.ip,
	services.nick,
	ts_headline('russian', folders.path, to_tsquery('панда')),
	NULL,
	files.size,
	NULL,
	NULL,
	NULL AS id,
	NULL AS count
from	files_view 
	join folders on (files_view.path = folders.id)
	join services on (folders.service = services.id)
           join servers on (servers.id = services.server)
where	to_tsvector('russian', files_view.name) @@ to_tsquery('панда')

Кстати, я не понял из схемы данных, нельзя ли кое-где указать inner join ?

PostgreSQL
The words INNER and OUTER are optional in all forms. INNER is the default; LEFT, RIGHT, and FULL imply an outer join.

Так что здесь и так используется inner join ;)

Сделал так, как Вы мне посоветовали за исключением того, что убрал "where type is null and tth is null" из вида, т.к. логика страдает:
type - тип файла. Для директории = null
tth - контрольная сумма файла. Для директории = null
А мне надо найти и файлы, и папки.

Запрос

Код: plaintext

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.

select 	servers.ip,
	services.nick,
	ts_headline('russian', folders.path, to_tsquery('панда')),
	NULL,
	files.size,
	NULL,
	NULL,
	NULL AS id,
	NULL AS count
from	servers
	join services	on (servers.id = services.server)
	join folders	on (folders.service = services.id)
	join files	on (files.path = folders.id)
where
	files.type is null and
	files.tth is null and
	to_tsvector('russian', files.name) @@ to_tsquery('панда')

работает приемленно быстро.

Главная задача - оптимизация

Код: plaintext

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.

select 	distinct on (count, tths.id, files_view.name)
	servers.ip	AS ip,
	services.nick	AS nick,
	folders.path	AS path,
	ts_headline('russian', files_view.name || '.' || files_view.type, to_tsquery('панда'))
			AS summary,
	files_view.size	AS size,
	tths.added	AS added,
	tths.tth	AS tth,
	tths.id		AS id,
	tths.count	AS count
from	servers
	join services	on (servers.id = services.server)
	join folders	on (folders.service = services.id)
	join files_view	on (files_view.path = folders.id)
	join tths	on (files_view.tth = tths.id)
	join
	(
--explain analyze
	select	distinct on (tths.tth)
		tths.id AS tth_id
	from	files join tths on files.tth = tths.id
	where	files.size >  0  and
		files.type is not null and
		to_tsvector('russian', files.name) @@ to_tsquery('панда')
	) AS inq
	on (tths.id = inq.tth_id)

...

Рейтинг:

0 / 0

26.07.2008, 13:34

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35453541

tadmin

Участник

Сообщения: 1 189

Рейтинг: 0 / 0

вот этот

Код: plaintext

  Seq Scan on files  (cost= 0 . 00 .. 260301 . 67  rows= 5293967  width= 286 ) (actual time= 0 . 007 .. 44122 . 366  rows= 5293967  loops= 1 )

ведет себя довольно нагло. Даже интересно.
У меня был похожий случай, победил через view.
У вас большой pg_dump, можете выложить?

...

Рейтинг:

0 / 0

26.07.2008, 15:17

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35453703

UKY

Гость

tadminвот этот

Код: plaintext

  Seq Scan on files  (cost= 0 . 00 .. 260301 . 67  rows= 5293967  width= 286 ) (actual time= 0 . 007 .. 44122 . 366  rows= 5293967  loops= 1 )

ведет себя довольно нагло. Даже интересно.
У меня был похожий случай, победил через view.
У вас большой pg_dump, можете выложить?
До меня только сейчас дошло как узнать с какого места в запросе тормоза начинаются %)
pg_dump большеват. да и скорость интернета не очень большая.

сделал set enable_seqscan to off и запрос выполнился за 2,5 секунды:

Код: plaintext

Sort  (cost= 4276639 . 47 .. 4279045 . 56  rows= 962436  width= 408 ) (actual time= 2542 . 133 .. 2542 . 181  rows= 30  loops= 1 )
  Sort Key: "*SELECT* 1".count, "*SELECT* 1".id
  Sort Method:  quicksort  Memory: 33kB
  ->  Append  (cost= 3976023 . 10 .. 4003351 . 98  rows= 962436  width= 408 ) (actual time= 2443 . 632 .. 2541 . 988  rows= 30  loops= 1 )
        ->  Subquery Scan "*SELECT* 1"  (cost= 3976023 . 10 .. 3995271 . 40  rows= 962415  width= 408 ) (actual time= 2443 . 628 .. 2444 . 006  rows= 28  loops= 1 )
              ->  Unique  (cost= 3976023 . 10 .. 3985647 . 25  rows= 962415  width= 408 ) (actual time= 2443 . 622 .. 2443 . 895  rows= 28  loops= 1 )
                    ->  Sort  (cost= 3976023 . 10 .. 3978429 . 14  rows= 962415  width= 408 ) (actual time= 2443 . 617 .. 2443 . 710  rows= 58  loops= 1 )
                          Sort Key: public.tths.count, public.tths.id, public.files.name
                          Sort Method:  quicksort  Memory: 43kB
                          ->  Hash Join  (cost= 85993 . 27 .. 3702741 . 36  rows= 962415  width= 408 ) (actual time= 2312 . 161 .. 2443 . 063  rows= 58  loops= 1 )
                                Hash Cond: (public.folders.service = public.services.id)
                                ->  Hash Join  (cost= 85807 . 13 .. 3677291 . 83  rows= 962415  width= 398 ) (actual time= 2307 . 106 .. 2435 . 514  rows= 58  loops= 1 )
                                      Hash Cond: (public.files.path = public.folders.id)
                                      ->  Nested Loop  (cost= 60363 . 09 .. 3544876 . 51  rows= 962415  width= 334 ) (actual time= 2 . 448 .. 3 . 894  rows= 58  loops= 1 )
                                            ->  Nested Loop  (cost= 60363 . 09 .. 99255 . 48  rows= 4627  width= 56 ) (actual time= 2 . 383 .. 2 . 922  rows= 19  loops= 1 )
                                                  ->  Unique  (cost= 60363 . 09 .. 60386 . 22  rows= 4627  width= 44 ) (actual time= 2 . 365 .. 2 . 568  rows= 19  loops= 1 )
                                                        ->  Sort  (cost= 60363 . 09 .. 60374 . 66  rows= 4627  width= 44 ) (actual time= 2 . 361 .. 2 . 438  rows= 46  loops= 1 )
                                                              Sort Key: public.tths.tth
                                                              Sort Method:  quicksort  Memory: 28kB
                                                              ->  Nested Loop  (cost= 158 . 34 .. 60081 . 40  rows= 4627  width= 44 ) (actual time= 0 . 278 .. 1 . 926  rows= 46  loops= 1 )
                                                                    ->  Bitmap Heap Scan on files  (cost= 158 . 34 .. 18685 . 88  rows= 4940  width= 4 ) (actual time= 0 . 200 .. 0 . 637  rows= 46  loops= 1 )
                                                                          Recheck Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
                                                                          Filter: ((type IS NOT NULL) AND (size >  0 ))
                                                                          ->  Bitmap Index Scan on files_name_russian_vector_index  (cost= 0 . 00 .. 157 . 11  rows= 5294  width= 0 ) (actual time= 0 . 161 .. 0 . 161  rows= 48  loops= 1 )
                                                                                Index Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
                                                                    ->  Index Scan using tths_pkey on tths  (cost= 0 . 00 .. 8 . 37  rows= 1  width= 44 ) (actual time= 0 . 019 .. 0 . 021  rows= 1  loops= 46 )
                                                                          Index Cond: (public.tths.id = public.files.tth)
                                                  ->  Index Scan using tths_pkey on tths  (cost= 0 . 00 .. 8 . 38  rows= 1  width= 52 ) (actual time= 0 . 009 .. 0 . 011  rows= 1  loops= 19 )
                                                        Index Cond: (public.tths.id = public.tths.id)
                                            ->  Index Scan using files_tth_index on files  (cost= 0 . 00 .. 741 . 90  rows= 222  width= 286 ) (actual time= 0 . 025 .. 0 . 036  rows= 3  loops= 19 )
                                                  Index Cond: (public.files.tth = public.tths.id)
                                      ->  Hash  (cost= 16835 . 44 .. 16835 . 44  rows= 341648  width= 76 ) (actual time= 2304 . 389 .. 2304 . 389  rows= 342931  loops= 1 )
                                            ->  Index Scan using folders_pkey on folders  (cost= 0 . 00 .. 16835 . 44  rows= 341648  width= 76 ) (actual time= 0 . 049 .. 1008 . 137  rows= 342931  loops= 1 )
                                ->  Hash  (cost= 181 . 01 .. 181 . 01  rows= 410  width= 18 ) (actual time= 4 . 933 .. 4 . 933  rows= 410  loops= 1 )
                                      ->  Hash Join  (cost= 74 . 66 .. 181 . 01  rows= 410  width= 18 ) (actual time= 1 . 823 .. 4 . 139  rows= 410  loops= 1 )
                                            Hash Cond: (public.servers.id = public.services.server)
                                            ->  Index Scan using servers_pkey on servers  (cost= 0 . 00 .. 99 . 76  rows= 398  width= 11 ) (actual time= 0 . 026 .. 0 . 791  rows= 398  loops= 1 )
                                            ->  Hash  (cost= 69 . 54 .. 69 . 54  rows= 410  width= 15 ) (actual time= 1 . 780 .. 1 . 780  rows= 410  loops= 1 )
                                                  ->  Index Scan using services_pkey on services  (cost= 0 . 00 .. 69 . 54  rows= 410  width= 15 ) (actual time= 0 . 016 .. 0 . 962  rows= 410  loops= 1 )
        ->  Subquery Scan "*SELECT* 2"  (cost= 6352 . 84 .. 8080 . 58  rows= 21  width= 86 ) (actual time= 97 . 706 .. 97 . 881  rows= 2  loops= 1 )
              ->  Nested Loop  (cost= 6352 . 84 .. 8080 . 37  rows= 21  width= 86 ) (actual time= 97 . 698 .. 97 . 866  rows= 2  loops= 1 )
                    ->  Nested Loop  (cost= 6352 . 84 .. 7969 . 07  rows= 21  width= 83 ) (actual time= 97 . 569 .. 97 . 672  rows= 2  loops= 1 )
                          ->  Nested Loop  (cost= 6352 . 84 .. 7832 . 99  rows= 21  width= 76 ) (actual time= 97 . 502 .. 97 . 572  rows= 2  loops= 1 )
                                ->  Bitmap Heap Scan on files  (cost= 6352 . 84 .. 7658 . 31  rows= 21  width= 12 ) (actual time= 97 . 323 .. 97 . 328  rows= 2  loops= 1 )
                                      Recheck Cond: ((to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text)) AND (tth IS NULL))
                                      Filter: (type IS NULL)
                                      ->  BitmapAnd  (cost= 6352 . 84 .. 6352 . 84  rows= 335  width= 0 ) (actual time= 97 . 307 .. 97 . 307  rows= 0  loops= 1 )
                                            ->  Bitmap Index Scan on files_name_russian_vector_index  (cost= 0 . 00 .. 157 . 11  rows= 5294  width= 0 ) (actual time= 0 . 070 .. 0 . 070  rows= 48  loops= 1 )
                                                  Index Cond: (to_tsvector('russian'::regconfig, (name)::text) @@ to_tsquery('панда'::text))
                                            ->  Bitmap Index Scan on files_tth_index  (cost= 0 . 00 .. 6195 . 47  rows= 335285  width= 0 ) (actual time= 97 . 206 .. 97 . 206  rows= 342931  loops= 1 )
                                                  Index Cond: (tth IS NULL)
                                ->  Index Scan using folders_pkey on folders  (cost= 0 . 00 .. 8 . 31  rows= 1  width= 76 ) (actual time= 0 . 105 .. 0 . 107  rows= 1  loops= 2 )
                                      Index Cond: (public.folders.id = public.files.path)
                          ->  Index Scan using services_pkey on services  (cost= 0 . 00 .. 6 . 47  rows= 1  width= 15 ) (actual time= 0 . 037 .. 0 . 039  rows= 1  loops= 2 )
                                Index Cond: (public.services.id = public.folders.service)
                    ->  Index Scan using servers_pkey on servers  (cost= 0 . 00 .. 5 . 28  rows= 1  width= 11 ) (actual time= 0 . 026 .. 0 . 029  rows= 1  loops= 2 )
                          Index Cond: (public.servers.id = public.services.server)
Total runtime:  2543 . 823  ms

Это уже хорошо. Даже очень :)

Но можно ли как-нибудь без принудительной установки параметров оптимизатора этого добиться?

...

Рейтинг:

0 / 0

26.07.2008, 21:32

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35453863

tadmin

Участник

Сообщения: 1 189

Рейтинг: 0 / 0

UKY
До меня только сейчас дошло как узнать с какого места в запросе тормоза начинаются %)
pg_dump большеват. да и скорость интернета не очень большая.

Тормоза (по времени) у вас начинались с hash-join's в памяти, но вот вызвала их никчемная туча записей. Это относительно простой случай, потому что в итоге мало записей возвращается, а sec-scan - большой. Могло быть и так, что делается несколько выборок, которые медленно сужают число записей, делают им hash-join, в то время как в самом конце выполняется условие where, которое и так бы все это отрезало. Стоит заставить планировщик применить первым самое жестокое правило, как все начинает летать.

UKY
сделал set enable_seqscan to off и запрос выполнился за 2,5 секунды:
Но можно ли как-нибудь без принудительной установки параметров оптимизатора этого добиться?
При объединении 8-10 таблиц(подзапросов) с кучей условий объединения безумно растет количество возможных планов исполнения. Для 8-10 таблиц оно может измеряться в тысячах. Планировщик видит это и не выбирает аналитически оптимальный, а лишь делает "догадку". (Читать про GQO).

Я замечал (не надо мне верить, это догадки), что при большом числе таблиц даже порядок их перечисления в запросе может радикально повлиять на план исполнения, а небольшое изменение в статистике таблиц может внезапно "опрокинуть" до сих пор работавший план запроса.

В подобной ситуации я добился оптимального плана через комбинацию нескольких view.

Попробуйте разбить ваш запрос на относительно простые части и реализуйте их в виде view так, чтобы финальный запрос можно было построить на их комбинации. Если каждая view будет исполнятся с оптимальным планом, есть шанс на то же самое в целом.

...

Рейтинг:

0 / 0

27.07.2008, 09:52

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35453890

MBG

Гость

Попробуйте на набор запросов разбить и отлаживайте их по шагам. Если промежуточный запрос дает мало записей - сохраняйте во временную таблицу, много записей - делайте вид. См.

http://postgrestips.blogspot.com/2007/07/temp.html

...

Рейтинг:

0 / 0

27.07.2008, 11:28

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35453900

UKY

Гость

Странно. Сейчас еще раз попробовал сделать при помощи видов, как вы и советовали и все залетало. Вчера не летало.

Код: plaintext

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.

create or replace temp view find_tths AS
	select	distinct on (tths.tth)
		tths.id AS tth_id
	from	files join tths on files.tth = tths.id
	where	files.size >  0  and
		files.type is not null and
		to_tsvector('russian', files.name) @@ to_tsquery('панда');

explain analyze
select 	distinct on (count, tths.id, files.name)
	servers.ip	AS ip,
	services.nick	AS nick,
	folders.path	AS path,
	ts_headline('russian', files.name || '.' || files.type, to_tsquery('панда'))
			AS summary,
	files.size	AS size,
	tths.added	AS added,
	tths.tth	AS tth,
	tths.id		AS id,
	tths.count	AS count
from	servers
	join services	on (servers.id = services.server)
	join folders	on (folders.service = services.id)
	join files	on (files.path = folders.id)
	join tths	on (files.tth = tths.id)
	join
	(
--explain analyze
		select * from find_tths
	) AS inq
	on (tths.id = inq.tth_id)
UNION ALL
--explain analyze
select 	servers.ip,
	services.nick,
	ts_headline('russian', folders.path, to_tsquery('панда')),
	NULL,
	files.size,
	NULL,
	NULL,
	NULL AS id,
	NULL AS count
from	servers
	join services	on (servers.id = services.server)
	join folders	on (folders.service = services.id)
	join files	on (files.path = folders.id)
where
	files.type is null and
	files.tth is null and
	to_tsvector('russian', files.name) @@ to_tsquery('панда')
order 	by count desc NULLS LAST, id;

Только вот одна маленькая проблема: в вИды нельзя передавать параметры. А мне надо как-то передать строку "панда". Как лечить?

...

Рейтинг:

0 / 0

27.07.2008, 11:43

| Ответить | Цитировать | Написать

Оптимизация большого запроса возможна?

#35453926

MBG

Гость

Можно сделать функцию, которая принимает параметры и создает нужные таблицы и виды (временные или постоянные). Если вы смотрели указанную ссылку, там так и делается.

...

Рейтинг:

0 / 0

27.07.2008, 12:23

| Ответить | Цитировать | Написать

25 сообщений из 65, страница 1 из 3

все

Форумы / PostgreSQL [игнор отключен] [закрыт для гостей] / Оптимизация большого запроса возможна?

Читали тему (0):

Читали форум (0):

Пользователи онлайн (0):

start [/forum/topic.php?fid=53&msg=35381542&tid=1995161]:	0ms
get settings:	12ms
get forum list:	16ms
check forum access:	4ms
check topic access:	4ms
track hit:	63ms
get topic data:	12ms
get forum data:	3ms
get page messages:	72ms
get tp. blocked users:	2ms
others:	277ms

total:	465ms