Explaining the inexplicable. Part 2

conference Registration PG Day’16 in full swing, and we continue to publish translated articles Hubert Lubaczewski about and explain its major components.

/ > last time I wrote that shows the output of explain. Now I want to talk more about the different types of "nodes" / transactions, which you can see in the explain plans.

The basic operation is the sequential scan (Seq Scan).

It looks like this:

the

explain analyze select * from pg_class;
QUERY PLAN 
---------------------------------------------------------------------------------------------------------
Seq Scan on pg_class (cost=0.00..10.92 rows=292 width=202) (actual time=0.009..0.049 rows=295 loops=1)
Total runtime: 0.249 ms
(2 rows)

This is the most basic operation of PostgreSQL-open the file table, reads the rows one by one and returns it to the user or located above the node of the tree explain, for example, LIMIT how here:

the

explain analyze select * from pg_class limit 2;
QUERY PLAN 
-------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.07 rows=2 width=202) (actual time=0.014..0.014 rows=2 loops=1)
-> Seq Scan on pg_class (cost=0.00..10.92 rows=292 width=202) (actual time=0.009..0.009 rows=2 loops=1)
Total runtime: 0.132 ms
(3 rows)

It is important to understand that the order of the return lines is not certain. They return not "insertion order" or "last updated row – first", or something in the same spirit. Parallel querying, updating, deleting, cleaning (vacuums) can change the order of rows at any time.

Seq Scan can filter rows – that is, discard some when you return. This happens, for example, when you add the condition “WHERE":

the

explain analyze select * from pg_class where relname ~ 'a';
QUERY PLAN 
---------------------------------------------------------------------------------------------------------
Seq Scan on pg_class (cost=0.00..11.65 rows=227 width=202) (actual time=0.030..0.294 rows=229 loops=1)
Filter: (relname ~ 'a'::text)
Rows Removed by Filter: 66
Total runtime: 0.379 ms
(4 rows)

As you can see, now we have info “Filter:”. And since I have the version 9.2 or later, I also got the comment "Rows removed by filter" ("Rows removed by filter").

the Following node type of “Index Scan".

This scan seems very simple, and most people understand at least one case of its use:

the

explain analyze select * from pg_class where oid = 1247;
QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------------------
Index Scan using pg_class_oid_index on pg_class (cost=0.15..8.17 rows=1 width=202) (actual time=0.007..0.007 rows=1 loops=1)
Index Cond: (oid = 1247::oid)
Total runtime: 0.077 ms
(3 rows)

It's simple – we have an index that matches a condition, so that PostgreSQL:

the

opens the index;
in the index, if finds where (in table data) can be the rows that match this condition:
the
- opens the table;
- gets the string (s) specified (s) index;
if the row can be returned – that is, if they are visible in the current session – they come back.

Of course, you may ask: how the string can be invisible? This can happen with deleted rows still in the table (had not been dusted vacuum). Or with strings that have been updated. Or was inserted, but after the current transaction.

Index Scan is also used when you want to sort some data using sort order in the index, for example:

the

explain analyze select * from pg_class order by oid limit 10;
QUERY PLAN 
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.15..1.67 rows=10 width=206) (actual time=0.017..0.029 rows=10 loops=1)
-> Index Scan using pg_class_oid_index on pg_class (cost=0.15..44.53 rows=292 width=206) (actual time=0.014..0.026 rows=10 loops=1)
Total runtime: 0.145 ms

There is no condition, but we can easily add it like this:

the

explain analyze select * from pg_class where oid > 1247 order by oid limit 10;
QUERY PLAN 
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.15..4.03 rows=10 width=206) (actual time=0.021..0.035 rows=10 loops=1)
-> Index Scan using pg_class_oid_index on pg_class (cost=0.15..37.84 rows=97 width=206) (actual time=0.017..0.031 rows=10 loops=1)
Index Cond: (oid > 1247::oid)
Total runtime: 0.132 ms
(4 rows)

In these cases, PG finds the initial starting point in the index (or first string, which is older than 1247, or just the smallest value in the index), and then just returns the following rows/values until the Limit condition is satisfied.

There is a version of Index Scan, called “Index Scan Backward", which does the same thing, but is used to scan in descending order according to:

the

explain analyze select * from pg_class where oid < 1247 order by oid desc limit 10;
QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.15..4.03 rows=10 width=206) (actual time=0.012..0.026 rows=10 loops=1)
-> Index Scan Backward using pg_class_oid_index on pg_class (cost=0.15..37.84 rows=97 width=206) (actual time=0.009..0.022 rows=10 loops=1)
Index Cond: (oid < 1247::oid)
Total runtime: 0.119 ms
(4 rows)

This is the same type of surgery: open the index and, for each row referenced in the index, to retrieve data from the table. This is not just "smaller to larger" and "larger to smaller".

Another similar operation — “Index Only Scan".

Let's create a simple table:

the

create table  test  (id serial primary key int4 i);
CREATE TABLE

insert into test (i) select random() * 1000000000 from generate_series(1,100000);
INSERT 0 100000

vacuum analyze test;
VACUUM

This gives us a table like this:

the

select * from test limit 10;
id | i 
----+-----------
1 | 546119592
2 | 253476978
3 | 235791031
4 | 654694043
5 | 187647296
6 | 709050245
7 | 210316749
8 | 348927354
9 | 120463097
10 | 5611946
(10 rows)

Here I have index on id

the

\d test
Table "public.test"
Column | Type | Modifiers 
--------+---------+---------------------------------------------------
id | integer | not null default nextval('test_id_seq'::regclass)
i | integer | 
Indexes:
"test_pkey" PRIMARY KEY, btree (id)

So, if certain conditions are met (later tell you more about that), I can get here is the plan:

the

explain analyze select id from test order by id asc limit 10;
QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..0.55 rows=10 width=4) (actual time=0.039..0.042 rows=10 loops=1)
-> Index Only Scan using test_pkey on test (cost=0.29..2604.29 rows=100000 width=4) (actual time=0.036..0.038 rows=10 loops=1)
Heap Fetches: 0
Total runtime: 0.092 ms
(4 rows)

Note the word “Only" in the “Index Only Scan".

This means that Postgres realized that I choose only data (columns) from the index. And maybe he doesn't need to check anything in the files table. So it will return the data directly from the index.

These scans were a big change in PostgreSQL 9.2, as they can work much faster than index scan, because they do not need to check anything in the table data.

The difficulty is that to work correctly, the Index should contain information about what these lines are on pages that were not changed "recently". That is, to use Index Only Scans your table must be well cleaned out using the vacuum. But, running autovacuum shouldn't be a problem.

Last scan of the table – the so-called Bitmap Index Scan. It looks like this:

the

explain analyze select * from test where i < 100000;
QUERY PLAN 
-------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=4.37..39.99 rows=10 width=8) (actual time=0.025..0.110 rows=13 loops=1)
Recheck Cond: (i < 100000)
-> Bitmap Index Scan on i1 (cost=0.00..4.37 rows=10 width=0) (actual time=0.013..0.013 rows=13 loops=1)

Total runtime: 0.154 ms
(5 rows)

(if you read carefully, you noticed that he uses the index, about which I had not previously spoken. It's easy to do: create index i1 on test (i);).

Bitmap Scans always consist of at least two nodes. First (lower level) there is a Bitmap Index Scan and then Bitmap Heap Scan.

How does it work?

For example, in your table of 100,000 pages (about 780MB). Bitmap Index Scan would create a bitmap, where each page your table will correspond to one bit. So, in this case we get a block of memory for 100,000 bits ~ 12.5 KB. All these bits will be set to 0. Then, the Bitmap Index Scan will set some bits to 1, depending on which page of the table can be a string to return.

This part generally does not affect the data in the table. After this is done – that is, when all pages where are the lines that need to return will be "flagged" – this bitmap will move to a higher level, to the node Bitmap Heap Scan, which reads them in a more consistent manner.

What is the meaning of such an operation? Regular Index Scans cause accidental operation of the I/o pages from the disk are loaded in random order. And it slowly. At least on the rotating disk.

Sequential scan is faster when you need to one page, but on the other hand, you don't always need all pages.

Bitmap Index Scans combines both cases: when you need a lot of rows from a table, but not all, and when the strings that you return, are not in the same block (which would have been justified if I had done “... where id < ..."). The scans are bitmaps, there's another interesting property – they can combine two operations or two indices, as in this example:

the

explain analyze select * from test where i < 5000000 or i > 950000000;
QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=107.36..630.60 rows=5323 width=8) (actual time=1.023..4.353 rows=5386 loops=1)
Recheck Cond: ((i < 5000000) OR (i > 950000000))
-> BitmapOr (cost=107.36 107.36..rows=5349 width=0) (actual time=0.922..0.922 rows=0 loops=1)
-> Bitmap Index Scan on i1 (cost=0.00..12.25 rows=527 width=0) (actual time=0.120..0.120 rows=491 loops=1)
Index Cond: (i < 5000000)
-> Bitmap Index Scan on i1 (cost=0.00..92.46 rows=4822 width=0) (actual time=0.799..0.799 rows=4895 loops=1)
Index Cond: (i > 950000000)
Total runtime: 4.765 ms
(8 rows)

Here we see two Bitmap Index scans (there may be more), which then merged (but the operation “JOIN" in SQL!) using BitmapOr.

As you may remember, the output of Bitmap Index Scan is a bitmap (a block of memory with ones and zeros). Having a few of these bitmaps, you can easily make a logical operations: Or, And or Not.

Here we see that two such bit maps were combined using the Or operator, and the resulting bitmap was passed to Bitmap Heap Scan, which loaded the appropriate rows from the table.

Although the two index scans use the same index, not always. For example, let's quickly add a few columns:

the

alter table test add column j int4 default random() * 1000000000;
ALTER TABLE
alter table test add column h int4 default random() * 1000000000;
ALTER TABLE
create index i2 on test (j);
CREATE INDEX
create index i3 on test (h);
CREATE INDEX

And here's what happens:

the

explain analyze select * from test where j < 50000000 and i < 50000000 and h > 950000000;
QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=280.76..323.61 rows=12 width=16) (actual time=2.295..2.352 rows=11 loops=1)
Recheck Cond: ((h > 950000000) AND (j < 50000000) AND (i < 50000000))
-> BitmapAnd (cost=280.76 280.76..rows=12 width=0) (actual time=2.278..2.278 rows=0 loops=1)
-> Bitmap Index Scan on i3 (cost=0.00..92.53 rows=4832 width=0) (actual time=0.546..0.546 rows=4938 loops=1)
Index Cond: (h > 950000000)
-> Bitmap Index Scan on i2 (cost=0.00..93.76 rows=4996 width=0) (actual time=0.783..0.783 rows=5021 loops=1)
Index Cond: (j < 50000000)
-> Bitmap Index Scan on i1 (cost=0.00..93.96 rows=5022 width=0) (actual time=0.798..0.798 rows=4998 loops=1)
Index Cond: (i < 50000000)
Total runtime: 2.428 ms
(10 rows)

Three scan, Bitmap Index Scan, each of which uses its own index, bitmaps are combined by using the bitwise operation “and", and the result is fed to a Bitmap Heap Scan.
If you are wondering why BitmapAnd shows “Actual rows = 0", the answer is simple: this site is not at all dealing with strings (only the bit map of disk pages). So he can't return the rows.

That's all for now. It's all possible table scan – the ways in which you will receive the data from the disk. Next time I'll talk about merging the different data sources and other types of plans.

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express