Projection? No thanks


Under the cut is a small tip about applying a spatial index
based on the zcurve to index point data situated on the field. And bencmark for PostgreSQL and compare with the same (but very different)
index to R-tree.

The development of the theme (1, 2, 3, 4, 5, 6).
Actually, return back to the beginning the idea to index geographic coordinates, placing them on the surface of a sphere. Conventional indexing pairs of latitude/longitude leads to distortions away from the equator, working with projections is not universal. So the idea is to translate the geographic coordinates into three-dimensional space looks very elegant.

The idea itself is not new, similar works, such as extending PostgreSQL — PGSphere, which is used for indexing 3-dimensional R-tree. With him we will compare.

the

data Preparation.


PGSphere


the
    the
  • To start, you will have to download, build and install the extension (the author used the current version 1.1.5)

    the
    gmake USE_PGXS=1 PG_CONFIG=/usr/bin/pg_config
    sudo gmake USE_PGXS=1 PG_CONFIG=/usr/bin/pg_config install
    
  • the
  • Then download it (psql)

    the
    CREATE EXTENSION pg_sphere;
  • the
  • will Create a table for test data

    the
    CREATE TABLE spoint_data (sp spoint);
  • the
  • We need source of random data.
    The first program parameter is the radius, and the second number of results.
    The only trick is the data is uniformly distributed inside the sphere with the given radius, otherwise it will not turn a uniform distribution on the sphere
  • the
  • Random data passed through an awk script to transform the geographic coordinates

    the
    # --- gendata.awk ------
    BEGIN{
    pi=3.1415926535897932;
    degra=pi/180.0;
    rad=180.0/pi;
    Grad = 1000000.;
    }
    {
    x = $1; y = $2 z = $3;
    r3 = sqrt(x*x + y*y + z*z);
    x *= Grad / r3;
    y *= Grad / r3;
    z *= Grad / r3;
    
    r2 = sqrt(x*x + y*y);
    lat = atan2(z, r2) * rad;
    lon = 180. + atan2(y, x) * rad;
    
    printf ("(%14.10 fd %14.10 fd)\n", lon, lat);
    }
  • the Actual creation of the data, here the radius is not important, it is important that pgsphere and zcurve received the same data. Sort it is highly desirable to accelerate the indexing.

    the

    ./random 1000000 100000000 | gawk -f gendata.awk | sort > pgsphere.txt
    the
  • Fill the data in our table

    the
    COPY spoint_data (sp) FROM '/home/.../pgsphere.txt';
  • the
  • Index

    the
    CREATE INDEX sp_idx ON spoint_data USING gist (sp);

ZORDER


the
    the
  • To start, you will have to download, build and install extension

    the
    gmake USE_PGXS=1 PG_CONFIG=/usr/bin/pg_config
    sudo gmake USE_PGXS=1 PG_CONFIG=/usr/bin/pg_config install
    
  • the
  • will Create a table for test data

    the
    create table test_points_3d (x integer,y integer, z integer);
  • the
  • We need the same source of random data.
  • the
  • Random data passed through an awk script to place them inside a cube with sides of 2 000 000

    the
    #--- gendata2.awk ------
    BEGIN{
    pi=3.1415926535897932;
    degra=pi/180.0;
    rad=180.0/pi;
    Grad = 1000000.;
    }
    {
    x = $1; y = $2 z = $3;
    r3 = sqrt(x*x + y*y + z*z);
    x *= Grad / r3;
    y *= Grad / r3;
    z *= Grad / r3;
    
    ix = int(x+0.5+Grad);
    iy = int(y+0.5+Grad);
    iz = int(z+0.5+Grad);
    print ix"\t"iy"\t";
    }
  • the Actual creation of the data, here the radius is important. Sorting is not required.

    the

    ./random 1000000 100000000 | gawk -f gendata2.awk > zcurve.txt
    the
  • Fill the data in our table

    the
    COPY test_points_3d FROM '/home/.../zcurve.txt';
  • the
  • Index

    the
    create index zcurve_test_points_3d on test_points_3d(zcurve_num_from_xyz(x,y,z));

the

test Preparation


PGSphere


For testing you will need here is a awk script

the
#--- gentest.awk -------
BEGIN{
pi=3.1415926535897932;
degra=pi/180.0;
rad=180.0/pi;
Grad = 1000000.;
}
{
x = $1; y = $2 z = $3;
r3 = sqrt(x*x + y*y + z*z);
x *= Grad / r3;
y *= Grad / r3;
z *= Grad / r3;

r2 = sqrt(x*x + y*y);

lat = atan2(z, r2) * rad;
lon = 180. + atan2(y, x) * rad;

# EXPLAIN (ANALYZE,BUFFERS) 
printf ("select count(1) from spoint_data where sp @' < (%14.10 fd%14.10 fd).316d>'::laukitis;\n", lon, lat);
}

This script is quite symmetric to that by which we were preparing the data. You should pay attention to the number of .316 is the radius of the sphere with center at the computed random point in which we are looking for data

The preparation of a series of queries is done as follows:

the
./random 1000000 100 1023 | gawk -f gentest.awk >tests1.sql

Here 100 — test series 1023 — seed the randomizer.

ZCURVE


For testing also need awk script

the
#--- gentest2.awk -------
BEGIN{
pi=3.1415926535897932;
degra=pi/180.0;
rad=180.0/pi;
Grad = 1000000.;
}
{
x = $1; y = $2 z = $3;
r3 = sqrt(x*x + y*y + z*z);
x *= Grad / r3;
y *= Grad / r3;
z *= Grad / r3;

ix = int(x+0.5+Grad);
iy = int(y+0.5+Grad);
iz = int(z+0.5+Grad);
# EXPLAIN (ANALYZE,BUFFERS) 
lrad = int(0.5 + Grad * sin(.316 * degra));
print "select count(1) from zcurve_3d_lookup_tidonly('zcurve_test_points_3d', "ix-lrad","iy-lrad","iz-lrad","ix+lrad","iy+lrad","iz+lrad");";
}

This script is quite symmetric to that by which we were preparing the data. Again, pay attention to the number of .316, which is half the side of the cube with center at the computed random point in which we are looking for data.

The preparation of a series of queries is done as follows:

the
./random 1000000 100 1023 | gawk -f gentest2.awk >tests2.sql

Here 100 — test series 1023 — seed the randomizer, it's better if it coincides with the add-on from pgsphere.

the

Benchmark


As before, measurements were conducted on a modest virtual machine with two cores and 4 GB of RAM, so the times are not absolute values, but the numbers of pages read is still possible to trust.

The times shown in the second runs, on a heated server and virtual machine. The number of read buffers — into the fresh-raised server.
the the the the the the
Radius AVG NPoints Nreq Type Time(ms) Reads Hits
.01° 1.17
0.7631
(0.7615)
10 000 zcurve
rtree
.37
.46
1.4397
2.1165
9.5647
3.087
.0316° 11.6
7.6392
(7.6045)
10 000 zcurve
rtree
.39
.67
2.0466
3.0944
20.9707
2.7769
.1° 115.22
76.193
(76.15)
1 000 zcurve
rtree
.44
2.75 *
4.4184
6.073
82.8572
2.469
.316° 1145.3
758.37
(760.45)
1 000 zcurve
rtree
.59
18.3 *
15.2719
21.706
401.791
1.62
1.° 11310
7602
(7615)
100 zcurve
rtree
7.2
94.5 *
74.9544
132.15
1651.45
1.12
where
Radius — the size of the search region in degrees
Npoints — the average number of points in the results, in parentheses are the theoretically expected number of
(in the field 41252.96 square degrees, 100 000 000 points ~2424 dots per square degree)

Nreq — the number of queries in the series
Type
‘zcurve’ — it is
’rtree’- PGSphere
Time(ms) — average time of query execution
Reads — the average number of readings by request
Hits — the number of accesses to the buffers

* at some point, the performance of R-tree begins abruptly
sink is connected with the fact, this tree read significantly more
pages and its working set ceases to fit in the cache (apparently).

Note that the zcurve finds more data, which is logical because he is looking inside the cube, not sphere PGSphere. Therefore, the required post-filtering in the spirit of HAVERSINE. But here we compare only the performance of the indexes.

Try to evaluate the difference. In General, the task is to find the area of the piece of the sphere that fall inside this cube is non-trivial. Try to make an assessment.

the
    the
  • Assume we have a cube in average cuts out from the sphere of the same area as the sphere of equal volume
  • the
  • the Volume of the unit sphere is 1.33*3.14=4.19
  • the
  • the Volume of a cube of side 2 = 8.
  • the
  • Then the root of the third degree of 8/4.19 = 1.24 is the ratio of the radii of the imaginary sphere to the present
  • the
  • area ratio of the imaginary sphere to the present 1.24*1.24=1.54
  • the
  • are from the experimental data 1.17/0.7631= 1.5332
    Bingo!
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Wikia Search — first impressions

Emulator data from GNSS receiver NMEA

mSearch: search + filter for MODX Revolution