The establishment of a directory the address information with blackjack and API

Part 1. Tragic. “Why me?!”


We are faced with the necessity of entering correct information on the location (registration, registration) users, and that this problem is not solved quite so easily as we would like. First we tried KLADR, in its free incarnation. Not that we downright absolutely everything is not much, but there was at least one very annoying thing — some addresses were missing in the directory. For example, 10 is, and 10к1 — sorry, did not deliver. Generally KLADR was attractive because it has a simple API, and plugins (jQuery in particular), which can be easily integrated into the application, but pushed content. We thought that if no such resource contains a full and most current address information with the API and plugins, the only way to create such a resource themselves.

the

Part 2. Iskatelia. “Where's the beef?!”


And we heard about FIAS. About how he is infinitely complete and perfect. And this way! On the FIAS website has a database, and a young and growing information system! However, the database size is over 4 Gb, but Oh well, it's all Russia! The database is updated regularly, so there is where carousing. The case for small — to deploy a base, screwed API and plugins. We have to mention that there are several relevant and important articles. For example, the loop here these articles, which initially helped a lot.

the

Part 3. How it works


The problems started with the fact that the database FIAS is delivered in format .dbf data from which it was necessary to migrate to PostgreSQL, which it was decided to use as a database.

It should be noted that in addition to the format .dbf, the database is represented in xml format. As you know, the person you want all at once. Turned on and running. But to deal with the fact that FIAS gives is quite problematic.

So, at fias.nalog.ru in the update section includes the following:

the
    the
  • Information about the updates: according to official figures — out at least once a week. Specifics the following. Sometimes they come out broken updates. Their size is about 1.2 Mb and when downloading is given for an archive, so take the time to tune in to only the latest update is causing issues.

  • the
  • Updates in the form of fias_delta_. — in formats .dbf .xml.
  • the
  • Itself database fias_. — in formats .dbf .xml.
  • the
  • Database format KLADR. The name of the file “Base.*” arj or 7z is missing a field (tax data, etc.), the search is performed by Klatovska id. Initially there is a temptation to use this format, as it is easier to understand and then implement. But a more careful study of the portal, you will find that this version will be supported only until the end of 2017, and then will be deprecated.

note: the .dbf files, you must install third-party extension to php_dbase.dll.

So what's all the same to select the version to download .xml or .dbf? The differences are as follows:

the
    the
  • xml — each file in the archive is a separate table. What is the problem. For example, the size of the file “HOUSEXX” where are home is more than 20 Gb in uncompressed form. How to parse? On the one hand, if you have more than 30 Gb RAM, it will not be a problem. Also, if you are a connoisseur of competent libraries such as the SAX Parser for Java, which do not load the entire file into memory and parse the parts, then maybe something happens.

  • the
  • dbf — file are separated not only as tables, but are divided by region. Will be a more viable option, if you don't need to address vast country, and only one region (as in our case).

So, unzipped files. What information do we need? See the following set of files:

the
    the
  • ADDROBХХ.DBF, where XX is the number of region — includes information about the region, Autonomous districts, cities and other settlements and streets.
  • the
  • NORDOCХХ.DBF, where XX is the number of region — contains information about the causes of changes in different records.
  • the
  • ROOMХХ.DBF, where XX is the number of region — contains information about the premises.
  • the
  • STEADХХ.DBF, where XX is the number of region — contains information about the land.
  • the
  • SOCRBASE.DBF — contains information about the reductions.
  • the
  • STRSTAT.DBF — contains information about the type structure.

In addition to these tables there are a number of other service tables, which contain information about reductions in other tables.

In most cases it is sufficient to form an address up to the house. Although, if anyone should, it is possible to go further.

Thus, we will create 2 tables in a postgresql database.

Table addresses:

the
CREATE TABLE addrs
(
"ACTSTATUS" integer,
"AOGUID" character varying(36) COLLATE pg_catalog."default",
"AOID" character varying(36) COLLATE pg_catalog."default",
"AOLEVEL" integer,
"AREACODE" integer,
"AUTOCODE" integer,
"CENTSTATUS" integer,
"CITYCODE" integer,
"CODE" character varying(20) COLLATE pg_catalog."default",
"CURRSTATUS" integer,
"ENDDATE" timestamp,
"FORMALNAME" character varying(120) COLLATE pg_catalog."default",
"IFNSFL" integer,
"IFNSUL" integer,
"NEXTID" character varying(36) COLLATE pg_catalog."default",
"OFFNAME" character varying(120) COLLATE pg_catalog."default",
"OKATO" VARCHAR(11),
"OKTMO" VARCHAR(11),
"OPERSTATUS" integer,
"PARENTGUID" character varying(36) COLLATE pg_catalog."default",
"PLACECODE" integer,
"PLAINCODE" character varying(20) COLLATE pg_catalog."default",
"POSTALCODE" integer,
"PREVID" character varying(36) COLLATE pg_catalog."default",
"REGIONCODE" integer,
"SHORTNAME" character varying(15) COLLATE pg_catalog."default",
"STARTDATE" timestamp,
"STREETCODE" integer,
"TERRIFNSFL" integer,
"TERRIFNSUL" integer,
"UPDATEDATE" timestamp,
"CTARCODE" integer,
"EXTRCODE" integer,
"SEXTCODE" integer,
"LIVESTATUS" integer,
"NORMDOC" character varying(36) COLLATE pg_catalog."default",
"PLANCODE" integer,
"CADNUM" integer,
"DIVTYPE" integer
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;

ALTER TABLE address
OWNER to postgres;

Table numbers:

the
CREATE TABLE hous
(
"AOGUID" character varying(36) COLLATE pg_catalog."default",
"BUILDNUM" character varying(10) COLLATE pg_catalog."default",
"ENDDATE" timestamp,
"ESTSTATUS" integer,
"HOUSEGUID" character varying(36) COLLATE pg_catalog."default",
"HOUSEID" character varying(36) COLLATE pg_catalog."default",
"HOUSENUM" character varying(15) COLLATE pg_catalog."default",
"STATSTATUS" integer,
"IFNSFL" integer,
"IFNSUL" integer,
"OKATO" VARCHAR(11),
"OKTMO" VARCHAR(11),
"POSTALCODE" integer,
"STARTDATE" timestamp,
"STRUCNUM" VARCHAR(15),
"STRSTATUS" integer,
"TERRIFNSFL" integer,
"TERRIFNSUL" integer,
"UPDATEDATE" timestamp,
"NORMDOC" character varying(36) COLLATE pg_catalog."default",
"COUNTER" integer,
"CADNUM" VARCHAR(50),
"DIVTYPE" integer
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;

Data import is carried out in a simple way. Open the file in Excel and save them as csv. Additionally, it is recommended to change the encoding, as opposed to xml files, which are represented in utf-8, dbf files encoded in win-866. Open the file in the editor (for this purpose you can use notepad++) and convert to utf-8.

Import table with the addresses:

the
COPY addrs FROM 'PathToTheFile\ADDROB01.csv' DELIMITER ';' CSV;

Import the table with the houses:

the
COPY addrs FROM 'PathToTheFile\HOUSE30.csv' DELIMITER ';' CSV;

what made the table ADDROBXX?

Despite the abundance of fields that need only some of them.

the
    the
  • ACTSTATUS — the field contains values 0 and 1. A careful viewing will show that the table a lot of streets with the same name. The fact that the tax base contains a history of objects, i.e. the changes that have occurred with any of the address elements. For example, of Lenin can be 10 pieces. 9 of them will be irrelevant. They can be renamed or changed in another way. ACTSTATUS — 0 — out-of-date addresses, and 1 is the only relevant.
  • the
  • AOGUID — ID record. Different historical records will be the same AOGUID. Vary they will be AOID on the field.
  • the
  • AOLEVEL — object level. 1 — subject of the Federation, 4 — city, 7 — a street, etc.
  • the
  • CENTSTATUS status of the center. For example, cities in the regional center.
  • the
  • FORMALNAME — the name of the object.
  • the
  • PARENTGUID — ID of the parent element. When you search for a city, which owns the street. PARENTGUID of the street will coincide with AOGUID street.
  • the
  • POSTALCODE — postal code.
  • the
  • REGIONCODE code of the Federation.

note: fias-nalog.ru you can find a full description of all fields.

Important in table HOUSEXX:

the
    the
  • AOGUID — ID ADDROBXX in the street in which the house is located.
  • the
  • HOUSEGUID — ID.
  • the Difficulty lies in the name of the house. It consists of 4 fields: the

  • HOUSENUM — room house.
  • the
  • STRUCNUM — room structure.
  • the
  • STRSTATUS is the symptom of structure (from 0 to 4, where 0 — no, 1 — structure, 2 — construction, 3 — letter).

Fields can contain similar or identical information and get out of the situation we have experimentally. In the class QuerryController (link to the repository below) has a method: chooseBuilding in which we tried to solve this problem. May be someone can do better. The DB queries are the same.

Further, after the base has been imported into Postgres — we've started creating an API and plugins for our system.

For the API, so as not izgalyatsya, used Laravel. Request schema was fairly simple. The object hierarchy looks like the following:

the
    the
  • Federation
  • the
  • Region of the Federation
  • the
  • Locality
  • the
  • Street
  • the
  • Home

By the way, testing the system, we are faced with the fact that not all settlements have streets and not all streets have houses that we are surprised and puzzled. This circumstance does not give rest because if you give the user the ability to enter information, then ponapishut such that involuntarily the question arises — “These people do at school learned?!”.
It was therefore decided not to give the user any opportunity for the Amateur and for such “missing” addresses to provide the opportunity to apply for inclusion of the missing address in the help system.

Request schema looks:

the
For the last item, the list of houses with the letters, enclosures, buildings, and other things.
After all the ordeal made a plugin for vue.js to work with the directory and the alternative for jQuery.

The region and district can be removed as unnecessary, as they are tightened together with the cities.
When you enter vsplyvayut autocomplete-tips, as in KLADR. However the difference is that KLADR-plug-in intended for completion, and here is only a valid address selected from the clues.

In the folder ASPUDcomponent is VueJs component to your database.
The source code is available in our repositories.

the

Part 4. How it to update?


Updates the following situation: you first need SOAP to get version updates. To see how this is done in the class UpdateController (method: filesVersions()).

Note: the version that is listed last in the resulting list is not necessarily the same as the one you can download on the main page. But do not rush to download the latest version, as it may be “bat”. Happened such precedents. Next, download the archive with latest version and unpacked. For work it is necessary to use extensions for php (php_rar.dll).

Well, and then selects a desired file of the region (or optionally for all files) to update the database.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Wikia Search — first impressions

Emulator data from GNSS receiver NMEA

mSearch: search + filter for MODX Revolution