Yes, from Oracle to Postgres

it just So happens that our Oracle organization works long and tight. I myself met with Oracle Database, in the days of the 6th version and, since then, any discomfort experienced. Messed up market relations. Recently, we began to notice that the Customer is looking much more favourably on our projects if they use the free SBD. About porting one of these projects and my story...

Selection of free DBMS was, for me, a matter of long and difficult but, ultimately, boiled down to two all known alternatives. PostgreSQL attracted a rich (and still growing) functionality, while the MySQL lured by the performance and the "zero" administration. Because Oracle is we are spoiled and SQL we all knew and loved, numerous and fashionable NoSQL options have disappeared in the semi-finals.

It is difficult to say which DBMS I would have stopped eventually, if not one can give me a "live" feel and PostgreSQL and MySQL, do not hurry to compare them and to take, in my opinion, well-founded decision. Apart from functionality, of course, compared and performance. I'm not going to tell you the details, but one of the crucial points, "shabashki" was the ability to quickly and reliably (ACID, Yes) to insert into the database a large number of records. On this subject, and test was performed:


On the axis of ordinates pending the number of records of fixed length stored in the database every second. Number in legend indicates the size of the transaction. It should be noted that MySQL was measured "as is", and PostgreSQL using some of the blotches, providing the opportunity to work with my usual partitionierung tables and materialized view. Because it was about the "reliable" storage, MyISAM chart is presented only for completeness and understanding, where the "theoretical maximum" desired performance on the used hardware.

Since the testing was done quite a long time and no SSD on available hardware does not even smell, to the absolute values shown on the chart, should not be taken as a dogma. Of course, you can save the data even faster, but I was interested in the performance of various DBMSs, working in the (almost) same conditions. For me it was a surprise that PostgreSQL, even weighted triggers particioniranja runs almost as fast as MySQL, using InnoDB, and on large transactions (1000 records or more) starts to catch up with MyISAM!

As you can guess, the chart shown above finally convinced me that should go in PostgreSQL. Re-creating tables, overriding column types (number in numeric and integer, varchar2 to varchar, and text ...) it was a matter trivial. Migration of data helped XML and XSLT.

About the benefits of XML
Strictly speaking, XML has helped yet. One of the features of our product is to store in a database the descriptions of business entities in the form of tabular data (I don't think he's very original in this). Comparison of such "metadata" for two different schemes have been a real headache, until then, until I wrote a small package that swaps them in the XML description. Sort of tags inside the descriptions allowed to compare them as plain text files. XSLT to complete the picture, providing the automatic generation of SQL scripts from file descriptions.
Left to ensure that only the SQL code written for Oracle. Most of the requests are worked, part — earned after some minor changes. First, I created a spreadsheet dual:

the
create table dual (
x varchar(1) not null
);

insert into dual(x) values('x');

Not that without it it was impossible to do, but in our queries it was used so often that to copy them was simply impractical. To PostgreSQL "was happy", I had to add in more requests severity:

Oracle version
select b.id id, b.name name
from ( select list_value
from acme_obj_list_value
group by list_value ), acme_list_value b
where b.id = list_value


PostgreSQL version
select b.id id, b.name as name
from ( select list_value
from acme_obj_list_value
group by list_value ) a, b acme_list_value
where b.id = a.list_value


All inline view must be called, and before the column alias is highly desirable to use the keyword 'as'. For most columns, you can omit it, but when you use names like 'name' or 'value' that leads to the error. The next step was the replacement of platform specific code in the corresponding constructs supported in Oracle and PostgreSQL. We are talking about nvl and decode, and also about obsolete syntax of outer joins. The first two can easily be replaced by the standard (and more flexible) coalesce and case, in the case of using an outer join, the query should be rewritten:

Oracle version
select ot.name, mv.str_value
from acme_object o, acme_meta_value mv acme_obj_type ot
where o.id = :object_id
and ot.id = o.obj_type_id
and mv.owner_id(+) = ot.id
and mv.param_id(+) = 9520


PostgreSQL version
select ot.name, mv.str_value
from acme_object o 
left join acme_meta_value mv on (mv.owner_id = ot.id and mv.param_id = 9520)
inner join acme_obj_type ot on (ot.id = o.obj_type_id)
where o.id = :object_id


ANSI Join supported Oracle 9 version and, in my opinion, is more convenient (though less concise) than the legacy option to use (+). Do not try to combine different forms of connections in one SQL query. If we used an outer join, the for internal connections is logical to use inner join rather than enumeration of tables in the phrase from, separated by commas.

Most of the work on migration of SQL code was connected with the rewriting of hierarchical queries. connect by in PostgreSQL, of course, is not supported. Meanwhile, the stock had a large number of queries of the following form:

Example of using a hierarchical query
select t.id as  value  
from acme_object t, acme_obj_ref_value rv
where rv.object_id = t.id
and rv.attr_id = 220102
and rv.ref_value = :object_id
and t.obj_type_id in ( select ot.id 
from acme_obj_type ot 
connect by prior ot.id = ot.parent_id 
start with ot.id = 200335 )


A simple rewriting of such queries, using CTE not allowed to lead them to an independent mind. Although Oracle (since version 11.2) support recursive queries, their syntax differs from that used in the PostgreSQL. In particular, in PostgreSQL the use of the keyword recursive is required, Oracle is it "does not understand". Fortunately, in most cases, the hierarchical part of the request could "hide" in predstavlenie.
Oracle version
create or replace view acme_arm(id) as
select ot.id 
from acme_obj_type ot 
connect by prior ot.id = ot.parent_id 
start with ot.id = 200335


PostgreSQL version
create or replace view acme_arm(id) as
with recursive t(id) as (
select id
from acme_obj_type
where id = 200335
union all
select a.id
from acme_obj_type a
inner join t on (t.id = a.parent_id)
)
select id from t


Rewriting operators merge was more "puzzling" (fortunately, they were not used as often as hierarchical queries). PostgreSQL, this statement is not supported, but it supports the use of the phrases from and returning the operator update, the latter returns a full resultset (similar to operator select) that allows you to use it in the phrase with. I'll just leave this here:

Oracle version
merge into acme_obj_value d
using ( select object_id
from acme_state_tmp
) s
on (d.object_id = s.object_id)
when matched then
update set d.date_value = least(l_dt, d.date_value)
when not matched then
insert (d.id, d.object_id, d.date_value)
values (acme_param_sequence.nextval, s.object_id, l_dt)


PostgreSQL version
with s as (
select object_id
from acme_state_tmp
),
upd as (
update acme_obj_value
set date_value = least(l_dt, d.date_value)
from s
where acme_obj_value.object_id = s.object_id
returning acme_obj_value.object_id
)
insert into acme_obj_value(id, object_id, date_value)
select nextval('acme_param_sequence'), s.object_id, l_dt
from s
where's.object_id not in (select object_id from upd)


In this example, you will notice that work posledovatelnostei PostgreSQL is also different from that adopted in Oracle. Of course, in Oracle you could define a function similar to the one that retrieves the values of the sequences in PostgreSQL, but Oracle rewriting code (as Java code) I wanted to avoid. In addition, this approach could be associated with additional overhead.

A lot of joy to work with date and time. The fact that widely used in Oracle type date accustomed to a certain carelessness in handling his values. We can assume that such a value is a number, the integer part determines the number of days with some "magic" date, and a fractional — time, accurate to seconds. After some getting used to (like most of Oracle), it is quite convenient, but PostgreSQL is much stricter in terms of data types.

the
date '2001-09-28' + interval '1 hour'

Add thus to the date a constant interval, but what if you want to add a variable value? The search expression is not clear:

the
date '2001-09-28' + (to_char(p_hours, '99') || 'hour')::interval

A gap in the line before the 'hour' is required! Also, you notice that the severity of the PostgreSQL applies to a conversion of numeric values to string (and Vice versa, of course). The mask is mandatory, even if it consists only of nines. Implicit conversions, so familiar after working with Oracle, do not work.

The remaining queries have undergone less radical change. Revision demanded all the code working with strings, simply because the corresponding functions in Oracle and PostgreSQL look different. Column rownumwhere he still remained, had to replace a window row_number(). In cases when the condition on rownum is used to limit the number of displayed rows, the queries were rewritten with the use of the phrase limit.

We should also talk about table functions. And Oracle and PostgreSQL they are. The implementation is of course different, but the appeal to them, from the SQL-request looks similar. Unfortunately, as in the case of a recursive CTE, all the spoils one key words:
Oracle version
select * from table(acme_table_fuction(...))


PostgreSQL version
select * from acme_table_fuction(...)


It remains to deal with packages. In PostgreSQL there is no such thing, but upon closer examination, it turns out that it is not very necessary. Indeed, what packages in Oracle? If you put aside global variables, and initialization code (which we don't use), the main advantage of packages is that they break dependency chains. When you modify database objects, invalidinput only implementation dependent packages, but not their headers. The ability to perform recursive calls within packages is one of the consequences of this fact.

In PostgreSQL, the dependency mechanism is not implemented. With recursive calls to stored functions (procedures in PostgreSQL, no) all right. To ensure that the client code had to make a minimum of changes, it is sufficient to provide only the appearance that we continue to work with packages. Schema PostgreSQL suited for this could not be better. Of course, this "package" will not be able to implement "private" feature, but it's not a very big problem. Here's how the code would look:

Emulation packages in PostgreSQL
drop function acme_utils.get_str_res(numeric);
drop function acme_utils.c_str_res_ot();
drop function acme_utils.c_str_res_id_attr();

drop schema acme_utils;

create schema acme_utils;

create or replace function acme_utils.c_str_res_ot()
returns numeric
as $$
begin
return 20069;
end;
$$ language plpgsql IMMUTABLE;

create or replace function acme_utils.c_str_res_id_attr()
returns numeric
as $$
begin
20070 return;
end;
$$ language plpgsql IMMUTABLE;

create or replace function acme_utils.get_str_res(in p_res_id numeric)
returns text
as $$
declare
res text;
begin
select o.name
into strict res
from acme_object o
inner join acme_obj_value on rid (rid.object_id = o.id and rid.attr_id = acme_utils.c_str_res_id_attr())
where o.obj_type_id = acme_utils.c_str_res_ot()
and rid.num_value = p_res_id;
return res;
end;
$$ language plpgsql STABLE;


The need to delete all items before "re-creating" scheme is a bit tiring, but you can live. You may notice in the text of the unfamiliar word 'strict'. It provides the usual Oracle behavior, attempt to fetch zero or more than one record. Other memorable moments can mention strange that calculates the number of rows affected by the last query:

Oracle version
insert into acme_resource_tmp(id, device_id, profile_id, owner_id, type_id, res_num, name)
select acme_main_sequence.nextval, t.device_id, t.profile_id, r.id., p.type_id, t.num, t.value
from acme_state_tmp t
acme_profile_detail inner join p on (p.profile_id = t.profile_id and p.param_id = t.param_id)
acme_resource inner join r on (r.device_id = t.device_id and r.owner_id is null);
l_ic := sql%rowcount;


PostgreSQL version
insert into acme_resource_tmp(id, device_id, profile_id, owner_id, type_id, res_num, name)
select nextval('acme_main_sequence'), t.device_id, t.profile_id, r.id., p.type_id, t.num, t.value
from acme_state_tmp t
acme_profile_detail inner join p on (p.profile_id = t.profile_id and p.param_id = t.param_id)
acme_resource inner join r on (r.device_id = t.device_id and r.owner_id is null);
l_ic get diagnostics = row_count;


The implementation of all the packages I had, of course, to rewrite, since they were not so much. From my previous story, you can understand that the whole rewrite of the SQL code is divided into three categories:

    the
  1. Requests that by a slight rewriting could bring to mind a platform-independent
  2. the
  3. Requests that are platform specific fragments managed to hide in views
  4. the
  5. of Course platform-specific code

With the first two no difficulty. The latter category can bring some problems if the platform-specific structures are present in the queries generated by the client. The fact that Java code does not want to rewrite. Even less willing to share the source code for two versions of working with various DBMS. Unfortunately, to completely eliminate platform-specific designs from client code failed. For the most part, interfered with the keyword table querying table functions. There were also appeals to a bit sequence and hierarchical queries.
It was decided to keep all platformization queries to the database, loading them into the software cache on the first access. Initially it was assumed that each database will store your version of the query, but turned out to be more convenient to store the queries simultaneously across all of the DBMS. In Oracle, storage of the text of the queries used CLOB field in PostgreSQL — text. To ensure consistency, be used convert CLOB to varchar2, restricted maximum request size of 4000 characters (one request still got beyond this size, but since it was intended for the version of PostgreSQL, to compress it did not have). Self transformation to_char had to hide using the view:

Oracle version
create or replace view acme_query(name, sql) as
select a.name, to_char(c.clob_value)
from acme_object a
acme_obj_list_value inner join b on (b.object_id = a.id and b.attr_id = 10061)
acme_obj_value inner join c on (c.object_id = a.id and c.attr_id = 10062)
where a.obj_type_id = 10004
and b.list_value = 10061;


PostgreSQL version
create or replace view acme_query(name, sql) as
select a.name c.clob_value
from acme_object a
acme_obj_list_value inner join b on (b.object_id = a.id and b.attr_id = 10061)
acme_obj_value inner join c on (c.object_id = a.id and c.attr_id = 10062)
where a.obj_type_id = 10004
and b.list_value = 10062;


In summary, I can say that the work was not so terrible as it seemed in the beginning. Most of it was associated with the rewriting of hierarchical queries and Oracle packages, and most of the problems with more strict SQL syntax and lack of the usual implicit conversions in PostgreSQL. The scope of work could be less if we initially used a more rigorous and pathometabolism code in Oracle.

Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

mSearch: search + filter for MODX Revolution

Emulator data from GNSS receiver NMEA

The game Let's Twist: the Path into the unknown