Memo Evangelist PostgreSQL: MySQL criticize intelligently

Hi, Habr! This publication is an attempt to dispel some popular myths and legends about MySQL. I was not wrong with the hub as the reason for writing was the publication varanio PostgreSQL that are not in MySQL, and Vice versa from here. The publication of the criticism in MySQL, though not perfect, but quite correct, but the comments to it leads to sad thoughts.
Generally speaking, I was going to write a post about the features that are not implemented or implemented in PostgreSQL worse. But in order not to interfere with a lot of topics in one publication, and considering the rather difficult work than that I know very well (MySQL) that I know very bad (PostgreSQL), such publication I decided to procrastinate and to begin to respond immediately to many of the comments from a publication varanio.
Why do I need it? Well, first, again, the Internet someone is not right. That is itself a small problem, but unfortunately many of these cool stories I have to meet not only online, but also to hear the various Russian conferences, including elite conferences elite speakers. Especially strange to hear not quite correct statements about MySQL from representatives of the company Postgres Professional, which I take with great sympathy and wish her every success and prosperity. Therefore, in the cultural exchange and to improve the technical level of discussion about the advantages and disadvantages of PostgreSQL and MySQL, I would for starters not only wanted to dismantle common misconceptions and provide correct wording, and describe the really serious (in my opinion) problems in MySQL at the moment.
So, school young joliverie. Lesson one: "MySQL Criticize properly". Zing!
the
To not to criticize MySQL?
Let's start with the most basic myths: kalotaritissa.
the
"MySQL is a legacy"
I will just give you a brief and is not a complete list of well-known companies, where MySQL plays a key role in the infrastructure and business: Github, Wikipedia, Google, Facebook, Twitter, LinkedIn, Alibaba, Taobao, Booking.com, AirBnB, Dropbox, Pinterest, GroupOn, Yelp.
I know that there are big and famous projects to PostgreSQL. But, first, few can compare in scale with these projects. And secondly, this list is just useful to remember, not to lose touch with reality, when someone gives something profound in the style of "MySQL is not needed."
the
"MySQL does not Have a community!"
Criteria for the assessment of the extent and activity of the community, it is possible to argue long, but the criteria in http://db-engines.com/en/ranking seem quite reasonable.
On the other hand in Russia, the popularity of PostgreSQL and the community do slightly above "average" trends. Don't quite understand what caused it, but I don't see anything wrong with that.
the
the "MySQL has a community, but it is fragmented!"
Never could figure out what it is. It sounds as if the users MySQL Oracle need to be completely retrained to work with MariaDB. Or DBA working with MariaDB knows absolutely nothing about Percona Server. In reality 99% of skills, books, articles, utilities, obtained somewhere councils, etc. can be safely used on any version of MySQL.
the
"many MySQL forks and they disorder"
For the first time this strange assertion, I heard a few years ago on one of Russian conferences. Since I often hear this from people who are obviously poorly represent what they are saying. Including in the above post varanio. I've never been able to obtain clear answers to two simple questions:
the
- How many "forks" of MySQL you can count?
- writing data to files the
- entry in transaction journal the
- entry in binary log the
- Yes, there is some doublewrite buffer, which judging by the title two more times writes!
what is the "confusion"? the
Failed to get clear answers this time, but I have not really hoped for. In the development of a free and open source "forks" are very frequent. It is usually considered that it is good for it is free and open to the public. In addition, the "forks" and "branches" MySQL even less than in many other well-known projects. Less than really PostgreSQL, for that matter.
By the way, this argument is strikingly similar to FUD from Microsoft. So Microsoft is touting its superiority over Linux in the beginning of this century:

the
"MySQL is owned by Oracle (and PostgreSQL to anyone)"
This is certainly true, but some important conclusions can be made from this, I don't really understand, and the flight of thought of the critics usually stops.
Theoretically, Oracle is entitled to roll the project / close the source to make it pay. Is it really — everyone can decide for himself. For me it is absolutely unreal — it was feared at the time of purchase of Sun Microsystems by Oracle, but it's been 5 years (Carl!) and during this time MySQL has evolved faster than ever before purchase. In addition, Oracle people aren't dumb and there are well aware that shutting down MySQL would hit the Oracle.
the
is "Subjective in PostgreSQL less bugs"
That could be it, but how would that compare objectively in the absence of a bug tracker in PostgreSQL?
This a non-technical myths end and move on to more meaningful.
the
"PostgreSQL is much faster than MySQL"
I have been optimizing code and MySQL, as a result, often spend load tests, though mostly between different version and variants of MySQL. Here's what I can say about such claims: I am absolutely sure that there are load/requests/configurations, where PostgreSQL will work at times faster than MySQL, as I am sure and selling — there are loads/requests/configurations where MySQL will be much faster PostgreSQL. In the case of MySQL there is another important criterion — used engine (storage engine), which also may change the situation not only in times, but by orders of magnitude.
Every time someone complains about performance (no matter MySQL or PostgreSQL), it is asked to show the schema/queries/explain/configuration. It is not surprising — all the talk about the performance can be conducted only in such a context. General allegations are not listed.
the
is "Replication in PostgreSQL is done in the mind. And in MySQL there is no"
In MySQL implements logical replication (regardless of statement-based or row-based formats). In PostgreSQL replication is physical (not "binary", namely physical).
Each approach has its pros and cons. Don't even need to be a specialist in one or another DBMS to understand it. Users of MySQL have extensive experience in bypass or mitigate the negative effects of logical replication, but most likely ever will and physical, because do we need one and the other. I suspect that PostgreSQL is a similar process, but in the opposite direction.
This topic is very extensive and raises so many questions that I'm thinking about a separate post.
the
is "non-strict MySQL works with data"
Options: "MySQL quietly divides by zero!"
Technically correct, but boring version of this statement reads: "In MySQL versions < 5.7 must not forget to include the correct SQL modes"
5.7 settings "rigor" is quite a reasonable default. The reason why this took so long to do it, is simple: as with any popular project, the main competitor of MySQL is its own the old version. I have to pull backward compatibility with older poorly written apps. 5.7 with that decided to kill that as well.
the
"In MySQL is non-transactional engines MyISAM type"
Yes, but it's not very clear what the criticism is. Talking about MyISAM in 2015 year only to support legacy applications. Honestly, the last time a live user MyISAM I met five years ago.
Though there is a non-transactional engine which requires no transactioncost. For example CSV. To make it easier to understand the PostgreSQL user, the CSV engine is about as file_fdw. It is rather to exchange data with other applications, not for actual data storage.
the
"In MySQL, transactions are screwed at"
Under this everyone understands something different. Someone thinks so, because you cannot put DDL in a transaction. Which of course is true, but the problem is growing from the "transactional side" and the DDL of the curves (more on this below in the section "How to criticize MySQL").
Boring, technically correct version: "MySQL non-transactional DDL".
Someone says that the error is inside a transaction does not lead to automatic rollback. Yes, the errors are different (for example, lock wait timeout) and the application has the ability to repeat the last statement, not roll back the entire transaction. As far as I know, Oracle and SQL Server by default behave the same — it is not a violation of any standards. In PostgreSQL, to be able to repeat the last statement with the error would have to wrap every statement within the transaction to SAVEPOINT. What, for example, and has resulted in ON_ERROR_ROLLBACK for psql. Ie, as usual, the approaches are different, with their pros and cons.
the
"In MySQL is very expensive DDL"
This refers to re-creating the table in certain operations, for example deleting a constraint. Already in 5.6 you can do the ALTER without recreating the table almost all transactions . In particular, the constraint can be removed without reconstruction. 5.7 the list in-place operations even wider.
I don't know how it is in PostgreSQL (a quick search shows that some problems still exist). But for MySQL there is a utility pt-online-schema-changethat allows you to bypass many restrictions on the ALTER TABLE in MySQL and about which often forget or don't know.
the
"In MySQL, some incorrect MVCC"
The mechanism is MVCC in InnoDB is implemented well. From the user's point of view, it is almost identical to implementing MVCC in Oracle and is very similar to the implementation in PostgreSQL. But the differences in the behavior of many surprise that gives rise to similar myths. The fact that the standard is very vaguely defines many delicate moments in different levels of isolation. As a result, each DBMS interpreterpath these "gaps" in their own way. A good comparison MVCC in Oracle, PostgreSQL and MySQL/InnoDB, you can read here. But even there not include all of the nuances.
Over the last 10 years of optimization and scalability in InnoDB MVCC had spent a huge amount of effort. I also applied to it.
updated 02.04.2017: a more formal attempt to understand the differences between the implementation of MVCC in different DBMS, including PostgreSQL and MySQL.
the
"because Of the connected engines MySQL writes data to the disk 2, 3, 4, and in time of war and on 5 times"
Incredibly naive statement. The logic is simple:
the
-
the
Here we need to talk about the characteristic of "write amplification". I couldn't find a good term in the Russian language, but in fact it is the ratio of the total amount of data written to disk to the total amount of data transferred by the client. View what it includes.
First, the entry in the data files only happens when you reset the updated pages to the disk. How long the page may be updated in memory before flushed to disk (and therefore how much the change write amplification) depends on many parameters: size of the shared buffer (the buffer pool in InnoDB), the size of the transaction log, the algorithm controlling the reset pages, server settings and of course the type of load. It's all applicable to both MySQL and PostgreSQL.
Secondly the transaction log contains not only updates records in a table, but all physical changes in the data files. For example, InnoDB is the management of the index trees (separating pages when you are filling the Union when you remove, rebuild, tree, etc.), deleting old versions of records (purge operation), change operation on the buffer and other internal accounting. This is also applicable for MySQL and PostgreSQL.
Thirdly, the binary log contains only the logical changes to the data: he did not "know" about the format of the data files and their entire internal accounting Department. Write amplification for the binary log depends on many parameters (statement/row-based, binlog_row_image and other settings). In addition a binary log can be disabled — you need it often, but not always.
Well, finally doublewrite buffer is activated also only when flushing the page to disk, and not on every INSERT/DELETE/UPDATE. Absolutely the same mechanism, only in the profile in PostgreSQL is called "full page write". Here and there you can disable it under certain conditions.
You should also consider redundancy of data formats (i.e., overhead for service information), page size (in InnoDB you can specify it when creating the base), compression and whatnot.
I hope from all this it is clear that to calculate the write amplification characteristic is impossible — too many parameters. It can be measured for a specific load, a specific configuration and (in the case of MySQL) specific engine. For example, for optimized recording engine type TokuDB or MyRocks this feature will be much lower than in InnoDB, because they were created.
All that can be said is that binary log (when enabled) leads to an additional overhead for recording. How high costs and whether the total write amplification more than the same features in PostgreSQL that we cannot say, even approximately without measuring on specific workloads and configurations.
Such is the boring truth.
the
the "MySQL too many magazines"
To paint all types of magazines, I will not assign them all more or less clear. The questions mainly causes the binary log. Binary log is a "fee" for the opportunity to have a plug-in engines. "Connected" not in the sense that they can connect dynamically as plug-ins (before such possibility was not), but in the sense that they generally can be several, and the server works with different engines through the API.
Because the physical representation of the data on the drive may vary depending on used engines as a single serialized representation of all of the changes necessary for replication, such a representation is written to abstract from the physical data, the logical form. As I wrote, the logical log does not necessarily duplicate records in the physical log when you update data.
With logical log and logical replication, you can do all sorts of interesting things. In particular, it is widely used in the technology of Galera. I'm going to do a brief overview of the entire economy in the next post.
the
"bad console MySQL client"
As I understand it, this refers to the lack of context-sensitive auto-completion. I agree, it can be convenient. But in order to feel this, you need to spend much time in the console client. As a developer, I spend more time in the editor. I think most of the work is also running DBA scripts, not manually in the console client.
But for those who really need it, recently there was a project mycli with an intelligent auto-completion and even syntax highlighting. And for lovers of beautiful GUI have MySQL Workbench, where is my all. Never used neither one nor the other.
updated 05.10.2017: In beta MySQL Shell, a new command line client from Oracle, there is support for completion for SQL.
In any case, it would be better to say "in the console MySQL client no auto-completion context", because all the criticism it looks like this and comes down.
the
To could to criticise MySQL but already outdated?
We would like to highlight the statements that in General are true, but lose their relevance in light of the impending release of MySQL 5.7.
the
"MySQL does not support JSON"
Limited support was previously in JSON UDF from the svetasmirnova. In 5.7 there is native support. How it is comparable to PostgreSQL, I can not say, but in any case the statement in this form is obsolete.
the
"In MySQL there is no functional indexes"
5.7 functional indices are implemented as indexed virtual/generated columns. In MariaDB they have for a long time, but indexing is only possible for a materialized virtual columns.
the
To need to criticize MySQL?
Go to probably the most interesting part. Real and serious the problem really is, but that's about it, just in halevanah say very rarely. In fact, judging by the blogs and the reports on the conference, MySQL users are more interested in not the absence of, say, window functions, and issues of horizontal scaling, sharding, clustering, high availability, cloud platforms and related automation, monitoring, information protection, and more. As a consequence, in these areas sosredotochena the efforts of developers.
But since we are talking about criticism from the PostgreSQL community, but of all these issues, it generally does not affect, here I will list just, in my opinion, criticisms of what he heard or read before, but worded technically correct.
the
Lack of transaction data dictionary
In MySQL, a "data dictionary" is a set of non-transactional files (.frm .par, etc.) It is a heavy legacy from the most ancient times and it creates a lot of problems.
It is not only the lack of transactional DDL in which the dictionary data is a prerequisite. This problem with expensive queries against the INFORMATION_SCHEMA, the problems with extensibility of the metadata format, the problem with physical backups, which are forced to block the server that is basically only to ensure the consistency of transactional data with non-transactional vocabulary, and other.
Work on this everybody Nakatabut results in 5.7 we already precisely will not see.
the
For Oracle, nothing but InnoDB does not exist
Unfortunately, Oracle is not particularly concerned about the fate of the third-party engines for obvious reasons. I don't think the concept of "plug-in" engines will be phased out in the near future, but also to take care of the support of the other engines they are clearly not going to.
As a result, many functionality is implemented only in the InnoDB. Examples: foreign keys, full text search, spatial indexes, virtual columns. It is unlikely that all these functions we will see in a short time in TokuDB or MyRocks. And if they are implemented, they will not be compatible in functionality with the fact that in InnoDB.
the
Imperfect, the query optimizer
The optimizer has always been one of the weaknesses of MySQL. Although I suspect that if you start with detailed questions, most critics will begin to talk about the issues that are fixed 5-10 years ago. Useful improvements in each new version, but the progress is still slower than many would like. There are historical reasons, although it is not so much the concept of plug-in engines, as in very imperfect architecture of the optimizer. It requires a serious review and refactoring, and work with Oracle in this direction, but these things can be done.
the
Weak support "advanced" SQL
This problem is partly related to the previous one. But yeah, no window functions, CTE and many more things. How important it is for users of MySQL is an open question. As I wrote, in blogs and conference papers particularly not to say that some of the functionality of SQL well is not enough. Judging by the changes in release of MySQL, Oracle is also not the priority.

updated 05.10.2017: Support for CTE and window functions appeared in MySQL 8.0. Interestingly, the results of testing are known PostgreSQL Evangelist Marcus Wynand, MySQL currently has the best support for CTE among all popular databases, including PostgreSQL.
the
Weak support GIS
Too true criticism, but in my opinion in Oracle, this is important. 5.7 GIS has been rewritten from scratch, added InnoDB spatial indexes, use the Boost.Geometry is samopisnogo code and started work on the standards. I suspect that the next versions will be much interesting.
updated 05.10.2017:Support GIS has been greatly enhanced in the MySQL 8.0. Of the key points — support for non-Cartesian SRS and extended set of functions for processing of spatial data. Details can be found in this presentation. It would also be interesting to compare this with PostGIS. According to my communication with expertise in the subject people, in some moments PostGIS is already inferior to the capabilities of the built-in support for GIS in MySQL 8.0.
the
Conclusion
I'm sure I missed something in each section, but I'm sure in the comments advanced. A single publication to cover all is impossible, but if someone this publication will make a thoughtful skeptical criticism of MySQL on habré and conferences, I pushed buttons in vain. Write separately about the logical/physical replication and MySQL features that are not implemented in PostgreSQL are implemented or worse, I would be interested, and I probably would venture, unless, of course, I'm not zaminusuyut completely.
Комментарии
Отправить комментарий