Usually in this blog I’m writing about various fun topics around PostgreSQL – like perhaps new cool features, some tricky configuration parameters, performance of particular features or on some “life hacks” to ease the life of DBA-s or developers. This post will be quite different though – inspired by an article I stumbled upon recently, I thought I’d also try my hand at writing about some less flattering sides of PostgreSQL 🙂 And yes there are some for sure – as with any major piece of software.
In the referenced article though, quite some points are debatable from a technology viewpoint, but in general I enjoyed the idea of exploring also the dark side of the moon once in a while. Especially in our relational database domain, where problems tend to be quite complex – as there’s a bunch of legacy code out there (PostgreSQL is around 25 years old already!) and there are the official SQL standards to be adhered to and there are a gazillion different database / OS / hardware combinations out there in the wild. So yeah, one definitely shouldn’t get blinded and praise any product X univocally…especially given that the software world has gotten so complex and is moving so fast nowadays, that my guess is that nobody actually has the time to try out and learn all those competing technologies on a sufficiently good level. Well maybe only in academia, where one is not expected to directly start generating $$, the faster the better.
To love or hate, that is the question
About the title – although it was kind of tempting to also use the word “hate” (more clicks probably), I found it too harsh in the end. In general – PostgreSQL is a truly well written and maintained software project, with a healthy Open Source community around it and it is also my default database recommendation for most applications. Its price – performance ratio is just yet to be beaten in the RDBMS sphere I believe. And there are far more benefits than shortcomings I can think of – with the most important ones, for me being the lightweightedness and simplicity for DBA operations (building a replication server is a one-liner!). From the development side it is maybe the fact that Postgres is at the very top when it comes to implementing the ISO SQL standard, so that once PostgreSQL really becomes a bottleneck for you. You could migrate to something else with a reasonable amount of work. In this area of course a lot of commercial competitors have no real interest in such qualities also – would you want to make migrating away from your product easy, if it’s generating you tens of thousands of dollars per instance per year?
So in short, as concluded in the referenced post, you can’t go wrong with PostgreSQL for most applications. But anyways, here are some of my points that I think could be improved to make PostgreSQL even better. Most of them are rather minor complaints you’ll find, or just looking at the clouds and wishing for more.
#1 Rigid query planning
A bit related to the “no hints dogma” from the referenced blog post (not to forget about the “pg_hint_plan” 3rd party extension though!) one of my biggest gripes with Postgres, when it comes to complex queries – is the mostly static query planner. Mostly here means that except some extra logic on choosing plans for prepared statements and PL/pgSQL stored procedures (which basically also boil down to prepared statements) – the planner / execution doesn’t gather any feedback from the actual runs! It pretty much cold-heartedly looks at the query, the pre-calculated statistics of tables mentioned in there, and then selects a plan and sticks to whatever happens – even if the row estimate for the first executed node is a million times off and some millions of rows are now going to be passed higher via a nested loop join, an algorithm that’s rather meant for handling small to medium datasets. I do understand that employing some adaptive query planning strategies is a huge task and would need decent resources to tackle, but some simple heuristics like automatic re-analyze based on seen data or trying some other join types in the presented example should be probably doable as “low hanging fruits”. And it seems some people are already trying out such ideas already also – there’s for example the AQO project. But for real progress some wider cooperation would be probably needed to keep up with the big commercial products, who already throw around buzzwords like “AI-based query optimization”. Not sure if just marketing though or they really have something, not much stuff on that on the interwebs.
#2 Some levels of automatic tuning
A bit connected to the first point, is the idea of PostgreSQL automatically selecting some better values for the default parameters, based on statistics gathered over time or just looking at available OS resources for example. Currently Postgres actually tries hard on purpose not to know too much about the OS (to support almost all platforms out there)…but for some reasonable subset of platforms and configuration settings it could be done theoretically. Not talking about complex stuff even, but just the main things like optimizing the checkpoint settings (afters some safety time + warnings) if checkpoints are occurring too often, or increasing Autovacuum aggressiveness if there’s lots of data changes coming in, or looking at memory availability and increasing work_mem to get rid of some temp files or enabling wal_compression if CPU is sitting idle. Currently I think the only parameter that is automatically tuned / set is the “wal_buffers” parameter.
Well, in general the topic of tuning is not a real problem for seasoned DBA-s of course, actually the opposite – it’s our bread 🙂 But it would surely benefit most developers out there and this is again what the competition is already touting and in the long run this would benefit the project hugely as DBA-s are kind of a scarce resource nowadays and extensive tuning could be off-putting for a lot of developers.
#3 Lost statistics during pg_upgrade
I’m sure most people have not heard or seen this issue so it can’t be probably described as a major problem. It has to do with the fact that old table statistics are not carried over when migrating to a newer Postgres version using the pg_upgrade utility – which is the fastest way to migrate, by the way. The original post is talking about some hours of downtime…but with the “–link” flag it’s usually less than a minute! But as said, it’s not a biggie for most users that have ample downtime for upgrades or have only normal simple queries, selecting or updating a couple of rows over an index…but for advanced 24/7 shops with lots of data it can be quite annoying – as it’s hard to predict the kind of spikes you’re going to get during those first critical minutes after an upgrade, when statistics are still being rebuilt. So here would be actually really nice if the statistics would not be “just deleted“ with every release, but only when there are really some incompatibilities. For some versions I even looked at the pertinent structures (pg_statistic group of tables) and couldn’t see any differences, so I think it’s just a corner case and hasn’t gotten enough attention. As there are also alternatives – if you really want to avoid such “iffy” moments, one could use Logical Replication (v10 and above) instead of the good old “in place” upgrade method. Some details on such upgrades can for example be found here.
#4 Autovacuum and XID Wraparound
Also “featured” in the original post – historically the XID Wraparound and the seemingly randomly operating Autovacuum background process have definitely been the number one problem for those who are not so up to date on how Postgres MVCC row versioning works and haven’t tuned accordingly. At default settings, after some years of operation, given that transaction counts are also increasing steadily, it’s indeed very possible to have some downtime to take care of the gathered “debt”. But still I’d say it’s not so tragic for most people as depicted, as it affects only very busy databases. And if you’re running with version v9.6 of PostgreSQL or higher then there’s a good chance that you’ll never get any Wraparound related downtime as the algorithms have been made a lot smarter. And it has improved some more as of v12 where Autovacuum is much more aggressive by default and one can do quick vacuums by skipping index maintenance!
And to defend Postgres a bit – one can enable early warning signals for such situations (log_autovacuum_min_duration) and there are many internal statistics available…one just needs to use this information and take actions – the tools are also there. But there’s definitely room for more improvements. For example, one cool idea (from my colleague Ants) would be to allow explicit declaration of “tables of interest” for long running snapshots. Currently many Autovacuum / Wraparound problems are caused by the fact that long running transactions sadly block pretty much all autovacuum cleanup activities within the affected database, even if we’re selecting a table that is completely static…Such declarations could probably help on the replica side also with reducing recovery conflict errors, making Postgres more load-balancing friendly.
FYI – in this area there are also some very promising developments happening with the zHeap project, that aims to provide an alternative to the MVCC row storage model, reducing bloat and thereby surprises from the background processes like the Autovacuum.
#5 Disk footprint and bloat
This is another well acknowledged problem area for PostgreSQL – it’s “on disk” footprint is mostly a bit higher than that of its competitors. Normally this doesn’t show in real life as most of your “working data set” should be cached anyways, but it can gradually slow things down, especially if again running at default configs or not doing any manual maintenance time to time. But luckily – some things are already happening in this area also, like the already mentioned zHeap project and also a new hybrid store (row + column storage) called ZedStore is emerging that will compress the data size considerably and will make Postgres more suitable for ultra large data warehouses – so let’s hope it makes it into Postgres v14.
One could throw in here also the full compression topic…but in my opinion it’s not really that important for OLTP databases at least, as enterprise SSD disks are fast as hell nowadays and you have the options to dabble on the File System level. And for these rare use cases where 10TB+ data warehouses with near real-time analytical query expectations are needed, I’d maybe not recommend Postgres as default anyways. There are better specialized systems employing some hybrid and columnar storage approaches for such needs – just automatic block compression would not save us there. Not to say that automatic compression is a bad idea though – for example the latest versions of TimescaleDB (a 3rd party PostgreSQL extension for time-series data) employing such tricks, can reduce the “on disk” size dramatically and thereby speed up heavier queries. There’s also another extension called cstore_fdw extension for Data Warehouse use cases, so there are options out there already for Postgres.
#6 Out-of-the-box vulnerability against brute force password attacks
Although there’s some thought put into it as there’s a “contrib” (bundled with Postgres) extension named “auth_delay” for exactly this purpose – out of the box there’s no limit on password login attempts! Also, by default users can open up to $max_connections (minus some superuser reserved connections) sessions in parallel…this could mean trouble if you have some mean gremlins in your network. Note though that under “password login” I mean “md5” and “scram-256” auth methods, meaning if you forward your authentication to LDAP or AD you should be safe – but be sure to validate that. Or enable the “auth_delay” extension anyways, better safe than sorry as it’s really easy to crack weak passwords if you know the username – basically a one-liner! So it would be really nice to see some kind of soft limits kicking in automatically. NB! This doesn’t mean that you don’t get notified of the failed login attempts, but it could be too late if you’re not monitoring the log files actively! PS there’s also an extension dealing with the same “problem space” called pg_auth_mon and we (Cybertec) also have a patch for support customers to disable such attacked accounts automatically after X failed attempts.
Don’t forget though – by default access config Postgres only listens to connections from the localhost, so the potential threat only occurs if some IP range is made explicitly accessible!
#7 No query cache
This is one of those “wishing for more” or “nice to have” items on my list. The general idea being that it’s quite wasteful to execute exactly the same deterministic query (fixed parameters, immutable functions etc) and scanning some large tables fully every time, returning exactly the same results – if the data is not changing! Of course there are some non-obvious technical pitfalls probably lurking around here…but some databases have managed to implement something like that. MySQL now decided to retire it in v8.0 due to some unfortunate implementation details…so the stuff is not easy. But as performance is the main thing that developers (especially junior ones) are worried about, and given that Postgres materialized views are also not always useful, something like that would be an awesome addition.
That should do
Phew…that was tiring already. Seems it’s not really that easy to find 10 things to improve about PostgreSQL…as it’s a solid database engine already 🙂 So let’s stop at 7 this time, but if you have some additional suggestions, please leave a comment. Hope it broadened your (database) horizons and do remember to decide on the grand total of pluses and minuses.