by Kaarel Moppel
Table of Contents
So, you’re building the next unicorn startup and are thinking feverishly about a future-proof PostgreSQL architecture to house your bytes? My advice here, having seen dozens of hopelessly over-engineered / oversized solutions as a database consultant over the last 5 years, is short and blunt: Don’t overthink, and keep it simple on the database side! Instead of getting fancy with the database, focus on your application. Turn your microscope to the database only when the need actually arises, m'kay! When that day comes, first of all, try all the common vertical scale-up approaches and tricks. Try to avoid using derivative Postgres products, or employing distributed approaches, or home-brewed sharding at all costs - until you have, say, less than 1 year of breathing room available.
Wow, what kind of advice is that for 2021? I’m talking about a simple, single-node approach in the age of Big Data and hyper-scalability...I surely must be a Luddite or just still dizzy from too much New Year’s Eve champagne. Well, perhaps so, but let’s start from a bit further back...
Over the holidays, I finally had a bit of time to catch up on my tech reading / watching TODO-list (still dozens of items left though, arghh)...and one pretty good talk was on the past and current state of distributed MySQL architectures by Peter Zaitsev of Percona. Oh, MySQL??? No no, we haven’t changed “horses” suddenly, PostgreSQL is still our main focus 🙂 It’s just that in many key points pertaining to scaling, the same constraints actually also apply to PostgreSQL. After all, they’re both designed as single-node relational database management engines.
In short, I’m summarizing some ideas out of the talk, plus adding some of my own. I would like to provide some food for thought to those who are overly worried about database performance - thus prematurely latching onto some overly complex architectures. In doing so, the “worriers” sacrifice some other good properties of single-node databases - like usability, and being bullet-proof.
If you’re new to this realm, just trust me on the above, OK? There’s a bunch of abandoned or stale projects which have tried to offer some fully or semi-automatically scalable, highly available, easy-to-use and easy-to-manage DBMS...and failed! It’s not an utterly bad thing to try though, since we can learn from it. Actually, some products are getting pretty close to the Holy Grail of distributed SQL databases (CockroachDB comes to mind first). However, I’m afraid we still have to live with the CAP theorem. Also, remember that to go from covering 99.9% of corner cases of complex architectures to covering 99.99% is not a matter of linear complexity/cost, but rather exponential complexity/cost!
Although after a certain amount of time a company like Facebook surely needs some kind of horizontal scaling, maybe you’re not there yet, and maybe stock Postgres can still provide you some years of stress-free cohabitation. Consider: Do you even have a runway for that long?
On my average workstation, I can achieve around 25k simple read transactions per CPU core on an "in memory" pgbench dataset using the default Postgres v13 configuration. After tuning, this increased to ~32k TPS per core, allowing a high-end server to handle about 1 million short reads. Replicas can further multiply this by 10, though query routing needs to be managed, potentially using the new LibPQ connection string syntax (targetsessionattrs). Postgres doesn't limit replicas, and with cascading, you could likely run dozens without significant issues.
On my humble workstation with 6 cores (12 logical CPUs) and NVMe SSD storage, the default very write-heavy (3 UPD, 1 INS, 1 SEL) “pgbench” test greets me with a number of around 45k TPS - for example, after some checkpoint tuning - and there are even more tuning tricks available.
Given that you have separated “hot” and “cold” data sets, and there’s some thought put into indexing, etc., a single Postgres instance can cope with quite a lot of data. Backups and standby server provisioning, etc. will be a pain, since you’ll surely meet some physical limits even on the finest hardware. However, these issues are common to all database systems. From the query performance side, there is no reason why it should suddenly be forced to slow down!
Given that 1) you declare your constraints correctly, 2) don’t fool around with some “fsync” or asynchronous commit settings, and 3) your disks don’t explode, a single node instance provides rock-solid data consistency. Again, the last item applies to any data storage, so hopefully, you have “some” backups somewhere...
Meaning: that even if something very bad happens and the primary node is down, the worst outcome is that your application is just currently unavailable. Once you do your recovery magic (or better, let some bot like Patroni take care of that) you’re exactly where you were previously. Now compare that with some partial failure scenarios or data hashing errors in a distributed world! Believe me, when working with critical data, in a lot of cases it’s better to have a short downtime than to have to sort out some runaway datasets for days or weeks to come, which is confusing for yourself and your customers.
In the beginning of the post, I said that when starting out, you shouldn’t worry too much about scaling from the architectural side. That doesn’t mean that you should ignore some common best practices, in case scaling could theoretically be required later. Some of them might be:
This might be the most important thing on the list - with modern real hardware (or some metal cloud instances) and the full power of config and filesystem tuning and extensions, you’ll typically do just fine on a single node for years. Remember that if you get tired of running your own setup, nowadays you can always migrate to some cloud providers - with minimal downtime - via Logical Replication! If you want to know how, see here. Note that I specifically mentioned “real” hardware above, due to the common misconception that a single cloud vCPU is pretty much equal to a real one...the reality is far from that of course - my own impression over the years has been that there is around a 2-3x performance difference, depending on the provider/region/luck factor in question.
You’d be surprised how often we see that...so slice and dice early, or set up some partitioning. Partitioning will also do a lot of good to the long-term health of the database, since it allows multiple autovacuum workers on the same logical table, and it can speed up IO considerably on enterprise storage. If IO indeed becomes a bottleneck at some point, you can employ Postgres native remote partitions, so that some older data lives on another node.
Initially, the data can just reside on a single physical node. If your data model revolves around the “millions of independent clients” concept for example, then it might even be best to start with many “sharded” databases with identical schemas, so that transferring out the shards to separate hardware nodes will be a piece of cake in the future.
There are benefits to systems that can scale 1000x from day one...but in many cases, there’s also an unreasonable (and costly) desire to be ready for scaling. I get it, it’s very human - I’m also tempted to buy a nice BMW convertible with a maximum speed of 250 kilometers per hour...only to discover that the maximum allowed speed in my country is 110, and even that during the summer months.
The thing that resonated with me from the Youtube talk the most was that there’s a definite downside to such theoretical scaling capability - it throttles development velocity and operational management efficiency at early stages! Having a plain rock-solid database that you know well, and which also actually performs well - if you know how to use it - is most often a great place to start with.
By the way, here’s another good link on a similar note from a nice Github collection and also one pretty detailed overview here about how an Alexa top 250 company managed to get by with a single database for 12 years before needing drastic scaling action!
To sum it all up: this is probably a good place to quote the classics: premature optimization is the root of all evil…
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
I like that you compared another DB engine for other data approaches.
Big key point here: "premature optimization is the root of all evil."
Yes, and the simplicity of a single instance DB allows development and maintenance resources to be focused on the application and business needs, rather than whiz-bang tech. The whiz-bang tech often is brittle and actually can reduce availability of consistent data.
" the default very write-heavy (3 UPD, 1 INS, 1 SEL) “pgbench” test greets me with a number of around 45k TPS"
- what was your pgbench command? I'd like to be able to see what I get and want to make sure I run the same test..
Me too, this will be good comparison.
It was very ad-hoc command line from my side at that time, so I believe all defaults - except --protocol=prepared. This might be the key here if you're seeing much lower numbers. And also note that on the server by default the more light unix sockets are used for connecting which is not of course possible case for real life remote clients, so this could also be another reason. One should probably not use the unxi socket though for such TPS tests in the future...gives some artificial boost indeed
Excellent article. something which I always wish to tell.
Many of those how are coming from commercial database systems are already victims of over engineering.
Glad you liked it Jobin! And given there was a quite some feedback for this article seems that things go in cycles and people are now slowly acknowledging the problem of extra complexity that was once though of as cool and necessary and are ready for simplifications again