2 weeks ago a new PostgreSQL major version with the number 10 was released (note the new numbering scheme!). I already covered my favorite parts from the official release notes from a DBA’s point of view already here, so also take a look there, if you are interested in the operational side as well. In this blogpost, I would like to look at the list of changes from a developers angle (disclaimer - not doing too much development these days though, just keeping track of things). Of course, some features are shared with the DBA part, but there is still a lot of new stuff – enjoy!
The long-awaited feature enables easy setup and minimal performance penalties for application scenarios, when you only want to replicate a single table or a subset of tables or all tables, enabling totally new application scenarios and enabling also zero downtime upgrades for following major versions! The top feature of version 10 for me.
An old way of managing partitions via inheritance and creating triggers to re-route inserts to correct tables ,was bothersome to say the least, not to mention the performance impact. Currently supported are „range“ and „list“ partitioning schemes. If someone is missing „hash“ partitioning available in some DB engines, one could use „list“ partitioning with expressions to achieve the same.
Libpq will connect to the first responsive server in the list. This helps greatly in providing some basic high-availability to applications without any extra software! FYI – one needs to update the Postgres client libraries and possibly also your database driver of choice to make use of the feature.
Now there’s a new Libpq connect parameter called target_session_attrs, with possible values “read-write” and “any” so that by listing replicas first, in your connect string (see previous list item) and choosing “any”, one can do simple read balancing. For writes one needs another connection (according pools recommended) with “read-write”. That is not really transparent (transparent way possible with pgpool2 for example) but might be just good enough.
For highly critical systems it is now possible to check, if the last transaction was committed, given that the txid was stored locally. For example, It can happen that the transaction was committed nicely and just the acknowledgement didn’t make it over the network to the client.
For highly critical systems it’s now possible to say that in addition to the connected server, the commit has to be propagated to any number of other servers. Previously the only option was a priority list, making managing server failures more difficult.
Such stats need to be created manually on a set of columns of a table, to point out that the values are actually somehow dependent on each other. This will enable to counter slow query problems, where the planner thinks there will be very little data returned (multiplication of probabilities yields very small numbers usually) and will choose a „nested loop“ join for example, that does usually not perform too well on bigger amounts of data.
Hash indexes are now WAL-logged, thus crash-safe and ready for wider use. They also received some performance improvements for simple searches so that they should actually be faster than standard B-tree indexes for bigger amounts of data. Bigger index size too and only suitable for equality searches.
Quite an awesome feature - now it is possible to look at both, old and new data of the affected rows for a statement level trigger. This can be used to calculate some aggregates for example or to reject some updates if suspiciously too many rows were changed.
Now one can drop a function based on name only. This is one of those small things that add up if you work a lot with stored procedures, as 99% of time the names are indeed unique. Less typing is good 🙂
This could be done previously only by hacking the system catalogs...which could bite you if not being careful.
In addition to some smaller JSONB stuff this addition once again assures us that JSONB is first class citizen in Postgres and mixed applications (NoSQL + SQL) have even more options.
Postgres has long had the feature to treat simple files as virtual tables, but this feature could be a huge win for “warm data” / “cold data" scenarios where archive data grows too big but is occasionally still needed. When compressing Postgres data one can easily win 10-30x on disk size by the way. A sample from the Postgres Wiki:
1 2 3 4 |
CREATE FOREIGN TABLE test(a int, b text) SERVER csv OPTIONS (program 'gunzip -c /tmp/data.czv.gz'); |
That was that – step by next year for Postgres 11 !
Last week a new PostgreSQL major version with the number 10 was released! Announcement, release notes and the "What's new" overview can be found from here, here and here – it’s highly recommended reading, so check them out. As usual there have been already quite some blog postings covering all the new stuff, but I guess everyone has their own angle on what is important so as with version 9.6 I'm again throwing in my impressions on the most interesting/relevant features here.
As always, users who upgrade or initialize a fresh cluster, will enjoy huge performance wins (e.g. better parallelization with parallel index scans, merge joins and uncorrelated sub-queries, faster aggregations, smarter joins and aggregates on remote servers) out of the box without doing anything, but here I would like to look more at the things that you won’t get out of the box but you actually need to take some steps to start benefiting from them. List of below highlighted features is compiled from a DBA's viewpoint here, soon a post on changes from a developer’s point of view will also follow.
First some hints on upgrading from an existing setup – this time there are some small things that could cause problems when migrating from 9.6 or even older versions, so before the real deal one should definitely test the upgrade on a separate replica and go through the full list of possible troublemakers from the release notes. Most likely pitfalls to watch out for:
The latter naming could be confused with normal server logs so a "just in case" change. If using any 3rd party backup/replication/HA tools check that they are all at latest versions.
Make sure to verify that your log parsing/grepping scripts (if having any) work.
If using the default 10 postgresql.conf settings on a machine with low number of CPUs you may see resource usage spikes as parallel processing is enabled by default now - which is a good thing though as it should mean faster queries. Set max_parallel_workers_per_gather to 0 if old behaviour is needed.
To ease testing etc, localhost and local Unix socket replication connections are now enabled in "trust" mode (without password) in pg_hba.conf! So if other non-DBA user also have access to real production machines, make sure you change the config.
The long awaited feature enables easy setup and minimal performance penalties for application scenarios where you only want to replicate a single table or a subset of tables or all tables, meaning also zero downtime upgrades for following major versions! Historically (Postgres 9.4+ required) this could be achieved only by usage of a 3rd party extension or slowish trigger based solutions. The top feature of version 10 for me.
Old way of managing partitions via inheritance and creating triggers to re-route inserts to correct tables was bothersome to say the least, not to mention the performance impact. Currently supported are "range" and "list" partitioning schemes. If someone is missing "hash" partitioning available in some DB engines, one could use "list" partitioning with expressions to achieve the same.
Hash indexes are now WAL-logged thus crash safe and received some performance improvements so that for simple searches they're actually faster than standard B-tree indexes for bigger amounts of data. Bigger index size though too.
Such stats needs to be created manually on a set of columns of a table, to point out that the values are actually somehow dependent on each other. This will enable to counter slow query problems where the planner thinks there will be very little data returned (multiplication of probabilities yields very small numbers usually) and will choose for example a "nested loop" join that does not perform well on bigger amounts of data.
Now one can use the pg_dump tool to speed up backups on standby servers enormously by using multiple processes (the --jobs flag).
See max_parallel_workers and min_parallel_table_scan_size / min_parallel_index_scan_size parameters. The default values (8MB, 512KB) for the latter two I would recommend to increase a bit though.
New roles pg_monitor, pg_read_all_settings, pg_read_all_stats, and pg_stat_scan_tables make life a lot easier for all kinds of monitoring tasks - previously one had to use superuser accounts or some SECURITY DEFINER wrapper functions.
Does couple of smart checks to discover structural inconsistencies and stuff not covered by page level checksums. Hope to check it out more deeply in nearer future.
This would for example enable having a single maintenance/monitoring script with version specific branching (different column names for pg_stat* views etc) instead of many version specific scripts.
1 2 3 4 5 6 |
SELECT :VERSION_NAME = '10.0' AS is_v10 gset if :is_v10 SELECT 'yippee' AS msg; else SELECT 'time to upgrade!' AS msg; endif |
That's it for this time! Lot of other stuff didn't got listed of course, so for full time DBAs I'd definitely suggest to look at the notes more thoroughly. And a big thanks to those 300+ people who contributed their effort to this particularly exciting release!
If you need further information, please feel free to contact us directly.
BY Kaarel Moppel - Although Postgres is a great all-around product that can’t be beaten in price-performance ratio, there’s one area where things could be a bit better. That’s in the validation of replicas, or maybe even more widely, all clusters.
What's the problem? After building a new replica there’s actually no way to tell in a “pushbutton” way if everything is absolutely fine with the new replica. The replica building tool pg_basebackup (there are of course some other tools/approaches but I suspect that the pg_basebackup tool has 90% of the “market”) just streams over the datafiles and transaction logs and writes them to disk and that’s it – done. After that, you usually modify the configs if needed and start your server and then see if you can connect to it and maybe also issue some “select count(*) from my_favourite_table” to verify that things are more or less plausible, before grinning in satisfaction and grabbing a coffee to tackle other burning tasks... but to what extent did we actually verify that the cluster is intact?
So what could go wrong when building up a new replica? Of course if pg_basebackup reports success then not too much actually, it is very unlikely that something is wrong. But a couple of things still bug me... but I might be a little paranoid 🙂 Just read on too see what I mean.
First the hard disks: in theory, disk errors should be already a thing of the past. Enterprise-grade disk systems should definitely have some built-in error detection. The thing is, nowadays everything is so abstracted and virtualized. You actually hardly know what’s down there. Maybe it’s some network storage? It for sure requires some drivers on the Operating System side, and also on the hardware side. As we all know, every software sadly has bugs. Google says the industry average is about 15 - 50 errors per 1000 lines of delivered code. So one cannot 100% exclude the chance of silent data corruption on writing datafiles. That's especially true because there will typically be some new untested hardware for replicas.
Luckily, Postgres already has some remedy against such cases. The “data-checksums” flag is there for this purpose. However, it needs to be set up when initializing the cluster, and also has a small performance impact. So it’s not always used.
What else? Postgres allows building up replicas on different platforms. For example, a master on Ubuntu, or a replica on CentOS – which of course is a great feature in itself. Plus, there is also the fact that minor versions can differ, e.g. master on 9.6.2, replica by some accident on 9.6.0. Even the replication code itself can fail; I can remember at least one occasion some years ago when committed rows where marked uncommitted on replicas. This is very rare, but could technically happen again.
In short, there are some “if”-s. The question is "What can be done here?" As I said, sadly there’s currently no tool from the Postgres project (I’m not sure that there should even be one or if it can even be done) to do full verification. The only 100% reliable option is to do a SQL dump/restore. That could take hours and hours for 1TB+ databases. It's not really an option if you need to switch over the master in a hush-hush manner.
So a good compromise, in order to have a better feeling before making the switchover, is to just dump the database! This will at least verify that there’s no silent data corruption. However, it could take hours, depending on your DB-size/hardware. There are some tricks to make it faster though - coming right up.
1. One does not have to actually store the SQL dump - we just want to verify that data can be read out i.e. using “/dev/null” is an excellent performance booster. When using pg_dumpall or pg_dump in plain SQL mode (default) it’s very easy to do:
1 |
pg_dumpall -h /var/run/postgresql >/dev/null |
2. Make sure you use the Unix socket for connecting to the local server instead of “localhost” TCP stack – just that simple adjustment should already give some 10-15% speedup.
3. Using multiple processes to dump the data! This is a real life-saver for bigger databases when time is of essence. One should just figure out a reasonable value for the “jobs” flag so that the IO-subsystem is pushed to its limit. NB! Multiple jobs absolutely require the “directory” output format (-Fd/--format=directory + -f/--file=dirname) for the pg_dump and sadly it brings also some issues:
4. Throw together a simple custom script that handles the negative side-effects from approach nr. 3 so you don’t have to think about it. One such that I use myself can for example be found from here. It’s a simple Python script that just spawns a given number of worker processes (half of the CPUs by default) and then dumps all tables to /dev/null starting from the bigger ones – one should just specify the Unix socket as host and also Postgres “bindir” is required.
NB! Before kicking off the dumping process on the replica one should make sure that the query conflict situation has been solved – otherwise it will fail soon if there’s some activity on the master! Options are configuring hot_standby_feedback or allowing replica to “fall behind” by setting the max_standby_streaming_delay to -1 or some big value. More on that here.
That’s it, hope you made it through, and it made you to think a bit. Feedback appreciated as usual!
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter.
PostgreSQL 10 will provide end users with countless new features. One of those features is related to “Foreign Data Wrappers” and is generally known as “aggregate pushdown”. To be honest: This stuff is one of my favorite new features of PostgreSQL 10 and therefore it might be worth, sharing this piece information with a broader audience. So if you are interested in remote aggregation, distributed queries, distributed aggregates and aggregate pushdown in PostgreSQL, keep reading.
To show what the optimizer is already capable of, we need two databases:
1 2 |
iMac:~ hs$ createdb db01 iMac:~ hs$ createdb db02 |
Then we can deploy some simple test data in db02:
1 2 3 4 5 6 7 8 9 10 11 |
BEGIN; CREATE TABLE t_test (id serial, name text); INSERT INTO t_test (name) SELECT 'dummy' FROM generate_series(1, 1000000); ANALYZE; COMMIT; |
The script generates 1 million rows and just a single name (= “dummy”)
For many years now, PostgreSQL has provided means to access remote data sources using “Foreign Data Wrappers” (FDWs)
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE EXTENSION postgres_fdw; CREATE SERVER pgserver FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname 'db02', host 'localhost'); CREATE USER MAPPING FOR CURRENT_USER SERVER pgserver OPTIONS (user 'hs'); IMPORT FOREIGN SCHEMA public FROM SERVER pgserver INTO public; |
The script shown here loads the postgres_fdw extension, which allows us to connect to a remote PostgreSQL database. Then a virtual server pointing to db01 is created in db01. Finally, there are a user mapping and the foreign schema imported. All tables in the remote database, which can be found in the “public” schema, will be linked and visible in db01.
Once the test data is in place, we can give PostgreSQL a try and see, how it behaves in case of aggregates. Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
db01=# explain (analyze, verbose) SELECT name, count(*) FROM t_test GROUP BY 1; QUERY PLAN ------------------------------------------------------------------------------ Foreign Scan (cost=107.31..137.93 rows=200 width=40) (actual time=192.244..192.245 rows=1 loops=1) Output: name, (count(*)) Relations: Aggregate on (public.t_test) Remote SQL: SELECT name, count(*) FROM public.t_test GROUP BY name Planning time: 0.063 ms Execution time: 192.581 ms (6 rows) |
The most important observation here is that PostgreSQL is able to push over the complete aggregate. As you can see, the remote SQL is basically the same as the local query. The main advantage is that by pushing over the aggregates PostgreSQL can drastically reduce the load on your local machine and reduce the amount of data, which has to be sent over the network.
However, at this point it is necessary to issue a word of caution: Yes, aggregates can be pushed down to a remote server. The thing is: Joins happen before the aggregate. In other words: PostgreSQL has to transfer all the data from the remote host in this case:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
PgSQL db01=# explain (analyze, verbose) SELECT a.name, count(*) FROM t_test AS a, t_test AS b WHERE a.id = b.id GROUP BY 1 ORDER BY 2, 1; QUERY PLAN ------------------------------------------------------------------------------------------------- Sort (cost=1003.90..1004.40 rows=200 width=40) (actual time=4012.290..4012.290 rows=1 loops=1) Output: a.name, (count(*)) Sort Key: (count(*)), a.name Sort Method: quicksort Memory: 25kB -> HashAggregate (cost=994.26..996.26 rows=200 width=40) (actual time=4012.283..4012.283 rows=1 loops=1) Output: a.name, count(*) Group Key: a.name -> Merge Join (cost=588.18..894.45 rows=19963 width=32) (actual time=3382.674..3848.202 rows=1000000 loops=1) Output: a.name Merge Cond: (a.id = b.id) -> Sort (cost=222.03..225.44 rows=1365 width=36) (actual time=1691.089..1788.210 rows=1000000 loops=1) Output: a.name, a.id Sort Key: a.id Sort Method: external sort Disk: 21528kB -> Foreign Scan on public.t_test a (cost=100.00..150.95 rows=1365 width=36) (actual time=0.311..1232.045 rows=1000000 loops=1) Output: a.name, a.id Remote SQL: SELECT id, name FROM public.t_test -> Sort (cost=366.15..373.46 rows=2925 width=4) (actual time=1691.579..1785.666 rows=1000000 loops=1) Output: b.id Sort Key: b.id Sort Method: external sort Disk: 17616kB -> Foreign Scan on public.t_test b (cost=100.00..197.75 rows=2925 width=4) (actual time=0.400..1203.757 rows=1000000 loops=1) Output: b.id Remote SQL: SELECT id FROM public.t_test Planning time: 0.105 ms Execution time: 4071.736 ms |
For PostgreSQL 11.0 we are working on a patch, which will hopefully make it into core. It allows PostgreSQL to perform many aggregations before the join has to happen, which makes joining cheaper because less data ends up in the join. There are many more improvements possible. They may be added to the planner in the near future.
However, as of PostgreSQL 10 a large step forward has been made already to allow PostgreSQL to dynamically distribute queries in a cluster.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.