Some time ago, I posted some information about zheap, a storage engine for PostgreSQL. The idea behind zheap is to make
UPDATE statements run faster in order to keep table bloat under control. Currently, PostgreSQL copies data on
UPDATE and puts the copy of the row into the same data file. While this isn’t a bad strategy, it can in some cases lead to table bloat. The idea behind zheap undo is to manage old copies of rows, which is similar to Oracle. Until now, the problem was that this “zheap undo” was not cleaned up. That’s why we implemented a “discard worker” to handle the cleanup. By discarding zheap UNDO logs, the worker keeps
UPDATE-related table bloat to a minimum.
Getting PostgreSQL and zheap
Before we dig into what the discard worker does, you need to install a zheap-enabled version of PostgreSQL. To make it easier for everybody, we created a container to allow users to give zheap a try out-of-the-box.
The original idea was to have zheap as an extension. However, it’s too invasive, therefore you can’t just install it on top of PostgreSQL, as you can with other extensions. zheap does need core patches.
Also: I want to point out that this is not a production-ready version, but rather a tech preview. We’re happy to share it with the community to get feedback and maybe gather some bug reports.
Here is the freely available docker container for you to try out. Installation is simple and very straightforward:
[hs@fedora ~]$ docker run --name zHeap -p 5432:5432 \ -d cybertecpostgresql/zheap
To connect to the newly created instance, just run psql as you normally would. The default username and password are both postgres.
[hs@fedora ~]$ psql -h localhost -p 5432 -U postgres UNDO: Cleaning up using the discard worker
Testing zheap UNDO in PostgreSQL
Once your zheap-enabled version of PostgreSQL is up and running, we can inspect the process table:
[hs@fedora ~]$ ps axf | grep post ... 2403084 ? Ss 0:00 \_ /home/hs/pgzheap/bin/postgres -D ./dbzheap 2403086 ? Ss 0:00 \_ postgres: checkpointer 2403087 ? Ss 0:00 \_ postgres: background writer 2403088 ? Ss 0:00 \_ postgres: walwriter 2403089 ? Ss 0:00 \_ postgres: autovacuum launcher 2403090 ? Ss 0:00 \_ postgres: stats collector 2403091 ? Ss 0:00 \_ postgres: undo discard worker 2403093 ? Ss 0:00 \_ postgres: logical replication launcher ...
The discard worker is launched automatically. Let’s see what it does, and how the undo log is stored.
Before we create a table and fill it with data, we’ve got to adjust a variable:
test=# SHOW default_table_access_method; default_table_access_method ----------------------------- heap (1 row) test=# SET default_table_access_method TO zheap; SET
This setting tells PostgreSQL which storage engine you want to use by default when creating a new table. Usually we want heaps. However, in this case zheap is the desired choice. We can add the zheap option to each CREATE TABLE. In many cases you might want to set it for the entire session as I just did:
test=# CREATE TABLE t_large (id serial, name text); CREATE TABLE
The layout of the table doesn’t really matter. We can use any layout.
Testing zheap UNDO with INSERT load
Let’s load some data:
test=# BEGIN; BEGIN test=*# INSERT INTO t_large (name) SELECT 'dummy' FROM generate_series(1, 10000000); INSERT 0 10000000 test=*# \d+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+----------------+----------+----------+-------------+---------------+------------+------------- public | t_large | table | hs | permanent | zheap | 193 MB | public | t_large_id_seq | sequence | hs | permanent | | 8192 bytes | (2 rows)
The important thing is that zheap is able to rollback in case of error. Therefore, a lot of undo has to be written: Let’s take a look at the undo directories before we commit the transaction:
[hs@fedora dbzheap]$ ls -l ./pg_undo/ && ls -l ./base/undo/ total 8 -rw-------. 1 hs hs 52 12. Nov 11:47 00000001A84012A0 -rw-------. 1 hs hs 52 12. Nov 11:47 00000001C9F48970 total 432128 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.0065A00000 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.0065B00000 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.0065C00000 ... -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.007FD00000 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.007FE00000 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.007FF00000
You can see that we’ve created a fairly large number of files in our
base/undo directory. Let’s commit and see what happens:
test=*# COMMIT; COMMIT
PostgreSQL will recycle (= delete) these logs when the new discard worker kicks in. Let’s take a look and see:
[hs@fedora dbzheap]$ ls -l ./pg_undo/ && ls -l ./base/undo/ total 8 -rw-------. 1 hs hs 52 12. Nov 11:47 00000001A84012A0 -rw-------. 1 hs hs 52 12. Nov 11:47 00000001C9F48970 total 2048 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.007FF00000 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.0080000000
The files are gone, which is what we expected.
Testing zheap UNDO with UPDATE load
The same will happen when we run
test=# BEGIN; BEGIN test=*# UPDATE t_large SET id = id - 1; UPDATE 10000000
The important aspect is that there is actually a lot of WAL created. As you can see, we are talking about around 600 MB:
[hs@fedora dbzheap]$ cd base/undo/ [hs@fedora undo]$ du -h 603M
Now, let’s commit the transaction:
test=*# COMMIT; COMMIT
What’s interesting is what happens immediately after the commit statement:
[hs@fedora undo]$ date && du -h Fr 12. Nov 11:55:50 CET 2021 603M . [hs@fedora undo]$ date && du -h Fr 12. Nov 11:55:57 CET 2021 2,0M .
So that’s it – the zheap UNDO logs are gone. The cleanup is NOT part of a commit, but is actually done by the discard worker which is in charge of making sure that log is not killed too early. Keep in mind: You might not be the only transaction at work here, so the cleanup still has to be asynchronous (just like a vacuum is).
Check out our Table Bloat Archive for more important information about zheap!
New GIS Site!
We also recently released a website that offers ready-to-use GIS data. If you use PostGIS and OSM data, this is the place to go. We provide you with ready-made database dumps featuring the entire planet. Check out our new GIS site.