pg_stat_io and PostgreSQL 16 performance

10.2023 / Category: / Tags: | |

Learn about pg_stat_io's debugging power: PostgreSQL 16 blesses users around the world with many features which ensure an even better user experience. One of those features is a system view called pg_stat_io. It offers deep insights into the I/O behavior of your PostgreSQL database server. From PostgreSQL 16 onward, it will make it far easier to debug and assess performance-related problems.

So, let us waste no time and dive into this wonderful feature.

pg_stat_io: Debugging I/O in PostgreSQL

The following listing shows us what the pg_stat_io view looks like:

It contains a few important columns with valuable information:

backend_type: The source of information

In this field, PostgreSQL describes from which type of backend the information is coming. The following backend types exist as of PostgreSQL 16:

The deal here is: PostgreSQL is based on a multi-process architecture - it is not multi-threaded as other database engines are. Each database connection is a process, and all the background operations are also isolated in separate processes as well. Those types of processes are what we call “backend type” in this context. At this point in time, we have got 9 different backend types, but it is actually likely that there will be more in the future.

object: Type of storage

In this field, we find one of two values:

  • relation
  • temp relation

In PostgreSQL, we distinguish between temporary and permanent relations. Permanent relations are entities such as tables, indexes and so on. In short, things which are made to last. Temporary relations are not meant to exist forever. PostgreSQL will give us information about both types of storage.

context: How did we do I/O?

In PostgreSQL there are 4 contexts in which I/O can occur:

  • normal
  • vacuum
  • bulkread
  • bulkwrite

Normal operations are when we interact with shared_buffers which is the I/O cache of PostgreSQL. So if a block is read from disk and stored in cache, or if a block is written out, this is considered to be “normal” I/O.
In contrast to that there is “vacuum”. This type of operation is VACUUM or ANALYZE -related and happens outside the normal I/O. Vacuum goes through the shared_buffers cache machinery.

Finally there is bulkread and and bulkwrite. These are related to large operations outside shared_buffers. How can that happen? If you happen to read a 15 TB table using a sequential scan (maybe SELECT count ( * ) FROM large_table) PostgreSQL will not ask its own I/O cache for each and every 8k block, because the odds of having a cache hit are dire anyway. To learn more about this, consider checking out synchronized sequential scans.

reads and writes: Number of reads and writes

This is the number of I/O operations. Note that this is not expressed in bytes, but in operations. If you want to calculate the amount in bytes you have to multiply this number by op_bytes (see later column). That is true for read and writes alike.

read_time and write_time: Measuring I/O time

The amount of data is one indicator pointing to possible I/O issues. However, what is even more relevant is the amount of time PostgreSQL has spent on actually fetching and writing this data. What is important to note here is that the track_io_timing setting has to be turned on to fill this column.

writebacks: Sending data from the kernel to the OS

Writing to disk is not as simple as people might think. If you write to a file it does not mean that data immediately goes to the underlying storage device (“permanent storage”). Instead data is sent straight to the file system cache managed by the kernel first. Writeback means that we force the kernel to actually send data to permanent storage. This is again measured in “number of requests” which has to be multiplied by op_bytes to receive a number of bytes.

writeback_time: Measuring writeback time

The number of writebacks is one thing. However, again we might be interested in finding out how long it actually took to write data to disk. In case track_io_timing is on, this column will contain a useful value.

extends: Growing PostgreSQL data files

When a transaction writes data there are two situations which can occur: There might be space inside a table / index / etc which can be reallocated or we might have to grow the data files to provide space for the data we want to write. Reallocation often happens after a VACUUM process or in case of HOT-Updates. However, in case there is no suitable space PostgreSQL grows the data file. An extend is again a unit of op_bytes (= in this case the desired block size).

extend_time: Time need to grow files

In case extends are written, we can again measure the time needed for that. By default, time is only measured when track_io_timing is turned on.

In general, I recommend turning on track_io_timing as you will otherwise not have the information you need to track down performance problems.

This is not only true for pg_stat_io but also for many other PostgreSQL statistics.

op_bytes: I/O chunking

On a modern system, data is not sent to disk (or read from disk) byte by byte. Instead, I/O happens in chunks. Usually the size of a single I/O chunk is 8k, but it might be different on some systems.

hits: Monitoring the cache

When I/O happens through the PostgreSQL cache (shared_buffers) we can face cache hits or cache misses. This value will tell us the number of hits we had to avoid asking the operating system for I/O.

evictions: Cleaning shared buffers

When blocks are removed from the PostgreSQL cache to make space for new ones, they get counted as evicted. This value gives us an idea of how much was in there.

reuses: Reusing I/O

This variable might be confusing for some people. The documentation says: “The number of times an existing buffer in a size-limited ring buffer outside of shared buffers was reused as part of an I/O operation in the bulkread, bulkwrite, or vacuum contexts.”. Now, this explanation might be pretty obscure to most people out there. What does it actually mean? Not all operations use the PostgreSQL I/O machinery. Suppose we sequentially scan a 10 TB table. As stated before, this scan will bypass the shared buffers entirely. Instead, a little ring buffer outside shared buffers is used to keep some blocks around. If one of these is reused, it will be counted in this variable.

fsyncs and fsync_time: Flushing to disk

When data is written to disk we have to distinguish between two cases: Either we want to rely on the fact that data is on disk or we don’t care much. The question which naturally arises is: Hey? This is a database system! Why would anybody not care if data is written to disk or not? Well, consider sorting on disk. If the data we want to sort is lost, we have to restart the sort process after a reboot anyway. Or: If we die in the middle of a transaction, the data should be gone anyway. What this means is that if we want data to survive no matter what, we sync it to disk using a flush. Otherwise we do not. This column will count the number of flushes and the time we needed to flush.

stats_reset: Reset time

Finally, PostgreSQL will tell us as of when data has been accumulated, and when the statistics have been reset.

pg_stat_io: Inspecting the data

After this rather theoretical introduction to the topic, we can take a look at some sample data.

Here is some autovacuum data:

For each context, you'll see one row. Most of the I/O happened in the “vacuum” context (which is no surprise if you happen to be an autovacuum worker process).

What happens within a normal database connection (= “client backend”)? Here is an example of such a pg_stat_io query:

The I/O happened during the creation of the table. Other than that, the table wasn't used. How did we figure that out? Well, the entire I/O was in a bulk operation and not during normal operations. This tells us that the table wasn't used at all since the time of its creation.

Finally …

Inspecting I/O is important; it gives you valuable information about what is really going on inside your database engine. Another data warehousing tool to improve performance are synchronized sequential scans. Find out about them in my blog.
Or, learn more basics about how to spot and manage performance problems in this blog about how to quickly detect slow queries.
We try to cover information to support PostgreSQL and to help people using their systems. If you want to know more about PostgreSQL support and PostgreSQL consulting - simply reach out to us.


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes
Article Rating
Notify of
1 Comment
Newest Most Voted
Inline Feedbacks
View all comments
Lucio Chiessi
Lucio Chiessi
7 months ago

Thanks a lot for this article Hans-Jürgen!! Awesome!

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf

+43 (0) 2622 93022-0

Get the newest PostgreSQL Info & Tools

    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    CYBERTEC PostgreSQL International GmbH
    Would love your thoughts, please comment.x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram