pg_stat_io and PostgreSQL 16 performance

10.2023 | Category: Performance | Tags: administration, statistics

10.2023

Learn about pg_stat_io's debugging power: PostgreSQL 16 blesses users around the world with many features which ensure an even better user experience. One of those features is a system view called pg_stat_io. It offers deep insights into the I/O behavior of your PostgreSQL database server. From PostgreSQL 16 onward, it will make it far easier to debug and assess performance-related problems.

Table of Contents

So, let us waste no time and dive into this wonderful feature.

`pg_stat_io`: Debugging I/O in PostgreSQL

The following listing shows us what the pg_stat_io view looks like:

blog=# d pg_stat_io
                        View 'pg_catalog.pg_stat_io'
     Column     |           Type           | Collation | Nullable | Default 
----------------+--------------------------+-----------+----------+---------
 backend_type   | text                     |           |          | 
 object         | text                     |           |          | 
 context        | text                     |           |          | 
 reads          | bigint                   |           |          | 
 read_time      | double precision         |           |          | 
 writes         | bigint                   |           |          | 
 write_time     | double precision         |           |          | 
 writebacks     | bigint                   |           |          | 
 writeback_time | double precision         |           |          | 
 extends        | bigint                   |           |          | 
 extend_time    | double precision         |           |          | 
 op_bytes       | bigint                   |           |          | 
 hits           | bigint                   |           |          | 
 evictions      | bigint                   |           |          | 
 reuses         | bigint                   |           |          | 
 fsyncs         | bigint                   |           |          | 
 fsync_time     | double precision         |           |          | 
 stats_reset    | timestamp with time zone |           |          |

blog=# d pg_stat_io

View 'pg_catalog.pg_stat_io'

Column | Type | Collation | Nullable | Default

----------------+--------------------------+-----------+----------+---------

backend_type | text | | |

object | text | | |

context | text | | |

reads | bigint | | |

read_time | double precision | | |

writes | bigint | | |

write_time | double precision | | |

writebacks | bigint | | |

writeback_time | double precision | | |

extends | bigint | | |

extend_time | double precision | | |

op_bytes | bigint | | |

hits | bigint | | |

evictions | bigint | | |

reuses | bigint | | |

fsyncs | bigint | | |

fsync_time | double precision | | |

stats_reset | timestamp with time zone | | |

It contains a few important columns with valuable information:

`backend_type`: The source of information

In this field, PostgreSQL describes from which type of backend the information is coming. The following backend types exist as of PostgreSQL 16:

blog=# SELECT DISTINCT  backend_type 
 FROM 	    pg_stat_io 
 ORDER BY 1;
    backend_type     
---------------------
 autovacuum launcher
 autovacuum worker
 background worker
 background writer
 checkpointer
 client backend
 standalone backend
 startup
 walsender
(9 rows)

blog=# SELECT DISTINCT backend_type

FROM pg_stat_io

ORDER BY 1;

backend_type

---------------------

autovacuum launcher

autovacuum worker

background worker

background writer

checkpointer

client backend

standalone backend

startup

walsender

(9 rows)

The deal here is: PostgreSQL is based on a multi-process architecture - it is not multi-threaded as other database engines are. Each database connection is a process, and all the background operations are also isolated in separate processes as well. Those types of processes are what we call “backend type” in this context. At this point in time, we have got 9 different backend types, but it is actually likely that there will be more in the future.

`object`: Type of storage

In this field, we find one of two values:

relation
temp relation

In PostgreSQL, we distinguish between temporary and permanent relations. Permanent relations are entities such as tables, indexes and so on. In short, things which are made to last. Temporary relations are not meant to exist forever. PostgreSQL will give us information about both types of storage.

`context`: How did we do I/O?

In PostgreSQL there are 4 contexts in which I/O can occur:

normal
vacuum
bulkread
bulkwrite

Normal operations are when we interact with shared_buffers which is the I/O cache of PostgreSQL. So if a block is read from disk and stored in cache, or if a block is written out, this is considered to be “normal” I/O.
In contrast to that there is “vacuum”. This type of operation is VACUUM or ANALYZE -related and happens outside the normal I/O. Vacuum goes through the shared_buffers cache machinery.

Finally there is bulkread and and bulkwrite. These are related to large operations outside shared_buffers. How can that happen? If you happen to read a 15 TB table using a sequential scan (maybe SELECT count ( * ) FROM large_table) PostgreSQL will not ask its own I/O cache for each and every 8k block, because the odds of having a cache hit are dire anyway. To learn more about this, consider checking out synchronized sequential scans.

`reads` and `writes`: Number of reads and writes

This is the number of I/O operations. Note that this is not expressed in bytes, but in operations. If you want to calculate the amount in bytes you have to multiply this number by op_bytes (see later column). That is true for read and writes alike.

`read_time` and `write_time`: Measuring I/O time

The amount of data is one indicator pointing to possible I/O issues. However, what is even more relevant is the amount of time PostgreSQL has spent on actually fetching and writing this data. What is important to note here is that the track_io_timing setting has to be turned on to fill this column.

`writebacks`: Sending data from the kernel to the OS

Writing to disk is not as simple as people might think. If you write to a file it does not mean that data immediately goes to the underlying storage device (“permanent storage”). Instead data is sent straight to the file system cache managed by the kernel first. Writeback means that we force the kernel to actually send data to permanent storage. This is again measured in “number of requests” which has to be multiplied by op_bytes to receive a number of bytes.

`writeback_time`: Measuring writeback time

The number of writebacks is one thing. However, again we might be interested in finding out how long it actually took to write data to disk. In case track_io_timing is on, this column will contain a useful value.

`extends`: Growing PostgreSQL data files

When a transaction writes data there are two situations which can occur: There might be space inside a table / index / etc which can be reallocated or we might have to grow the data files to provide space for the data we want to write. Reallocation often happens after a VACUUM process or in case of HOT-Updates. However, in case there is no suitable space PostgreSQL grows the data file. An extend is again a unit of op_bytes (= in this case the desired block size).

`extend_time`: Time need to grow files

In case extends are written, we can again measure the time needed for that. By default, time is only measured when track_io_timing is turned on.

In general, I recommend turning on track_io_timing as you will otherwise not have the information you need to track down performance problems.

This is not only true for pg_stat_io but also for many other PostgreSQL statistics.

`op_bytes`: I/O chunking

On a modern system, data is not sent to disk (or read from disk) byte by byte. Instead, I/O happens in chunks. Usually the size of a single I/O chunk is 8k, but it might be different on some systems.

hits: Monitoring the cache

When I/O happens through the PostgreSQL cache (shared_buffers) we can face cache hits or cache misses. This value will tell us the number of hits we had to avoid asking the operating system for I/O.

`evictions`: Cleaning shared buffers

When blocks are removed from the PostgreSQL cache to make space for new ones, they get counted as evicted. This value gives us an idea of how much was in there.

`reuses`: Reusing I/O

This variable might be confusing for some people. The documentation says: “The number of times an existing buffer in a size-limited ring buffer outside of shared buffers was reused as part of an I/O operation in the bulkread, bulkwrite, or vacuum contexts.”. Now, this explanation might be pretty obscure to most people out there. What does it actually mean? Not all operations use the PostgreSQL I/O machinery. Suppose we sequentially scan a 10 TB table. As stated before, this scan will bypass the shared buffers entirely. Instead, a little ring buffer outside shared buffers is used to keep some blocks around. If one of these is reused, it will be counted in this variable.

`fsyncs` and `fsync_time`: Flushing to disk

When data is written to disk we have to distinguish between two cases: Either we want to rely on the fact that data is on disk or we don’t care much. The question which naturally arises is: Hey? This is a database system! Why would anybody not care if data is written to disk or not? Well, consider sorting on disk. If the data we want to sort is lost, we have to restart the sort process after a reboot anyway. Or: If we die in the middle of a transaction, the data should be gone anyway. What this means is that if we want data to survive no matter what, we sync it to disk using a flush. Otherwise we do not. This column will count the number of flushes and the time we needed to flush.

`stats_reset`: Reset time

Finally, PostgreSQL will tell us as of when data has been accumulated, and when the statistics have been reset.

`pg_stat_io`: Inspecting the data

After this rather theoretical introduction to the topic, we can take a look at some sample data.

Here is some autovacuum data:

blog=# SELECT * FROM pg_stat_io WHERE backend_type = 'autovacuum worker';
-[ RECORD 1 ]--+------------------------------
backend_type   | autovacuum worker
object         | relation
context        | bulkread
reads          | 0
read_time      | 0
writes         | 0
write_time     | 0
writebacks     | 0
writeback_time | 0
extends        | 
extend_time    | 
op_bytes       | 8192
hits           | 0
evictions      | 0
reuses         | 0
fsyncs         | 
fsync_time     | 
stats_reset    | 2023-08-24 11:03:59.786026+02
-[ RECORD 2 ]--+------------------------------
backend_type   | autovacuum worker
object         | relation
context        | normal
reads          | 29
read_time      | 0
writes         | 0
write_time     | 0
writebacks     | 0
writeback_time | 0
extends        | 24
extend_time    | 0
op_bytes       | 8192
hits           | 56682
evictions      | 0
reuses         | 
fsyncs         | 0
fsync_time     | 0
stats_reset    | 2023-08-24 11:03:59.786026+02
-[ RECORD 3 ]--+------------------------------
backend_type   | autovacuum worker
object         | relation
context        | vacuum
reads          | 70838
read_time      | 0
writes         | 42200
write_time     | 0
writebacks     | 0
writeback_time | 0
extends        | 0
extend_time    | 0
op_bytes       | 8192
hits           | 3654
evictions      | 0
reuses         | 70799
fsyncs         | 
fsync_time     | 
stats_reset    | 2023-08-24 11:03:59.786026+02

blog=# SELECT * FROM pg_stat_io WHERE backend_type = 'autovacuum worker';

-[ RECORD 1 ]--+------------------------------

backend_type | autovacuum worker

object | relation

context | bulkread

reads | 0

read_time | 0

writes | 0

write_time | 0

writebacks | 0

writeback_time | 0

extends |

extend_time |

op_bytes | 8192

hits | 0

evictions | 0

reuses | 0

fsyncs |

fsync_time |

stats_reset | 2023-08-24 11:03:59.786026+02

-[ RECORD 2 ]--+------------------------------

backend_type | autovacuum worker

object | relation

context | normal

reads | 29

read_time | 0

writes | 0

write_time | 0

writebacks | 0

writeback_time | 0

extends | 24

extend_time | 0

op_bytes | 8192

hits | 56682

evictions | 0

reuses |

fsyncs | 0

fsync_time | 0

stats_reset | 2023-08-24 11:03:59.786026+02

-[ RECORD 3 ]--+------------------------------

backend_type | autovacuum worker

object | relation

context | vacuum

reads | 70838

read_time | 0

writes | 42200

write_time | 0

writebacks | 0

writeback_time | 0

extends | 0

extend_time | 0

op_bytes | 8192

hits | 3654

evictions | 0

reuses | 70799

fsyncs |

fsync_time |

stats_reset | 2023-08-24 11:03:59.786026+02

For each context, you'll see one row. Most of the I/O happened in the “vacuum” context (which is no surprise if you happen to be an autovacuum worker process).

What happens within a normal database connection (= “client backend”)? Here is an example of such a pg_stat_io query:

blog=# SELECT * FROM pg_stat_io WHERE backend_type = 'client backend';
-[ RECORD 1 ]--+------------------------------
backend_type   | client backend
object         | relation
context        | bulkread
reads          | 894
read_time      | 0
writes         | 0
write_time     | 0
writebacks     | 0
writeback_time | 0
extends        | 
extend_time    | 
op_bytes       | 8192
hits           | 14
evictions      | 0
reuses         | 131
fsyncs         | 
fsync_time     | 
stats_reset    | 2023-08-24 11:03:59.786026+02
-[ RECORD 2 ]--+------------------------------
backend_type   | client backend
object         | relation
context        | bulkwrite
reads          | 0
read_time      | 0
writes         | 42200
write_time     | 0
writebacks     | 0
writeback_time | 0
extends        | 44248
extend_time    | 0
op_bytes       | 8192
hits           | 0
evictions      | 0
reuses         | 42200
fsyncs         | 
fsync_time     | 
stats_reset    | 2023-08-24 11:03:59.786026+02
-[ RECORD 3 ]--+------------------------------
backend_type   | client backend
object         | relation
context        | normal
reads          | 180
read_time      | 0
writes         | 0
write_time     | 0
writebacks     | 0
writeback_time | 0
extends        | 0
extend_time    | 0
op_bytes       | 8192
hits           | 7291
evictions      | 0
reuses         | 
fsyncs         | 0
fsync_time     | 0
stats_reset    | 2023-08-24 11:03:59.786026+02
-[ RECORD 4 ]--+------------------------------
backend_type   | client backend
object         | relation
context        | vacuum
reads          | 2
read_time      | 0
writes         | 0
write_time     | 0
writebacks     | 0
writeback_time | 0
extends        | 0
extend_time    | 0
op_bytes       | 8192
hits           | 179
evictions      | 0
reuses         | 0
fsyncs         | 
fsync_time     | 
stats_reset    | 2023-08-24 11:03:59.786026+02
-[ RECORD 5 ]--+------------------------------
backend_type   | client backend
object         | temp relation
context        | normal
reads          | 0
read_time      | 0
writes         | 0
write_time     | 0
writebacks     | 
writeback_time | 
extends        | 0
extend_time    | 0
op_bytes       | 8192
hits           | 0
evictions      | 0
reuses         | 
fsyncs         | 
fsync_time     | 
stats_reset    | 2023-08-24 11:03:59.786026+02

blog=# SELECT * FROM pg_stat_io WHERE backend_type = 'client backend';

-[ RECORD 1 ]--+------------------------------

backend_type | client backend

object | relation

context | bulkread

reads | 894

read_time | 0

writes | 0

write_time | 0

writebacks | 0

writeback_time | 0

extends |

extend_time |

op_bytes | 8192

hits | 14

evictions | 0

reuses | 131

fsyncs |

fsync_time |

stats_reset | 2023-08-24 11:03:59.786026+02

-[ RECORD 2 ]--+------------------------------

backend_type | client backend

object | relation

context | bulkwrite

reads | 0

read_time | 0

writes | 42200

write_time | 0

writebacks | 0

writeback_time | 0

extends | 44248

extend_time | 0

op_bytes | 8192

hits | 0

evictions | 0

reuses | 42200

fsyncs |

fsync_time |

stats_reset | 2023-08-24 11:03:59.786026+02

-[ RECORD 3 ]--+------------------------------

backend_type | client backend

object | relation

context | normal

reads | 180

read_time | 0

writes | 0

write_time | 0

writebacks | 0

writeback_time | 0

extends | 0

extend_time | 0

op_bytes | 8192

hits | 7291

evictions | 0

reuses |

fsyncs | 0

fsync_time | 0

stats_reset | 2023-08-24 11:03:59.786026+02

-[ RECORD 4 ]--+------------------------------

backend_type | client backend

object | relation

context | vacuum

reads | 2

read_time | 0

writes | 0

write_time | 0

writebacks | 0

writeback_time | 0

extends | 0

extend_time | 0

op_bytes | 8192

hits | 179

evictions | 0

reuses | 0

fsyncs |

fsync_time |

stats_reset | 2023-08-24 11:03:59.786026+02

-[ RECORD 5 ]--+------------------------------

backend_type | client backend

object | temp relation

context | normal

reads | 0

read_time | 0

writes | 0

write_time | 0

writebacks |

writeback_time |

extends | 0

extend_time | 0

op_bytes | 8192

hits | 0

evictions | 0

reuses |

fsyncs |

fsync_time |

stats_reset | 2023-08-24 11:03:59.786026+02

The I/O happened during the creation of the table. Other than that, the table wasn't used. How did we figure that out? Well, the entire I/O was in a bulk operation and not during normal operations. This tells us that the table wasn't used at all since the time of its creation.

Finally …

Inspecting I/O is important; it gives you valuable information about what is really going on inside your database engine. Another data warehousing tool to improve performance are synchronized sequential scans. Find out about them in my blog.
Or, learn more basics about how to spot and manage performance problems in this blog about how to quickly detect slow queries.

We try to cover information to support PostgreSQL and to help people using their systems. If you want to know more about PostgreSQL support and PostgreSQL consulting - simply reach out to us.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook, or LinkedIn.

One response to “pg_stat_io and PostgreSQL 16 performance”

Lucio Chiessi says:

October 4, 2023 at 2:15 pm

Thanks a lot for this article Hans-Jürgen!! Awesome!

Reply

pg_stat_io and PostgreSQL 16 performance

`pg_stat_io`: Debugging I/O in PostgreSQL

`backend_type`: The source of information

`object`: Type of storage

`context`: How did we do I/O?

`reads` and `writes`: Number of reads and writes

`read_time` and `write_time`: Measuring I/O time

`writebacks`: Sending data from the kernel to the OS

`writeback_time`: Measuring writeback time

`extends`: Growing PostgreSQL data files

`extend_time`: Time need to grow files

`op_bytes`: I/O chunking

hits: Monitoring the cache

`evictions`: Cleaning shared buffers

`reuses`: Reusing I/O

`fsyncs` and `fsync_time`: Flushing to disk

`stats_reset`: Reset time

`pg_stat_io`: Inspecting the data

Finally …

One response to “pg_stat_io and PostgreSQL 16 performance”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

pg_stat_io and PostgreSQL 16 performance

pg_stat_io: Debugging I/O in PostgreSQL

backend_type: The source of information

object: Type of storage

context: How did we do I/O?

reads and writes: Number of reads and writes

read_time and write_time: Measuring I/O time

writebacks: Sending data from the kernel to the OS

writeback_time: Measuring writeback time

extends: Growing PostgreSQL data files

extend_time: Time need to grow files

op_bytes: I/O chunking

hits: Monitoring the cache

evictions: Cleaning shared buffers

reuses: Reusing I/O

fsyncs and fsync_time: Flushing to disk

stats_reset: Reset time

pg_stat_io: Inspecting the data

Finally …

One response to “pg_stat_io and PostgreSQL 16 performance”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

`pg_stat_io`: Debugging I/O in PostgreSQL

`backend_type`: The source of information

`object`: Type of storage

`context`: How did we do I/O?

`reads` and `writes`: Number of reads and writes

`read_time` and `write_time`: Measuring I/O time

`writebacks`: Sending data from the kernel to the OS

`writeback_time`: Measuring writeback time

`extends`: Growing PostgreSQL data files

`extend_time`: Time need to grow files

`op_bytes`: I/O chunking

`evictions`: Cleaning shared buffers

`reuses`: Reusing I/O

`fsyncs` and `fsync_time`: Flushing to disk

`stats_reset`: Reset time

`pg_stat_io`: Inspecting the data