What is fillfactor and how does it affect PostgreSQL performance?

07.2020 | Category: Performance | Tags: performance

07.2020

By Kaarel Moppel

Table of Contents

Recently I was asked if there’s a rule of thumb / best practice for setting the fillfactor in Postgres - and the only answer I could give was to decrease it “a bit” if you do lots and lots of updates on some table. Good advice? Well, it could be better - this kind of coarse quantization leaves a lot of room for interpretation and possible adverse effects. So to have a nicer answer ready for the next time, I thought it would be nice to get some real approximate numbers. Time to conjure up some kind of a test! If you already know what fillfactor does, then feel free to skip to the bottom sections. There you'll find some numbers and a rough recommendation principle.

What is fillfactor?

But first a bit of theory for the newcomers - so what does fillfactor do and how do you configure it? Simon says:

The fillfactor for a table is a percentage between 10 and 100. 100 (complete packing) is the default. When you specify a smaller fillfactor, INSERT operations pack table pages only to the indicated percentage. The remaining space on each page is reserved for updating rows on that page. This gives UPDATE a chance to place the updated copy of a row on the same page as the original, which is more efficient than placing it on a different page. For a table whose entries are never updated, complete packing is the best choice. However, in heavily updated tables smaller fillfactors are appropriate. This parameter cannot be set for TOAST tables.

Per table parameter

In short, it’s a per table (or index!) parameter that directs Postgres to initially leave some extra disk space on data pages unused. Later UPDATE-s could use it, touching ultimately only one data block, thus speeding up such operations. And besides normal updates it also potentially (depending on if updated columns are indexed or not) enables another special, and even more beneficial, type of update, known as HOT updates. The acronym means Heap Only Tuples, i.e. indexes are not touched.

There's quite a simple base concept: how densely should we initially “pack” the rows? (Many other database systems also offer something similar.) Not much more to it - but how to set it? Sadly (or probably luckily) there’s no global parameter for that and we need to change it “per table”. Via some SQL, like this:

-- leave 10% of block space unused when inserting data
ALTER TABLE pgbench_accounts SET (fillfactor = 90);

1 2	-- leave 10% of block space unused when inserting data ALTER TABLE pgbench_accounts SET (fillfactor = 90);

Effects of fillfactor

As the documentation mentions, for heavily updated tables we can gain on transaction performance by reducing the fillfactor (FF). But in what range should we adjust it and how much? Documentation doesn’t take a risk with any numbers here. Based on my personal experience, after slight FF reductions small transaction improvements can usually be observed. Not immediately, but over some longer period of time (days, weeks). And as I haven’t witnessed any cases where it severely harms performance, you can certaily try such FF experiments for busy tables. But in order to get some ballpark numbers, I guess the only way is to set up a test...

Test setup

As per usual, I modified some test scripts I had lying around, that use the default pgbench schema and transactions, which should embody a typical simple OLTP transaction with lots of UPDATE-s...so exactly what we want. The most important parameter (relative to hardware, especially memory) for pgbench transaction performance is the “scale factor”, so here I chose different values covering 3 cases - initial active data set fits almost into RAM, half fits and only a fraction (ca 10%) fits. Tested fillfactor ranges were 70, 80, 90, 100.

Test host: 4 CPU i5-6600, 16GB RAM, SATA SSD

PostgreSQL: v12.2, all defaults except: shared_buffers set to 25% of RAM, i.e. 4GB, checkpoint_completion_target = 0.9, track_io_timing = on, wal_compression = on, shared_preload_libraries=’pg_stat_statements’

Pgbench: scales 1000/2500/10000, 12h runtime for each scale / FF combination, 2 concurrent sessions.

Query latencies were measured directly in the database using the pg_stat_statement extension, so they should be accurate.

By the way, if you wonder why I’m not using the latest Postgres version v12.3 for my testing (which is normally the way to go) - all these combinations took a week to run through and although a new release appeared during that time I thought it’s not worth it as I didn’t see anything relevant from the release notes.

Test results

Performance has many aspects and even for a bit of a simplistic pgbench test we could measure many things - maybe most important for us in this fillfactor context are the frequent updates on our biggest table that we want to optimize. But let’s not also forget about the effect of our fillfactor changes on the Postgres background processes, worse caching rates etc, so for completeness let’s also look at the total Transactions per Seconds numbers. Remember - 1 pgbench default transaction includes 3 updates (2 mini-tables + 1 main), 1 insert into the write-only history table + 1 select from the main table by PK.

So here the effects on pgbench_accounts UPDATE mean time as measured via pg_stat_statements in milliseconds with global TPS in parentheses:

Data scale	FF=100	FF=90	FF=80	FF=70
Mem.	5.01 (374)	4.80 (389)	4.89 (382)	4.98 (376)
2x Mem.	5.31 (363)	5.33 (359)	5.38 (357)	5.42 (353)
10x Mem.	6.45 (249)	5.67 (282)	5.61 (284)	5.72 (279)

Deriving a rule of thumb

What can we learn from the test data? Although it seems that there was some slight randomness in the tests (as 2x Mem test actually made things minimally slower), on the whole it seems that decreasing FF a bit also improves performance “a bit”! My general hunch has something to it, even 🙂 On average a 10% boost, when decreasing FF by 10 or 20%. It's not game changing, but it could make a visible difference.

And the second learning - don’t overdo it! As we see that FF 70% clearly deteriorates the update performance instead of improving it, with all scaling factors.

My try at a rule of thumb - when your active / hot data set is a lot bigger than the amount of RAM, going with fillfactor 90% seems to be a good idea. Don’t forget - there are still no free lunches out there. When optimizing for fast updates on table X, we pay some of the winnings back with global TPS numbers. Our tables will be somewhat bigger, we lose a bit on cache hit ratios, and also background workers like the autovacuum daemon have more scanning to do. So for smaller data vs RAM ratios, the benefits are not particularly visible. You might even lose slightly in global TPS numbers.

Some more thoughts - I’m pretty sure that if I had tuned autovacuum to be more aggressive, we would have seen some more performance improvement. In spite of decreasing the FF, the pgbench_accounts table was still growing considerably during the testing (~35% at biggest scale) even at FF 70%. And the same for old spinning disks - the potential for additional winnings with fillfactor tuning is even bigger there compared to SSD-s, because fillfactor helps to reduce exactly the Achilles heel of rotating disks - the costly random access.

Read more about performance improvements:

8 responses to “What is fillfactor and how does it affect PostgreSQL performance?”

Sebastian Dressler says:

November 3, 2020 at 4:07 pm

Hi Kaarel, thanks a lot for this! Could you maybe share your autovacuum config and what you specifically mean by "more aggressive" in that context?

Reply
- laurenz says:
  
  November 4, 2020 at 8:19 am
  
  You may want to read my blog for a discussion of that.
  
  Reply
Reka putra says:

December 14, 2020 at 5:28 am

HI Kaarel, does fill_factor impact about how many data per Page for every Shared_buffer cache? if it bloat or not, as empty space also get into cache?

Reply
- laurenz says:
  
  December 14, 2020 at 8:04 am
  
  Yes, setting a fillfactor of less than 100 will bloat your table, and that bloat will also affect cache efficiency, since you will cache more empty space.
  That is the price yo are paying for more HOT updates.
  
  Reply
  - Reka putra says:
    
    December 15, 2020 at 12:42 am
    
    Ok then, it's same as others rdbms fillfactor, and the table with fillfactor specified. The empty space only maintained after table reorganization/rebuild/ vacuum in postgres, many DML operation the pages will be full again until next vacuum.
    Great article with detail explanation and research. It help me understanding postgres internal and saving time to do a research about fillfactor
    Thanks.
    Reka
    
    Reply
Sulivan Oliveira says:

May 3, 2021 at 11:58 pm

Hi Kaarel, thank you very much for this post!

Reply
Dave Myron says:

August 17, 2022 at 5:10 pm

If the updates-to-inserts ratio for a single table is 3:1, would that suggest that a fillfactor=25 be a good idea?

Reply
- laurenz says:
  
  August 18, 2022 at 8:33 am
  
  No, that is way too low.
  Unless you are bulk updating major parts of the table, it is probably good enough to have space for a few more rows in the block.
  Depending on the size of your rows, a fillfactor between 70 and 90 is usually good.
  
  Reply

What is fillfactor and how does it affect PostgreSQL performance?

What is fillfactor?

Per table parameter

Effects of fillfactor

Test setup

Test results

Deriving a rule of thumb

8 responses to “What is fillfactor and how does it affect PostgreSQL performance?”

Leave a Reply Cancel reply

CYBERTEC Guest

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts