Binary data performance in PostgreSQL

	350 MB data	4.5 MB data
file system	46 ms	1 ms
Large Object	950 ms	8 ms
`bytea`	590 ms	6 ms

15 responses to “Binary data performance in PostgreSQL”

Pavel Luzanov says:

May 14, 2020 at 9:52 am

Thank you for an article. But one question remains.
What if we change storage strategy for pg_largeobject.data column to external?

Reply
Jürgen Strobel says:

May 14, 2020 at 1:59 pm

Streaming writes to bytea (from a client perspective) is theoretically
possible using UPDATE and concatenation. I guess with a lot of write
amplification though.

Reply
- laurenz says:
  
  May 14, 2020 at 2:15 pm
  
  From a client perspective it looks that way, but really it isn't. If you
  UPDATE tab SET bincol = bincol || 'xdeadbeef';
  the complete value will by read from and written to storage, and you may run out of memory on the server.
  
  Reply
Pavel Stěhule says:

May 14, 2020 at 7:25 pm

maybe LO is slow, because native protocol support is not used. Probably using lo_export should be much faster.

Reply
- laurenz says:
  
  May 15, 2020 at 5:30 am
  
  I am not sure what you mean by the native protocol.
  The JDBC driver uses the fast-path API to call the large object server functions.
  
  Reply
  - Pavel Stěhule says:
    
    May 15, 2020 at 6:36 am
    
    libpq allows to use lo_import function. https://www.postgresql.org/docs/9.0/lo-interfaces.html. But probably JDBC driver doesn't use libpq. Another question is impact of Java implementation. Can you check a performance of lo_import from psql? When I look on your examples, there is not too much reasons why LO is too much slower than byte.
    
    There are another argument for LO. Import, export LO typically needs significantly less more RAM (client side, server side) than bytea. I don't know how are PHP limits today, but 15 years ago it was important factor
    
    Reply
xedsdsss says:

May 18, 2020 at 1:30 pm

dag, i luv this article on all things BLOB in PG

Reply
Pól Ua Laoínecháin says:

August 31, 2021 at 4:58 am

Hi,

You say above:

he performance of reading a file directly from the file system must be better. After all, the database is also stored on files, and there must be a certain overhead.

However, there is this from SQLite:

SQLite reads and writes small blobs (for example, thumbnail images) 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().

Now, I know that they are both different systems and that tipping points (if any) may be (very) different, but your statement may not hold globally?

Has this been benchmarked at all? Would be an interesting one?

Pól...

Reply
Shiyao Jin says:

September 26, 2021 at 10:58 pm

Really good article!
One question: if I change the TOAST strategy from extended to external on existing column in existing table by using "ALTER TABLE bins ALTER COLUMN data SET STORAGE EXTERNAL", what will happen to the existing data in that column? The existing data will be decompressed and then stored out-of-line?

Reply
- laurenz says:
  
  September 27, 2021 at 8:01 am
  
  No, existing data won't be affected.
  But if they are compressed binary data, PostgreSQL won't have compressed them anyway (it will have attempted compression and then realized that the data grew, as described in my article).
  
  Reply
  - Shiyao Jin says:
    
    September 27, 2021 at 6:34 pm
    
    Thank you for your prompt response.
    The tricky part in my case is that my data is binary data (from binary file) but not compressed. I'm not sure if PostgreSQL has compressed them (or all of them in different rows) because I guess there is the probability that PostgreSQL's compression algorithm might make binary data in some rows smaller but not for binary data in other rows. This might result in that in some rows binary data is compressed while in other rows binary data is not compressed. I'm concerned that if I change to "EXTERNAL", there will be some compatible issues, e.g. PostgreSQL cannot read the existing compressed data
    
    Reply
    - laurenz says:
      
      September 28, 2021 at 5:28 am
      
      I see.
      You don't have to worry about this, because it is stored in the column header if the data are compressed or not. So regardless of the column setting, PostgreSQL will decompress data that were stored compressed.
      
      Reply
      - Shiyao Jin says:
        
        September 29, 2021 at 12:05 am
        
        Thank you for your answer. Really appreciate it
Damir Reic says:

August 5, 2022 at 5:13 pm

Hi! Maybe a thing od two to mention about bytea types. I have a 10GB big database, a table 7.5GB big in it with bytea column. It takes 12 hours to do backup and when I was troubleshooting slowness I found that pg_dump did 240GB of network traffic while backup speed was ~45Mbit on average. Furthur troubleshooting discovered that bytea type in that big table was the reason for backup to take so long. Unfortunately, I still didn't find a solution to fix but based on my research I should migrate data to hex field type. Any inputs on that?

Reply
- laurenz says:
  
  August 16, 2022 at 7:04 am
  
  You should definitely use bytea and store the data as they are.
  I do not have a ready explanation for the slowness or the amount of traffic; this would need further investigation.
  Perhaps pg_dumpbinary can improve the situation for you.
  
  Reply

Binary data performance in PostgreSQL

Alternatives for storing binary data

Storing the data outside the database

Storing the data in Large Objects

Storing the data as `bytea`

Important TOAST considerations

Benchmarking the different approaches

Code to read binary data from the file system

Code to read binary data from a Large Object

Code to read binary data from a `bytea`

Benchmark results for reading binary data

Summary

Binary data in the file system:

Binary data as Large Objects:

Binary data as `bytea`:

15 responses to “Binary data performance in PostgreSQL”

Leave a Reply Cancel reply

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Binary data performance in PostgreSQL

Alternatives for storing binary data

Storing the data outside the database

Storing the data in Large Objects

Storing the data as bytea

Important TOAST considerations

Benchmarking the different approaches

Code to read binary data from the file system

Code to read binary data from a Large Object

Code to read binary data from a bytea

Benchmark results for reading binary data

Summary

Binary data in the file system:

Binary data as Large Objects:

Binary data as bytea:

15 responses to “Binary data performance in PostgreSQL”

Leave a Reply Cancel reply

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Storing the data as `bytea`

Code to read binary data from a `bytea`

Binary data as `bytea`: