© Laurenz Albe 2020
A frequently asked question in this big data world is whether it is better to store binary data inside or outside of a PostgreSQL database. Also, since PostgreSQL has two ways of storing binary data, which one is better?
I decided to benchmark the available options to have some data points next time somebody asks me, and I thought this might be interesting for others as well.
Alternatives for storing binary data
Storing the data outside the database
For that, you store the binary data in files outside the database and store the path of the file in the database.
The obvious downside is that consistency is not guaranteed that way. With all data inside the database, PostgreSQL can guarantee the atomicity of transactions. So it will make the architecture and the implementation somewhat more complicated to store the data outside the database.
One approach to consistency is to always add a file before storing its metadata in the database. When a file’s metadata are deleted, you simply leave the file on the file system. This way, there can be no path in the database that does not exist in the file system. You can clean up the filesystem with offline reorganization runs that get rid of orphaned files.
There are two big advantages with this approach:
- It keeps the database small, which makes maintenance easier. For example, it is easier to perform an incremental backup of a file system than of a database.
- The performance of reading a file directly from the file system must be better. After all, the database is also stored on files, and there must be a certain overhead.
Storing the data in Large Objects
PostgreSQL Large Objects are the “old way” of storing binary data in PostgreSQL. The system assigns an oid
(a 4-byte unsigned integer) to the Large Object, splits it up in chunks of 2kB and stores it in the pg_largeobject
catalog table.
You refer to the Large Object by its oid
, but there is no dependency between an oid
stored in a table and the associated Large Object. If you delete the table row, you have to delete the Large Object explicitly (or use a trigger).
Large Objects are cumbersome, because the code using them has to use a special Large Object API. The SQL standard does not cover that, and not all client APIs have support for it.
There are two advantages of Large Objects:
- you can store arbitrary large data with Large Objects
- the Large Object API has support for streaming, that is, reading and writing Large Objects in chunks
Storing the data as bytea
bytea
(short for “byte array”) is the “new way” is storing binary data in PostgreSQL. It uses TOAST (The Oversized-Attribute Storage Technique, proudly called “the best thing since sliced bread” by the PostgreSQL community) to transparently store data out of line.
A bytea
is stored directly in the database table and vanishes when you delete the table row. No special maintenance is necessary.
The main disadvantages of bytea
are:
- like all TOASTed data types, there is an absolute length limit of 1GB
- when you read or write a
bytea
, all data have to be stored in memory (no streaming support)
Important TOAST considerations
If you choose bytea
, you should be aware of how TOAST works:
- for a new table row that would exceed 2000 bytes, variable length data types are compressed, if possible
- if the compressed data would still exceed 2000 bytes, PostgreSQL splits variable length data types in chunks and stores them out of line in a special “TOAST table”
Now for already compressed data, the first step is unnecessary and even harmful. After compressing the data, PostgreSQL will realize that the compressed data have actually grown (because PostgreSQL uses a fast compression algorithm) and discard them. That is an unnecessary waste of CPU time.
Moreover, if you retrieve only of a substring of a TOASTed value, PostgreSQL still has to retrieve all chunks that are required to decompress the value.
Fortunately, PostgreSQL allows you to specify how TOAST should handle a column. The default EXTENDED
storage type works as described above. If we choose EXTERNAL
instead, values will be stored out of line, but not compressed. This saves CPU time. It also allows operations that need only a substring of the data to access only those chunks that contain the actual data.
So you should always change the storage type for compressed binary data to EXTERNAL
. This also allows us to implement streaming, at least for read operations, using the substr
function (see below).
The bytea
table that I use in this benchmark is defined like
CREATE TABLE bins ( id bigint PRIMARY KEY, data bytea NOT NULL ); ALTER TABLE bins ALTER COLUMN data SET STORAGE EXTERNAL;
Benchmarking the different approaches
I chose to write my little benchmark in Java, which is frequently used for application code. I wrote an interface for the code that reads the binary data, so that it is easy to test the different implementations with the same code. This also makes it easier to compare the implementations:
import java.io.EOFException; import java.io.IOException; import java.sql.SQLException; public interface LOBStreamer { public final static int CHUNK_SIZE = 1048576; public int getNextBytes(byte[] buf) throws EOFException, IOException, SQLException; public void close() throws IOException, SQLException; }
CHUNK_SIZE
is the unit in which the data will be read.
Code to read binary data from the file system
In the constructor, the database is queried to get the path of the file. That file is opened for reading; the chunks are read in getNextBytes
.
import java.io.IOException; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.io.EOFException; import java.io.File; import java.io.FileInputStream; public class FileStreamer implements LOBStreamer { private FileInputStream file; public FileStreamer(java.sql.Connection conn, long objectID) throws IOException, SQLException { PreparedStatement stmt = conn.prepareStatement( "SELECT path FROM lobs WHERE id = ?"); stmt.setLong(1, objectID); ResultSet rs = stmt.executeQuery(); rs.next(); String path = rs.getString(1); this.file = new FileInputStream(new File(path)); rs.close(); stmt.close(); } @Override public int getNextBytes(byte[] buf) throws EOFException, IOException { int result = file.read(buf); if (result == -1) throw new EOFException(); return result; } @Override public void close() throws IOException { file.close(); } }
Code to read binary data from a Large Object
The Large Object is opened in the constructor. Note that all read operations must take place in the same database transaction that opened the large object.
Since Large Objects are not covered by the SQL or JDBC standard, we have to use the PostgreSQL-specific extensions of the JDBC driver. That makes the code not portable to other database systems.
However, since Large Objects specifically support streaming, the code is simpler than for the other options.
import java.io.EOFException; import java.io.IOException; import java.sql.Connection; import java.sql.SQLException; import org.postgresql.PGConnection; import org.postgresql.largeobject.LargeObject; import org.postgresql.largeobject.LargeObjectManager; public class LargeObjectStreamer implements LOBStreamer { private LargeObject lob; public LargeObjectStreamer(Connection conn, long objectID) throws SQLException { PGConnection pgconn = conn.unwrap(PGConnection.class); this.lob = pgconn.getLargeObjectAPI().open( objectID, LargeObjectManager.READ); } @Override public int getNextBytes(byte[] buf) throws EOFException, SQLException { int result = lob.read(buf, 0, buf.length); if (result == 0) throw new EOFException(); return result; } @Override public void close() throws IOException, SQLException { lob.close(); } }
Code to read binary data from a bytea
The constructor retrieves the length of the value and prepares a statement that fetches chunks of the binary data.
Note that the code is more complicated that in the other examples, because I had to implement streaming myself.
With this approach, I don’t need to read all chunks in a single transaction, but I do so to keep the examples as similar as possible.
import java.io.EOFException; import java.io.IOException; import java.io.InputStream; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; public class ByteaStreamer implements LOBStreamer { private PreparedStatement stmt; private Connection conn; private int position = 1, size; public ByteaStreamer(Connection conn, long objectID) throws SQLException { PreparedStatement len_stmt = conn.prepareStatement( "SELECT length(data) FROM bins WHERE id = ?"); len_stmt.setLong(1, objectID); ResultSet rs = len_stmt.executeQuery(); if (!rs.next()) throw new SQLException("no data found", "P0002"); size = rs.getInt(1); rs.close(); len_stmt.close(); this.conn = conn; this.stmt = conn.prepareStatement( "SELECT substr(data, ?, ?) FROM bins WHERE id = ?"); this.stmt.setLong(3, objectID); } @Override public int getNextBytes(byte[] buf) throws EOFException, IOException, SQLException { int result = (position > size + 1 - buf.length) ? (size - position + 1) : buf.length; if (result == 0) throw new EOFException(); this.stmt.setInt(1, position); this.stmt.setInt(2, result); ResultSet rs = this.stmt.executeQuery(); rs.next(); InputStream is = rs.getBinaryStream(1); is.read(buf); is.close(); rs.close(); position += result; return result; } @Override public void close() throws SQLException { this.stmt.close(); } }
Benchmark results for reading binary data
I performed the benchmark the code on a laptop with and Intel® Core™ i7-8565U CPU and SSD storage. The PostgreSQL version used was 12.2. Data were cached in RAM, so the results don’t reflect disk I/O overhead. The database connection used the loopback interface to reduce the network impact to a minimum.
This code was used to run the test:
Class.forName("org.postgresql.Driver"); Connection conn = DriverManager.getConnection( "jdbc:postgresql:test?user=laurenz&password=..."); // important for large objects conn.setAutoCommit(false); byte[] buf = new byte[LOBStreamer.CHUNK_SIZE]; long start = System.currentTimeMillis(); for (int i = 0; i < LOBStreamTester.ITERATIONS; ++i) { // set LOBStreamer implementation and object ID as appropriate LOBStreamer s = new LargeObjectStreamer(conn, 62409); try { while (true) s.getNextBytes(buf); } catch (EOFException e) { s.close(); } conn.commit(); } System.out.println( "Average duration: " + (double)(System.currentTimeMillis() - start) / LOBStreamTester.ITERATIONS); conn.close();
I ran each test multiple times in a tight loop, both with a large file (350 MB) and a small file (4.5 MB). The files were compressed binary data.
350 MB data | 4.5 MB data | |
---|---|---|
file system | 46 ms | 1 ms |
Large Object | 950 ms | 8 ms |
bytea | 590 ms | 6 ms |
Summary
In my benchmark, retrieving binary objects from the database is roughly ten times slower that reading them from files in a file system. Surprisingly, streaming from a bytea
with EXTERNAL
storage is measurably faster than streaming from a Large Object. Since Large Objects specifically support streaming, I would have expected the opposite.
To sum it up, here are the advantages and disadvantages of each method:
Binary data in the file system:
Advantages:
- by far the fastest method
- small database for ease of maintenance, e.g. backups
Disadvantages:
- more difficult to guarantee consistency
- more complicated architecture
Binary data as Large Objects:
Advantages:
- constistence automatically guaranteed
- API supports streaming
Disadvantages:
- very bad performance
- non-standard API
- requires special maintenance like
DELETE
triggers in the database - database becomes large and unwieldy
Binary data as bytea
:
Advantages:
- consistence automatically guaranteed
- works with standard SQL
Disadvantages:
- bad performance
- writes cannot be streamed (need lots of RAM)
- database becomes large and unwieldy
Storing large binary data in the database is only a good idea on a small scale, when a simple architecture and ease of coding are the prime goals and high performance is not important. Then the increased database size also won’t be a big problem.
Large Objects should only be used if the data exceed 1GB or streaming writes to the database is important.
If you need help with the performance and architecture of your PostgreSQL database, don’t hesitate to ask us.