PostgreSQL vs Redis vs Memcached performance

Test setup

Some characteristics on my test setup:

AMD Ryzen 3600 CPU set to performance mode. My test script is synchronous in nature i.e. basically using a single CPU core, so speed probably matters a lot.

Ubuntu 20.04 desktop as OS, swap disabled.

PostgreSQL v13.3, i.e. the latest.

Default PostgreSQL configuration except “shared_buffers=512MB”, “track_io_timing=on”, “shared_preload_libraries=’pg_stat_statements’”.

A single NVMe SSD both for system and database.

Reading/writing to Postgres was done using prepared statements to get best query performance, avoiding parsing on subsequent calls.

Size of the dataset fully fitted into the PostgreSQL managed cache (shared buffers).

Asynchronous mode was used for writing data, meaning a slight loss of most recent data is possible in case of a server crash/reboot, as a trade-off for more writing (especially in small transactions) performance. But this is consistent with how both Redis and Memcached by default handle their writes. With Redis though, one can also enable AOF persistence to get PostgreSQL default behaviour.

Both reading/writing tests go through all the inserted randomly generated floating-point data keys from 1 to $rows one-by-one. The original test didn’t mention the exact access pattern though.

The key-value table was fetched into Postgres cache before the read test so that there would be no cold-cache effect as with Redis/Memcached.

I did not measure the memory usage of Postgres as this is expected to be very constant for this use case after all data has been inserted and cached.

Execution time is recorded on the application side with calls to the system clock to match the original test. Note that I decided to measure execution time from the server-side also to get a better understanding of the amount of “waste” as after all Python is not really optimal for performance testing for many reasons that I don’t want to delve into here. And jumping ahead a bit - the results were pretty horrifying indeed and one should rather not use Python for benchmarking databases - a lot of CPU time just disappeared somewhere!

Full test script is available here. Should take around 10-15min to run. Basically something like this:

CREATE UNLOGGED TABLE kv_test(key text, value int);
CREATE INDEX ON kv_test (key);

-- pseudo-code from the Python script
for $ROWS in [1000, 10000, 100000, 1000000]:
  truncate kv_test
  generate $ROWS random values
  for $i := 1 .. $ROWS:
    insert $key into kv_test ($rand[i], $rand[i])
  vacuum analyze the table
  for $i := 1 .. $ROWS:
    select * from kv_test where key = $rand[i]

CREATE UNLOGGED TABLE kv_test(key text, value int);

CREATE INDEX ON kv_test (key);

-- pseudo-code from the Python script

for $ROWS in [1000, 10000, 100000, 1000000]:

truncate kv_test

generate $ROWS random values

for $i := 1 .. $ROWS:

insert $key into kv_test ($rand[i], $rand[i])

vacuum analyze the table

for $i := 1 .. $ROWS:

select * from kv_test where key = $rand[i]

Conclusion

In short - the numbers looked surprisingly/suspiciously good for the reading test for Postgres! I can’t imagine how the original test managed to get such high single-digit millisecond results for random key reads across the whole dataset. For my test, I only managed to see 1ms+ worst cases for the biggest rowcount. This data can, by the way, be also looked up on the “pg_stat_statements” snapshots table called “results”.

So sadly, I still cannot possibly get too ecstatic as there was a lot of information missing on the details of exactly how the original tests were performed, so it might have been a bit of an “apples to oranges” situation still in the end, I’m afraid. The average key readout times for Redis/Memcached seemed just way too slow in comparison to Postgres. I suspect they used a remote machine still for the cache, although the paper didn’t mention it and talked about a single Core i7 node.

But about the writing speed of key-value data - well: Postgres doesn’t really compete on higher row counts. ???? But this was also more or less expected! Why? A full-blown relational database engine like Postgres goes to great lengths to ensure we cannot insert invalid data violating some constraints, plus the WAL writing (minimal though for unlogged tables but still) and the on-disk-format overhead - internal columns, alignment, some index bloat on page splits, etc. That all amplifies writes a lot! So basically, I think the results still aren’t bad in the end. Only ca 2x slower than Redis for 100k and 1M row counts. The main idea of caches is that they’re only useful if we read much more from them compared to writing/updating anyways!

But be it how it is with the comparison to the other DBs in absolute numbers, it was good to see that the relative stability of Postgres responses to growing datasets was very-very good! And even beating Memcached which deteriorated 2x on 1M rows! This all probably shows that the selected algorithms for PostgreSQL are mathematically sound and well implemented!

But what can be definitely said - Postgres performance is definitely at least good enough for a “caching use case” for some smaller amount of rows. So with your next project it might be worth asking, do you really need another external component?

Or maybe just:

add some more memory/CPU to the DB and hammer it harder - pretty safe to do for reads actually;

use your HA replicas for load balancing and achieve better total resource utilization, and enjoy the benefits of not having to change the mental model when dealing with relational data on the “real persistence” level anyways.

Other thoughts/notes

Once again - not to forget: the caching software used was by now 5 years old! But at the same time, I also left a bit on the table still even for Postgres - for example, one could probably even get a slightly better read performance when using the relatively newish (v11+) “covering” indexes feature or maybe even with hash indexes.

The thought-provoking article states it actually quite well: “A performance comparison between in-memory key-value data stores is more of an intellectual exercise than of any practical importance - unless you are deploying systems at such scale that this becomes interesting as a cost-saving measure. This is because such stores are IO bound and usually the network latency might play a bigger role in application perceived latency than the database latency.”. So in short - don’t think too much about that milli- or microsecond cache response time with a local test. Real-life network latencies can and will eat that mostly away!

Note that the difference between query execution times measured from the “app side” vs from the “DB side” is huge, up to 10x for worst cases with a little number of rows! But again as the previous chapter states - for real-life database applications, the choice of languages usually doesn’t matter performance-wise.

The maximum tested rowcount of 1 million is actually too small in today’s context I think - even in the case of Postgres with its relatively large disk format, the data together with the index is only about 128MB. I suspect an average smartwatch even holds more data nowadays.

Note though that Redis is certainly much more than a simple volatile cache as the linked article also mentions. However, in practice, it is primarily used as a key-value store... which also Postgres can be, being a general-purpose management system not particularly tuned for a specific workload.

Aspect of cost - provisioning a capable and modern DB server with lots of memory usually costs much more than only a plain “discardable”, albeit optionally memory optimized, compute node. So using a cache might still be a perfectly good idea.

Dedicated caches will always perform better if you need to hammer them really hard - as there is usually no permissions checking, sessions, snapshots, complex locking scenarios, etc.

Note that in some cases one can also use totally transparent PostgreSQL caching! It’s sadly not a built-in feature of Postgres, but for example, one of the more popular connection pooling software pieces called pgpool2 provides it as an extra feature, so might be worth checking out.

9 responses to “PostgreSQL vs Redis vs Memcached performance”

Marti R. says:

August 11, 2021 at 1:46 pm

I'm confused about the "PostgreSQL vs Redis vs Memcached: Read Operation" table. The header indicates figures are in "ms".

In the last column Redis has 8, Memcached has 30, PostgreSQL has 0.029.

Did milliseconds/microseconds get mixed up here? Surely PostgreSQL can't be 3 orders of magnitude faster.

The source code to benchmark PostgreSQL was linked in the article, but not the ones for Redis/Memcached?

Reply
- Adam Brusselback says:
  
  August 11, 2021 at 2:53 pm
  
  That's timing sounds correct for Postgres for getting a single row from a table with that many in it.
  
  For example, this is me running a query (SELECT 1 FROM sale WHERE sale_id = 'xxx';) against a table with 1,292,320 rows in it.
  
  I'll let others remark on the rest.
  
  Reply
Noah Yetter says:

August 25, 2021 at 4:03 pm

To get usefully realistic benchmarks here, the database/cache instances need to be on another host, because wire time is very, very significant. For example, within EC2, inter-host latency is typically around 1ms round-trip (but variable, roughly 0.8ms to 1.6ms depending on the AZ of each host), whereas the time it takes Postgres to read a cached primary key lookup should be no more than 0.1ms. Once you take wire time into account, much less of the theoretical performance of memcached or Redis is actually usable.

Reply
Alan S says:

August 25, 2021 at 7:40 pm

Ditto what Noah Yetter wrote.

Another kind of test would have been useful: measuring the maximum calls-per-second rate for a small payload, which is the typical use-case for redis and memcached. Redis-bench can provide some test measurements for specific redis actions, too.

Imagine using redis for keeping state for a circuit-breaker, usage counting, usage limiting, rate limiting, etc. and you'll see the advantage of having CPS benchmarks to compare these datastore technologies.

Reply
David Walthour says:

August 25, 2021 at 10:42 pm

It seems that you are choosing just about the worst way to write data to postgres here. Using better approaches (e.g. using the copy statement: https://www.postgresql.org/docs/9.2/sql-copy.html) coupled with the TimeScaleDB engine (https://www.timescale.com/), I've been able to achieve writes to tables with 50-100 columns at a rate of 100,000-200,000 rows per second. These techniques should easily consume 1,000,000 rows of key-value pairs in under 4 seconds, quite possibly much less. (I don't work either for postgres or timescale, just a big fan of both of these products.)

Reply
Konstantin Root says:

August 26, 2021 at 9:35 am

>Redis (v3.0.7)

Really? Current version is 6.2 https://redis.io/download

If you used async write/read then in Redis you should compare with pipelining.

With simple key-value read/write Memcached would be faster than Redis, since it allows multi-threading.
So the number looks strange.
If you really want to company reading / writing a lot of items from Redis (1M or so) then consider using pipelining and Redis edition with multi-threading support.

Reply
Artur Formella says:

August 26, 2021 at 8:00 pm

Is there a place for a Redis simulator build on... PostgreSQL? With ACID and much more 🙂

Reply
Marc Rechté says:

August 30, 2021 at 1:46 pm

Hello. I am not clear on what you put in the above write operation table. On my box I get similar read and write figures:

= Averages for 1000 rows measured from DB side =
ins_mean (ms): 0.00885111910000002, calculated total (s): 0.00885111910000002
sel_mean (ms): 0.007435678299999974, calculated total (s): 0.007435678299999974

Reply
Jürgen Strobel says:

August 31, 2021 at 9:56 am

Interesting topic. But I feel doing the test with a single-connection client (as I understood) makes it completely useless. I doubt a single client can saturate the bandwidth of any of the 3 test subjects, and the level of saturation may differ wildly between them. For any real server query response times have to be acceptable, but what you really care for is max queries/sec.

Reply

	NUMBER OF RECORDS
Database	1,000	10,000	100,000	1,000,000
Redis (v3.0.7)	34	214	1,666	14,638
Memcached (v1.4.14)	23	100	276	2,813
PostgreSQL (v13.3)	29.6	304	2,888	31,230

	NUMBER OF RECORDS
Database	1,000	10,000	100,000	1,000,000
Redis (v3.0.7)	8	6	8	8
Memcached (v1.4.14)	9	14	14	30
PostgreSQL (v13.3)	0.026	0.028	0.027	0.029

PostgreSQL vs Redis vs Memcached performance

Test setup

PostgreSQL vs Redis vs Memcached: Write Operation

PostgreSQL vs Redis vs Memcached: Read Operation

Conclusion

Other thoughts/notes

9 responses to “PostgreSQL vs Redis vs Memcached performance”

Leave a Reply Cancel reply

CYBERTEC Guest

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts