pupils using OFFSET pagination to find page 142
© Laurenz Albe 2022

 

When processing a big result set in an interactive application, you want to paginate the result set, that is, show it page by page. Everybody is familiar with that from the first web search on. You also get a button to scroll to the next page, and you get a total result count. This article shows the various options for pagination of a result set and their performance. It also discusses the problem of the total result count.

An example to demonstrate pagination techniques

Let’s create a table with enough time series data:

CREATE UNLOGGED TABLE data (
   id bigint GENERATED ALWAYS AS IDENTITY,
   value double precision NOT NULL,
   created timestamp with time zone NOT NULL
);

/* make the example repeatable */
SELECT setseed(0.2740184);

INSERT INTO data (value, created)
SELECT random() * 1000, d
FROM generate_series(
        TIMESTAMP '2022-01-01 00:00:00 UTC',
        TIMESTAMP '2022-12-31 00:00:00 UTC',
        INTERVAL '1 second'
     ) AS d(d);

/* add a primary key */
ALTER TABLE data ADD PRIMARY KEY (id);

/* set hint bits, create visibility map, gather statistics */
VACUUM (ANALYZE) data;

The query that we are interested in is:

SELECT value, created
FROM data
WHERE value BETWEEN 0 AND 10
ORDER BY created;

The best index for the query (at least on my machine) is a two-column index that allows PostgreSQL to use an index-only scan that supports the ORDER BY clause:

CREATE INDEX data_created_value_idx ON data
   (created, value);

That index is good for fetching the first rows quickly, but it also is the best index for fetching the whole result set.

Simple pagination: OFFSET and LIMIT

We want to get the result set in chunks of 50 rows. Using OFFSET and LIMIT, we can do that as follows:

/* the first page */
SELECT value, created
FROM data
WHERE value BETWEEN 0 AND 10
ORDER BY created
LIMIT 50;

/* page number 100 */
SELECT value, created
FROM data
WHERE value BETWEEN 0 AND 10
ORDER BY created
OFFSET 4950 LIMIT 50;

Disadvantages of OFFSET and LIMIT

  • Performance gets worse as you fetch later pages. To get page number 100, PostgreSQL has to compute the first 5000 rows of the result set, only to discard the first 4950 ones. So processing gets slower as you page through the result set, and we do a lot of unnecessary work. If you know that users will only ever look at the first few results, you can get away with that.
  • Concurrent data modifications can cause problems. If somebody inserts a new row to the table while we are watching the first page, it can happen that that row would have shown up on the first page. Now if we fetch the second page, the whole result set will shift by one position, and the second page will start with the last row from the first page. Even worse, if a concurrent statement deletes a row, it could happen that we skip a row in the result set if we move to page two.

Advantages of OFFSET and LIMIT

There is one main advantage of this technique: it is simple and can be used with all queries. Also, as long as you only look at the first few pages, it can perform pretty well.

Pagination with WITH HOLD cursors

Cursors are the natural way to fetch result sets in chunks. However, normal cursors only work in the context of a single transaction. Therefore, normal cursors are not useful for pagination, because it is a very bad idea to have user interaction while a transaction is open: not only will a long transaction hold table locks for a very long time (which could block DDL or TRUNCATE statements), but it will also block the progress of autovacuum, which can lead to table bloat.

But you can use a WITH HOLD cursor. Such a cursor fetches the complete result set at the end of the transaction and materializes it on the server:

START TRANSACTION;

/*
 * Don't use SCROLL unless you need to scroll backwards,
 * for example to get the total count (see below).
 */
DECLARE c SCROLL CURSOR WITH HOLD FOR
SELECT value, created
FROM data
WHERE value BETWEEN 0 AND 10
ORDER BY created;

/* this will calculate the whole result set */
COMMIT;

Now it is simple to fetch an arbitrary page:

/* get page number 100 */
MOVE ABSOLUTE 4950 IN c;
FETCH 50 FROM c;

Don’t forget to close the cursor when you are done:

CLOSE c;

It is a smart idea to fetch the first page before committing the transaction, because that COMMIT materializes the result set and can take a long time. During that time, the client can already render the first result page, and the user can look at it.

Advantages of WITH HOLD cursors

  • works with all queries
  • the result set is stable, and there is no danger of skipping or repeating results like with OFFSET and LIMIT
  • probably the best method if you eventually want to retrieve the whole result set or need the total count (see below), since it calculates the whole result set anyway

Disadvantages of WITH HOLD cursors

  • you must not forget to close the cursor when you are done, otherwise the result set is held on the server until the end of the database session
  • transaction level connection pooling won’t work with WITH HOLD cursors, since they are bound to the database connection
  • the whole result set is calculated when the initial transaction ends, so fetching the first page can be slow (but subsequent pages will be fast)
  • if the cursor is held open for a longer time, the results will become “stale”

Keyset pagination

This is the most advanced way to paginate the result set. It requires that we fetch the id column as well. The first page is fetched with:

SELECT id, value, created
FROM data
WHERE value BETWEEN 0 AND 10
ORDER BY created, id
LIMIT 50;

We have to remember the values for id and created from the last row of the page. Then we can fetch the next page with

SELECT id, value, created
FROM data
WHERE value BETWEEN 0 AND 10
  AND (created, id) > ('2022-01-01 01:27:35+01', 5256)
ORDER BY created, id
LIMIT 50;

where 2022-01-01 01:27:35+01 and 5256 are the values we remembered from the previous page. We need id as a tie-breaker in case there are multiple rows with the same value of created.

To make keyset pagination efficient, we need to create a special index that supports the second WHERE condition:

CREATE INDEX data_keyset_idx ON data
   (created, id, value);

Keyset pagination would be fast enough without including value in the index, but I add it to get an index-only scan, so that the performance comparison with the other methods is fair.

Advantages of keyset pagination

  • each query fetches only that part of the result set that we need, so the performance will be better than with the other techniques
  • there is no danger of skipping or repeating results like with OFFSET and LIMIT
  • each query will retrieve current data that reflect the latest concurrent data modifications

Disadvantages of keyset pagination

  • requires a special index specifically designed for the query
  • only useful if the exact query is known in advance

Comparing the performance of the pagination methods

The following results were measured for the example in the beginning. I measured the response time on the client side via a local connection (average of five runs). In addition, I used EXPLAIN (ANALYZE, BUFFERS) to determine how many buffers the query had to process. All data were cached in shared buffers, and “random_page_cost” was set to 1.1.

page 1page 10page 100
OFFSET/LIMIT1.16 ms
25 buffers
4.42 ms
184 buffers
15.36 ms
1915 buffers
WITH HOLD cursor642.28 ms
(1.41 ms without COMMIT)
120507 buffers
0.34 ms

0 buffers

0.51 ms

0 buffers

keyset pagination1.40 ms
30 buffers
1.35 ms
31 buffers
1.39 ms
24 buffers

 

Pagination with OFFSET and LIMIT wins for the first page (perhaps because it has to fetch one column less), but keyset pagination is better if you have to fetch more pages. Cursor pagination can only compete if you COMMIT asynchronously, while the user is watching the first page. However, it is faster than the other methods for subsequent pages, since the result set is already calculated.

What is the problem with the total result count?

So far, we have seen that there are smart ways to paginate a query result. However, there is no smart way to get the total result count. You have to calculate the complete result set, then count it. The impact of that varies depending on the pagination method chosen:

  • With OFFSET and keyset pagination, an extra expensive query is necessary. If only the first few pages are retrieved, the query to calculate the total result set count is usually more expensive than all other queries together.
  • With WITH HOLD cursor pagination, the whole result set is calculated in the beginning, so the result set count comes for free:
    /* the result will be "MOVE 314711" */
    MOVE ALL IN c;
    

No matter how you do it, calculating the whole result set is slow. I have seen many cases where the count(*) queries to calculate the total result set count were the most expensive queries in an application’s workload. Therefore, my recommendations are:

  • The easiest solution is simply not to display a result set count. Keep it simple!
  • The next best solution is to display an approximate result set count. This is easy if you run “EXPLAIN (FORMAT JSON) SELECT 1 FROM /* your query */” rather than “SELECT count(*) FROM /* your query */”. Then you can extract the row count estimate from the EXPLAIN output. See my article on counting rows for details. Remember that Google doesn’t display exact counts either!
  • If you really need the exact result set count, don’t display it right away. Rather, make that an option that is initially disabled, for example a button “get exact count”. Then the user only has to pay the price if he or she deems it necessary.
  • No matter what you do, only calculate or estimate the count when you display the first page. Don’t fall into the trap of repeating the query for every result set page the user displays.

Conclusion

When paginating results, you can choose between OFFSET/LIMIT, WITH HOLD cursors and keyset pagination. Each of these methods has its benefits and disadvantages, and you will have to decide which one is best in your case. Consider that displaying a total result set count is quite expensive, so avoid it whenever possible.