It happens on a quite regular basis that people contact the Cybertec PostgreSQL Support Desk to get support for a pretty common thing: Paginators on websites. What is the main issue? How hard can be it to display simple tables after all?
A typical scenario with paginators
Suppose you want to display a table on your website. You definitely don’t want to display 1 million entries at once so you turn to a pagination. What many people do is:
SELECT … FROM tab WHERE … LIMIT 100
… and then …
SELECT count(*) FROM tab WHERE …
What you have just seen is a classical recipe for disaster. Why that? The first query might be highly efficient and execute on milliseconds. However, what about the second one? Can anybody be sure that it also takes just milliseconds? No, because if the WHERE clause is not selective, the count might yield a really high number. Just recently our support team had a typical case: The second query took 9 (!) minutes to complete.
You have to ask yourself. Do you really want to wait for minutes to have an exact page count? The user has to wait minutes just to figure out that his query has yielded exactly 4.353.902.535 pages? How many people do you know, who ever checked more than maybe the first 5 pages?
To fix that problem you basically got two choices
- use estimates
- display the first 100 rows only
As far as estimates are concerned: Have you every used Google? Google will stop providing you with answers after around 30 pages. Google does that for a reason: Nobody will ever go to page number 31. Everybody will refine his search long before.
So, estimates are fine. In PostgreSQL you can simply use EXPLAIN to see, how many rows PostgreSQL expects. This is fine in many cases – especially when result sets are very large.
The second option is a query like that:
SELECT … FROM tab WHERE … LIMIT 101
Assuming that you want to display exactly 100 rows, you can always tell the user that there are more than 100 rows or more than, say, 10 pages. However, there is an even bigger advantage: Deterministic runtime. If you fetch 101 rows only – can execution times vary? Sure, they can but they will certainly stay within an acceptable range. If there is no limit, runtimes can theoretically be infinite.
Implications of slow queries:
Let us assume we got 1.000.000 queries, which need 1 ms each, and let us assume we got 1.000.000 queries, which need 1 second each.
In the first scenario, life is easy: A single CPU can do that in around 16 minutes.
In the second case 12 CPUs will be busy for a day.
So just imagine what will happen, if your “slow” query does not take 1 second but 1 minute? This is not uncommon in case you really want to count everything, to display an exact page count (which is usually done in grey, font size = 6).