Table of Contents
It is a frequent complaint that count(*)
is so slow on PostgreSQL.
In this article I want to explore the options you have to get your result as fast as possible. Added bonus: I'll show you how to estimate query result counts.
count(*)
so slow?Most people have no trouble understanding that the following is slow:
1 2 |
SELECT count(*) FROM /* complicated query */; |
After all, it is a complicated query, and PostgreSQL has to calculate the result before it knows how many rows it will contain.
But many people are appalled if the following is slow:
1 |
SELECT count(*) FROM large_table; |
Yet if you think again, the above still holds true: PostgreSQL has to calculate the result set before it can count it. Since there is no “magical row count” stored in a table (like it is in MySQL's MyISAM), the only way to count the rows is to go through them.
So count(*)
will normally perform a sequential scan of the table, which can be quite expensive.
*
” in count(*)
the problem?The “*
” in SELECT * FROM ...
is expanded to all columns. Consequently, many people think that using count(*)
is inefficient and should be written count(id)
or count(1)
instead.
But the “*
” in count(*)
is quite different, it just means “row” and is not expanded at all (actually, that is a “zero-argument aggregate”). Writing count(1)
or count(id)
are actually slower than count(*)
, because they have to test if the argument IS NULL
or not (count
, like most aggregates, ignores NULL
arguments).
So there is nothing to be gained by avoiding the “*
”.
It is tempting to scan a small index rather than the whole table to count the number of rows.
However, this is not so simple in PostgreSQL because of its multi-version concurrency control strategy. Each row version (“tuple”) contains the information to which database snapshot it is visible. But this information is not (redundantly) stored in the indexes. So it usually isn't enough to count the entries in an index, because PostgreSQL has to visit the table entry (“heap tuple”) to make sure an index entry is visible.
To mitigate this problem, PostgreSQL has introduced the visibility map, a data structure that stores if all tuples in a table block are visible to everybody or not.
If most table blocks are all-visible, an index scan doesn't need to visit the heap tuple often to determine visibility. Such an index scan is called “index only scan”, and with that it is often faster to scan the index to count the rows.
Now it is VACUUM
that maintains the visibility map, so make sure that autovacuum runs often enough on the table if you want to use a small index to speed up count(*)
.
I wrote above that PostgreSQL does not store the row count in the table.
Maintaining such a row count would be an overhead that every data modification has to pay for a benefit that no other query can reap. This would be a bad bargain. Moreover, since different queries can see different row versions, the counter would have to be versioned as well.
But there is nothing that keeps you from implementing such a row counter yourself.
Suppose you want to keep track of the number of rows in the table mytable
. You can do that as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
START TRANSACTION; CREATE TABLE mytable_count(c bigint); CREATE FUNCTION mytable_count() RETURNS trigger LANGUAGE plpgsql AS $$BEGIN IF TG_OP = 'INSERT' THEN UPDATE mytable_count SET c = c + 1; RETURN NEW; ELSIF TG_OP = 'DELETE' THEN UPDATE mytable_count SET c = c - 1; RETURN OLD; ELSE UPDATE mytable_count SET c = 0; RETURN NULL; END IF; END;$$; CREATE CONSTRAINT TRIGGER mytable_count_mod AFTER INSERT OR DELETE ON mytable DEFERRABLE INITIALLY DEFERRED FOR EACH ROW EXECUTE PROCEDURE mytable_count(); -- TRUNCATE triggers must be FOR EACH STATEMENT CREATE TRIGGER mytable_count_trunc AFTER TRUNCATE ON mytable FOR EACH STATEMENT EXECUTE PROCEDURE mytable_count(); -- initialize the counter table INSERT INTO mytable_count SELECT count(*) FROM mytable; COMMIT; |
We do everything in a single transaction so that no data modifications by concurrent transactions can be “lost” due to race conditions.
This is guaranteed because CREATE TRIGGER
locks the table in SHARE ROW EXCLUSIVE
mode, which prevents all concurrent modifications.
The down side is of course that all concurrent data modifications have to wait until the SELECT count(*)
is done.
This provides us with a really fast alternative to count(*)
, but at the price of slowing down all data modifications on the table. Using a deferred constraint trigger will make sure that the lock on the row in mytable_count
is held as short as possible to improve concurrency.
Even though this counter table might receive a lot of updates, there is no danger of “table bloat” because these will all be HOT updates.
count(*)
Sometimes the best solution is to look for an alternative.
Often an approximation is good enough and you don't need the exact count. In that case you can use the estimate that PostgreSQL uses for query planning:
1 2 3 |
SELECT reltuples::bigint FROM pg_catalog.pg_class WHERE relname = 'mytable'; |
This value is updated by both autovacuum and autoanalyze, so it should never be much more than 10% off. You can reduce autovacuum_analyze_scale_factor
for that table so that autoanalyze runs more often there.
Up to now, we have investigated how to speed up counting the rows of a table.
But sometimes you want to know how many rows a SELECT
statement will return without actually running it.
Obviously the only way to get an exact answer to this is to execute the query. But if an estimate is good enough, you can use PostgreSQL's optimizer to get it for you.
The following simple function uses dynamic SQL and EXPLAIN
to get the execution plan for the query passed as argument and returns the row count estimate:
1 2 3 4 5 6 7 8 9 |
CREATE FUNCTION row_estimator(query text) RETURNS bigint LANGUAGE plpgsql AS $$DECLARE plan jsonb; BEGIN EXECUTE 'EXPLAIN (FORMAT JSON) ' || query INTO plan; RETURN (plan->0->'Plan'->>'Plan Rows')::bigint; END;$$; |
Do not use this function to process untrusted SQL statements, since it is by nature vulnerable to SQL injection.
For further information on this topic, see Hans' blog post about Speeding up count(*): Why you shouldn't use max(id) – min(id)
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
I often see the following pattern:
select [something]
from [somewhere]
where (count(*) from [somewhere_else] where [join_to_somewhere]) > 0;
This is extremely wasteful. An
exists
clause is needed here:select [something]
from [somewhere]
where exists (select 1 from [somewhere_else] where [join_to_somewhere]);
You don't ask someone how many people there are in the conference room when you just want to know whether or not the conference room is free. 🙂
Minor correction: your description of what "*" means in "count(*)" is not quite correct. It's actually just a piece of syntax that means nothing at all; for no good reason that I can see, you have to put a "*" when calling an aggregate function that takes no parameters:
postgres=> select count (rolname) from pg_roles;
count
-------
11
(1 row)
postgres=> select count (*) from pg_roles;
count
-------
11
(1 row)
postgres=> select count () from pg_roles;
ERROR: count(*) must be used to call a parameterless aggregate function
LINE 1: select count () from pg_roles;
^
postgres=> df count
List of functions
Schema | Name | Result data type | Argument data types | Type
------------ ------- ------------------ --------------------- ------
pg_catalog | count | bigint | | agg
pg_catalog | count | bigint | "any" | agg
(2 rows)
postgres=>
Incidentally, the only possible no-parameter aggregate functions are count() itself, and other functions composed with count(), i.e., f (count ()). This is why we think of this as a feature of count().
That's what I intended to say. It just counts the rows.
One category of solution that's perhaps missing here is algorithms that very quickly give you an approximate or estimated count - for example the HyperLogLog extension (postgresql-hll).
Citus also has a blog post about "Faster PostgreSQL Counting" covering similar ground. https://www.citusdata.com/blog/2016/10/12/count-performance/
Love the code snippit here using JSON to extract the row count from explain, this is a real clean approach!
What if we use sequences instead of tables to keep the row quantity of a table?
I mean something like this:
START TRANSACTION;
-- CREATE TABLE mytable_count(c bigint);
CREATE SEQUENCE t_random_counter_seq minvalue 0;
CREATE FUNCTION t_random_count() RETURNS trigger
LANGUAGE plpgsql AS
$$BEGIN
IF TG_OP = 'INSERT' THEN
perform nextval('t_random_counter_seq');
-- UPDATE mytable_count SET c = c 1;
RETURN NEW;
ELSIF TG_OP = 'DELETE' THEN
--UPDATE mytable_count SET c = c - 1;
perform setval('t_random_counter_seq', currval('t_random_counter_seq') - 1);
RETURN OLD;
ELSE
--UPDATE mytable_count SET c = 0;
perform setval('t_random_counter_seq', 0);
RETURN NULL;
END IF;
END;$$;
CREATE TRIGGER t_random_count_mod AFTER INSERT OR DELETE ON t_random
FOR EACH ROW EXECUTE PROCEDURE t_random_count();
-- TRUNCATE triggers must be FOR EACH STATEMENT
CREATE TRIGGER t_random_count_trunc AFTER TRUNCATE ON t_random
FOR EACH STATEMENT EXECUTE PROCEDURE t_random_count();
-- initialize the counter table
--select setval('t_random_counter_seq', 0);
COMMIT;
We get the table count with select currval('t_random_counter_seq');
I suspect this could be even faster but I cannot demostrate it.
And sorry for using sequences in this way. I'm afraid they weren't made for 🙂
That is a viable option. It might be slightly faster if you mostly insert.
To speed up the case of bulk deletions, using a statement level trigger with a transition table would be faster.
Yes, in that case I'll have to find a fast way to know how many records I delete.
One category of solution that's perhaps missing here is algorithms that very quickly give you an approximate or estimated count - for example the HyperLogLog extension (postgresql-hll).
Love the code snippit here using JSON to extract the row count from explain, this is a real clean approach!
HyperLogLog still needs to see every row of input and is O(n) in time. It helps with count(distinct x) by making it take O(1) space.
Right, HLL is only for distinct counts and that's not the primary topic here.
However I remembered another one that you forgot to cover - the TABLESAMPLE method:
pg-11.2 root@db1=# select count(*) from test;
?column? | 150000000
Time: 300209.477 ms (05:00.209)
pg-11.2 root@db1=# select count(*)*100 from test tablesample system(1);
?column? | 149669600
Time: 6865.618 ms (00:06.866)
There are probably many other techniques.
Using
TABLESAMPLE
doesn't strike me as a very good one though:- It has to scan 1% of the table, which is slower than just using
reltuples
.- It has to use an estimate to guess how much 1% is, so it cannot be any more precise than the estimate itself.
So I see no advantage over using
reltuples
here.I had thought the block/page count in pg_class was not an estimate; so the SYSTEM tablesample method should always be using an accurate figure for 1% of the blocks in the relation. Did I understand that incorrectly?
The default autovacuum_analyze_scale_factor is 10% of the table. However the other important factor impacting accuracy of statistics is that for large tables, the analyze process samples a small number of blocks using default_statistics_target with a default value of 100. I haven't done enough testing, but I wonder if it could be even less than 1% in some cases.
The test above showed the block sampling method to give an estimate that was less than 0.5% away from the actual number (in seconds instead of minutes). It seems to be this would be a great alternative if someone needed a more control over the accuracy than what they were getting from statistics, but they still didn't need an exact answer.
While it is smart to use EXPLAIN to get a row estimate, I would point out that sometimes EXPLAIN is so deeply wrong in estimating rows.
Absolutely.
But it's the best we have to estimate the result count of a query.
> Since there is no “magical row count” stored in a table (like it is in MySQL’s MyISAM)
Whhy myisam? maybe innodb?
What I find out most often is that count(*) is seldom useful or required. It's most used with the traditional paginator that doesn't scale for big rows. What's the point to let the user know that there are hundreds of pages available? Even if you store the rows count in a separate table it won't often be enough because most of these systems allow those rows to be filtered somehow, and in that case that extra table would be useless and the count(*) would still be slow when the filter is not enough to reduce the rows count to some small enough value. People should forget about the traditional pagination component because it's not useful at all to start with. No one is going to navigate through hundreds of pages. That's why the filters exist in the first place. So, one could apply either infinite scroll to display the results or some buttons such as "next page" and "previous page".
I never leave comments on articles like this but I just wanted to say that this is one of the most thorough and helpful articles I've ever come across while doing a quick google search for a seemingly simple problem. Very well written and lays out responses to common confusions that I had when I was trying to troubleshoot a slow
SELECT * FROM ...
queryNever ever use what Laurenz is suggesting -- estimated counts etc.
First.. his solution? Parsing of the planner result is far too simplistic
Second.. Estimated counts implies you are interested in counts? Sure. run analyze frequently.. but in an active DB.. it's just not worth it. Planner says 42 rows.. real? could 1.5 million. If counts are even somewhat valuable to you... use real counts. Figure out how. Estimated? never
I agree that it is best not to show counts at all.
If you object to estimated counts, I suppose you are not satisfied with Google's web search engine.
Maybe, PostgreSQL should have a function which returns estimated row count - and this runs much faster than the traditional count(*), for large data sets. So, I could simply write my query as: SELECT estimated_row_count() FROM my_table; The estimated count would return a count which is somewhat consistently within 5% (or other acceptable percentage) of the actual count.
That already exists, see the query here:
SELECT reltuples::bigint
FROM pg_catalog.pg_class
WHERE relname = 'my_table';
That is fast, and if you lower
autovacuum_analyze_scale_factor
to 0.05, it should be within 5% of the actual value.