To handle transactions PostgreSQL uses a mechanism called MVCC (Multi Version Concurrency Control). The core idea of this machinery is to allow the storage engine to keep more than just one version of the row.
What does it mean and why is that so? Let us consider a simple example:
BEGIN; UPDATE foo SET bar = bar + 1; COMMIT;
Let us assume that our example hits exactly one row.
What is the point of all this? Well, how can we ever perform a ROLLBACK if there is no copy of the old row around? A nice side effect of copying the row is that a SELECT statement can perfectly scan the table even while the UPDATE is running without being stopped by a lock or so. The more transactions can coexist, the more scalable is a database.
VACUUM: Cleaning up data
So, when can PostgreSQL clean out the old row? The answer is quite simple: As soon as no other transaction can see the data. This brings us to the core of the entire topic. What if there are insanely long running transactions, which just don’t terminate for some reason?
In this case VACUUM has to defer cleanup and in some cases it might happen that too many versions of a row pile up in a table.
Monitoring long running transactions
Therefore it is highly recommended to check for long running transactions. Sometimes connection pools or applications are simply buggy and can cause long running transactions, which never terminate, unless there is some manual (or automated) intervention. Aborting insanely long running transactions can definitely improve your VACUUM behavior and thus have beneficial side effects.