CYBERTEC Logo

Reporting: Creating correct output

08.2013 / Category: / Tags: |

Creating reports is a core task of every PostgreSQL database engineer or developer. However, many people think that it is enough to hack up some SQL aggregating some data and execute it. This is not quite true. We have repeatedly seen reports being just plain wrong without people even taking note of it.

How can this happen?

Transaction isolation does make a difference

In our example we want to aggregate some simple data:

We add two boys and one girl:

Let us start with our report. We want to create a nice and colorful analysis displaying the number of girls, the number of boys, and the overall total.

To prove our point we have written two queries:

We start a transaction, do an analysis per gender followed by an overall count (and yes, there are more efficient ways of doing that).

The question now is: Is this report correct or not? Most people would ask: “Why not?”

Here is why:

[one_half]
User 1

[/one_half][one_half_last]
User 2

[/one_half_last]

What we see here is that the count does not reflect what we have seen before. The overall count is two – it is supposed to be three. The reason for this (for many users) unexpected behavior is that we are in so called “read committed” mode. In “read committed” mode every SELECT statement will take a so called snapshot. In other words: Every SELECT will already see data committed by some other transaction. In our case the second SELECT will already see the other transaction's commit. So, over time, a “read committed” transaction can change its view of the data. This implies that a report consisting of more than just one query might not operate on the same data and thus return inconsistent results.

Repeatable read – a way to fix transaction visibility

To fix the problem we can run our transaction in “repeatable read” mode. This will make sure that your transaction will have a consistent view of the data throughout the transaction and therefore return correct results.

Running a transaction in “repeatable read” works like this:

If you happen to do reporting, we always recommend to use “repeatable read” because it allows you to do your reporting on a consistent view of the data.

----------
We welcome your feedback on our brand new Facebook page: www.fb.com/cybertec.postgresql

Comments are closed.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram