Creating reports is a core task of every PostgreSQL database engineer or developer. However, many people think that it is enough to hack up some SQL aggregating some data and execute it. This is not quite true. We have repeatedly seen reports being just plain wrong without people even taking note of it.
How can this happen?
In our example we want to aggregate some simple data:
1 2 |
test=# CREATE TABLE t_test (name text, gender char(1)); CREATE TABLE |
We add two boys and one girl:
1 2 3 4 5 6 7 8 |
test=# INSERT INTO t_test VALUES ('hans', 'm'); INSERT 0 1 test=# INSERT INTO t_test VALUES ('paul', 'm'); INSERT 0 1 test=# INSERT INTO t_test VALUES ('maria', 'f'); INSERT 0 1 |
Let us start with our report. We want to create a nice and colorful analysis displaying the number of girls, the number of boys, and the overall total.
To prove our point we have written two queries:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
test=# BEGIN; BEGIN test=# SELECT gender, count(*) FROM t_test GROUP BY 1; gender | count --------+------- m | 2 f | 1 (2 rows) test=# SELECT count(*) FROM t_test; count ------- 3 (1 row) test=# COMMIT; COMMIT |
We start a transaction, do an analysis per gender followed by an overall count (and yes, there are more efficient ways of doing that).
The question now is: Is this report correct or not? Most people would ask: “Why not?”
User 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
BEGIN; test=# SELECT gender, count(*) FROM t_test GROUP BY 1; gender | count --------+------- m | 2 f | 1 (2 rows) . . . test=# SELECT count(*) FROM t_test; count ------- 2 (1 row) COMMIT; |
User 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
. . . . . . . . . . . . test=# DELETE FROM t_test WHERE name = 'hans'; DELETE 1 . . . . . . . . . |
Here is why:
What we see here is that the count does not reflect what we have seen before. The overall count is two – it is supposed to be three. The reason for this (for many users) unexpected behavior is that we are in so called “read committed” mode. In “read committed” mode every SELECT statement will take a so called snapshot. In other words: Every SELECT will already see data committed by some other transaction. In our case the second SELECT will already see the other transaction's commit. So, over time, a “read committed” transaction can change its view of the data. This implies that a report consisting of more than just one query might not operate on the same data and thus return inconsistent results.
To fix the problem we can run our transaction in “repeatable read” mode. This will make sure that your transaction will have a consistent view of the data throughout the transaction and therefore return correct results.
Running a transaction in “repeatable read” works like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
test=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; BEGIN test=# SELECT gender, count(*) FROM t_test GROUP BY 1; gender | count --------+------- m | 2 f | 1 (2 rows) test=# SELECT count(*) FROM t_test; count ------- 3 (1 row) test=# COMMIT; COMMIT |
If you happen to do reporting, we always recommend to use “repeatable read” because it allows you to do your reporting on a consistent view of the data.
----------
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
UPDATED August 2023 - Table bloat in PostgreSQL: Many people wonder why deleting data from a table in a PostgreSQL database does not shrink files on disk. You would expect storage consumption to go down when data is deleted. This is not always the case.
To show how this works, I have compiled some examples.
1 2 |
test=# CREATE TABLE t_test (id int4); CREATE TABLE |
1 2 |
test=# INSERT INTO t_test SELECT * FROM generate_series(1, 100000); INSERT 0 100000 |
The size of the table will be around 3.5 MB on disk. We can find out about the size of a table using pg_relation_size
:
1 2 3 4 5 |
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 3544 kB (1 row) |
1 2 |
test=# DELETE FROM t_test WHERE id > 50000; DELETE 50000 |
Many users would expect this table to shrink to half the size. This is not the case:
1 2 3 4 5 |
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 3544 kB (1 row) |
The reason why the storage space doesn't decrease is that when you delete some rows, it does not necessarily mean that nobody else can see the deleted rows anymore. There might still be transactions around which might at some point need those row versions you are about to delete. In other words: PostgreSQL does not delete a row instantly, but marks it invisible, so that certain transactions cannot see it.
VACUUM
:
1 2 |
test=# VACUUM t_test; VACUUM |
In our example the table has been reduced in size:
1 2 3 4 5 |
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 1776 kB (1 row) |
VACUUM
is able to reclaim space and return it to the filesystem, but this is not always possible. It is only possible to shrink a file by cutting off stuff at the end. Cutting out data at the end of a data file is only possible if (and only if) there are no valid rows anymore AFTER a certain position in the file. In our example we have loaded data in ascending order into our PostgreSQL table (1, 2, 3, etc.). If we delete everything larger than 50.000 it means that the second half of the table will be gone. VACUUM
will go through the table and figure out that the second half of the data file contains nothing but chunk and truncate the data file to return space to the filesystem.
In the first example, we arranged things in a way that we can see VACUUM
returning space to the file system.
1 2 |
test=# DROP TABLE t_test; DROP TABLE |
First of all we drop the table and create a new one:
1 2 |
test=# CREATE TABLE t_test (id int4); CREATE TABLE |
Then we load 100.000 rows just like before:
1 2 |
test=# INSERT INTO t_test SELECT * FROM generate_series(1, 100000); INSERT 0 100000 |
In this example we will delete the first half of the data:
1 2 |
test=# DELETE FROM t_test WHERE id < 50000; DELETE 49999 |
Just like before we can run VACUUM
:
1 2 |
test=# VACUUM t_test; VACUUM |
This time the size of the underlying table is the same size:
1 2 3 4 5 |
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 3544 kB (1 row) |
Remember what we said about cleanup and VACUUM
? VACUUM
can only shrink a file when free space is at the end. However, our example has been set up in a way that this is not the case.
In PostgreSQL there is a thing called CTID, which says where a certain row is. ctid=(0, 14) would mean that we are talking about the 14th row in the first data block of the table.
1 2 3 4 5 6 7 8 9 |
test=# SELECT ctid, * FROM t_test ORDER BY ctid DESC LIMIT 5; ctid | id -----------+-------- (442,108) | 100000 (442,107) | 99999 (442,106) | 99998 (442,105) | 99997 (442,104) | 99996 (5 rows) |
As you can see, the last row is in block 442. Do the math: 442 * 8192 = 3.5 MB. This is why the table has not changed in size.
1 2 3 4 5 |
test=# SELECT min(ctid) FROM t_test; min ---------- (221,54) (1 row) |
We can see that the first valid row is somewhere in the middle of the table. Therefore the table has not been truncated by VACUUM
.
Postgres version 12 introduced a new table storage parameter vacuum_truncate
that controls the functionality to truncate off any empty pages at the end of a table. For example, you might want to avoid vacuum truncation for the following reasons:
shared_buffers
(see this blog about DROP TABLE: Killing shared_buffers
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
If your PostgreSQL database is purely maintained (lack of VACUUM) or badly structured, you might face some table bloat.
UPDATE Sept. 2023 - See this blog for a more recent update on this topic: Table bloat revisited.
Table bloat is somewhat nasty because it slows down your database and eats up more storage than needed. Therefore it can be important to figure out if a table is bloated or not. The PostgreSQL contrib package provides a package called pgstattuple to diagnose table bloat.
To use the module we have to activate it:
1 2 |
test=# CREATE EXTENSION pgstattuple; CREATE EXTENSION |
For a test we create a table and add some 10.000 rows to it on the fly:
1 2 |
test=# CREATE TABLE t_test AS SELECT * FROM generate_series(1, 10000); SELECT 10000 |
We can already see, how pgstattuple works. To make sure the psql output stays readable, we have called ...
1 2 |
test=# x Expanded display is on. |
In our example pgstattuple
will return the following data:
1 2 3 4 5 6 7 8 9 10 11 |
test=# SELECT * FROM pgstattuple('t_test'); -[ RECORD 1 ]------+------- table_len | 368640 tuple_count | 10000 tuple_len | 280000 tuple_percent | 75.95 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 7380 free_percent | 2 |
As you can see the size of the table is somewhat over 368k. Our table has a fill grade of around 76%. Note that those numbers don't add up to 100% completely. This is due to some overhead. In reality the fill grade of a freshly loaded table will be a lot higher than in our trivial single column case.
To demonstrate table bloat we can delete some data. In this example we delete one third of those rows:
1 2 |
test=# DELETE FROM t_test WHERE generate_series % 3 = 0; DELETE 3333 |
Note, the size of the table on disk is still the same. Some of those valid rows have just been moved to the “dead tuple” section of the output:
1 2 3 4 5 6 7 8 9 10 11 |
test=# SELECT * FROM pgstattuple('t_test'); -[ RECORD 1 ]------+------- table_len | 368640 tuple_count | 6667 tuple_len | 186676 tuple_percent | 50.64 dead_tuple_count | 3333 dead_tuple_len | 93324 dead_tuple_percent | 25.32 free_space | 7380 free_percent | 2 |
The first lesson here is that DELETE does not shrink a table on disk. It merely marks rows as dead. This is highly important – many people are misled by this behavior.
To reclaim the space occupied by those dead rows we can call VACUUM:
1 2 |
test=# VACUUM t_test; VACUUM |
What we see here is that those dead rows have been converted to free space. And yes, the size of the table is still the same:
1 2 3 4 5 6 7 8 9 10 11 |
test=# SELECT * FROM pgstattuple('t_test'); -[ RECORD 1 ]------+------- table_len | 368640 tuple_count | 6667 tuple_len | 186676 tuple_percent | 50.64 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 114036 free_percent | 30.93 |
This free space can now be used to store new rows inside your table.
In our example we have checked one table. But what if you want to check your entire database to see who is to blame?
In this case, you must make use of the PostgreSQL type system and a system catalog. Let us discuss the type system first: In the previous example we have seen that pgstattuple can be called in the FROM-clause because it returns an entire row. But we have to do this for all tables – not just for some. To make sure that we can put the list of objects we want to check into the FROM-clause (in the form of the pg_class system table) we have to move the function call to the SELECT clause. The call will return a composite type consisting of all fields:
1 2 3 4 5 6 7 |
test=# SELECT relname, pgstattuple(oid) FROM pg_class WHERE relkind = 'r' LIMIT 3; relname | pgstattuple --------------+---------------------------------------------------- pg_statistic | (147456,387,115800,78.53,27,8161,5.53,19840,13.45) pg_type | (73728,357,61965,84.05,20,3440,4.67,4828,6.55) pg_authid | (8192,3,324,3.96,0,0,0,7816,95.41) (3 rows) |
This is not really readable so we have to break up those fields again. Some brackets and a star will do the job:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
test=# SELECT relname, (pgstattuple(oid)).* FROM pg_class WHERE relkind = 'r' LIMIT 3; -[ RECORD 1 ]-------+------------- relname | pg_statistic table_len | 147456 tuple_count | 387 tuple_len | 115800 tuple_percent | 78.53 dead_tuple_count | 27 dead_tuple_len | 8161 dead_tuple_percent | 5.53 free_space | 19840 free_percent | 13.45 -[ RECORD 2 ]-------+------------- relname | pg_type table_len | 73728 tuple_count | 357 tuple_len | 61965 tuple_percent | 84.05 dead_tuple_count | 20 dead_tuple_len | 3440 dead_tuple_percent | 4.67 free_space | 4828 free_percent | 6.55 -[ RECORD 3 ]-------+------------- relname | pg_authid table_len | 8192 tuple_count | 3 tuple_len | 324 tuple_percent | 3.96 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 7816 free_percent | 95.41 |
We can now sort and filter at will to figure out which table is to blame.
See also Laurenz Albe's post about "How a bad network connection can cause table bloat."
We welcome your feedback on our Facebook page.
In my role as a PostgreSQL consultant and trainer there are a couple of issues, which pop up on a regular basis. One of those issues has to do with set operations. Many people seem to misunderstand the way UNION and UNION ALL work.
UPDATE Sept. 2023: See this post on UNION ALL
for more recent information.
The basic assumption of most users is that UNION just appends data – nothing can be further from the truth. Here is why:
1 2 3 4 5 |
test=# SELECT 1 UNION SELECT 1; ?column? ---------- 1 (1 row) |
Most people would expect that two rows will be returned from this query. In fact, it is just one row. The reason for that is simple. UNION does NOT just append data – it also removes duplicates. As we have two identical numbers in our example, only one row will remain.
We can use the explain command to see how this operation is performed internally:
1 2 3 4 5 6 7 8 9 10 |
test=# explain SELECT 1 UNION SELECT 1; QUERY PLAN ------------------------------------------------------------ Unique (cost=0.05..0.06 rows=2 width=0) -> Sort (cost=0.05..0.06 rows=2 width=0) Sort Key: (1) -> Append (cost=0.00..0.04 rows=2 width=0) -> Result (cost=0.00..0.01 rows=1 width=0) -> Result (cost=0.00..0.01 rows=1 width=0) (6 rows) |
PostgreSQL will sort the data and remove duplicates later on. The way PostgreSQL handles this query also explains, why many users report performance problems when using UNION. PostgreSQL has to sort all the data to remove duplicate – this is clearly more expensive than just adding data to some result.
What people in many cases really want is UNION ALL.
Let us take a look at an example:
1 2 3 4 5 6 |
test=# SELECT 1 UNION ALL SELECT 1; ?column? ---------- 1 1 (2 rows) |
In this case data is really just appended and therefore we get two lines of data.
The plan will reveal that things are really the way we want them to be:
1 2 3 4 5 6 7 8 |
test=# explain SELECT 1 UNION ALL SELECT 1; QUERY PLAN ------------------------------------------------------ Result (cost=0.00..0.04 rows=2 width=4) -> Append (cost=0.00..0.04 rows=2 width=4) -> Result (cost=0.00..0.01 rows=1 width=0) -> Result (cost=0.00..0.01 rows=1 width=0) (4 rows) |
PostgreSQL will take both result sets and just add them up.
So be careful when it comes to set operations and think twice if you really want to filter duplicates or not.
----------
We welcome your feedback on our brand new Facebook page: www.fb.com/cybertec.postgresql
After digging through a table consisting of dozens of boolean values I thought it might be helpful to more folks out there to know about bitfields in PostgreSQL.
Basically “bit” can be used just as a normal char data type. The idea behind bit is to have a fixed set of bits, which can be accessed at will. Here is an example:
1 2 3 4 5 |
test=# SELECT '0110'::bit(4); bit ------ 0110 (1 row) |
You can use zeros and ones to represent bits in a PostgreSQL bitfield. The nice thing about it is that you can easily cast those bitfields to integer:
1 2 3 4 5 |
test=# SELECT '0110'::bit(4)::int4; int4 ------ 6 (1 row) |
In our case we would have reduced 4 boolean columns to just one single column. Just imagine the benefit if you have to deal with hundreds of values. Clearly – normalizing boolean columns is not too attractive either.
The previous example has shown how a bitfield can be turned into integer. Let us try to convert stuff back now:
1 2 3 4 5 |
test=# SELECT 6::bit(4); bit ------ 0110 (1 row) |
In this case we have turned a number into a bitfield. All it takes here is a simple cast.
What happens if you want to update a column containing a bitfield? To do so, PostgreSQL offers a function called set_bit. It takes a bitfield, a position as well as the desired new value of the n-th bit you want to change.
Setting a bit works as follows:
1 2 3 4 5 |
test=# SELECT set_bit('0110'::bit(4), 3, 1); set_bit --------- 0111 (1 row) |
If you want to fetch a certain bit from your field, you can call get_bit:
1 2 3 4 5 |
test=# SELECT get_bit('0110'::bit(4), 2); get_bit --------- 1 (1 row) |
In this example we have extracted the third bit from the bitfield.
Shifting bits around is a pretty common operation. Therefore it can also be done in PostgreSQL. Here are two examples – one with variable bitfields (the bit counterpart to varbit) and one with a fixed bitfield:
1 2 3 4 5 |
test=# SELECT '0110'::bit(10) << 1; ?column? ---------- 1100 (1 row) |
1 2 3 4 5 |
test=# SELECT '0110'::varbit(10) << 1; ?column? ------------ 1100000000 (1 row) |
The nice thing about bitfields is that their size is virtually unlimited. You want a thousand bits or maybe ten thousand? No problem – this is all possible and easy to do.
For more posts on data types and their use in PostgreSQL, check out our data type blog spot.
UPDATED AUGUST 2023: After my previous post, I received mails asking about what was meant by using “generate_series
” in an outer join.
Let me try to address this issue with an example: Many people use PostgreSQL to aggregate data. In this example, we assume that we want to aggregate data on an hourly basis.
Here's some sample data:
1 2 |
test=# CREATE TABLE t_data (t timestamp, payload text); CREATE TABLE |
We create a table with just two fields:
1 2 3 4 5 6 7 8 9 10 11 |
test=# INSERT INTO t_data VALUES ('2012-04-04 03:12', 'data 1'); INSERT 0 1 test=# INSERT INTO t_data VALUES ('2012-04-04 04:16', 'data 2'); INSERT 0 1 test=# INSERT INTO t_data VALUES ('2012-04-04 04:28', 'data 3'); INSERT 0 1 test=# INSERT INTO t_data VALUES ('2012-04-04 06:45', 'data 4'); INSERT 0 1 |
We have data for 3am, 4am and 6am ... we've intentionally left 5am out. If we want to start to aggregate the data to see how many rows there are per hour, we can use the following query:
1 2 3 4 5 6 7 8 9 10 |
test=# SELECT date_trunc('hour', t), count(payload) FROM t_data GROUP BY 1 ORDER BY 1; date_trunc | count ---------------------+------- 2012-04-04 03:00:00 | 1 2012-04-04 04:00:00 | 2 2012-04-04 06:00:00 | 1 (3 rows) |
A simple GROUP BY
will reveal that there were no entries for the 5am group. However, the zero is not displayed here, because GROUP BY
will only return data for valid groups.
In order to create a list of times we want to see in our final report, we can execute generate_series
. Let's assume we want to have a complete table from midnight to 6am:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
test=# SELECT * FROM generate_series( '2012-04-04 00:00'::timestamp, '2012-04-04 06:00'::timestamp, '1 hour') AS x; x --------------------- 2012-04-04 00:00:00 2012-04-04 01:00:00 2012-04-04 02:00:00 2012-04-04 03:00:00 2012-04-04 04:00:00 2012-04-04 05:00:00 2012-04-04 06:00:00 (7 rows) |
This will return a complete list – exactly what we need to display in the final results. Remember, we also want to display hours with no rows.
Let's now join things together using a standard PostgreSQL outer join:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
test=# SELECT * FROM generate_series('2012-04-04 00:00'::timestamp, '2012-04-04 06:00'::timestamp, '1 hour') AS x LEFT JOIN t_data ON (date_trunc('hour', t) = x); x | t | payload ---------------------+---------------------+--------- 2012-04-04 00:00:00 | | 2012-04-04 01:00:00 | | 2012-04-04 02:00:00 | | 2012-04-04 03:00:00 | 2012-04-04 03:12:00 | data 1 2012-04-04 04:00:00 | 2012-04-04 04:16:00 | data 2 2012-04-04 04:00:00 | 2012-04-04 04:28:00 | data 3 2012-04-04 05:00:00 | | 2012-04-04 06:00:00 | 2012-04-04 06:45:00 | data 4 (8 rows) |
We'll run date_trunc
to make sure that the precision of our dates is actually in hours. Then we can use the date column as the join key. Note that the non-existing fields on the “data” side of the join will stay empty.
Once we have joined the data we have to aggregate to fetch the final result. When doing this aggregation, watch out for a common mistake:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
test=# SELECT x, count(*) FROM generate_series('2012-04-04 00:00'::timestamp, '2012-04-04 06:00'::timestamp, '1 hour') AS x LEFT JOIN t_data ON (date_trunc('hour', t) = x) GROUP BY 1; x | count ---------------------+------- 2012-04-04 00:00:00 | 1 2012-04-04 01:00:00 | 1 2012-04-04 02:00:00 | 1 2012-04-04 03:00:00 | 1 2012-04-04 04:00:00 | 2 2012-04-04 05:00:00 | 1 2012-04-04 06:00:00 | 1 (7 rows) |
Never EVER use count(*)
along with an outer join.
The problem is, count(*)
will count ROWS – we don't want to count rows because we're not allowed to count NULL
values. As you can see, the result is just plain wrong.
To correct the problem, I advise you to count non-NULL
values. In SQL, this means that you have to use count(column)
rather than count(*)
. count(column)
will only count values which are not NULL
. That's exactly what we want:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
test=# SELECT x, count(payload) FROM generate_series('2012-04-04 00:00'::timestamp, '2012-04-04 06:00'::timestamp, '1 hour') AS x LEFT JOIN t_data ON (date_trunc('hour', t) = x) GROUP BY 1; x | count ---------------------+------- 2012-04-04 00:00:00 | 0 2012-04-04 01:00:00 | 0 2012-04-04 02:00:00 | 0 2012-04-04 03:00:00 | 1 2012-04-04 04:00:00 | 2 2012-04-04 05:00:00 | 0 2012-04-04 06:00:00 | 1 (7 rows) |
Voilà, the result is exactly as desired.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
UPDATED August 2023: Time is one of those unloved topics every programmer has to deal with. Have you ever written a routine heavily dependent on time calculations in C, Python, PHP, Perl, or any other language? In that case, I'm sure you've fallen in love with time management as much as I have.
When it comes to processing time, PostgreSQL is really like a kind of revelation – simple, easy to deal with, efficient, and highly capable.
One of the most common tasks that PostgreSQL excels at is to figure out about yesterday, today and tomorrow. Consider the following query:
1 2 3 4 5 |
test=# SELECT 'YESTERDAY'::date, 'TODAY'::date, 'TOMORROW'::date; date | date | date ------------+------------+------------ 2013-08-05 | 2013-08-06 | 2013-08-07 (1 row) |
These three PostgreSQL constants allow you to fetch important dates in SQL quickly and easily. There is no need for nasty math here.
PostgreSQL can also handle dates like February 29th nicely:
1 2 3 4 5 6 7 8 9 |
test=# SELECT DATE '2023-02-29'; ERROR: date/time field value out of range: '2023-02-29' LINE 1: SELECT DATE '2023-02-29'; ^ test=# SELECT DATE '2024-02-29'; date ------------ 2024-02-29 (1 row) |
But PostgreSQL is not only able to work with dates. It can also be extremely convenient in terms of working with intervals. Basically, an interval can be subtracted and added to a timestamp at will.
1 2 3 4 5 |
test=# SELECT now(); now ------------------------------- 2013-08-06 10:41:19.202914+02 (1 row) |
1 2 3 4 5 |
test=# SELECT now() + '3 decades 2 hours 5 centuries 20 minutes 90 days 12 months'::interval; ?column? ------------------------------- 2544-11-04 13:01:42.298739+01 (1 row) |
The interval
datatype accepts an easily readable format. You don't even have to put the units in exact order – hours can be placed before centuries, or the other way around. There are absolutely no restrictions on order, which makes coding with this type a lot simpler.
generate_series
Did you ever write a web application which has to display a calendar? In case you did, I'm pretty sure you had a problem coming up with a list of days for a given month. In most programming languages, doing that kind of processing is pretty hard. PostgreSQL can assist here. It provides a simple, efficient method:
1 2 3 4 5 6 7 8 9 10 |
test=# SELECT * FROM generate_series('2012-02-26'::date, '2012-03-02'::date, '1 day'::interval); generate_series ------------------------ 2012-02-26 00:00:00+01 2012-02-27 00:00:00+01 2012-02-28 00:00:00+01 2012-02-29 00:00:00+01 2012-03-01 00:00:00+01 2012-03-02 00:00:00+01 (6 rows) |
The generate_series
function takes three parameters: The first one defines the starting timestamp. The second parameter defines the ending timestamp, and the third parameter will tell PostgreSQL the size of the increments for the interval. In our example, we defined the interval length as one day-- but you can use any increment you desire.
Keep in mind: Generating a timeseries can come in handy when you have to write outer joins - See the next blog post in this series for more info. Doing stuff like that on the database side can dramatically reduce the amount of engineering needed.
date_bin
and timestamps in PostgreSQLSince this blog post was written, a new function was added in PostgreSQL 14 to solve a problem which has caused challenges for many users: How can we map timestamps to time bins? The function is called date_bin
.
What people often do is round a timestamp to a full hour. That’s commonly done using the date_trunc
function. But what if you want to round data in a way that things fit into a 30-minute or a 15-minute grid? Find out all about date_bin
in this blog post.
For more advanced information on how PostgreSQL processes time, see the following blogs:
EXCLUDE TIES
, CURRENT ROW
, and GROUP
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.