Data isn’t only about storage and accumulation – sometimes it’s also about deletion, cleanup and archiving. In SQL there’s more than one way to empty a table. Two essential methods are available:
DELETE vs. TRUNCATE
Both commands serve totally different purposes, which are sometimes not fully understood.
The key difference is that
DELETE is basically a row level operation. A
DELETE statement will mark every row matching the
WHERE-clause as deleted. In the case of billions of rows, this takes a relatively long time.
TRUNCATE is different: It’s basically a table operation. Instead of touching each row separately, it will simply empty the entire table and start a new data file.
How DELETE and TRUNCATE work
Let’s create some sample data and take a look:
blog=# CREATE TABLE t_sample AS SELECT * FROM generate_series(1, 1000000) AS id; SELECT 1000000
You’ve just created 1 million rows, which can be used to check the process.
The first example shows a simple
DELETE which will delete all rows and terminate the transaction (ROLLBACK):
blog=# BEGIN; BEGIN blog=*# \timing Timing is on. blog=*# DELETE FROM t_sample; DELETE 1000000 Time: 709.771 ms blog=*# ROLLBACK; ROLLBACK Time: 0.836 ms
PostgreSQL needs around 0.7 seconds for this operation. In the next listing you can see what happens if you use
blog=# TRUNCATE t_sample; TRUNCATE TABLE Time: 2.481 ms
TRUNCATE is considerably faster than
DELETE. You need to keep in mind that
TRUNCATE can only be used if you want to clean an entire table (or partition), while
DELETE was designed to remove rows more selectively. The conclusion therefore is that
TRUNCATE is unbeatable if you want to delete all rows. Avoid
DELETE in this case.
TRUNCATE: Under the hood
It’s noteworthy that in PostgreSQL,
TRUNCATE is fully transactional. That means that
TRUNCATE can be rolled back just like any other command. People therefore often ask: How does it work– and how can this ever work? Let’s take a look and see:
blog=# SELECT oid, relfilenode, relname FROM pg_class WHERE relname = 't_sample'; oid | relfilenode | relname --------+-------------+---------- 309268 | 309271 | t_sample (1 row)
In PostgreSQL a table is stored in a set of files identified by the “relfilenode”. What happens is that PostgreSQL will lock the table during
TRUNCATE and create a new relfilenode:
blog=# BEGIN; BEGIN blog=*# TRUNCATE t_sample; TRUNCATE TABLE blog=*# SELECT oid, relfilenode, relname FROM pg_class WHERE relname = 't_sample'; oid | relfilenode | relname --------+-------------+---------- 309268 | 309272 | t_sample (1 row) blog=*# COMMIT; COMMIT
As you can see, a new file has been created. The beauty is: In case of a
ROLLBACK, the new file can be thrown away and you’re back where you started. However, if the transaction is able to
COMMIT, the new file will be used:
blog=# SELECT oid, relfilenode, relname FROM pg_class WHERE relname = 't_sample'; oid | relfilenode | relname --------+-------------+---------- 309268 | 309272 | t_sample (1 row)
During these operations, the OID (= object ID) will stay constant.
If you want to dive deeper into PostgreSQL and if you happen to be interested in how to modify data and table structures efficiently, I want to recommend my post about ALTER TABLE done right on our website.
Also, if you want to learn more about PostgreSQL and if you’re interested in other topics, feel free to leave a comment with your suggestion. We’re eager to post useful content that’s relevant and helpful to you.