Running ALTER TABLEADD COLUMN can have side effects which have the potential to cause serious issues in production. Changing data structures is an important issue and often comes up, therefore it’s important to understand what is really going on.

Let’s dive in and see how to run ALTER TABLEADD COLUMN in a way that doesn’t hurt operations due to locking or extensive I/O.

Getting started with ALTER TABLE… ADD COLUMN

The following statement creates a simple table which is going to serve as a test dummy for the operations we plan to run:


test=# CREATE TABLE t_sample AS
SELECT a1::int
FROM generate_series(1, 100000000) AS a1;
SELECT 100000000
Time: 56272,917 ms (00:56,273)

The initial table is around 3.4 GB in size as shown in the next listing:


test=# \d+
List of relations
 Schema |    Name  |  Type | Owner | Persistence | Access method | Size | Description
--------+----------+-------+-------+-------------+---------------+---------+-------------
 public | t_sample | table | hs    | permanent   | heap          | 3458 MB |
(1 row)

So far this is what we expected. But what if columns are added? How does it impact the database?

Adding a column to a table in PostgreSQL

We have to address various scenarios here. The simplest one is to add a column without any default values:


test=# ALTER TABLE t_sample ADD COLUMN a2 int;
ALTER TABLE
Time: 11,472 ms

The important point here is: This operation is really fast because all it does is to run some magic on the system catalog. We do need a full table lock BUT it is a really short lock because PostgreSQL does not actually write this column to disk. The system column knows that one more column should be on disk – as nothing is present the engine can safely assume that the value is indeed NULL.

A similar trick can be applied if we add a constant default value:


test=# ALTER TABLE t_sample ADD COLUMN a3 int DEFAULT 10;
ALTER TABLE
Time: 3,339 ms

Again the operation is really quick and it does NOT affect the amount of storage we need because this additional column does not make it to disk in the first place:


test=# \d+
List of relations
Schema  |  Name    | Type  | Owner | Persistence | Access method | Size | Description
--------+----------+-------+-------+-------------+---------------+---------+-------------
public  | t_sample | table | hs    | permanent   | heap          | 3458 MB |
(1 row)

The engine knows that more columns are supposed to be around. However, if nothing can be found inside a row, PostgreSQL will know what the value is supposed to be. In this case it can expect the value to be 10. Again, there is no need to persist this entry.

However, the situation starts to change if the value we want to add to the column is not constant anymore.

ALTER TABLE and random()


test=# ALTER TABLE t_sample
ADD COLUMN a4 int DEFAULT random()*1000;
ALTER TABLE
Time: 65233,954 ms (01:05,234)

random() does not yield deterministic values. However, the content of a table is supposed to be static – it must not change on the fly. Therefore we really have to  write the data to disk. Mind that the operation takes more than one minute. While this is not much, it also means that the table is locked for the time being. Having hundreds or possibly thousands of connections wait on this operation can seriously harm availability and cause all kinds of problems ranging from reaching connection limits to timeouts. Therefore extensive table locking has to be avoided.

Due to the fact that the additional column had to be written to disk, the size of the table has grown to over 4 GB in size.


test=# \d+
List of relations
 Schema |   Name   | Type  | Owner | Persistence | Access method | Size    | Description
--------+----------+-------+-------+-------------+---------------+---------+-------------
 public | t_sample | table | hs    | permanent   | heap          | 4223 MB |
(1 row)

Note that size is not the primary problem here – locking usually is.

ALTER TABLE … DROP COLUMN …

Once columns have been added, we also need to see what happens if we want to get rid of them. Just like ADD COLUMN, the DROP COLUMN command tries to avoid rewriting the table as much as possible. Therefore DROP COLUMN simply marks the column as deleted:


test=# ALTER TABLE t_sample DROP COLUMN a3;
ALTER TABLE
Time: 8,148 ms
test=# \d+
List of relations
Schema  | Name     | Type  | Owner | Persistence | Access method | Size    | Description
--------+----------+-------+-------+-------------+---------------+---------+-------------
public  | t_sample | table | hs    | permanent   | heap          | 4223 MB |
(1 row)

The size of the table doesn’t change since the physical column still exists. However, we can’t see it and the space will be reclaimed later on. ALTER TABLEDROP COLUMN …  therefore (usually) doesn’t cause major locking issues, unless you’re having a hard time getting a short lock due to extremely high concurrency.

Finally …

If you want to learn more about PostgreSQL, see Michał’s post on practical examples of data normalization.

Check out our CYBERTEC Migrator to migrate from Oracle to PostgreSQL: meet the CYBERTEC Migrator

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.