PostgreSQL tablespaces misunderstood
© Laurenz Albe 2021

 

Users with an Oracle background consider tablespaces very important and are surprised that you can find so little information about them in PostgreSQL. This article will explain what they are, when they are useful and whether or not you should use them.

What is a tablespace

Essentially, a tablespace in PostgreSQL is a directory containing data files. These data files are the storage behind objects with a state: tables, sequences, indexes and materialized views. In PostgreSQL, each such object has its own data file. If the object is bigger, it will have several files called segments with a size limit of 1GB.

PostgreSQL uses the operating system’s file system for its storage. This is different from Oracle, which essentially implements its own “file system”.

Let’s compare the terms for clarity:

Terminology in Oracle vs. PostgreSQL
OraclePostgreSQL or operating system
tablespacefile system
datafilelogical/physical volume
segmentall data files of a table
extentsegment / data file

Default tablespaces

Each PostgreSQL database cluster initially has two tablespaces. You can list them with \db in psql:

       List of tablespaces
    Name    |  Owner   | Location 
------------+----------+----------
 pg_default | postgres | 
 pg_global  | postgres | 
(2 rows)

You’ll notice that there is no location specified. That is because they always correspond to fixed subdirectories of the PostgreSQL data directory: the default tablespace (pg_default) is the “base” subdirectory, and the global tablespace (pg_global) is the “global” subdirectory.

By default, all data files will be stored in the default tablespace. Only certain objects are stored in the global tablespace: the catalog tables pg_database, pg_authid, pg_tablespace and pg_shdepend and all their indexes. These are the only catalog tables shared by all databases.

Creating and using new tablespaces

To create a new tablespace, you first have to create a new directory. Don’t create that directory in the PostgreSQL data directory!

Note that the directory has to belong to the “postgres” operating system user (to be exact, the user has to have permissions to change the directory’s permissions).

Then you can create the tablespace:

CREATE TABLESPACE mytbsp LOCATION '/tmp/mytbsp';

To use the tablespace, you can create a table or another object with storage in it:

CREATE TABLE newtab (
   id  integer NOT NULL,
   val text    NOT NULL
) TABLESPACE mytbsp;

ALTER TABLE newtab
   ADD CONSTRAINT newtab_pkey PRIMARY KEY (id)
   USING INDEX TABLESPACE mytbsp;

CREATE INDEX newtab_val_idx ON newtab (val)
   TABLESPACE mytbsp;

Note that indexes are not automatically created in the same tablespace as the table.

You can also create a database in a tablespace:

CREATE DATABASE newdb TABLESPACE mytbsp;

Then all objects you create in that database will automatically be placed in the database’s tablespace.

There are ALTER commands to change the tablespace of any object. Moving an object to another tablespace copies the data files, and the object is inaccessible while it is being moved.

Backup and tablespaces

If you perform a file system backup of a database with tablespaces, you have to back up all tablespaces. You cannot back up or restore a single tablespace, and there is no equivalent to Oracle’s “transportable tablespaces”.

pg_basebackup with the plain format will try to save tablespaces in the same place as on the database server (the -D option only specifies the location of the data directory). To backup data from a tablespace to a different location, you have to use the option --tablespace-mapping=olddir=newdir. You can use this option more than once for multiple tablespaces.

Using tablespaces makes database administration more complicated, because the data directory no longer contains all the data.

When should I create a tablespace?

In the vast majority of cases, you shouldn’t create extra tablespaces in PostgreSQL. In particular, it never makes sense to create a tablespace on the same file system as the data directory or on the same file system as another tablespace.

So what are the benefits of tablespaces that justify the administrative complexity?

  • If you have your data spread over several file systems on different devices, you can distribute the I/O load. However, this can be done just as well or better by striping on a lower level.
  • If you run out of space on a disk, a tablespace will give you the option to add more storage. However, today you typically use a volume manager of some description to do this on the operating system level.
  • If you want to put a size limit on a database or table, you can put it in a tablespace on a file system with limited size.
  • If you have both fast, expensive and slow, cheap storage, you can put the objects that need good performance in a separate tablespace on fast storage. In that case, you should adjust the seq_page_cost, random_page_cost and effective_io_concurrency options on the tablespace to tell the optimizer about the performance characteristics.
  • If you want temporary files (both for temporary tables and for query processing) to be created somewhere else than in the default tablespace, you can set the temp_tablespaces parameter to a different tablespace.

If you are running in a virtualized environment with virtualized storage, all these points are moot, with the exception of the third. Since almost everybody uses virtualization these days, tablespaces are becoming an increasingly irrelevant PostgreSQL feature.

Dispelling an old myth

There is a tenacious myth circulating among database administrators that you should put tables and indexes on different disks for good performance.

You will hear people making elaborate arguments as to why the particular interplay of access patterns during an index scan will make this efficient on spinning disks. But spinning disks are going out of business, and you typically only saturate your storage system with several concurrent SQL statements, when all such access patterns will be disrupted anyway.

The truth behind the myth is that it is certainly beneficial to spread the I/O load over multiple devices. If you use striping on the operating system level, you will get a better spread than you will by carefully placing tables and indexes.

Conclusion

Tablespaces are rarely relevant in PostgreSQL. Resist the temptation to create tablespaces and leave all data in the default tablespace.