CYBERTEC Logo

PostgreSQL & tablespaces - it's not so scary ...

05.2017 / Category: / Tags: |

Having spent the last week with Oracle DBAs eagerly peeking towards PostgreSQL, I saw that besides the usual beginner questions one of the Postgres concepts that sounded kind of familiar for Oracle folks (but still brought some confusion, as it's not quite the same thing) are tablespaces.

From higher (user) level everything is pretty similar to Oracle and other systems though – tablespaces "house" database object like tables and indexes. The biggest differences compared to Oracle might be that Postgres tablespaces are cluster level objects and there's no possibility to restrict tablespace (thus cluster/database) size. You don't have to deal with single datafiles but folders instead.

And before going to the details - why would one need tablespaces at all? Well, for two reasons mostly – first to keep the database running when the disk space is running out on the current partition and there's no easy way to extend it (no LVM for example) ... and second, to optimize performance and enable more parallel IO, by for example having indexes on fast SSD disks to benefit OLTP workloads. Just "for fun" I wouldn't recommend setting up tablespaces as it has some implications for replication scenarios, see end of article for more.

So what is exactly a tablespace in Postgres?

As a reminder – by default a Postgres cluster is "self-contained", meaning all the data files will be residing inside the initialized "data directory" (called also PGDATA as one can set the similarly named environment variable to simplify working with clusters via pg_ctl), making things very easy and portable.

But about tablespaces - actually it's very simple, a tablespace is just another name for a folder where the Postgres server process owner (typically "postgres" system user) is the owner and can write some files. The folder can be located anywhere, Postgres must just be able to create symlinks to that certain path. The symlinks are created in the "pg_tblspc" subfolder of the cluster (for example /var/lib/postgresql/9.6/main/pg_tblspc on Ubuntu) and can be inspected using the usual file system commands like "ls -l". After that, users can explicitly say that they want to store some specific tables/indexes in that new "linked folder" or one can also decide to make it the default for new objects or even move all old objects over to the new tablespace.

How to create one?

In the simplest form something like that will work:

When using tablespaces to tune IO though, something like that would make more sense:

NB! Just declaring a tablespace won't help normal unprivileged users – they all need a CREATE grant to create tables/indexes. Something like that will alleviate the situation. Note that for databases with high security requirements one should of course give out such grants on a per user basis.

Moving existing tables to a new tablespace

When plagued by disks running out of space, then after creating a new tablespace the first thing should be setting it also to serve as a default for new objects. This can be done by modifying the default_tablespace parameter and reloading the server config via pg_reload_conf() or "pg_ctl -D mydatadir reload". After that one could/should move also fast growing tables/indexes to the new tablespace. This can be done per table or starting with Postgres 9.4 also en masse.

NB! Using the later syntax locks all tables/indexes (use NOWAIT for fast failing when unable to get all the locks) and will then start to copy the data files one by one, so basically a downtime is required. For "online" moving one can though use tools like pg_repack or pg_squeeze.

Backup & replication considerations

For logical (pg_dump, pg_dumpall) backups nothing changes, but for physical backups (or when building up a replica), pg_basebackup will by default require the exact same directory layout on the replica for things to work. When this is not possible or feasible then one can "re-map" the tablespaces though using the --tablespace-mapping parameter or just use the "tar" format that creates again a "self-contained" blob. See pg_basebackup for more details. Also for doing manual basebackups with cp or rsync or such, one should make sure the "dereference" option is used so that the datafiles behind the symlinks get copied.

Hope it helped. Questions/corrections are welcome as usual!

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes
Article Rating
Subscribe
Notify of
guest
10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Douglas J Hunley
6 years ago

> for example having indexes on fast SSD disks

Wouldn't you be better off w/ the data files on SSD and the indexes on spinning rust? B-TREE access is pretty fast and optimal as is but the random I/O of gathering data pages can be horrific.

Kaarel
Kaarel
6 years ago

Sure this would be the best option if one has enough SSD space 🙂 ...but if you're constrained and have a big and wide table with dozens of columns say, then placing the heavily used ID column on SSD might still be a better balance, winning say 20x in SSD space, giving away 10x in speed but having hope for very fast index-only scans for counts and such. Very use case dependent stuff anyways...

Glyn Astill
6 years ago

Not quite true on the pg_dump / restore level; you might need to use --no-tablespaces

Marco Pegoraro
6 years ago

Hello, thank you for the clear article.

I set up 2 table spaces where I believed PG would have dropped most of the data, but the default tablespace now keeps on growing more than the others.

All my tables and indexes are now going into one of the 2 custom tablespaces so I'm puzzled by this growing of the default disk location.

I'm doing this with the goal of splitting different data-areas of my application into different EBS disks on AWS.

Joonas Salo
Joonas Salo
2 years ago

So I want to store indexes in local SSD for performance and main data in a redundant network drive for safe keeping.

What if the local SSD breaks, would the whole database end up in corrupted state or can postgres recreate the indexes from the main data?

How about materialized views?

Santhosh
Santhosh
4 years ago

my windows is corrupted.so i need to re-install it. my database in table space in my usb folder. postgres installation folder and data folder not recovered can i got the data restoration from my table space folder

Raul Pereira
Raul Pereira
5 years ago

Sorry, I'm starting with postgresql now. Why can not I create two tablespace in postgresql in the same directory? ERROR: directory "/var/lib/postgresql/10/main/base/PG_10_201707211" already in use as a tablespace

Pavlo Golub
Pavlo Golub
5 years ago
Reply to  Raul Pereira

Because postgres wants exclusive right on this folder. And postgres wants eliminate files mess etc

Raul Pereira
Raul Pereira
5 years ago
Reply to  Pavlo Golub

Thank you Pavlo. As I understand it, tablespace is a physical directory. I first created the directories on the disk and then created the tablespaces by stating the respective location for each tablespace.

Peter Neave
Peter Neave
5 years ago

There is an extra " " in the line
CREATE INDEX on lot_of_activity(id) TABLESPACE  fast_ssd;

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    10
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram