BY Kaarel Moppel - The big question we hear quite often is, “Can and should we run production Postgres workloads in Docker? Does it work?” The answer in short: yes, it will work... if you really want it to... or if it’s all only fun and play, i.e. for throwaway stuff like testing.
Table of Contents
Containers, commonly also just called Docker, have definitely been a thing for quite a few years now. (There are other popular container runtimes out there, and it’s not a proprietary technology per se, but let’s just say Docker to save on typing.) More and more people are “jumping on the container-ship” and want to try out Docker, or have already given this technology a go. However, containers were originally designed more as a vehicle for code; they were initially intended to provide a worry-free “batteries included” deployment experience. The idea is that it “just works” anywhere and is basically immutable. That way, quality can easily be tested and guaranteed across the board.
Those are all perfectly desirable properties indeed for developers...but what if you’re in the business of data and database management? Databases, as we know, are not really immutable - they maintain a state, so that code can stay relatively “dumb” and doesn’t have to “worry” about state. Statelessness enables rapid feature development and deployment, and even push-button scaling - just add more containers!
If your sensors are halfway functional, you might have picked up on some concerned tones in that last statement, meaning there are some “buts” - as usual. So why not fully embrace this great modern technology and go all in? Especially since I already said it definitely works.
The reason is that there are some aspects you should at least take into account to avoid cold sweats and swearing later on. To summarise: you’ll benefit greatly for your production-grade use cases only if you’re ready to do the following:
a) live fully on a container framework like Kubernetes / OpenShift
b) depend on some additional 3rd party software projects not directly affiliated with the PostgreSQL Global Development Group
c) or maintain either your own Docker images, including some commonly needed extensions, or some scripts to perform common operational tasks like upgrading between major versions.
To reiterate - yes, containers are mostly a great technology, and this type of stuff is interesting and probably would look cool on your CV...but: the origins of container technologies do not stem from persistent use cases. Also, the PostgreSQL project does not really do much for you here besides giving you a quick and convenient way to launch a standard PostgreSQL instance on version X.
Not to sound too discouraging - there is definitely at least one perfectly valid use case out there for Docker / containers: it’s perfect for all kinds of testing, especially for integration and smoke testing!
Since we basically implement containers as super light-weight “mini VMs”, you can start and discard them in seconds! That, however, assumes you have already downloaded the image. If not, then the first launch will take a minute or two, depending on how good your internet connection is 🙂
As a matter of fact, I personally usually have all the recent (9.0+) versions of Postgres constantly running on my workstation in the background, via Docker! I don’t of course use all those versions too frequently - however, since they don’t ask for too much attention, and don’t use up too many resources if “idling”, they don’t bother me. Also, they’re always there for me when I need to test out some Postgres statistic fetching queries for our Postgres monitoring tool called pgwatch2. The only annoying thing that could pester you a bit is - if you happen to also run Postgres on the host machine, and want to take a look at a process listing to figure out what it’s doing, (e.g. ps -efH | grep postgres
) the “in container” processes show up and somewhat “litter” the picture.
OK, so I want to benefit from those light-weight pre-built “all-inclusive” database images that everyone is talking about and launch one - how do I get started? Which images should I use?
As always, you can’t go wrong with the official stuff - and luckily, the PostgreSQL project provides all modern major versions (up to v8.4 by the way, released in 2009!) via the official Docker Hub. You also need to know some “Docker foo”. For a simple test run, you usually want something similar to what you can see in the code below.
NB! As a first step, you need to install the Docker runtime / engine (if it is not already installed). I’ll not be covering that, as it should be a simple process of following the official documentation line by line.
Also note: when launching images, we always need to explicitly expose or “remap” the default Postgres port to a free port of our preference. Ports are the “service interface” for Docker images, over which all communication normally happens. So we actually don’t need to care about how the service is internally implemented!
1 2 3 4 5 6 7 8 9 |
# Note that the first run could take a few minutes due to the image being downloaded… docker run -d --name pg13 -p 5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust postgres:13 # Connect to the container that’s been started and display the exact server version psql -U postgres -h localhost -p 5432 -c 'show server_version' postgres server_version ──────────────────────────────── 13.1 (Debian 13.1-1.pgdg100+1) (1 row) |
Note that you don’t have to actually use “trust” authentication, but can also set a password for the default “postgres” superuser via the POSTGRES_PASSWORD env variable.
Once you’ve had enough of Slonik’s services for the time being, just throw away the container and all the stored tables / files etc with the following code:
1 2 3 4 |
# Let’s stop the container / instance docker stop pg13 # And let’s also throw away any data generated and stored by our instance docker rm pg13 |
Couldn’t be any simpler!
NB! Note that I could also explicitly mark the launched container as “temporary” with the ‘--rm’ flag when launching the container, so that any data remains would automatically be destroyed upon stopping.
Now that we have seen how basic container usage works, complete Docker beginners might get curious here - how does it actually function? What is actually running down there inside the container box?
First, we should probably clear up the two concepts that people often initially mix up:
Let’s make sense of this visually:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# Let’s take a look at available Postgres images on my workstation # that can be used to start a database service (container) in the snappiest way possible docker images | grep ^postgres | sort -k2 -n postgres 9.0 cd2eca8588fb 5 years ago 267MB postgres 9.1 3a9dca7b3f69 4 years ago 261MB postgres 9.2 18cdbca56093 3 years ago 261MB postgres 9.4 ed5a45034282 12 months ago 251MB postgres 9.5 693ab34b0689 2 months ago 197MB postgres 9.6 ebb1698de735 6 months ago 200MB postgres 10 3cfd168e7b61 3 months ago 200MB postgres 11.5 5f1485c70c9a 16 months ago 293MB postgres 11 e07f0c129d9a 3 months ago 282MB postgres 12 386fd8c60839 2 months ago 314MB postgres 13 407cece1abff 14 hours ago 314MB # List all running containers docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 042edf790362 postgres:13 'docker-entrypoint.s…' 11 hours ago Up 11 hours 0.0.0.0:5432->5432/tcp pg13 |
Other common tasks when working with Docker might be:
* Checking the logs of a specific container, for example, to get more insights into query errors
1 2 3 4 |
# Get all log entries since initial launch of the instance docker logs pg13 # “Tail” the logs limiting the initial output to last 10 minutes docker logs --since '10m' --follow pg13 |
* Listing the IP address of the image
Note that by default, all Docker containers can speak to each other, since they get assigned to the default subnet of 172.17.0.0/16. If you don’t like that, you can also create custom networks to cordon off some containers, whereby then they can also access each other using the container name!
1 2 3 4 5 6 |
# Simple ‘exec’ into container approach docker exec -it pg13 hostname -I 172.17.0.2 # A more sophisticated way via the “docker inspect” command docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' pg13 |
* Executing custom commands on the container
Note that this should be a rather rare occasion, and is usually only necessary for some troubleshooting purposes. You should try not to install new programs and change the files directly, as this kind of defeats the concept of immutability. Luckily, in the case of official Postgres images you can easily do that, since it runs under “root” and the Debian repositories are also still connected - which a lot of images remove, in order to prevent all sorts of maintenance nightmares.
Here is an example of how to install a 3rd party extension. By default, we only get the “contrib extensions” that are part of the official Postgres project.
1 2 3 4 5 6 7 8 |
docker exec -it pg13 /bin/bash # Now we’re inside the container! # Refresh the available packages listing apt update # Let’s install the extension that provides some Oracle compatibility functions... apt install postgresql-13-orafce # Let’s exit the container (can also be done with CTRL+D) exit |
* Changing the PostgreSQL configuration
Quite often when doing some application testing, you want to measure how much time the queries really take - i.e. measure things from the DB engine side via the indispensable “pg_stat_statements” extension. You can do it relatively easily, without going “into” the container! Starting from Postgres version 9.5, to be exact...
1 2 3 4 5 6 7 8 9 10 11 |
# Connect with our “Dockerized” Postgres instance psql -h localhost -U postgres postgres=# ALTER SYSTEM SET shared_preload_libraries TO pg_stat_statements; ALTER SYSTEM postgres=# ALTER SYSTEM SET track_io_timing TO on; ALTER SYSTEM # Exit psql via typing “exit” or pressing CTRL+D # and restart the container docker restart pg13 |
As stated in the Docker documentation: “Ideally, very little data is written to a container’s writable layer, and you use Docker volumes to write data.”
The thing about containers’ data layer is that it’s not really meant to be changed! Remember, containers should be kind of immutable. The way it works internally is via “copy-on-write”. Then, there’s a bunch of different storage drivers used over different versions of historical Docker runtime versions. Also, there are some differences which spring from different host OS versions. It can get quite complex, and most importantly, slow on the disk access level via the “virtualized” file access layer. It’s best to listen to what the documentation says, and set up volumes for your data to begin with.
Aha, but what are volumes, exactly? They’re directly connected and persistent OS folders where Docker tries to stay out of the way as much as possible. That way, you don’t actually lose out on file system performance and features. The latter is not really guaranteed, though - and can be platform-dependent. Things might look a bit hairy, especially on Windows (as usual), where one nice issue comes to mind. The most important keyword here might be “persistent” - meaning volumes don’t disappear, even when a container is deleted! So they can also be used to “migrate” from one version of the software to another.
How should you use volumes, in practice? There are two ways to use volumes: the implicit and the explicit. The “fine print” by the way, is available here.
Also, note that we actually need to know beforehand what paths should be directly accessed, i.e. “volumized”! How can you find out such paths? Well, you could start from the Docker Hub “postgres” page, or locate the instruction files (the Dockerfile) that are used to build the Postgres images and search for the “VOLUME” keyword. The latter can be found for Postgreshere.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# Implicit volumes: Docker will automatically create the left side folder if it is not already there docker run -d --name pg13 -p5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust -v /mydatamount/pg-persistent-data:/var/lib/postgresql/data postgres:13 # Explicit volumes: need to be pre-initialized via Docker docker volume create pg13-data docker run -d --name pg13 -p5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust -v pg13-data:/var/lib/postgresql/data postgres:13 # Let’s inspect where our persistent data actually “lives” docker volume inspect pg13-data # To drop the volume later if the container is not needed anymore use the following command docker volume rm pg13-data |
To tie up the knots on this posting - if you like containers in general, and also need to run some PostgreSQL services - go ahead! Containers can be made to work pretty well, and for bigger organizations running hundreds of PostgreSQL services, it can actually make life a lot easier and more standardized once everything has been automated. Most of the time, the containers won’t bite you.
But at the same time, you had better be aware of the pitfalls:
Don’t want to sound like a luddite again, but before going “all in” on containers you should acknowledge two things. One, that there are major benefits to production-level database containers only if you’re using some container automation platform like Kubernetes. Two, the benefits will come only if you are willing to make yourself somewhat dependent on some 3rd party software vendors. 3rd party vendors are not out to simplify the life of smaller shops, but rather cater to bigger “K8s for the win” organizations. Often, they encode that way of thinking into the frameworks, which might not align well with your way of doing things.
Also, not all aspects of the typical database lifecycle are well covered. My recommendation is: if it currently works for you “as is”, and you’re not 100% migrating to some container-orchestration framework for all other parts of your software stack, be aware that you’re only winning in the ease of the initial deployment and typically also in automatic high-availability (which is great of course!) - but not necessarily in all aspects of the whole lifecycle (fast major version upgrades, backups the way you like them, access control, etc).
On the other hand - if you feel comfortable with some container framework like Kubernetes and/or can foresee that you’ll be running oodles of database instances - give it a go! -- after you research possible problem points, of course.
On the positive side - since I am in communication with a pretty wide crowd of DBA’s, I can say that many bigger organizations do not want to look back at the traditional way of running databases after learning to trust containers.
Anyway, it went a bit long - thanks for reading, and please do let me know in the comments section if you have some thoughts on the topic!
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Great article. It gave me a few insights about the topic. In my company we start an application development and in the beginning we had this discussion, if we should put the database on a container or not. My colleague, which is an IT specialist, was against the idea because of the points mentioned by you (imutability etc.) but the IoT service provider company supporting us convinced my team to go for it. We are using volumes as you suggested and postgres. From development point of view it worked perfectly. We will launch the application soon and in a few weeks I'll be able to give a feedback regarding the production environment. Again, thanks for the great post