CYBERTEC PostgreSQL Logo

Running Postgres in Docker - why and how?

03.2021 / Category: / Tags: |

BY Kaarel Moppel - The big question we hear quite often is, “Can and should we run production Postgres workloads in Docker? Does it work?” The answer in short: yes, it will work... if you really want it to... or if it’s all only fun and play, i.e. for throwaway stuff like testing.

Containers, commonly also just called Docker, have definitely been a thing for quite a few years now. (There are other popular container runtimes out there, and it’s not a proprietary technology per se, but let’s just say Docker to save on typing.) More and more people are “jumping on the container-ship” and want to try out Docker, or have already given this technology a go. However, containers were originally designed more as a vehicle for code; they were initially intended to provide a worry-free “batteries included” deployment experience. The idea is that it “just works” anywhere and is basically immutable. That way, quality can easily be tested and guaranteed across the board.

Those are all perfectly desirable properties indeed for developers...but what if you’re in the business of data and database management? Databases, as we know, are not really immutable - they maintain a state, so that code can stay relatively “dumb” and doesn’t have to “worry” about state. Statelessness enables rapid feature development and deployment, and even push-button scaling - just add more containers!

Running Postgres in Docker

Should I use Postgres with Docker?

If your sensors are halfway functional, you might have picked up on some concerned tones in that last statement, meaning there are some “buts” - as usual. So why not fully embrace this great modern technology and go all in? Especially since I already said it definitely works.

The reason is that there are some aspects you should at least take into account to avoid cold sweats and swearing later on. To summarise: you’ll benefit greatly for your production-grade use cases only if you’re ready to do the following:

a) live fully on a container framework like Kubernetes / OpenShift

b) depend on some additional 3rd party software projects not directly affiliated with the PostgreSQL Global Development Group

c) or maintain either your own Docker images, including some commonly needed extensions, or some scripts to perform common operational tasks like upgrading between major versions.

To reiterate - yes, containers are mostly a great technology, and this type of stuff is interesting and probably would look cool on your CV...but: the origins of container technologies do not stem from persistent use cases. Also, the PostgreSQL project does not really do much for you here besides giving you a quick and convenient way to launch a standard PostgreSQL instance on version X.

A testers’ dream

Not to sound too discouraging - there is definitely at least one perfectly valid use case out there for Docker / containers: it’s perfect for all kinds of testing, especially for integration and smoke testing!

Since we basically implement containers as super light-weight “mini VMs”, you can start and discard them in seconds! That, however, assumes you have already downloaded the image. If not, then the first launch will take a minute or two, depending on how good your internet connection is 🙂

As a matter of fact, I personally usually have all the recent (9.0+) versions of Postgres constantly running on my workstation in the background, via Docker! I don’t of course use all those versions too frequently - however, since they don’t ask for too much attention, and don’t use up too many resources if “idling”, they don’t bother me. Also, they’re always there for me when I need to test out some Postgres statistic fetching queries for our Postgres monitoring tool called pgwatch2. The only annoying thing that could pester you a bit is - if you happen to also run Postgres on the host machine, and want to take a look at a process listing to figure out what it’s doing, (e.g. ps -efH | grep postgres) the “in container” processes show up and somewhat “litter” the picture.

Slonik in a box - a quickstart

OK, so I want to benefit from those light-weight pre-built “all-inclusive” database images that everyone is talking about and launch one - how do I get started? Which images should I use?

As always, you can’t go wrong with the official stuff - and luckily, the PostgreSQL project provides all modern major versions (up to v8.4 by the way, released in 2009!) via the official Docker Hub. You also need to know some “Docker foo”. For a simple test run, you usually want something similar to what you can see in the code below.

NB! As a first step, you need to install the Docker runtime / engine (if it is not already installed). I’ll not be covering that, as it should be a simple process of following the official documentation line by line.

Also note: when launching images, we always need to explicitly expose or “remap” the default Postgres port to a free port of our preference. Ports are the “service interface” for Docker images, over which all communication normally happens. So we actually don’t need to care about how the service is internally implemented!

Note that you don’t have to actually use “trust” authentication, but can also set a password for the default “postgres” superuser via the POSTGRES_PASSWORD env variable.

Once you’ve had enough of Slonik’s services for the time being, just throw away the container and all the stored tables / files etc with the following code:

Couldn’t be any simpler!

NB! Note that I could also explicitly mark the launched container as “temporary” with the ‘--rm’ flag when launching the container, so that any data remains would automatically be destroyed upon stopping.

Peeking inside the container

Now that we have seen how basic container usage works, complete Docker beginners might get curious here - how does it actually function? What is actually running down there inside the container box?

First, we should probably clear up the two concepts that people often initially mix up:

  • A Docker image: images are immutable “batteries (libraries) included” software packages that you can download from some public or private Docker registry or build yourself, that then can be “instantiated”, i.e. launched.
  • A Docker container: once we have launched an image, we’re dealing with a “live clone” that should actually be called a container! And now, its files can be modified, although in theory this freedom should not be overused - or at least not in a direct manner without volumes (see below).

Let’s make sense of this visually:

Other common tasks when working with Docker might be:

* Checking the logs of a specific container, for example, to get more insights into query errors

* Listing the IP address of the image

Note that by default, all Docker containers can speak to each other, since they get assigned to the default subnet of 172.17.0.0/16. If you don’t like that, you can also create custom networks to cordon off some containers, whereby then they can also access each other using the container name!

* Executing custom commands on the container

Note that this should be a rather rare occasion, and is usually only necessary for some troubleshooting purposes. You should try not to install new programs and change the files directly, as this kind of defeats the concept of immutability. Luckily, in the case of official Postgres images you can easily do that, since it runs under “root” and the Debian repositories are also still connected - which a lot of images remove, in order to prevent all sorts of maintenance nightmares.

Here is an example of how to install a 3rd party extension. By default, we only get the “contrib extensions” that are part of the official Postgres project.

* Changing the PostgreSQL configuration

Quite often when doing some application testing, you want to measure how much time the queries really take - i.e. measure things from the DB engine side via the indispensable “pg_stat_statements” extension. You can do it relatively easily, without going “into” the container! Starting from Postgres version 9.5, to be exact...

Don’t forget about the volumes

As stated in the Docker documentation: “Ideally, very little data is written to a container’s writable layer, and you use Docker volumes to write data.”

The thing about containers’ data layer is that it’s not really meant to be changed! Remember, containers should be kind of immutable. The way it works internally is via “copy-on-write”. Then, there’s a bunch of different storage drivers used over different versions of historical Docker runtime versions. Also, there are some differences which spring from different host OS versions. It can get quite complex, and most importantly, slow on the disk access level via the “virtualized” file access layer. It’s best to listen to what the documentation says, and set up volumes for your data to begin with.

Aha, but what are volumes, exactly? They’re directly connected and persistent OS folders where Docker tries to stay out of the way as much as possible. That way, you don’t actually lose out on file system performance and features. The latter is not really guaranteed, though - and can be platform-dependent. Things might look a bit hairy, especially on Windows (as usual), where one nice issue comes to mind. The most important keyword here might be “persistent” - meaning volumes don’t disappear, even when a container is deleted! So they can also be used to “migrate” from one version of the software to another.

How should you use volumes, in practice? There are two ways to use volumes: the implicit and the explicit. The “fine print” by the way, is available here.

Also, note that we actually need to know beforehand what paths should be directly accessed, i.e. “volumized”! How can you find out such paths? Well, you could start from the Docker Hub “postgres” page, or locate the instruction files (the Dockerfile) that are used to build the Postgres images and search for the “VOLUME” keyword. The latter can be found for Postgreshere.

Some drops of tar - big benefits possible, with some drawbacks

To tie up the knots on this posting - if you like containers in general, and also need to run some PostgreSQL services - go ahead! Containers can be made to work pretty well, and for bigger organizations running hundreds of PostgreSQL services, it can actually make life a lot easier and more standardized once everything has been automated. Most of the time, the containers won’t bite you.

But at the same time, you had better be aware of the pitfalls:

  • Docker images and the whole concept of containers are actually optimized for the lightning-fast and slim startup experience so that by default even the data is not properly separated into a separate persistence unit! Which for databases could end in a catastrophe, if appropriate measures (volumes) are not put into place.
  • Using containers won’t give you any automatic and magical high-availability capabilities. This usually comes provided by the container framework - either via some simple “stateful sets”, or more advanced “operators”, or via cleverly bundled database and “bot” images which rely on a central highly-available consensus database.
  • Life will be relatively easy only when you go “all in” on some container management framework like Kubernetes, and additionally select some “operator” software (Zalando and Crunchy Postgres operators are the most popular ones, I believe) to take care of the nitty-gritty.
  • Batteries are not included: you pretty much only get persistence and a running Postgres major version! For example, a very common task - major version upgrades - is surprisingly out of scope for the default Postgres images! It is also out of scope for some Kubernetes operators - this means you need to be ready to get your hands dirty and create some custom intermediate images, or find some 3rd party ones like Spilo.

TLDR;

Don’t want to sound like a luddite again, but before going “all in” on containers you should acknowledge two things. One, that there are major benefits to production-level database containers only if you’re using some container automation platform like Kubernetes. Two, the benefits will come only if you are willing to make yourself somewhat dependent on some 3rd party software vendors. 3rd party vendors are not out to simplify the life of smaller shops, but rather cater to bigger “K8s for the win” organizations. Often, they encode that way of thinking into the frameworks, which might not align well with your way of doing things.

Also, not all aspects of the typical database lifecycle are well covered. My recommendation is: if it currently works for you “as is”, and you’re not 100% migrating to some container-orchestration framework for all other parts of your software stack, be aware that you’re only winning in the ease of the initial deployment and typically also in automatic high-availability (which is great of course!) - but not necessarily in all aspects of the whole lifecycle (fast major version upgrades, backups the way you like them, access control, etc).

On the other hand - if you feel comfortable with some container framework like Kubernetes and/or can foresee that you’ll be running oodles of database instances - give it a go! -- after you research possible problem points, of course.

On the positive side - since I am in communication with a pretty wide crowd of DBA’s, I can say that many bigger organizations do not want to look back at the traditional way of running databases after learning to trust containers.

Anyway, it went a bit long - thanks for reading, and please do let me know in the comments section if you have some thoughts on the topic!

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

One response to “Running Postgres in Docker - why and how?”

  1. Great article. It gave me a few insights about the topic. In my company we start an application development and in the beginning we had this discussion, if we should put the database on a container or not. My colleague, which is an IT specialist, was against the idea because of the points mentioned by you (imutability etc.) but the IoT service provider company supporting us convinced my team to go for it. We are using volumes as you suggested and postgres. From development point of view it worked perfectly. We will launch the application soon and in a few weeks I'll be able to give a feedback regarding the production environment. Again, thanks for the great post

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram