caged elephant
© Laurenz Albe 2018

 

In a recent wrestling match with the Linux “out-of-memory killer” for a Cybertec customer I got acquainted with Linux control groups (“cgroups”), and I want to give you a short introduction how they can be used with PostgreSQL and discuss their usefulness.

Warning: This was done on my RedHat Fedora 27 system running Linux 4.16.5 with cgroups v1 managed by systemd version 234. Both cgroups and systemd‘s handling of them seem to be undergoing changes, so your mileage may vary considerably. Still, it should be a useful starting point if you want to explore cgroups.

What are Linux cgroups?

From the cgroups manual page:

Control cgroups, usually referred to as cgroups, are a Linux kernel feature which allow processes to be organized into hierarchical groups whose usage of various types of resources can then be limited and monitored.

cgroups are managed with special commands that start with “cg”, but can also be managed through a special cgroups file system and systemd.

Now a running PostgreSQL cluster is a group of processes, so that’s a perfect fit.

There are several subsystems defined (also called “controllers” in cgroups terminology). Of these, the following are interesting for PostgreSQL:

  • memory: useful for limiting the total memory usage
  • blkio: useful for limiting the I/O throughput
  • cpu: useful to define upper and lower limits to the CPU time available for the processes
  • cpuset: useful for binding the processes to a subset of the available CPU cores

Configuring cgroups

During system startup, cgroups are created as defined in the /etc/cgconfig.conf configuration file.

Let’s create a cgroup to build a cage for a PostgreSQL cluster:

group db_cage {
    # user and group "postgres" can manage these cgroups
    perm {
        task {
            uid = postgres;
            gid = postgres;
            fperm = 774;
        }
        admin {
            uid = postgres;
            gid = postgres;
            dperm = 775;
            fperm = 774;
        }
    }

    # limit memory to 1 GB and disable swap
    memory {
        memory.limit_in_bytes = 1G;
        memory.memsw.limit_in_bytes = 1G;
    }

    # limit read and write I/O to 10MB/s each on device 8:0
    blkio {
        blkio.throttle.read_bps_device = "8:0 10485760";
        blkio.throttle.write_bps_device = "8:0 10485760";
    }

    # limit CPU time to 0.25 seconds out of each second
    cpu {
        cpu.cfs_period_us = 1000000;
        cpu.cfs_quota_us = 250000;
    }

    # only CPUs 0-3 and memory node 0 can be used
    cpuset {
        cpuset.cpus = 0-3;
        cpuset.mems = 0;
    }
}

To activate it, run the following as root:

# /usr/sbin/cgconfigparser -l /etc/cgconfig.conf -s 1664

To have that done automatically at server start, I tell systemd to enable the cgconfig service:

# systemctl enable cgconfig
# systemctl start cgconfig

Starting PostgreSQL in a cgroup

To start PostgreSQL in the cgroups we defined above, use the cgexec executable (you may have to install an operating system package called libcgroup or libcgroup-tools for that):

$ cgexec -g cpu,memory,blkio:db_cage \
   /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data start

We can confirm that PostgreSQL is running in the correct cgroup:

$ head -1 /var/lib/pgsql/10/data/postmaster.pid 
16284

$ cat /proc/16284/cgroup | egrep '\b(cpu|blkio|memory)\b'
10:cpu,cpuacct:/db_cage
9:blkio:/db_cage
4:memory:/db_cage

To change a running process to a cgroup, you can use cgclassify (but then you have to change all running PostgreSQL processes).

Using cgroups with systemd

systemd provides a simpler interface to Linux cgroups, so you don’t have to do any of the above. systemd can create cgroups “on the fly” for the services it starts.

If your PostgreSQL service is called postgresql-10, simply create a file /etc/systemd/system/postgresql-10.service like this:

# include the original service file rather than editing it
# so that changes don't get lost during an upgrade
.include /usr/lib/systemd/system/postgresql-10.service

[Service]
# limit memory to 1GB
# sets "memory.limit_in_bytes"
MemoryMax=1G
# limit memory + swap space to 1GB
# this should set "memory.memsw.limit_in_bytes" but it only
# works with cgroups v2 ...
# MemorySwapMax=1G

# limit read I/O on block device 8:0 to 10MB per second
# sets "blkio.throttle.read_bps_device"
IOReadBandwidthMax=/dev/block/8:0 10M
# limit write I/O on block device 8:0 to 10MB per second
# sets "blkio.throttle.write_bps_device"
IOWriteBandwidthMax=/dev/block/8:0 10M

# limit CPU time to a quarter of the available
# sets "cpu.cfs_quota_us"
CPUQuota=25%

# there are no settings to control "cpuset" cgroups

Now you have to tell systemd that you changed the configuration and restart the service:

# systemctl daemon-reload
# systemctl restart postgresql-10

As you see, not all cgroup settings ar available with systemd. As a workaround, you can define cgroups in /etc/cgconfig.conf and use cgexec to start the service.

How useful are cgroups for PostgreSQL?

I would say that it depends on the subsystem.

memory

At first glance, it sounds interesting to limit memory usage with cgroups. But there are several drawbacks:

  • If PostgreSQL is allowed to use swap space, it will start swapping when the memory quota is exceeded.
  • If PostgreSQL is not allowed to use swap space, the Linux OOM killer will kill PostgreSQL when the quota is exceeded (alternatively, you can configure the cgroup so that the process is paused until memory is freed, but this might never happen).
  • The memory quota also limits the amount of memory available for the file system cache.

None of this is very appealing — there is no option to make malloc fail so that PostgreSQL can handle the problem.

I think that it is better to use the traditional way of limiting PostgreSQL’s memory footprint by setting shared_buffers, work_mem and max_connections so that PostgreSQL won’t use too much memory.

That also has the advantage that all PostgreSQL clusters on the machine can share the file system cache, so that clusters that need it can get more of that resource, while no cluster can become completely memory starved (everybody is guaranteed shared_buffers).

blkio

I think that cgroups are a very useful way of limiting I/O bandwidth for PostgreSQL.

The only drawback is maybe that PostgreSQL cannot use more than its allotted quota even if the I/O system is idle.

cpu

cgroups are also a good way of limiting CPU usage by a PostgreSQL cluster.

Again, it would be nice if PostgreSQL were allowed to exceed its quota if the CPUs are idle.

cpuset

This is only useful on big machines with a NUMA architecture. On such machines, binding PostgreSQL to the CPUs and memory of one NUMA node will make sure that all memory access is local to that node and consequently fast.

You can thus partition your NUMA machine between several PostgreSQL clusters.