As of version 15, PostgreSQL offers you the ability to write the server log in JSON format. This feature has been desired for a long time and has finally made it to PostgreSQL core. In this post we will discuss how JSON logs can be configured and what this brand new feature does for users.

Configuring JSON logs in PostgreSQL

If you want to ensure that your database server produces log in JSON format, you first have to make some adjustments to postgresql.conf or user ALTER SYSTEM to make changes to postgresql.auto.conf.

Basically, a handful of settings are needed to teach PostgreSQL what we want:

#------------------------------------------------------------------------------
# REPORTING AND LOGGING
#------------------------------------------------------------------------------

# - Where to Log -

log_destination = 'jsonlog' # Valid values are combinations of
# stderr, csvlog, jsonlog, syslog, and
# eventlog, depending on platform.
# csvlog and jsonlog require
# logging_collector to be on.

# This is used when logging to stderr:
logging_collector = on # Enable capturing of stderr, jsonlog,
# and csvlog into log files. Required
# to be on for csvlogs and jsonlogs.
# (change requires restart)

# These are only used if logging_collector is on:
log_directory = 'log' # directory where log files are written,
# can be absolute or relative to PGDATA
log_filename = 'postgresql-%a.log' # log file name pattern,
# can include strftime() escapes

First of all, you need to adjust the log_destination to “jsonlog”. The default setting here is stderr which simply writes logs in the standard text format which has been in use for years. In most cases, we want the log to be written to a file, so we turn the logging collector on. Voila, we are done. To enable log rotation, I changed the log_filename to postgresql-%a.log which ensures that we produce a logfile for every day of the week.

Once this is done, a simple restart will enable those settings. Actually, jsonlog does not require a restart – however, enabling the logging collector does. Usually this does not cause any additional downtime because those settings are in most cases set when the server is deployed. Turning the logging collector on later is the exception rather than the rule.

Checking the JSON logs

The server is now going to produce the desired files. What you’ll see are actually two files:

[hs@hansmacbook log]$ ls -l
total 16
-rw------- 1 hs staff 2039 Nov 4 08:50 postgresql-Fri.json
-rw------- 1 hs staff 169 Nov 4 08:50 postgresql-Fri.log

Why is that the case? Let’s dive in and figure out what’s going on:

[hs@hansmacbook log]$ cat postgresql-Fri.log 
2022-11-04 08:50:59.000 CET [32183] LOG: ending log output to stderr
2022-11-04 08:50:59.000 CET [32183] HINT: Future log output will go to log destination "jsonlog".

The .log file is created before the JSON machinery starts. You’ll simply find two lines in there – nothing more. The bulk of the log will be sent to the .json file, which is the expected behavior:

[hs@hansmacbook log]$ head postgresql-Fri.json 
{"timestamp":"2022-11-04 08:50:59.000 CET","pid":32183,"session_id":"6364c462.7db7","line_num":1,"session_start":"2022-11-04 08:50:58 CET","txid":0,"error_severity":"LOG","message":"ending log output to stderr","hint":"Future log output will go to log destination \"jsonlog\".","backend_type":"postmaster","query_id":0}
{"timestamp":"2022-11-04 08:50:59.000 CET","pid":32183,"session_id":"6364c462.7db7","line_num":2,"session_start":"2022-11-04 08:50:58 CET","txid":0,"error_severity":"LOG","message":"starting PostgreSQL 15.0 on x86_64-apple-darwin21.6.0, compiled by Apple clang version 13.1.6 (clang-1316.0.21.2.5), 64-bit","backend_type":"postmaster","query_id":0}
...

Use a tool to read JSON logs

Reading a tightly packed file containing millions of JSON documents is not really user-friendly, so I recommend using a tool such as “jq” to make this stream more readable and easier to process:

[hs@hansmacbook log]$ tail -f postgresql-Fri.json | jq
{
"timestamp": "2022-11-04 08:50:59.000 CET",
"pid": 32183,
"session_id": "6364c462.7db7",
"line_num": 1,
"session_start": "2022-11-04 08:50:58 CET",
"txid": 0,
"error_severity": "LOG",
"message": "ending log output to stderr",
"hint": "Future log output will go to log destination \"jsonlog\".",
"backend_type": "postmaster",
"query_id": 0
}
{
"timestamp": "2022-11-04 08:50:59.000 CET",
"pid": 32183,
"session_id": "6364c462.7db7",
"line_num": 2,
"session_start": "2022-11-04 08:50:58 CET",
"txid": 0,
"error_severity": "LOG",
"message": "starting PostgreSQL 15.0 on x86_64-apple-darwin21.6.0, compiled by Apple clang version 13.1.6 (clang-1316.0.21.2.5), 64-bit",
"backend_type": "postmaster",
"query_id": 0
}
{
"timestamp": "2022-11-04 08:50:59.006 CET",
"pid": 32183,
"session_id": "6364c462.7db7",
"line_num": 3,
"session_start": "2022-11-04 08:50:58 CET",
"txid": 0,
"error_severity": "LOG",
"message": "listening on IPv6 address \"::1\", port 5432",
"backend_type": "postmaster",
"query_id": 0
}
...

Check system messages

What’s really important is that a line of JSON does not always contain the same number of fields. System messages, etc will show different information than query-related stuff. This is done for efficiency reasons, as the following log snippet reveals:

test=# SELECT 1 / 0;
ERROR: division by zero
test=# \q
[hs@hansmacbook log]$ tail -n2 postgresql-Fri.json | jq
{
"timestamp": "2022-11-04 08:55:59.017 CET",
"pid": 32185,
"session_id": "6364c463.7db9",
"line_num": 2,
"session_start": "2022-11-04 08:50:59 CET",
"txid": 0,
"error_severity": "LOG",
"message": "checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.005 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB",
"backend_type": "checkpointer",
"query_id": 0
}
{
"timestamp": "2022-11-04 09:05:53.346 CET",
"user": "hs",
"dbname": "test",
"pid": 32266,
"remote_host": "[local]",
"session_id": "6364c7dd.7e0a",
"line_num": 1,
"ps": "SELECT",
"session_start": "2022-11-04 09:05:49 CET",
"vxid": "3/60",
"txid": 0,
"error_severity": "ERROR",
"state_code": "22012",
"message": "division by zero",
"statement": "SELECT 1 / 0;",
"application_name": "psql",
"backend_type": "client backend",
"query_id": 0
}

The second message contains information about the database (which is not relevant to the checkpoint message before), the user and a lot more. You need to be aware of this when processing the JSON document for later use.

JSON logs: Efficiency matters

Keep in mind that producing log in JSON format is a lot more verbose than, say, the standard text log. If you happen to log system events and errors only, this is not that much of an issue. However, it is a major problem if you happen to log every single query. You need to be aware of the fact that producing millions and millions of lines does impact performance and space consumption.

Finally …

PostgreSQL offers many additional features which are particularly useful if you are dealing with a large data set. One of those features is called “GROUPING SETS”. I’ve written a blog to help you understand how cool this feature really is. Check out my performance-related presentation about GROUP BY CUBE on YouTube which will give you a quick insider view.

If you want to start building PostgreSQL applications more quickly, I also recommend checking out our CYPEX video tutorials. The tutorials will help you to build powerful replacements for Oracle APEX and Oracle Forms.