UPDATED 14.02.2023: pg_timetable is a brand-new job scheduler for PostgreSQL implemented completely from scratch (in Go). Traditional job scheduling is quite limited and does not cover all we needed in real life. Therefore, we thought about job execution in more detail and came up with a new concept that has not been seen in the PostgreSQL world before. pg_timetable does not simply execute some simple SQL statements periodically. It has been designed to execute entire sequences of operations at a time while offering some other important features we will discuss in this post.
Installing and running pg_timetable
You can run pg_timetable in 2 ways:
- Standalone application
Both methods are described on our Github page so it should be easy for most people to run pg_timetable and to give it a try.
For the sake of simplicity we assume that you have installed our new PostgreSQL job scheduler and everything is up and running. So let’s take a look and see how pg_timetable can be configured.
Base tasks and chains
The core idea is to allow users to execute more complicated sequences of operations. Now what does that mean? Suppose you want to …
- Start a transaction
- Write a log entry to the database
- Download a file
- Import the file
- Run aggregations
- Commit the transaction
A sequence like that is quite common. Let us take a look at those steps in pg_timetable terminology: Each step on the way is a “base task”. There are various types of base tasks:
- Built-in tasks (such as “send email”, start transaction, etc.)
- SQL tasks (some database instruction)
- Program tasks (some external executables)
Those tasks can be combined and turned into chains. A base task can take parameters which are added at runtime. What does that mean? Suppose you want to send an email once a day. Who is going to receive this email? This kind of information can be passed to chains / base tasks scheduled for execution.
Configuring pg_timetable: An example
Let us take a look at an easy pg_timetable configuration examples:
-- Run public.my_func() at 00:05 every day in August: SELECT timetable.add_job('execute-func', '5 0 * 8 *', 'SELECT public.my_func()'); -- Run VACUUM at minute 23 past every 2nd hour from 0 through 20 every day: SELECT timetable.add_job('run-vacuum', '23 0-20/2 * * *', 'VACUUM'); -- Refresh materialized view every 2 hours: SELECT timetable.add_job('refresh-matview', '@every 2 hours', 'REFRESH MATERIALIZED VIEW public.mat_view'); -- Clear log table after pg_timetable restart: SELECT timetable.add_job('clear-log', '@reboot', 'TRUNCATE public.log'); -- Reindex at midnight on Sundays with reindexdb utility: -- using default database under default user (no command line arguments) SELECT timetable.add_job('reindex-job', '0 0 * * 7', 'reindexdb', job_kind := 'PROGRAM'); -- specifying target database and tables, and be verbose SELECT timetable.add_job('reindex-job', '0 0 * * 7', 'reindexdb', '["--table=foo", "--dbname=postgres", "--verbose"]'::jsonb, 'PROGRAM'); -- passing password using environment variable through bash shell SELECT timetable.add_job(job_name := 'reindex-job', job_schedule := '0 0 * * 7', job_command := 'bash', job_max_instances := 1, job_exclusive := true, job_parameters := '["-c", "PGPASSWORD=5m3R7K4754p4m"] reindexdb -U postgres -h 192.168.0.221 -v'::jsonb, job_kind := 'PROGRAM', job_self_destruct := false, job_ignore_errors := true, job_live := true);
These are the simplest one-task chain examples, so-called jobs. A job is a chain with only one task according to our terminology.
There are a couple of interesting things here:
- job_max_instances tells us if this job must not run concurrently. Set the vaslue to one to specify there is only single running job instance allowed at a time
- job_self_destruct = false means that the chain will survive its own execution. If you set this to true, a chain is only executed once (in case it succeeds).
- job_exclusive means that other chains doing something should wait until the end of this chain execution.
Detailed description of the add_job() function is available in the manual.
Using pg_timetable concurrency protection
Concurrency protection is an important aspect. Suppose you want to backup a database daily. What is going to happen if the backup takes 60 hours to complete? At some point, your server is going to die because jobs pile up. Or maybe backup files will start to overwrite each other. In short: If you want to make sure that only one backup at a time is allowed, pg_timetable concurrency protection is what you want. The
max_instances = 1 setting is the easiest way to do that. There is no more need for pid files, no need for anything else. pg_timetable does it all for you.
Utilizing self-destructive chains
More often than not, it happens that a job should only be done once. The idea is simple: Suppose somebody gets a 7-day free trial. After 7 days, you want to shut down the person’s account. Obviously, shutting down somebody’s account is not done daily – it happens once, and if the process was successful, that was basically it.
Self-destructive chains are especially useful for asynchronous execution. What do I mean by that? Suppose you have a website and you press a button to launch a job that runs for 2 hours. Clearly, nobody is going to sit in front of the browser window and wait. What you do instead is to tell pg_timetable to execute stuff as soon as possible and kill the job as soon as it is ended successfully. Asynchronous execution is the prime use case.
In case you’re wondering how many simultaneous jobs can pg_timetable run in one session, we have two special examples for you: one simulates many parallel chains and another shows how to handle many tasks within a chain.
What is even better, the later shows you how to check the result code and output of a previous task. If the last task failed, that is possible only if
ignore_error boolean = true is set for that task. Otherwise, a scheduler will stop the chain. This sample shows how to calculate failed, successful, and the total number of tasks executed. Based on these values, we can calculate the success ratio:
... [INFO] [chain:1] [task:497] [txid:499374] Starting task [INFO] [chain:1] [task:497] [txid:499374] Task executed successfully [INFO] [chain:1] [task:498] [txid:499374] Starting task [ERROR] [chain:1] [task:498] [sql:SELECT 1.0 / round(random())::int4;] [args:] [err:ERROR: division by zero (SQLSTATE 22012)] [pid:9616] [time:505.3µs] [txid:499374] Query [ERROR] [chain:1] [task:498] [error:ERROR: division by zero (SQLSTATE 22012)] [txid:499374] Task execution failed [INFO] [chain:1] [task:498] [txid:499374] Ignoring task failure [INFO] [chain:1] [task:499] [txid:499374] Starting task [INFO] [chain:1] [task:499] [txid:499374] Task executed successfully [INFO] [chain:1] [task:500] [txid:499374] Starting task [ERROR] [chain:1] [task:500] [sql:SELECT 1.0 / round(random())::int4;] [args:] [err:ERROR: division by zero (SQLSTATE 22012)] [pid:9616] [time:2.1592ms] [txid:499374] Query [ERROR] [chain:1] [task:500] [error:ERROR: division by zero (SQLSTATE 22012)] [txid:499374] Task execution failed [INFO] [chain:1] [task:500] [txid:499374] Ignoring task failure [INFO] [chain:1] [task:501] [txid:499374] Starting task [INFO] [chain:1] [task:501] [txid:499374] Task executed successfully [INFO] [chain:1] [task:1] [txid:499374] Starting task [INFO] [notice:Tasks executed:500; succeeded: 274; failed: 226; ratio: 54.8000000000000000] [severity:NOTICE] Notice received [INFO] [chain:1] [task:1] [txid:499374] Task executed successfully [INFO] [chain:1] [txid:499374] Chain executed successfully ...
Making use of cron-style and interval configuration
To make configuration easier, we have introduced a cron-style configuration interface:
-- CRON-Style -- * * * * * command to execute -- ┬ ┬ ┬ ┬ ┬ -- │ │ │ │ │ -- │ │ │ │ └──── day of the week (0 - 7) (Sunday to Saturday)(0 and 7 is Sunday); -- │ │ │ └────── month (1 - 12) -- │ │ └──────── day of the month (1 - 31) -- │ └────────── hour (0 - 23) -- └──────────── minute (0 - 59)
But what if you want to have enhanced control over jobs? It’s not that unusual one cannot foresee the execution time of some complicated operations, like
REFRESH MATERIALIZED VIEW,
CREATE INDEX CONCURRENTLY, etc. It’s wise to run the next iteration after the some period, let say in an hour, or a day. In this case pg_timetable provides special interval format:
'@after 2 hours'will start the next chain iteration in two hours after the finish of the previous run;
'@every 2 hours'will spin a new chain run every two hours no matter if and when the previous run is finished;
- and a special
'@reboot'value will run the chain at the very beginning of a pg_timetable session. Pay attention: not the server reboot! Rotate logs, install updates, check system status are the good examples of things you might want to do during start-up.
If you like pg_timetable, we are looking forward to receive you feedback, or maybe even feature requests, bug reports and contributions. We also want to point out that commercial support for pg_timetable is available internationally.
If you want to read something else right now, we can recommend our post about pg_permissions which is available here.