pg_timetable: Advanced PostgreSQL scheduling

01.2020 | Category: Performance | Tags: automation, pg_timetable

01.2020

UPDATED 14.02.2023: pg_timetable is a brand-new job scheduler for PostgreSQL implemented completely from scratch (in Go). Traditional job scheduling is quite limited and does not cover all we needed in real life. Therefore, we thought about job execution in more detail and came up with a new concept that has not been seen in the PostgreSQL world before. pg_timetable does not simply execute some simple SQL statements periodically. It has been designed to execute entire sequences of operations at a time while offering some other important features we will discuss in this post.

Table of Contents

Installing and running pg_timetable

You can run pg_timetable in 2 ways:

Container
Standalone application

Both methods are described on our Github page so it should be easy for most people to run pg_timetable and to give it a try.

For the sake of simplicity we assume that you have installed our new PostgreSQL job scheduler and everything is up and running. So let's take a look and see how pg_timetable can be configured.

Base tasks and chains

The core idea is to allow users to execute more complicated sequences of operations. Now what does that mean? Suppose you want to …

Start a transaction
Write a log entry to the database
Download a file
Import the file
Run aggregations
Commit the transaction

A sequence like that is quite common. Let us take a look at those steps in pg_timetable terminology: Each step on the way is a “base task”. There are various types of base tasks:

Built-in tasks (such as “send email”, start transaction, etc.)
SQL tasks (some database instruction)
Program tasks (some external executables)

Those tasks can be combined and turned into chains. A base task can take parameters which are added at runtime. What does that mean? Suppose you want to send an email once a day. Who is going to receive this email? This kind of information can be passed to chains / base tasks scheduled for execution.

Configuring pg_timetable: An example

Let us take a look at an easy pg_timetable configuration examples:

-- Run public.my_func() at 00:05 every day in August:
SELECT timetable.add_job('execute-func', '5 0 * 8 *', 'SELECT public.my_func()');

-- Run VACUUM at minute 23 past every 2nd hour from 0 through 20 every day:
SELECT timetable.add_job('run-vacuum', '23 0-20/2 * * *', 'VACUUM');

-- Refresh materialized view every 2 hours:
SELECT timetable.add_job('refresh-matview', '@every 2 hours', 
  'REFRESH MATERIALIZED VIEW public.mat_view');

-- Clear log table after pg_timetable restart:
SELECT timetable.add_job('clear-log', '@reboot', 'TRUNCATE public.log');

-- Reindex at midnight on Sundays with reindexdb utility:

--  using default database under default user (no command line arguments)
SELECT timetable.add_job('reindex-job', '0 0 * * 7', 'reindexdb', job_kind := 'PROGRAM');

--  specifying target database and tables, and be verbose
SELECT timetable.add_job('reindex-job', '0 0 * * 7', 'reindexdb',
          '['--table=foo', '--dbname=postgres', '--verbose']'::jsonb, 'PROGRAM');

--  passing password using environment variable through bash shell
SELECT timetable.add_job(job_name := 'reindex-job', job_schedule := '0 0 * * 7', 
    job_command := 'bash', job_max_instances := 1, job_exclusive := true,
    job_parameters := '['-c', 'PGPASSWORD=5m3R7K4754p4m'] reindexdb -U postgres -h 
192.168.0.221 -v'::jsonb,
    job_kind := 'PROGRAM', job_self_destruct := false, 
job_ignore_errors := true, job_live := true);

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

-- Run public.my_func() at 00:05 every day in August:

SELECT timetable.add_job('execute-func', '5 0 * 8 *', 'SELECT public.my_func()');

-- Run VACUUM at minute 23 past every 2nd hour from 0 through 20 every day:

SELECT timetable.add_job('run-vacuum', '23 0-20/2 * * *', 'VACUUM');

-- Refresh materialized view every 2 hours:

SELECT timetable.add_job('refresh-matview', '@every 2 hours',

'REFRESH MATERIALIZED VIEW public.mat_view');

-- Clear log table after pg_timetable restart:

SELECT timetable.add_job('clear-log', '@reboot', 'TRUNCATE public.log');

-- Reindex at midnight on Sundays with reindexdb utility:

-- using default database under default user (no command line arguments)

SELECT timetable.add_job('reindex-job', '0 0 * * 7', 'reindexdb', job_kind := 'PROGRAM');

-- specifying target database and tables, and be verbose

SELECT timetable.add_job('reindex-job', '0 0 * * 7', 'reindexdb',

'['--table=foo', '--dbname=postgres', '--verbose']'::jsonb, 'PROGRAM');

-- passing password using environment variable through bash shell

SELECT timetable.add_job(job_name := 'reindex-job', job_schedule := '0 0 * * 7',

job_command := 'bash', job_max_instances := 1, job_exclusive := true,

job_parameters := '['-c', 'PGPASSWORD=5m3R7K4754p4m'] reindexdb -U postgres -h

192.168.0.221 -v'::jsonb,

job_kind := 'PROGRAM', job_self_destruct := false,

job_ignore_errors := true, job_live := true);

These are the simplest one-task chain examples, so-called jobs. A job is a chain with only one task according to our terminology.

There are a couple of interesting things here:

job_max_instances tells us if this job must not run concurrently. Set the vaslue to one to specify there is only single running job instance allowed at a time
job_self_destruct = false means that the chain will survive its own execution. If you set this to true, a chain is only executed once (in case it succeeds).
job_exclusive means that other chains doing something should wait until the end of this chain execution.

Detailed description of the add_job() function is available in the manual.

If you want to see more complex multi-step chain examples, please, visit our Samples folder in the repo and appropriate Samples section in the manual.

Using pg_timetable concurrency protection

Concurrency protection is an important aspect. Suppose you want to backup a database daily. What is going to happen if the backup takes 60 hours to complete? At some point, your server is going to die because jobs pile up. Or maybe backup files will start to overwrite each other. In short: If you want to make sure that only one backup at a time is allowed, pg_timetable concurrency protection is what you want. The max_instances = 1 setting is the easiest way to do that. There is no more need for pid files, no need for anything else. pg_timetable does it all for you.

Utilizing self-destructive chains

More often than not, it happens that a job should only be done once. The idea is simple: Suppose somebody gets a 7-day free trial. After 7 days, you want to shut down the person's account. Obviously, shutting down somebody's account is not done daily - it happens once, and if the process was successful, that was basically it.
Self-destructive chains are especially useful for asynchronous execution. What do I mean by that? Suppose you have a website and you press a button to launch a job that runs for 2 hours. Clearly, nobody is going to sit in front of the browser window and wait. What you do instead is to tell pg_timetable to execute stuff as soon as possible and kill the job as soon as it is ended successfully. Asynchronous execution is the prime use case.

High load

In case you're wondering how many simultaneous jobs can pg_timetable run in one session, we have two special examples for you: one simulates many parallel chains and another shows how to handle many tasks within a chain.

What is even better, the later shows you how to check the result code and output of a previous task. If the last task failed, that is possible only if ignore_error boolean = true is set for that task. Otherwise, a scheduler will stop the chain. This sample shows how to calculate failed, successful, and the total number of tasks executed. Based on these values, we can calculate the success ratio:

...
 [INFO] [chain:1] [task:497] [txid:499374] Starting task
 [INFO] [chain:1] [task:497] [txid:499374] Task executed successfully
 [INFO] [chain:1] [task:498] [txid:499374] Starting task
[ERROR] [chain:1] [task:498] [sql:SELECT 1.0 / round(random())::int4;] 
[args:[]] [err:ERROR: division by zero (SQLSTATE 22012)] [pid:9616] [time:505.3µs] [txid:499374] Query
[ERROR] [chain:1] [task:498] [error:ERROR: division by zero (SQLSTATE 22012)] [txid:499374] Task execution failed
 [INFO] [chain:1] [task:498] [txid:499374] Ignoring task failure
 [INFO] [chain:1] [task:499] [txid:499374] Starting task
 [INFO] [chain:1] [task:499] [txid:499374] Task executed successfully
 [INFO] [chain:1] [task:500] [txid:499374] Starting task
[ERROR] [chain:1] [task:500] [sql:SELECT 1.0 / round(random())::int4;] 
[args:[]] [err:ERROR: division by zero (SQLSTATE 22012)] [pid:9616] [time:2.1592ms] [txid:499374] Query
[ERROR] [chain:1] [task:500] [error:ERROR: division by zero (SQLSTATE 22012)] [txid:499374] Task execution failed
 [INFO] [chain:1] [task:500] [txid:499374] Ignoring task failure
 [INFO] [chain:1] [task:501] [txid:499374] Starting task
 [INFO] [chain:1] [task:501] [txid:499374] Task executed successfully
 [INFO] [chain:1] [task:1] [txid:499374] Starting task
 [INFO] [notice:Tasks executed:500; succeeded: 274; failed: 226; ratio: 54.8000000000000000] [severity:NOTICE] Notice received
 [INFO] [chain:1] [task:1] [txid:499374] Task executed successfully
 [INFO] [chain:1] [txid:499374] Chain executed successfully
...

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

...

[INFO] [chain:1] [task:497] [txid:499374] Starting task

[INFO] [chain:1] [task:497] [txid:499374] Task executed successfully

[INFO] [chain:1] [task:498] [txid:499374] Starting task

[ERROR] [chain:1] [task:498] [sql:SELECT 1.0 / round(random())::int4;]

[args:[]] [err:ERROR: division by zero (SQLSTATE 22012)] [pid:9616] [time:505.3µs] [txid:499374] Query

[ERROR] [chain:1] [task:498] [error:ERROR: division by zero (SQLSTATE 22012)] [txid:499374] Task execution failed

[INFO] [chain:1] [task:498] [txid:499374] Ignoring task failure

[INFO] [chain:1] [task:499] [txid:499374] Starting task

[INFO] [chain:1] [task:499] [txid:499374] Task executed successfully

[INFO] [chain:1] [task:500] [txid:499374] Starting task

[ERROR] [chain:1] [task:500] [sql:SELECT 1.0 / round(random())::int4;]

[args:[]] [err:ERROR: division by zero (SQLSTATE 22012)] [pid:9616] [time:2.1592ms] [txid:499374] Query

[ERROR] [chain:1] [task:500] [error:ERROR: division by zero (SQLSTATE 22012)] [txid:499374] Task execution failed

[INFO] [chain:1] [task:500] [txid:499374] Ignoring task failure

[INFO] [chain:1] [task:501] [txid:499374] Starting task

[INFO] [chain:1] [task:501] [txid:499374] Task executed successfully

[INFO] [chain:1] [task:1] [txid:499374] Starting task

[INFO] [notice:Tasks executed:500; succeeded: 274; failed: 226; ratio: 54.8000000000000000] [severity:NOTICE] Notice received

[INFO] [chain:1] [task:1] [txid:499374] Task executed successfully

[INFO] [chain:1] [txid:499374] Chain executed successfully

...

Making use of cron-style and interval configuration

To make configuration easier, we have introduced a cron-style configuration interface:

-- CRON-Style
-- * * * * * command to execute
-- ┬ ┬ ┬ ┬ ┬
-- │ │ │ │ │
-- │ │ │ │ └──── day of the week (0 - 7) (Sunday to Saturday)(0 and 7 is Sunday);
-- │ │ │ └────── month (1 - 12)
-- │ │ └──────── day of the month (1 - 31)
-- │ └────────── hour (0 - 23)
-- └──────────── minute (0 - 59)

1

2

3

4

5

6

7

8

9

-- CRON-Style

-- * * * * * command to execute

-- ┬ ┬ ┬ ┬ ┬

-- │ │ │ │ │

-- │ │ │ │ └──── day of the week (0 - 7) (Sunday to Saturday)(0 and 7 is Sunday);

-- │ │ │ └────── month (1 - 12)

-- │ │ └──────── day of the month (1 - 31)

-- │ └────────── hour (0 - 23)

-- └──────────── minute (0 - 59)

But what if you want to have enhanced control over jobs? It's not that unusual one cannot foresee the execution time of some complicated operations, like REFRESH MATERIALIZED VIEW, CREATE INDEX CONCURRENTLY, etc. It's wise to run the next iteration after the some period, let say in an hour, or a day. In this case pg_timetable provides special interval format:

'@after 2 hours' will start the next chain iteration in two hours after the finish of the previous run;
while '@every 2 hours' will spin a new chain run every two hours no matter if and when the previous run is finished;
and a special '@reboot' value will run the chain at the very beginning of a pg_timetable session. Pay attention: not the server reboot! Rotate logs, install updates, check system status are the good examples of things you might want to do during start-up.

Finally …

If you like pg_timetable, we are looking forward to receive you feedback, or maybe even feature requests, bug reports and contributions. We also want to point out that commercial support for pg_timetable is available internationally.

If you want to read something else right now, we can recommend our post about pg_permissions which is available here.

7 responses to “pg_timetable: Advanced PostgreSQL scheduling”

Denn Sporcic says:

January 29, 2020 at 10:12 pm

Fantastic work team.
I really don't understand why you are implementing core system features in Go.
Why is this not done C ?
Cheers

Reply
- Pavlo Golub says:
  
  January 30, 2020 at 7:12 am
  
  1. Goroutines
  2. Cross-platform support
  3. Easy parallelism management
  
  Thanks for your kind words!
  
  Reply
Rafael Ponte says:

April 26, 2020 at 2:32 am

Very nice, thanks for sharing it.

One question: why didn't you guys use the PGQ Extension? I think it's very mature it could solve this issue as well.

Reply
- Pavlo Golub says:
  
  April 27, 2020 at 8:33 am
  
  Hello Rafael. Thanks for your comment.
  
  Would you be kind to provide some examples on how PGQ can be used for scheduling? They are completely different, unless we missed something.
  
  Thanks
  
  Reply
Shaba says:

October 17, 2024 at 1:44 pm

Hi, I am new to PostgreSQL as I come from a SQL Server environment.
I could not get the Windows installer to work but managed to install pg_timetable using the git clone option.

I do like the tool but have a question regarding the scheduling.
Do I have to have a command prompt open with the scheduling enabled for it to work (one per database?), or is there another option, i.e., backgroup worker, etc.? As I don't like keeping the command prompt open and running on the LIVE environment as it could be easy closed by someone without realising?

Reply
- Laurenz Albe says:
  
  October 18, 2024 at 5:38 pm
  
  No; that would be pretty useless.
  You just add the job, then you can close the database connection, and the job will be started all by itself.
  
  Reply
Stephan Borsodi says:

January 17, 2025 at 8:43 am

Inspired by Shaba's question:
I am a senior Oracle database solution engineer "brought up" on Unix systems ... and I have "some difficulties" on Windows based systems 😉
I could not install the pg_timetable service on my Windows 11 system. Hence, "Prof. ChatGPT" recommended me to use nssm (Non-Sucking Service Manager - download the appropriate version for your system from: https://nssm.cc/download ). Steps:
1. Extract the archive and copy the nssm.exe file to a location in your PATH
(e.g., C:\Windows\System32).
2. Create the Service in cmd.exe as Administrator, e.g.:
nssm install postgresql_pg_timetable_worker001
3. Add the necessary arguments, such as:
Path: Browse to the pg_timetable.exe executable
Parameters, e.g.: --host=localhost --port=5432 --dbname=mydb --clientname=worker001 --user=scheduler --password=mydb
You may add a description for the service, define the startmode etc.
4. Finalize by clicking on "Add service"
5. You might check the service by calling services.msc (in your cmd.exe session)

And yessss, it works perfectly 🙂 - both the Windows service and pg_timetable!

Reply

pg_timetable: Advanced PostgreSQL scheduling

Installing and running pg_timetable

Base tasks and chains

Configuring pg_timetable: An example

There are a couple of interesting things here:

Using pg_timetable concurrency protection

Utilizing self-destructive chains

High load

Making use of cron-style and interval configuration

Finally …

7 responses to “pg_timetable: Advanced PostgreSQL scheduling”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts