Hello, my name is Pavlo Golub, and I am a scheduler addict. That began when I implemented pg_timetable for PostgreSQL. I wrote a lot about it. In this post, I want to share the result of my investigations on the schedulers available for PostgreSQL. I gave a talk about this topic at the CERN meetup, so you may want to check it out for more details.

Comparison table of PostgreSQL schedulers

Let’s start with the complete comparison table. If you want to know more about each aspect, you’ll find further explanations below.

I would like to get comments and suggestions on this table, especially from developers or users of these products. I can be biased towards my own creation, so please don’t judge me too harshly. 🙂

Feature\Productpg_timetablepg_cronpgAgentjpgAgentpgbucket
Architecture
Year20192016200820162015
Implementationstandalonebgworkerstandalonestandalonestandalone
LanguageGoCC++JavaC++
Can operate w\o extension✔️✔️
Jobs meta stored indatabasedatabasedatabasedatabasefile
Remote Database Execution✔️✔️✔️✔️
Cross Platform✔️✔️✔️✔️
Functionality
SQL tasks✔️✔️✔️✔️✔️
Program/Shell tasks✔️✔️✔️✔️
Built-in tasks✔️
Parallel Jobs✔️✔️✔️✔️✔️
Parallel Jobs Limit✔️✔️
Concurrency protection✔️✔️
Task Parameters✔️
Arbitrary Role✔️✔️✔️✔️
On Success Task✔️✔️✔️
On Error Task✔️✔️
Scheduling
Standard Cron✔️✔️✔️
Interval✔️✔️
On Reboot✔️
Start Manually✔️✔️
Kill Running Job✔️✔️✔️
Job Timeout✔️✔️
Task Timeout✔️
Disable Job✔️✔️✔️✔️
Auto Job Disable✔️
Self-Destructive Jobs✔️
Logging Levels
Job✔️
Task✔️
Session✔️
Logging Destinations
stdout/stderr✔️
file✔️
database✔️

PostgreSQL Schedulers Architecture

Implementation

The first important aspect of each scheduler is its implementation. Either it is a standalone client or a background worker.
A standalone client can be run on any host or platform. The only requirement for such architecture is the ability to connect to the target PostgreSQL server.
On the other hand, background worker implementation requires it to be one of the PostgreSQL processes. That, in turn, means you need to change the shared_preload_libraries configuration parameter and restart the server. But in that case, the scheduler doesn’t need to connect to PostgreSQL (it can, though) and can use a unique SPI protocol.
There is no right or wrong choice; each of these choices has pros and cons. The proper solution depends on the user and the environment.

Extension

Each of the schedulers heavily relies on a database infrastructure. Specific purpose tables, functions, and views are a common thing. So it’s up to the developer to decide how to organize this set of objects in a database. Database extensions are one of the obvious ways of doing so. The difference reveals itself during upgrades: either the user should run ALTER EXTENSION name UPDATE for extension-based deployments, or the scheduler is responsible for updating itself.
Right now, only one of the five schedulers can store jobs and task descriptions from the database: pgbucket can store jobs and tasks in files.

Functionality

Tasks

Each database scheduler is, at a minimum, supposed to be able to execute SQL tasks. The ability to run external programs or shell commands is a big plus. For some everyday tasks, e.g., send mail, log, copy table, etc., we’ve implemented them as productive internal tasks in pg_timetable.

Parallelization

An essential aspect of scheduler operation is the capability

  • to run several jobs simultaneously,
  • to control simultaneous job execution,
  • to limit how many instances of the same job can run simultaneously (concurrency).

Execution control

When it comes to execution control, I check if:

  • values can be passed to tasks as arguments;
  • a task/job can be executed under the specified role/user (both database and OS);
  • a job/task succeeded (any specific task performed);
  • a job/task failed (any specific task performed);

Scheduling

Standard Cron

This is probably the main criteria by which people evaluate schedulers. cron syntax is the standard, de facto.
Both pgAgent and jpgAgent heavily depend on GUI, and it’s impossible to use cron syntax as the input value. The user needs to use checkboxes to specify the required schedule.

Interval

Interval values specify if jobs can be executed within a time interval, e.g., every hour, every 15 minutes.

On Reboot

On reboot indicates that jobs will be executed after a scheduler restart, not after a PostgreSQL restart. It’s almost impossible to handle server restart adequately, unless the scheduler is implemented as a background worker. In that case, a restart of PostgreSQL means restarting all processes, including the scheduling process.

Start and Kill Manually

Sometimes it’s good to have the opportunity to start a job manually, for example, for debugging purposes or during maintenance windows. The same applies to the ability to stop a frozen or long-running job.

Timeouts

Job and task timeouts allow you to terminate long-running processes automatically.

Job Disabling/Deletion

When using PostgreSQL schedulers, we are interested in:

  • whether jobs can be paused/enabled;
  • if this job can be disabled automatically, e.g., if it fails continuously 3 times;
  • if a job can be deleted after successful execution automatically.

Logging

In this part, we check what, how detailed and where PostgreSQL schedulers can:

  • log job details;
  • log task details;
  • log session details;
  • output logs to stdout/stderr;
  • output logs to file;
  • store logs in database tables;

Conclusion

I would like to see more PostgreSQL schedulers available! Let me know if you have any suggestions or fixes, and I will update this table with the new info.

I want to remind you that pg_timetable is a community project. So, please, don’t hesitate to ask any questions, to report bugs, to star pg_timetable project, and to tell the world about it.

In conclusion, I wish you all the best! ♥️
Please, stay safe – so we can meet in person at one of the conferences, meetups, or training sessions!