Hello, my name is Pavlo Golub, and I am a scheduler addict. That began when I implemented pg_timetable for PostgreSQL. I wrote a lot about it. In this post, I want to share the result of my investigations on the schedulers available for PostgreSQL. I gave a talk about this topic at the CERN meetup, so you may want to check it out for more details.
Comparison table of PostgreSQL schedulers
Let’s start with the complete comparison table. If you want to know more about each aspect, you’ll find further explanations below.
I would like to get comments and suggestions on this table, especially from developers or users of these products. I can be biased towards my own creation, so please don’t judge me too harshly. 🙂
|Can operate w\o extension||✔️||❌||❌||❌||✔️|
|Jobs meta stored in||database||database||database||database||file|
|Remote Database Execution||✔️||❌||✔️||✔️||✔️|
|Parallel Jobs Limit||✔️||❔||❔||❔||✔️|
|On Success Task||✔️||❌||✔️||❔||✔️|
|On Error Task||❌||❌||❌||❔||✔️|
|Kill Running Job||✔️||❌||❌||✔️||✔️|
|Auto Job Disable||❌||❌||❌||❌||✔️|
PostgreSQL Schedulers Architecture
The first important aspect of each scheduler is its implementation. Either it is a standalone client or a background worker.
A standalone client can be run on any host or platform. The only requirement for such architecture is the ability to connect to the target PostgreSQL server.
On the other hand, background worker implementation requires it to be one of the PostgreSQL processes. That, in turn, means you need to change the
shared_preload_libraries configuration parameter and restart the server. But in that case, the scheduler doesn’t need to connect to PostgreSQL (it can, though) and can use a unique SPI protocol.
There is no right or wrong choice; each of these choices has pros and cons. The proper solution depends on the user and the environment.
Each of the schedulers heavily relies on a database infrastructure. Specific purpose tables, functions, and views are a common thing. So it’s up to the developer to decide how to organize this set of objects in a database. Database extensions are one of the obvious ways of doing so. The difference reveals itself during upgrades: either the user should run
ALTER EXTENSION name UPDATE for extension-based deployments, or the scheduler is responsible for updating itself.
Right now, only one of the five schedulers can store jobs and task descriptions from the database: pgbucket can store jobs and tasks in files.
Each database scheduler is, at a minimum, supposed to be able to execute SQL tasks. The ability to run external programs or shell commands is a big plus. For some everyday tasks, e.g., send mail, log, copy table, etc., we’ve implemented them as productive internal tasks in pg_timetable.
An essential aspect of scheduler operation is the capability
- to run several jobs simultaneously,
- to control simultaneous job execution,
- to limit how many instances of the same job can run simultaneously (concurrency).
When it comes to execution control, I check if:
- values can be passed to tasks as arguments;
- a task/job can be executed under the specified role/user (both database and OS);
- a job/task succeeded (any specific task performed);
- a job/task failed (any specific task performed);
This is probably the main criteria by which people evaluate schedulers. cron syntax is the standard, de facto.
Both pgAgent and jpgAgent heavily depend on GUI, and it’s impossible to use cron syntax as the input value. The user needs to use checkboxes to specify the required schedule.
Interval values specify if jobs can be executed within a time interval, e.g., every hour, every 15 minutes.
On reboot indicates that jobs will be executed after a scheduler restart, not after a PostgreSQL restart. It’s almost impossible to handle server restart adequately, unless the scheduler is implemented as a background worker. In that case, a restart of PostgreSQL means restarting all processes, including the scheduling process.
Start and Kill Manually
Sometimes it’s good to have the opportunity to start a job manually, for example, for debugging purposes or during maintenance windows. The same applies to the ability to stop a frozen or long-running job.
Job and task timeouts allow you to terminate long-running processes automatically.
When using PostgreSQL schedulers, we are interested in:
- whether jobs can be paused/enabled;
- if this job can be disabled automatically, e.g., if it fails continuously 3 times;
- if a job can be deleted after successful execution automatically.
In this part, we check what, how detailed and where PostgreSQL schedulers can:
- log job details;
- log task details;
- log session details;
- output logs to stdout/stderr;
- output logs to file;
- store logs in database tables;
I would like to see more PostgreSQL schedulers available! Let me know if you have any suggestions or fixes, and I will update this table with the new info.