Why migrate to pg_timetable from pgAgent?
There are many reasons, actually, why people might want to export pgAgent jobs to pg_timetable. Most of them may be spotted by taking a look at the PostgreSQL schedulers comparison table I introduced in one of my previous posts and the supported platform table from the official readme.
Differences and similarities between pg_timetable and pgAgent architectures
- Both tools use multi-step job architecture. pg_timetable uses “chain” and “task” terms, while pgAgent uses “job” and “step” terminology.
- pgAgent is able to execute
BATCH(shell) tasks. pg_timetable‘s task can be one of
BUILTIN. Thus to emulate pgAgent‘s
BATCHstep, pg_timetable will run a
PROGRAMtask calling shell execution, e.g.
bash -c "command".
- pgAgent allows specifying many schedules for a job. If several schedules are found, we should duplicate the chain definition for each of them.
- To store scheduling information, pg_timetable uses cron notation, but pgAgent uses boolean arrays as storage. We need to provide a way of transforming the array to cron notation.
- pgAgent, as well as pg_timetable, allows ignoring task errors. The possible scenarios are: fail the job, mark the step as succeeded and continue, mark as fail but ignore it and proceed. pg_timetable allows only ignoring the error without marking it as succeeded.
- While pgAgent can limit job execution by the agent’s host, pg_timetable is more flexible in implementing control with the client name. Here we will assume the host name becomes a client name after migration.
- pgAgent allows setting start and end timestamp for a schedule, as well as if a schedule is enabled. We will migrate only valid schedules: those which are enabled and active right now within the date range.
- You have installed the pgagent extension in your database.
- You’ve scheduled pgagent jobs to export.
- You have created the pg_timetable schema by running the pg_timetable against this database at least once. The schema will be created automatically. It doesn’t matter if you have already added some chains or not.
The pgAgent to pg_timetable migration script – full explanation
To export every piece of information available as precisely as possible, use this SQL snippet. I will explain it step by step.
As I said earlier, we need to convert boolean arrays storing execution information for minutes, hours, days, and months to proper cron notation.
The first thing to notice is that arrays are of different lengths, e.g., 60 for minutes, 12 for months, etc. Second, the cron clause starts from 0 for hours and minutes but with 1 for other units.
The trick is simple: unnest the input array, join it with order numbers, throw out rows with false values, and combine numbers into a cron-acceptable comma-separated string. If all values in the array are false, then just use asterisk cron notation.
CREATE OR REPLACE FUNCTION bool_array_to_cron(bool, start_with int4 DEFAULT 0) RETURNS TEXT AS $$ WITH u AS ( SELECT unnest($1) e, generate_series($2, array_length($1, 1)-1+$2) AS i ) SELECT COALESCE(string_agg(i::text, ','), '*') FROM u WHERE e $$ LANGUAGE sql;
Which shell to use
pg_timetable is a cross-platform tool that may run on most of the known platforms. Because of this, we want to know exactly which shell will be used for batch pgAgent tasks. On Unix-like platforms the obvious choice would be
sh -c "command". On Windows, it might be either
pwsh -c "command" or
cmd /C "command".
cte_shell clause serves precisely that purpose. You can easily specify the default shell to use:
cte_shell(shell, cmd_param) AS ( VALUES ('sh', '-c') )
Please note the naming convention I’m using here. If the CTE clause starts with ‘cte’, it somehow produces the final result. Meanwhile, CTE clauses named ‘pga_*’ prepare pgAgent-specific result sets for transformation.
In this CTE clause, we want to return all active pgAgent schedules with cron notation. The schedule name will later be used in the chain name.
pga_schedule AS ( SELECT s.jscjobid, s.jscname, format('%s %s %s %s %s', bool_array_to_cron(s.jscminutes), bool_array_to_cron(s.jschours), bool_array_to_cron(s.jscmonthdays), bool_array_to_cron(s.jscmonths, 1), bool_array_to_cron(s.jscweekdays, 1)) AS schedule FROM pgagent.pga_schedule s WHERE s.jscenabled AND now() < COALESCE(s.jscend, 'infinity'::timestamptz) AND now() > s.jscstart )
pga_chain and cte_chain
This clause will return the result set for all pgAgent jobs scheduled. Because on the target database there might already be some chains, we cannot use
jobid directly. Instead, we want to generate the next
chain_id from the sequence to use later.
Given that pgAgent might have several schedules for the same job, we are specifying
jobname as a concatenation of the job name and the schedule name.
cte_chain will insert all selected rows into
pga_chain AS ( SELECT nextval('timetable.chain_chain_id_seq'::regclass) AS chain_id, jobid, format('%s @ %s', jobname, jscname) AS jobname, jobhostagent, jobenabled, schedule FROM pgagent.pga_job JOIN pga_schedule ON jobid = jscjobid ), cte_chain AS ( INSERT INTO timetable.chain (chain_id, chain_name, client_name, run_at, live) SELECT chain_id, jobname, jobhostagent, schedule, jobenabled FROM pga_chain )
This clause is the most complicated one here. So let me throw some light on it. We want to generate the
task_id in a similar way as we did for
pgAgent steps are performed in alphanumeric order by name. pg_timetable uses the numeric value to specify the task order. To generate these numeric values, we are using the rank() window function here.
The hardest part is with the connection string. pg_timetable and pgAgent allow remote execution for SQL tasks. Within the pgAgent settings, you may use either the database name to execute a task on the same cluster, or specify the connection string for a custom remote cluster. pg_timetable uses connection string notation in both cases. The logic is:
pga_step AS ( SELECT c.chain_id, nextval('timetable.task_task_id_seq'::regclass) AS task_id, rank() OVER (ORDER BY jstname) AS jstorder, jstid, jstname, jstenabled, CASE jstkind WHEN 'b' THEN 'PROGRAM' ELSE 'SQL' END AS jstkind, jstcode, COALESCE( NULLIF(jstconnstr, ''), CASE WHEN jstdbname = current_database() THEN NULL WHEN jstdbname > '' THEN 'dbname=' || jstdbname END ) AS jstconnstr, jstonerror != 'f' AS jstignoreerror FROM pga_chain c JOIN pgagent.pga_jobstep js ON c.jobid = js.jstjobid )
cte_tasks and cte_parameters
These clauses are pretty straightforward. The only thing to explain is how
PROGRAM tasks are specified in pg_timetable. The
command column should contain the program name (possibly with full path) only. In our case, it’s the name of the shell we specified in the first step. All command-line parameters for executing the shell go into the
timetable.parameter table. Pay attention; we don’t do any quote escaping here. pg_timetable handles all parameter values automatically.
cte_tasks AS ( INSERT INTO timetable.task(task_id, chain_id, task_name, task_order, kind, command, database_connection) SELECT task_id, chain_id, jstname, jstorder, jstkind::timetable.command_kind, CASE jstkind WHEN 'SQL' THEN jstcode ELSE sh.shell END, jstconnstr FROM pga_step, cte_shell sh ), cte_parameters AS ( INSERT INTO timetable.parameter (task_id, order_id, value) SELECT task_id, 1, jsonb_build_array(sh.cmd_param, s.jstcode) FROM pga_step s, cte_shell sh WHERE s.jstkind = 'PROGRAM' )
Finally, when you execute the migration script, you might see something like this:
chain_id|jobid|jobname |jobhostagent|jobenabled|schedule | --------+-----+------------------------------------------------+------------+----------+--------------------------------------------------------------------------------------------------------+ 91| 1|Download locations and aggregate @ every min |node1 |true |* * * * * | 92| 1|Download locations and aggregate @ test schedule|node1 |true |2 3,6 3 4 1 | 93| 1|Download locations and aggregate @ run download |node1 |true |0,5 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 * 1,2,3,4,5,6,7,8,9,10,11,12 2,3,4,5,6|
From this particular example, one may conclude pgAgent has:
- One scheduled job with the name “Download locations and aggregate”.
- That job has 3 enabled schedules: “every min”, “test schedule”, “run download”.
- In order to execute the migrated chain, pg_timetable should be started with the command-line option
If you want to contribute to pg_timetable and help to make it better:
- ⭐give a star to the project,
- feel free to open an 🤚issue and ask a 🎓question
- or even consider submitting a 📜pull request.
In conclusion, I wish you all the best! ♥️
One more thing. I want to thank everyone here in the Cybertec and the PostgreSQL community for the support you provided to Ukrainians and me personally! #StandWithUkraine 💙💛