Migrate scheduled jobs to pg_timetable from pgAgent

Why migrate to pg_timetable from pgAgent?

There are many reasons, actually, why people might want to export pgAgent jobs to pg_timetable. Most of them may be spotted by taking a look at the PostgreSQL schedulers comparison table I introduced in one of my previous posts and the supported platform table from the official readme.

Table of Contents

In my previous article I highlighted the reasons why and the ways how one might want to move scheduled jobs from pg_cron to pg_timetable.

Differences and similarities between pg_timetable and pgAgent architectures

Both tools use multi-step job architecture. pg_timetable uses "chain" and "task" terms, while pgAgent uses "job" and "step" terminology.
pgAgent is able to execute SQL and BATCH (shell) tasks. pg_timetable's task can be one of SQL, PROGRAM, and BUILTIN. Thus to emulate pgAgent's BATCH step, pg_timetable will run a PROGRAM task calling shell execution, e.g. bash -c "command".
pgAgent allows specifying many schedules for a job. If several schedules are found, we should duplicate the chain definition for each of them.
To store scheduling information, pg_timetable uses cron notation, but pgAgent uses boolean arrays as storage. We need to provide a way of transforming the array to cron notation.
pgAgent, as well as pg_timetable, allows ignoring task errors. The possible scenarios are: fail the job, mark the step as succeeded and continue, mark as fail but ignore it and proceed. pg_timetable allows only ignoring the error without marking it as succeeded.
While pgAgent can limit job execution by the agent's host, pg_timetable is more flexible in implementing control with the client name. Here we will assume the host name becomes a client name after migration.
pgAgent allows setting start and end timestamp for a schedule, as well as if a schedule is enabled. We will migrate only valid schedules: those which are enabled and active right now within the date range.

Prerequisites

You have installed the pgagent extension in your database.
You've scheduled pgagent jobs to export.
You have created the pg_timetable schema by running the pg_timetable against this database at least once. The schema will be created automatically. It doesn't matter if you have already added some chains or not.

The pgAgent to pg_timetable migration script - full explanation

To export every piece of information available as precisely as possible, use this SQL snippet. I will explain it step by step.

bool_array_to_cron()

As I said earlier, we need to convert boolean arrays storing execution information for minutes, hours, days, and months to proper cron notation.

The first thing to notice is that arrays are of different lengths, e.g., 60 for minutes, 12 for months, etc. Second, the cron clause starts from 0 for hours and minutes but with 1 for other units.

The trick is simple: unnest the input array, join it with order numbers, throw out rows with false values, and combine numbers into a cron-acceptable comma-separated string. If all values in the array are false, then just use asterisk cron notation.

CREATE OR REPLACE FUNCTION bool_array_to_cron(bool[], start_with int4 DEFAULT 0) RETURNS TEXT AS
$$
WITH u AS (
	SELECT unnest($1) e, generate_series($2, array_length($1, 1)-1+$2) AS i 
)
SELECT COALESCE(string_agg(i::text, ','), '*') FROM u WHERE e
$$
LANGUAGE sql;

CREATE OR REPLACE FUNCTION bool_array_to_cron(bool[], start_with int4 DEFAULT 0) RETURNS TEXT AS

WITH u AS (

SELECT unnest($1) e, generate_series($2, array_length($1, 1)-1+$2) AS i

)

SELECT COALESCE(string_agg(i::text, ','), '*') FROM u WHERE e

LANGUAGE sql;

Which shell to use

pg_timetable is a cross-platform tool that may run on most of the known platforms. Because of this, we want to know exactly which shell will be used for batch pgAgent tasks. On Unix-like platforms the obvious choice would be sh -c "command". On Windows, it might be either pwsh -c "command" or cmd /C "command".

Thecte_shell clause serves precisely that purpose. You can easily specify the default shell to use:

cte_shell(shell, cmd_param) AS (
	VALUES ('sh', '-c')
)

cte_shell(shell, cmd_param) AS (

VALUES ('sh', '-c')

)

Please note the naming convention I'm using here. If the CTE clause starts with 'cte', it somehow produces the final result. Meanwhile, CTE clauses named 'pga_*' prepare pgAgent-specific result sets for transformation.

pga_schedule

In this CTE clause, we want to return all active pgAgent schedules with cron notation. The schedule name will later be used in the chain name.

pga_schedule AS (
	SELECT
		s.jscjobid,
		s.jscname,
		format('%s %s %s %s %s', 
			bool_array_to_cron(s.jscminutes), 
			bool_array_to_cron(s.jschours), 
			bool_array_to_cron(s.jscmonthdays), 
			bool_array_to_cron(s.jscmonths, 1), 
			bool_array_to_cron(s.jscweekdays, 1)) AS schedule
	FROM 
		pgagent.pga_schedule s  
	WHERE 
		s.jscenabled 
		AND now() < COALESCE(s.jscend, 'infinity'::timestamptz)
		AND now() > s.jscstart
)

pga_schedule AS (

SELECT

s.jscjobid,

s.jscname,

format('%s %s %s %s %s',

bool_array_to_cron(s.jscminutes),

bool_array_to_cron(s.jschours),

bool_array_to_cron(s.jscmonthdays),

bool_array_to_cron(s.jscmonths, 1),

bool_array_to_cron(s.jscweekdays, 1)) AS schedule

FROM

pgagent.pga_schedule s

WHERE

s.jscenabled

AND now() < COALESCE(s.jscend, 'infinity'::timestamptz)

AND now() > s.jscstart

)

pga_chain and cte_chain

This clause will return the result set for all pgAgent jobs scheduled. Because on the target database there might already be some chains, we cannot use jobid directly. Instead, we want to generate the next chain_id from the sequence to use later.

Given that pgAgent might have several schedules for the same job, we are specifying jobname as a concatenation of the job name and the schedule name. cte_chain will insert all selected rows into timetable.chain table.

pga_chain AS (
    SELECT
        nextval('timetable.chain_chain_id_seq'::regclass) AS chain_id,
        jobid,
        format('%s @ %s', jobname, jscname) AS jobname,
        jobhostagent,
        jobenabled,
        schedule
    FROM
        pgagent.pga_job JOIN pga_schedule ON jobid = jscjobid
),
cte_chain AS (
	INSERT INTO timetable.chain (chain_id, chain_name, client_name, run_at, live)
	    SELECT 
	    	chain_id, jobname, jobhostagent, schedule, jobenabled
	    FROM
	    	pga_chain
)

pga_chain AS (

SELECT

nextval('timetable.chain_chain_id_seq'::regclass) AS chain_id,

jobid,

format('%s @ %s', jobname, jscname) AS jobname,

jobhostagent,

jobenabled,

schedule

FROM

pgagent.pga_job JOIN pga_schedule ON jobid = jscjobid

cte_chain AS (

INSERT INTO timetable.chain (chain_id, chain_name, client_name, run_at, live)

SELECT

chain_id, jobname, jobhostagent, schedule, jobenabled

FROM

pga_chain

)

pga_step

This clause is the most complicated one here. So let me throw some light on it. We want to generate the task_id in a similar way as we did for chain_id.

pgAgent steps are performed in alphanumeric order by name. pg_timetable uses the numeric value to specify the task order. To generate these numeric values, we are using the rank() window function here.

The hardest part is with the connection string. pg_timetable and pgAgent allow remote execution for SQL tasks. Within the pgAgent settings, you may use either the database name to execute a task on the same cluster, or specify the connection string for a custom remote cluster. pg_timetable uses connection string notation in both cases. The logic is:

1. Use connection string if specified. Ignore empty string with NULLIF().
2. Then check if the database name is equal to the current database. If so, do nothing.
3. Otherwise, generate the proper connection string with the database name specified.

pga_step AS (
	SELECT 
		c.chain_id,
		nextval('timetable.task_task_id_seq'::regclass) AS task_id,
		rank() OVER (ORDER BY jstname) AS jstorder,
		jstid,
		jstname,
		jstenabled,
		CASE jstkind WHEN 'b' THEN 'PROGRAM' ELSE 'SQL' END AS jstkind,
		jstcode,
		COALESCE(
			NULLIF(jstconnstr, ''), 
			CASE 
				WHEN jstdbname = current_database() THEN NULL
				WHEN jstdbname > '' THEN 'dbname=' || jstdbname 
			END
		) AS jstconnstr,
		jstonerror != 'f' AS jstignoreerror
	FROM
		pga_chain c JOIN pgagent.pga_jobstep js ON c.jobid = js.jstjobid
)

pga_step AS (

SELECT

c.chain_id,

nextval('timetable.task_task_id_seq'::regclass) AS task_id,

rank() OVER (ORDER BY jstname) AS jstorder,

jstid,

jstname,

jstenabled,

CASE jstkind WHEN 'b' THEN 'PROGRAM' ELSE 'SQL' END AS jstkind,

jstcode,

COALESCE(

NULLIF(jstconnstr, ''),

CASE

WHEN jstdbname = current_database() THEN NULL

WHEN jstdbname > '' THEN 'dbname=' || jstdbname

END

) AS jstconnstr,

jstonerror != 'f' AS jstignoreerror

FROM

pga_chain c JOIN pgagent.pga_jobstep js ON c.jobid = js.jstjobid

)

cte_tasks and cte_parameters

These clauses are pretty straightforward. The only thing to explain is how PROGRAM tasks are specified in pg_timetable. The command column should contain the program name (possibly with full path) only. In our case, it's the name of the shell we specified in the first step. All command-line parameters for executing the shell go into the timetable.parameter table. Pay attention; we don't do any quote escaping here. pg_timetable handles all parameter values automatically.

cte_tasks AS (
	INSERT INTO timetable.task(task_id, chain_id, task_name, task_order, kind, command, database_connection)
	    SELECT
	    	task_id, chain_id, jstname, jstorder, jstkind::timetable.command_kind, 
	    	CASE jstkind WHEN 'SQL' THEN jstcode ELSE sh.shell END,
	    	jstconnstr
	    FROM
	    	pga_step, cte_shell sh
),
cte_parameters AS (
	INSERT INTO timetable.parameter (task_id, order_id, value)
		SELECT 
			task_id, 1, jsonb_build_array(sh.cmd_param, s.jstcode)
	    FROM
	    	pga_step s, cte_shell sh
	    WHERE 
	    	s.jstkind = 'PROGRAM'
)

cte_tasks AS (

INSERT INTO timetable.task(task_id, chain_id, task_name, task_order, kind, command, database_connection)

SELECT

task_id, chain_id, jstname, jstorder, jstkind::timetable.command_kind,

CASE jstkind WHEN 'SQL' THEN jstcode ELSE sh.shell END,

jstconnstr

FROM

pga_step, cte_shell sh

cte_parameters AS (

INSERT INTO timetable.parameter (task_id, order_id, value)

SELECT

task_id, 1, jsonb_build_array(sh.cmd_param, s.jstcode)

FROM

pga_step s, cte_shell sh

WHERE

s.jstkind = 'PROGRAM'

)

Finally...

Finally, when you execute the migration script, you might see something like this:

chain_id|jobid|jobname                                         |jobhostagent|jobenabled|schedule                                                                                                |
--------+-----+------------------------------------------------+------------+----------+--------------------------------------------------------------------------------------------------------+
      91|    1|Download locations and aggregate @ every min    |node1       |true      |* * * * *                                                                                               |
      92|    1|Download locations and aggregate @ test schedule|node1       |true      |2 3,6 3 4 1                                                                                             |
      93|    1|Download locations and aggregate @ run download |node1       |true      |0,5 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 * 1,2,3,4,5,6,7,8,9,10,11,12 2,3,4,5,6|

--------+-----+------------------------------------------------+------------+----------+--------------------------------------------------------------------------------------------------------+

91| 1|Download locations and aggregate @ every min |node1 |true |* * * * * |

92| 1|Download locations and aggregate @ test schedule|node1 |true |2 3,6 3 4 1 |

93| 1|Download locations and aggregate @ run download |node1 |true |0,5 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 * 1,2,3,4,5,6,7,8,9,10,11,12 2,3,4,5,6|

From this particular example, one may conclude pgAgent has:

One scheduled job with the name "Download locations and aggregate".
That job has 3 enabled schedules: "every min", "test schedule", "run download".
In order to execute the migrated chain, pg_timetable should be started with the command-line option --clientname=node1.

If you want to contribute to pg_timetable and help to make it better:

⭐give a star to the project,
feel free to open an 🤚issue and ask a 🎓question
or even consider submitting a 📜pull request.

In conclusion, I wish you all the best! ♥️

One more thing. I want to thank everyone here in the Cybertec and the PostgreSQL community for the support you provided to Ukrainians and me personally! #StandWithUkraine 💙💛

I hope we can meet in person at one of the conferences, meetups, or training sessions!

Please leave your comments below. In order to receive regular updates on important changes in PostgreSQL, subscribe to the CYBERTEC newsletter, or follow us on Twitter, Facebook, or LinkedIn