pg_timetable v4 is out and available immediately!

08.2021

Category:

Tags: cron , pg_timetable , scheduler , what's new

Our team is proud to introduce a new major pg_timetable v4 release!

Table of Contents

This time a huge amount of work has been done and the pg_timetable v4 release is backward-incompatible with previous versions. However, job migration is straightforward and shouldn't stop anyone from updating.

We've added detailed Read the Docs documentation and will enhance it further. Let us know what areas should be highlighted or what topics we should add.

Now, let's check out some of the great new features! You can review the official release page for a complete changelog with links to issues, commits, and pull requests.

I want to remind you that pg_timetable is a community project. So, please, don't hesitate to ask any questions , to report bugs, to star pg_timetable project, and to tell the world about it.

pg_timetable v4 vs v3

The previous release is still considered stable. There are no confirmed bugs nor critical performance issues and is absolutely safe to stick with it.

However, it won't be backpatched if new bugs are discovered. In addition, new features won't be implemented for the v3 branch.

Performance

To increase performance, we worked on several fronts.

First of all, we reworked the database schema and simplified it, as much as possible. This way new pg_timetable can execute fewer system queries leaving more bandwidth and connections to the real workers.

Secondly, we rewrote cron-related functions in pure SQL instead of PL/PgSQL. It allows us not only to increase performance but also to get rid of tedious debugging.

And the third pillar is the comprehensive logging system. See details below in the separate section.

Configuration File Support

As soon as the functionality grows, it becomes error-prone and tedious to handle configuration using only command-line parameters and environmental variables.

Rules of common sense suggest to use configuration files, in this case. We've added a detailed self-explanatory config.example.yaml file for you.

Of course, you are free to use a custom set of options or not to use it at all. For example, it seems like a good idea to provide general options in the configuration file and connection options using environmental variables, for the sake of security.

CopyFromFile built-in task

One of the most frequent tasks for database schedulers is to import some data from external sources to the database, e.g. sensors, logs, official statistics, etc. The CSV format is standard de facto for such kinds of data. PostgreSQL can use the COPY command to import CSV files. But to use this function, the file should be placed in the server's file system, so the PostgreSQL process can access it. That is not always the case for production environments, especially in the cloud era.

To overcome this limitation in the previous version of pg_timetable, we suggest using psql copy command, which performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.

From now on you don't need any additional programs to perform client copy. One can use the new built-in CopyFromFile task. You may find the full example in the manual. Below you will notice that the task accepts JSON parameters in the form of an object, e.g.

{
    'sql': 'COPY location FROM STDIN WITH DELIMITER '|' NULL '<NULL>'',
    'filename': 'download/orte_ansi.txt'
}

1

2

3

4

{

'sql': 'COPY location FROM STDIN WITH DELIMITER '|' NULL '<NULL>'',

'filename': 'download/orte_ansi.txt'

}

Here sql is the correct COPY .. FROM STDIN command. One can use any of the standard clauses controlling file specification, e.g. file encoding, condition for insert, escaping, quoting, etc.
And the filename is the correct path (absolute or relative to pg_timetable) of the source CSV file.

This task perfectly works for all formats supported by PostgreSQL: text, csv and binary.

Enhanced logging

One of the key architectural decisions made for pg_timetable was the database-driven concept. Meaning scheduler sessions are fully controlled by the database and every piece of the output is stored back in the database for further analysis.

File logs

That said, people were using output redirection, if they wanted to have logs stored in files, e.g.

pg_timetable -c loader -u scheduler -d timetable > pgtt.log

1	pg_timetable -c loader -u scheduler -d timetable > pgtt.log

The drawback of this approach is that information is stored as plain text, making it hard for machine processing. In the new version, we have introduced the ability to duplicate logs to a file system using --log-file command line parameter. Moreover, the user can choose the exact format of the underlying file specifying --log-file-format: json or text.

Database log performance

Another bottleneck we were trying to fix was the database logging. Heavy clients were producing a lot of information to be stored by separate INSERT statements, causing intensive use of connections. We have come up with an outstanding log hook implementation for the famous logrus package.

The key benefits are:

Usage of the COPY machinery;
Cache controlled by the time and the number of entries to be logged;
Ability to skip entry logging in case of extremely high load.

I personally highly encourage you to use our log hook in your projects. Let us know if you think we should make it a standalone project.

Tasks output

pg_timetable has been storing output for program tasks from the very beginning. It's useful for debugging and back-in-time incident analysis. We never thought this feature could be applied to the built-in SQL tasks until Nikolay Samokhvalov proposed it. So now one can have SQL task with SELECT statement to determine which output will be saved upon execution, e.g.

my_database=> timetable=> SELECT timetable.add_job('log-sessions', '* * * * *', 
'SELECT * FROM timetable.active_session');
 add_job
---------
       1
(1 row)

...

my_database=> SELECT chain_id, finished, output FROM timetable.execution_log;
 chain_id |           finished            |  output
----------+-------------------------------+----------
        1 | 2021-08-12 15:40:46.064795+02 | SELECT 2
(1 row)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

my_database=> timetable=> SELECT timetable.add_job('log-sessions', '* * * * *',

'SELECT * FROM timetable.active_session');

add_job

---------

1

(1 row)

...

my_database=> SELECT chain_id, finished, output FROM timetable.execution_log;

chain_id | finished | output

----------+-------------------------------+----------

1 | 2021-08-12 15:40:46.064795+02 | SELECT 2

(1 row)

The output of the SQL task is the tag of the command executed and the number of rows returned or affected. We are not storing the result set since commands can be arbitrarily complicated or rather huge. Users should take care of this and explicitly store anything important, e.g.

my_database=> CREATE TABLE history_session (
ts timestamptz DEFAULT now(), 
LIKE timetable.active_session);
CREATE TABLE

my_database=> SELECT timetable.add_job('dump-sessions', '* * * * *', 
'INSERT INTO history_session SELECT now(), * FROM timetable.active_session');
 add_job
---------
       3
(1 row)

...

my_database=> SELECT * FROM history_session;
              ts               | client_pid | client_name | server_pid
-------------------------------+------------+-------------+------------
 2021-08-12 15:52:02.378981+02 |      10440 | loader-foo  |      15676
 2021-08-12 15:52:02.378981+02 |      10440 | loader-foo  |      16436
 2021-08-12 15:52:02.378981+02 |      10440 | loader-foo  |      14892
 2021-08-12 15:52:02.378981+02 |      10440 | loader-foo  |      25480
 2021-08-12 15:53:02.446911+02 |      19664 | writer-boo  |      21876
 2021-08-12 15:53:02.446911+02 |      19664 | writer-boo  |      11852
 2021-08-12 15:53:02.446911+02 |      19664 | writer-boo  |      10052
 2021-08-12 15:53:02.446911+02 |      19664 | writer-boo  |      27396
(8 rows)

my_database=> SELECT chain_id, finished, output FROM timetable.execution_log;
 chain_id |           finished            |   output
----------+-------------------------------+------------
        1 | 2021-08-12 15:40:46.064795+02 | SELECT 2
        1 | 2021-08-12 15:52:02.380711+02 | SELECT 4
        3 | 2021-08-12 15:52:02.382296+02 | INSERT 0 4
        1 | 2021-08-12 15:52:07.34872+02  | SELECT 3
        3 | 2021-08-12 15:52:07.448914+02 | INSERT 0 4
(5 rows)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

my_database=> CREATE TABLE history_session (

ts timestamptz DEFAULT now(),

LIKE timetable.active_session);

CREATE TABLE

my_database=> SELECT timetable.add_job('dump-sessions', '* * * * *',

'INSERT INTO history_session SELECT now(), * FROM timetable.active_session');

add_job

---------

3

(1 row)

...

my_database=> SELECT * FROM history_session;

ts | client_pid | client_name | server_pid

-------------------------------+------------+-------------+------------

2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 15676

2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 16436

2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 14892

2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 25480

2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 21876

2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 11852

2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 10052

2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 27396

(8 rows)

my_database=> SELECT chain_id, finished, output FROM timetable.execution_log;

chain_id | finished | output

----------+-------------------------------+------------

1 | 2021-08-12 15:40:46.064795+02 | SELECT 2

1 | 2021-08-12 15:52:02.380711+02 | SELECT 4

3 | 2021-08-12 15:52:02.382296+02 | INSERT 0 4

1 | 2021-08-12 15:52:07.34872+02 | SELECT 3

3 | 2021-08-12 15:52:07.448914+02 | INSERT 0 4

(5 rows)

Resource Options

We want our scheduler to be as flexible as possible. That's why we have introduced the options category for resource management:

--cron-workers: Number of parallel workers for scheduled chains (default: 16);
--interval-workers: Number of parallel workers for interval chains (default: 16);
--chain-timeout: Abort any chain that takes more than the specified number of milliseconds;
--task-timeout: Abort any task within a chain that takes more than the specified number
of milliseconds.

Finally

This was the first in a series of posts dedicated to the new pg_timetable v4 features. Stay tuned for the coolest features to be highlighted.

You can find previous publications following the tag pg_timetable.

Stay safe, healthy, and wealthy!
Be happy! Peace! Love! ❤

One response to “pg_timetable v4 is out and available immediately!”

Stefan Wolf says:

August 19, 2021 at 1:50 pm

Great, thx.

Nice to see - it's written in go(lang) and uses pgx.
I'm currently take a look at go also...

Reply

pg_timetable v4 is out and available immediately!

pg_timetable v4 vs v3

Performance

Configuration File Support

CopyFromFile built-in task

Enhanced logging

File logs

Database log performance

Tasks output

Resource Options

Finally

One response to “pg_timetable v4 is out and available immediately!”

Leave a Reply Cancel reply

Pavlo Golub

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts