CYBERTEC Logo

pg_timetable v4 is out and available immediately!

08.2021 / Category: / Tags: | | |

Our team is proud to introduce a new major pg_timetable v4 release!

This time a huge amount of work has been done and the pg_timetable v4 release is backward-incompatible with previous versions. However, job migration is straightforward and shouldn't stop anyone from updating.

We've added detailed Read the Docs documentation and will enhance it further. Let us know what areas should be highlighted or what topics we should add.

Now, let's check out some of the great new features! You can review the official release page for a complete changelog with links to issues, commits, and pull requests.

I want to remind you that pg_timetable is a community project. So, please, don't hesitate to ask any questions , to report bugs, to star pg_timetable project, and to tell the world about it.

pg_timetable v4 vs v3

The previous release is still considered stable. There are no confirmed bugs nor critical performance issues and is absolutely safe to stick with it.

However, it won't be backpatched if new bugs are discovered. In addition, new features won't be implemented for the v3 branch.

Performance

To increase performance, we worked on several fronts.

First of all, we reworked the database schema and simplified it, as much as possible. This way new pg_timetable can execute fewer system queries leaving more bandwidth and connections to the real workers.

Secondly, we rewrote cron-related functions in pure SQL instead of PL/PgSQL. It allows us not only to increase performance but also to get rid of tedious debugging.

And the third pillar is the comprehensive logging system. See details below in the separate section.

Configuration File Support

As soon as the functionality grows, it becomes error-prone and tedious to handle configuration using only command-line parameters and environmental variables.

Rules of common sense suggest to use configuration files, in this case. We've added a detailed self-explanatory config.example.yaml file for you.

Of course, you are free to use a custom set of options or not to use it at all. For example, it seems like a good idea to provide general options in the configuration file and connection options using environmental variables, for the sake of security.

CopyFromFile built-in task

One of the most frequent tasks for database schedulers is to import some data from external sources to the database, e.g. sensors, logs, official statistics, etc. The CSV format is standard de facto for such kinds of data. PostgreSQL can use the COPY command to import CSV files. But to use this function, the file should be placed in the server's file system, so the PostgreSQL process can access it. That is not always the case for production environments, especially in the cloud era.

To overcome this limitation in the previous version of pg_timetable, we suggest using psql copy command, which performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.

From now on you don't need any additional programs to perform client copy. One can use the new built-in CopyFromFile task. You may find the full example in the manual. Below you will notice that the task accepts JSON parameters in the form of an object, e.g.

  • Here sql is the correct COPY .. FROM STDIN command. One can use any of the standard clauses controlling file specification, e.g. file encoding, condition for insert, escaping, quoting, etc.
  • And the filename is the correct path (absolute or relative to pg_timetable) of the source CSV file.

This task perfectly works for all formats supported by PostgreSQL: text, csv and binary.

Enhanced logging

One of the key architectural decisions made for pg_timetable was the database-driven concept. Meaning scheduler sessions are fully controlled by the database and every piece of the output is stored back in the database for further analysis.

File logs

That said, people were using output redirection, if they wanted to have logs stored in files, e.g.

The drawback of this approach is that information is stored as plain text, making it hard for machine processing. In the new version, we have introduced the ability to duplicate logs to a file system using --log-file command line parameter. Moreover,  the user can choose the exact format of the underlying file specifying --log-file-format: json or text.

Database log performance

Another bottleneck we were trying to fix was the database logging. Heavy clients were producing a lot of information to be stored by separate INSERT statements, causing intensive use of connections. We have come up with an outstanding log hook implementation for the famous logrus package.

The key benefits are:

  1. Usage of the COPY machinery;
  2. Cache controlled by the time and the number of entries to be logged;
  3. Ability to skip entry logging in case of extremely high load.

I personally highly encourage you to use our log hook in your projects. Let us know if you think we should make it a standalone project.

Tasks output

pg_timetable has been storing output for program tasks from the very beginning. It's useful for debugging and back-in-time incident analysis. We never thought this feature could be applied to the built-in SQL tasks until Nikolay Samokhvalov proposed it. So now one can have SQL task with SELECT statement to determine which output will be saved upon execution, e.g.

The output of the SQL task is the tag of the command executed and the number of rows returned or affected. We are not storing the result set since commands can be arbitrarily complicated or rather huge. Users should take care of this and explicitly store anything important, e.g.

Resource Options

We want our scheduler to be as flexible as possible. That's why we have introduced the options category for resource management:

  • --cron-workers: Number of parallel workers for scheduled chains (default: 16);
  • --interval-workers: Number of parallel workers for interval chains (default: 16);
  • --chain-timeout: Abort any chain that takes more than the specified number of milliseconds;
  • --task-timeout: Abort any task within a chain that takes more than the specified number
    of milliseconds.

Finally

This was the first in a series of posts dedicated to the new pg_timetable v4 features. Stay tuned for the coolest features to be highlighted.

You can find previous publications following the tag pg_timetable.

Stay safe, healthy, and wealthy!
Be happy! Peace! Love! ❤

0 0 votes
Article Rating
Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Stefan Wolf
Stefan Wolf
2 years ago

Great, thx.

Nice to see - it's written in go(lang) and uses pgx.
I'm currently take a look at go also...

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    1
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram