Data Masking for PostgreSQL

pg_datamask is an extension for PostgreSQL, built by CYBERTEC. Limit the exposure of sensitive data – anonymize it. Provide real-world testing data to your developers without risk; data masking for PostgreSQL is the best way to protect your data. Our extension ensures that real data is never exposed to software developers – however, it preserves the original characteristics of your data. Your result: realistic testing without the risk of data leaks.

Secure data: The need for data masking

Masking data is not just desirable – in some cases it is even required by law. CYBERTEC provides a means to help PostgreSQL users protect their data.

What data masking can do for you:

Prevent data proliferation: The problem of data proliferation affects all areas of business and can gravely affect the profitability of your businesses.
Conform to legal requirements: In recent years, various regulations and legal requirements such as PCI-DSS, GDPR (European Union General Data Protection Regulation) have been created to ensure private data stays private.
Protect privileged or secret information: In many cases, companies work with highly critical data which should not be seen by potential competitors.

Data masking is an elegant solution to these problems.

How data masking for PostgreSQL works

Our module hooks into the PostgreSQL core and processes data while it streams.
From a user’s point of view, you first create a masked backup, which can then be used by developers to work on fully anonymized data. The advantage of the process is that developers are always clearly separated from the production system, which drastically reduces the risk of a leak.

Here is how data masking with pg_datamask works:

Configure PostgreSQL for data masking
Build a model to handle anonymization
Create a user to create secure backups
Take a masked backup
Provide the secure backups to your developers

Customized masking vs. generic masking

Currently there are two options available to anonymize data:

• Generic masking: We ship a ready-made function, suitable for most needs
• Custom-built code: You can write your own functions to mask your data

If you want to go the fast route, we’ll provide you with a ready-made solution to handle your data – get started quickly. However, if you prefer, write your own custom code and get all the flexibility you need to handle data the way you want.

Generic masking: Our masking library

Our masking library allows you to choose how to mask specific data types. The library provides functionality for the most typical use cases, which allows you to:

easily customize your masking process;
at the same time, retain efficiency.

Our library has all you need:

Simple replacements for very basic needs
Fully irreversible anonymization

Limitations and side notes:

NULL fields won’t be masked, because the content is already “unknown” anyway.
In some rare cases, it could be that CHECK constraints (e.g. CHECK (field < 100)) fail on replay. To get around that, you have to write your own masking function, or just ignore those failures on replay. The reason is that some constraints are so restrictive that automatic masking might lead to errors, so we give the user the chance to directly decide how to handle those cases.