August 2015

There are still many people out there who cannot spell the name of their favorite database. “PostgreSQL”, “PostgresSQL”, “Postgre SQL” … the list goes on and on. Can we blame those people? Actually no. Some words are simply pretty tricky. Each of us has failed once in a while. After all database work is not about blaming people - it is about helping them. Fuzzy search is a way to solve the problem and to fix user experience. The goal is really to make sure that users get the chance to find something - even if typos are included in the search string.

Contrib modules

PostgreSQL provides a module called “pg_trgm”, which allows users to use trigrams along with indexes. “pg_trgm” is a very capable module and even allows regular expression matches. However, there is more. There are many more algorithms out there, which can be used to measure the distance between words or groups of words.

pg_similarity: A nice addon to PostgreSQL

One module, which has been around for quite a while, ist pg_similarity. It can be downloaded for free from the following website: http://pgsimilarity.projects.pgfoundry.org/ It features a couple of algorithms such as Jaro-Winkler, Q-grams and a lot more.

pg_similarity in action

To see pg_similarity in action we have compiled a couple of examples. Here is a q-gram example:

test=# SELECT
         qgram('PostgreSQL', 'PostgresSQL') AS q1,
         qgram('PostgreSQL', 'Postgres SQL') AS q2,
         qgram('PostgreSQL', 'Bostgres SQL') AS q3;
 q1  |        q2         |        q3
-----+-------------------+-------------------
 0.8 | 0.769230769230769 | 0.538461538461538
(1 row)

test=# SELECT

qgram('PostgreSQL', 'PostgresSQL') AS q1,

qgram('PostgreSQL', 'Postgres SQL') AS q2,

qgram('PostgreSQL', 'Bostgres SQL') AS q3;

q1 | q2 | q3

-----+-------------------+-------------------

0.8 | 0.769230769230769 | 0.538461538461538

(1 row)

Now the same with Jaro-Winkler:

test=# SELECT
         jarowinkler('PostgreSQL', 'PostgresSQL') AS j1,
         jarowinkler('PostgreSQL', 'Postgres SQL') AS j2,
         jarowinkler('PostgreSQL', 'Bostgres SQL') AS j3;
        j1         |        j2         |        j3
-------------------+-------------------+-------------------
 0.981818181818182 | 0.966666666666667 | 0.883333333333333
(1 row)

test=# SELECT

jarowinkler('PostgreSQL', 'PostgresSQL') AS j1,

jarowinkler('PostgreSQL', 'Postgres SQL') AS j2,

jarowinkler('PostgreSQL', 'Bostgres SQL') AS j3;

j1 | j2 | j3

-------------------+-------------------+-------------------

0.981818181818182 | 0.966666666666667 | 0.883333333333333

(1 row)

Depending on your needs you can choose, which algorithm is suited for your problem. In some cases even a combination of various algorithms can be useful.

For updated information, check the Postgres documentation about GIN indexes.

See this 2023 blog post about fuzzy searches for more specific information concerning PostgreSQL 16.

Nowadays JSON is used pretty much everywhere. It's not only web developers who like JSON. It is also used for configuration, data transfer, and a lot more. Luckily PostgreSQL is pretty good at JSON. Recently I have discovered a module called wal2json, which even allows the transformation of xlog to JSON. The module can be found here: https://github.com/eulerto/wal2json

The module can be compiled just like any other contrib module. Once it has been installed, postgresql.conf can be adapted to allow replication slots:

wal_level = logical
max_replication_slots = 10

1 2	wal_level = logical max_replication_slots = 10

After a database restart a replication slot can be created:

test=# SELECT   *
       FROM     pg_create_logical_replication_slot('hans_slot', 'wal2json');

 slot_name | xlog_position
-----------+---------------
 hans_slot | 0/18DD268

test=# SELECT *

FROM pg_create_logical_replication_slot('hans_slot', 'wal2json');

slot_name | xlog_position

-----------+---------------

hans_slot | 0/18DD268

Some data can be inserted to demonstrate how the replication slot works:

test=# CREATE TABLE t_data (id int, name text, payload int[]);
CREATE TABLE

test=# INSERT INTO t_data VALUES (1, 'hans', '{10, 20, 30}');
INSERT 0 1

test=# INSERT INTO t_data VALUES (2, 'paul', '{23, 49, 87}');
INSERT 0 1

test=# CREATE TABLE t_data (id int, name text, payload int[]);

CREATE TABLE

test=# INSERT INTO t_data VALUES (1, 'hans', '{10, 20, 30}');

INSERT 0 1

test=# INSERT INTO t_data VALUES (2, 'paul', '{23, 49, 87}');

INSERT 0 1

When the data is dequeued a perfect stream of JSON documents can be seen:

test=# SELECT    *
       FROM      pg_logical_slot_get_changes('hans_slot', NULL, NULL);
 location  | xid |                             data
-----------+-----+-------------------------------------------------------------------
 0/18DD2F0 | 993 | {                                                                +
           |     |         'xid': 993,                                              +
           |     |         'change': [
 0/18F9678 | 993 |         ]                                                        +
           |     | } 
 0/18F96B0 | 994 | {                                                                +
           |     |         'xid': 994,                                              +
           |     |         'change': [
 0/18F96B0 | 994 |                 {                                                +
           |     |                         'kind': 'insert',                        +
           |     |                         'schema': 'public',                      +
           |     |                         'table': 't_data',                       +
           |     |                         'columnnames': ['id', 'name', 'payload'],+
           |     |                         'columntypes': ['int4', 'text', '_int4'],+
           |     |                         'columnvalues': [1, 'hans', '{10,20,30}']+
           |     |                 }
 0/18F9748 | 994 |         ]                                                        +
           |     | }
 0/18F9780 | 995 | {                                                                +
           |     |         'xid': 995,                                              +
           |     |         'change': [
 0/18F9780 | 995 |                 {                                                +
           |     |                         'kind': 'insert',                        +
           |     |                         'schema': 'public',                      +
           |     |                         'table': 't_data',                       +
           |     |                         'columnnames': ['id', 'name', 'payload'],+
           |     |                         'columntypes': ['int4', 'text', '_int4'],+
           |     |                         'columnvalues': [2, 'paul', '{23,49,87}']+
           |     |                 }
0/18F9818 | 995 |         ]                                                        +
          |     | }
(8 rows)

test=# SELECT *

FROM pg_logical_slot_get_changes('hans_slot', NULL, NULL);

location | xid | data

-----------+-----+-------------------------------------------------------------------

0/18DD2F0 | 993 | { +

| | 'xid': 993, +

| | 'change': [

0/18F9678 | 993 | ] +

| | }

0/18F96B0 | 994 | { +

| | 'xid': 994, +

| | 'change': [

0/18F96B0 | 994 | { +

| | 'kind': 'insert', +

| | 'schema': 'public', +

| | 'table': 't_data', +

| | 'columnnames': ['id', 'name', 'payload'],+

| | 'columntypes': ['int4', 'text', '_int4'],+

| | 'columnvalues': [1, 'hans', '{10,20,30}']+

| | }

0/18F9748 | 994 | ] +

| | }

0/18F9780 | 995 | { +

| | 'xid': 995, +

| | 'change': [

0/18F9780 | 995 | { +

| | 'kind': 'insert', +

| | 'schema': 'public', +

| | 'table': 't_data', +

| | 'columnnames': ['id', 'name', 'payload'],+

| | 'columntypes': ['int4', 'text', '_int4'],+

| | 'columnvalues': [2, 'paul', '{23,49,87}']+

| | }

0/18F9818 | 995 | ] +

| | }

(8 rows)

The real beauty here is that all PostgreSQL JSON functions can be used to process the JSON stream. It is pretty easy to do ex-post analysis on the changes fetched from the xlog.

Find out more about JSON and PostgreSQL in our tag blog spot just for JSON blogs.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.