CYBERTEC PostgreSQL Logo

PostgreSQL vs PostgreSQL vs Postgres vs Oracle vs all the rest

09.2020 / Category: / Tags: |

Is it Postgre, PostGreSQL, Postgres or PostgreSQL? We have all seen a couple of wrong ways to spell “PostgreSQL”. The question therefore is: How can one find data even if there are typos? In PostgreSQL there are various solutions to the problem. Depending on what kind of search you need you can choose between various methods.

Before we get started it is necessary to compile some sample data:

All we have here is the name of some database along with a handful of different spellings and typos.

Simple case-insensitive search in PostgreSQL

There is one common thing people need quite frequently: Case-insensitive search. Of course one can work around this problem using “upper” and “lower” but in many cases this is simply less convenient than it should be. In addition to that developers have to keep these things in mind at all times which can lead to bugs.

Fortunately, there is a more simplistic solution available: The “citext” datatype (as provided by the “citext” extension) can handle case-insensitive comparisons. The following example shows how it works:

As you can see, the data in the table is elegantly preserved. Just the comparison function is a bit more relaxed. Solving the case-sensitivity issue on the data type level is really nice and makes life a lot easier for all software making use of the database. The citext extension is available on all public clouds including Amazon AWS, Microsoft Azure and so on.

LIKE queries: When we know at least something

LIKE and ILIKE are the classical means to do similarity search in PostgreSQL:

As you can see, we can find all incarnations of PostgreSQL in an easy way. However, LIKE and ILIKE have a major limitation: One has to know quite a lot about the spelling of things you are looking for. In my example we needed at string that contained “postgr” - but what if somebody had used “BostgreSQL”? We would not have found anything. In other words: LIKE and ILIKE are good but in many cases these two keywords are not sufficient to find what is really needed.

Trigrams are an additional way, provided by PostgreSQL, to handle similarity search and are more suitable to handle typos. To use trigrams one has to install the extension:

The trigram extension provides us with the distance operator. It helps us to find similar strings. Let us give it a try: Support we want to search for “db3” because we did not quite understand that the IBM product is actually called “DB2”. Here is what happens:

We can sort by distance which ideally gives us the string closest to what we are searching for. As you can see “DB2” and “DB/2” come out on top which is exactly the way it should be. Of course pg_trgm is not the holy grail and should be used with caution. However, it is one way to tackle the problems caused by wrong data.

While pg_trgm is useful is you are looking for names, addresses and so on full text search has a slightly different purpose. The goal is to look for certain words in text. Here is an example:

We are looking for the words “server” and “microsoft”. Naturally one row is returned. What is noteworthy here is that the order of words can be swapped. We simply want to make sure that all words are in the string - for now order does not matter.

Full text search in PostgreSQL is quite powerful and it offers a variety of additional features which are way beyond the scope of this posting.

However, what can we do if the order of words does matter? The answer is “phrase search”: PostgreSQL allows you to specify in which order words have to appear:

In this case we want “microsoft” followed by “sql”. PostgreSQL will return the right row. However, what if we look for “microsoft” followed by “server”? In this case things will fail because there happens to be a word in between which is not allowed:

Fortunately, PostgreSQL (https://www.postgresql.org/docs/current/functions-textsearch.html) has a feature indicating how far things are allowed to be apart. The following listing shows how this works:

Finally …

PostgreSQL has many options to handle fuzzy search. There are many more things out there (such as pgsimilarity) which can be used to make our lives easier. One of the things people are often not aware of is the fact that full text search and indexing can also have an impact on the optimal VACUUM policy. Check out my posting about the “GIN pending list” to learn more.

5 1 vote
Article Rating
Subscribe
Notify of
guest
3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Stirling Butcher
Stirling Butcher
4 years ago

Poor title mislead me to coming here

Sameer A
Sameer A
4 years ago

Agree. Nevertheless it was a good and simple article for those struggling to tackle fuzzy matches.

Zixma Liamg
Zixma Liamg
4 years ago

Thank you, I've share this url done.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram