CYBERTEC PostgreSQL Logo

Can Collations Be Used Over citext?

09.2025
Category: 

Introduction

Recently, I read Laurenz Albe's blog about case insensitive string search. He recommended case insensitive collations saying, "it won't be any more expensive (and maybe more intuitive) than the current techniques of using upper() or the citext extension are today". In other words, a custom collation can be used instead of citext to achieve case insensivity with a lower performance penalty. Today we will be testing it.

Test Setup

We created the collation as case insensitive but accent sensitive beceause citext works in the same way.

Then duplicate all these records until we have ~32m rows for both tables.

Comparison For Different Operators

"=" Operator

The following results show the average execution time after executing them 20 times:

As it can be seen index scans for both are roughly the same. That’s expected, since an index lookup only executes the comparison operator a few times. Interesting part is on the sequential scan. Let' s check the sequential scan:

As we know that there 5 different values in the demo1 and demo2 tables; 'apple', 'APPLE', 'Åpple', 'Apple', 'applex'. These values each make up about %25 of the table(except 'apple' and 'applex', since the first 'apple' record is updated as 'applex'). Because of case insensitivity and accent sensitivity, the condition "word = 'apple'" matches roughly %75 of the rows. As a result, query runs on demo1 with custom collation is ~3 times faster than the other.

For <, <=, > and >= Operators

What about <, <=, > and >= operators, do they work? Yes, they do:

In this case as well, the query runs on the demo1 table is ~4 times faster than the one runs on demo2 table. An important point to note is that as the number of fetched rows increases, the performance gap between the custom collation and citext grows wider.

Note: be carefull, 'Åpple' comes before 'applex' in both queries because both implementations are Unicode aware.

LIKE Operator

This is the tricky part because on PostgreSQL 17 and earlier, the LIKE operator is not supported with nondeterministic collations. For example;

However, Laurenz pointed out this commit, starting from PostgreSQL 18 the LIKE operator is supported with nondeterministic collations as well.

The query on demo1 with the custom collation runs faster than on demo2. However, both queries use a sequential scan — why? The reason is that both solutions (custom collation and citext) are Unicode - aware. Normally, to speed up LIKE queries on text, char, or varchar columns, we create an index with the xxx_pattern_ops option. But since our collation is nondeterministic, creating such an index results in an error:

For citext story is a bit different, to test it, I inserted some rows into demo2:

In contrast, with a proper index on a citext column, it is possible to use the LIKE operator.

Conclusion

A custom nondeterministic collation can indeed serve as an alternative to citext for case-insensitive queries, often with a much lower performance penalty.

  • For equality lookups (=) both approaches perform similarly when indexes are used, but under sequential scans the custom collation is consistently 2–4x faster than citext.
  • For range queries (<, <=, >, >=), both solutions work correctly and respect Unicode ordering, but the performance gap widens as the number of matching rows increases, with the custom collation again outperforming citext.
  • For LIKE queries, things are more nuanced:
    • On PostgreSQL 17 and earlier, LIKE is not supported with nondeterministic collations at all.
    • Starting with PostgreSQL 18, it is supported, but still limited to sequential scans because pattern operator classes cannot be used with nondeterministic collations.
    • citext, on the other hand, can still leverage functional indexes, making it more flexible for LIKE queries when proper indexing is required.

In short:

  • If your main use case is equality and range comparisons, a custom ICU collation is the better choice, offering better performance with simpler semantics.
  • If you rely heavily on pattern matching with LIKE and indexes, citext remains more practical until nondeterministic collations gain better index support.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2025
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram