CYBERTEC PostgreSQL Logo

CREATE INDEX: Data types matter

05.2025 / Category: / Tags:

In the PostgreSQL world, as well as in order database systems, data types play a crucial role in ensuring optimal performance, efficiency, as well as semantics. Moreover, some data types are inherently easier and faster to index than others. Many people are not aware of the fact that indexes indeed make a difference, so let us take a look and see how long it takes to index the very same data using different
types.

Creating some sample data

To show which differences a data type makes, we first have to create some sample table. In this case, it contains 5 data types that we want to inspect to understand how they behave:

Finally, we use our good old friend (the generate_series function), which has served me very well over the years:

What this does is generate 50 million random rows and put them into my table. Note that all column entries are identical to ensure fairness for our index creation.

The data might look as follows:

Note that PostgreSQL does not necessarily write data to disk immediately. In addition to that, there is a thing known as "hint bits", which can impact performance. We want to ensure that we can compare the time needed to create those indexes, so we VACUUM the table (for hint bits) and force a checkpoint to force everything to disk:

Running CREATE INDEX for real

In this example, all tests are using the default settings of PostgreSQL, which are of course suboptimal and can be tuned as described in one of our more successful blog posts about CREATE INDEX.

One of the key settings (to be found in postgresql.conf) controlling indexing speed is maintenance_work_mem. For this quick benchmark, we kept the standard setting of 64 MB:

Let us create the index on the integer column first:

As we can see, this is the fastest process we can expect - indexing integer is really quick because CPUs are made to handle numbers. For 64 bit integers, we see similarly good results:

However, the story starts to change if we move from integer numbers to floating point values. The process starts to be slower:

While native CPU floating point numbers (we are using ARM chips here) are still reasonably okay, we will see a massive slowdown compared to integer values when we move to "numeric":

What is the purpose of "numeric" anyway? The answer is: Accounting. Floating point numbers have different precision depending on the magnitude of the number. "numeric" ensures that the precision is exactly the way we want it to be, regardless of small or large values. Therefore, we should always use "numeric" for money, accounting, and so on. One thing has to be kept in mind at all times:
The result has to always be correct - even if it takes longer to calculate it. Therefore, one should never use "float" to handle currency.

Obviously, the by far most expensive data type is "text". Here are the numbers:

Obviously, one would not want to store numbers as text anyway, but I decided to put the value here for comparison.

Comparing results

After running those tests, we can compare the results and see what happens. The following graph visualizes the results:

graphic for blog CREATE INDEX Data types matter

What is interesting here is that the difference in speed is quite substantial. Most people greatly underestimate those changes in performance and thus it makes sense to spread some awareness among readers.

2 responses to “CREATE INDEX: Data types matter”

  1. Tried to reproduce and got interesting side-effects (only partly reproducing your numbers). Especially because indexes are typically created only once, and that's it (summary at the end):

    VACUUM ANALYZE;
    CHECKPOINT;

    CREATE INDEX ON t_demo (v1);
    Time: 49167.688 ms (00:49.168)
    CREATE INDEX ON t_demo (v2);
    Time: 40263.288 ms (00:40.263)
    CREATE INDEX ON t_demo (v3);
    Time: 58812.693 ms (00:58.813)
    CREATE INDEX ON t_demo (v4);
    Time: 56302.695 ms (00:56.303)
    CREATE INDEX ON t_demo (v5);
    Time: 133345.734 ms (02:13.346)

    DROP INDEX t_demo_v1_idx;
    DROP INDEX t_demo_v2_idx;
    DROP INDEX t_demo_v3_idx;
    DROP INDEX t_demo_v4_idx;
    DROP INDEX t_demo_v5_idx;
    VACUUM ANALYZE;
    CHECKPOINT;

    CREATE INDEX ON t_demo (v5);
    Time: 119213.866 ms (01:59.214)
    CREATE INDEX ON t_demo (v4);
    Time: 53486.847 ms (00:53.487)
    CREATE INDEX ON t_demo (v3);
    Time: 49539.559 ms (00:49.540)
    CREATE INDEX ON t_demo (v2);
    Time: 32803.024 ms (00:32.803)
    CREATE INDEX ON t_demo (v1);
    Time: 33744.911 ms (00:33.745)

    DROP INDEX t_demo_v1_idx;
    DROP INDEX t_demo_v2_idx;
    DROP INDEX t_demo_v3_idx;
    DROP INDEX t_demo_v4_idx;
    DROP INDEX t_demo_v5_idx;
    VACUUM ANALYZE;
    CHECKPOINT;

    CREATE INDEX ON t_demo (v1);
    Time: 30534.489 ms (00:30.534)
    CREATE INDEX ON t_demo (v2);
    Time: 33775.389 ms (00:33.775)
    CREATE INDEX ON t_demo (v3);
    Time: 47776.601 ms (00:47.777)
    CREATE INDEX ON t_demo (v4);
    Time: 59156.192 ms (00:59.156)
    CREATE INDEX ON t_demo (v5);
    Time: 124224.454 ms (02:04.224)

    DROP INDEX t_demo_v1_idx;
    DROP INDEX t_demo_v2_idx;
    DROP INDEX t_demo_v3_idx;
    DROP INDEX t_demo_v4_idx;
    DROP INDEX t_demo_v5_idx;
    VACUUM ANALYZE;
    CHECKPOINT;

    CREATE INDEX ON t_demo (v5);
    Time: 119638.285 ms (01:59.638)
    CREATE INDEX ON t_demo (v4);
    Time: 56985.595 ms (00:56.986)
    CREATE INDEX ON t_demo (v3);
    Time: 49229.118 ms (00:49.229)
    CREATE INDEX ON t_demo (v2);
    Time: 32311.887 ms (00:32.312)
    CREATE INDEX ON t_demo (v1);
    Time: 33835.165 ms (00:33.835)

    Summary:

    v1: 0:49.168 0:33.745 0:30.534 0:33.835
    v2: 0:40.263 0:32.803 0:33.775 0:32.312
    v3: 0:58.813 0:49.540 0:47.777 0:49.229
    v4: 0:56.303 0:53.487 0:59.156 0:56.986
    v5: 2:13.346 1:59.214 2:04.224 1:59.638

    select version();
    PostgreSQL 16.8 (Ubuntu 16.8-0ubuntu0.24.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, 64-bit

    Any explanation?

    • Those numbers seem to confirm that text is the most expensive data type to index.
      The differences in the measurements are perhaps due to different hardware, software or operating system.
      Benchmarks are difficult to generalize...

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2025
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram