PostgreSQL gets a query parameter of type "unknown"
© Laurenz Albe 2022

 

Recently, I could help solve a “mysterious” performance problem for a customer. The problem turned out to be a badly chosen query parameter type, which led to poor query performance. After working on this customer’s case, I realized that the meaning of the data type of a query parameter is not universally understood, which is why I decided to write this article.

What are query parameters?

In SQL, we often encounter queries that only differ in constant values, for example

SELECT val FROM large WHERE id = 42;
SELECT val FROM large WHERE id = 4711;
SELECT val FROM large WHERE id = 1001;

It makes sense to consider them as different instances of the same query:

SELECT val FROM large WHERE id = $1;

Here, $1 is a placeholder for an arbitrary value. Such a placeholder is called a query parameter (also known as a bind variable). When you execute the query, you have to supply an actual parameter value as an argument for the parameter.

Using a placeholder has several advantages:

  • the database can parse and optimize the statement once and reuse the execution plan several times with different parameter values to speed up processing (read this article to understand how that is done in PostgreSQL).
  • seperating query and parameters prevents the possibility of SQL injection, which is a major threat for database applications.

Support for query parameters in PostgreSQL

The PostgreSQL frontend/backend protocol provides two ways to send an SQL statement:

  • the simple query protocol sends the complete SQL statement as a single string
  • the extended query protocol separates the SQL statement and its parameters:
    • the “Prepare” message sends the statement with the placeholders $1, $1 etc.
    • the “Bind” message sends the parameter values
    • the “Execute” message executes the query

Note that you can use query parameters only for constants. It is not possible to use parameters for identifiers like table, column or schema names. Also, parameters are only supported in SELECT, INSERT, UPDATE and DELETE.

Prepared statements

When you prepare a statement, you can assign it a name, so that the same statement can be executed with different parameter values. This is called a prepared statement, and every database client API provides support for that.

Using JDBC in Java, that would work as follows:

java.sql.PreparedStatement stmt =
	conn.prepareStatement("SELECT * FROM large WHERE id = ?");
stmt.setLong(1, 42L);
java.sql.ResultSet rs = stmt.executeQuery();

The placeholder is always ?, and the first argument of the set* methods specifies the parameter number.

In PostgreSQL, you can also use prepared statements in SQL. PREPARE creates a prepared statement, and EXECUTE executes it:

PREPARE stmt(bigint) AS
   SELECT val FROM large WHERE id = $1;

EXECUTE stmt(42);

Query parameter data types and type conversion

You may have noticed that in both of the above cases, there was a data type associated with the parameter: long in Java and bigint in SQL. However, you don’t have to specify a data type. Then PostgreSQL will try to infer the data type itself. Such an “untyped value” is represented by the data type “unknown” in PostgreSQL.

But PostgreSQL will also consider type conversions for data types other than unknown. For example, there is no equality operator to compare bigint and numeric in PostgreSQL:

SELECT o.oprname
FROM pg_operator AS o
   JOIN pg_amop AS ao
      ON o.oid = ao.amopopr
   JOIN pg_am AS a
      ON a.oid = ao.amopmethod
WHERE /* an equality operator as used in B-tree indexes */
      ao.amopstrategy = 3 AND a.amname = 'btree'
  AND (o.oprleft, o.oprright)
      = ('bigint'::regtype, 'numeric'::regtype);

 oprname 
═════════
(0 rows)

Still, you can compare a bigint with a numeric, because PostgreSQL will implicitly convert the bigint to a numeric, as the latter is the preferred data type for numbers. You can find the exact rules that govern these conversions in the documentation. These rules are quite complicated, because PostgreSQL supports overloading for functions and operators.

An example of the performance impact of query parameter types

To illustrate that, let’s consider this simple example:

CREATE TABLE large (
   id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
   val double precision NOT NULL
);

INSERT INTO large (val)
SELECT random()
FROM generate_series(1, 100000);

VACUUM (ANALYZE) large;

Then we write a Java program that queries the database with the code snippet from above, but we set the parameter in three different ways. Using the auto_explain extension, we capture the execution plan on the server in each case.

Using integer as the parameter type

The parameter is set with

stmt.setInt(1, 42);

and the resulting execution plan is

Query Text: SELECT * FROM large WHERE id = $1
Index Scan using large_pkey on large  (...)
                        (actual time=0.013..0.013 rows=1 loops=1)
  Index Cond: (id = 42)
  Buffers: shared hit=3

This works fine, since setInt marks the parameter as type integer, and there is an equality operator to compare integer and bigint.

Using unknown as the parameter type

The parameter is set as type java.sql.Types.OTHER with

stmt.setObject(1, 42);

and the resulting execution plan is

Query Text: SELECT * FROM large WHERE id = $1
Index Scan using large_pkey on large  (...)
                        (actual time=0.005..0.006 rows=1 loops=1)
  Index Cond: (id = 42)
  Buffers: shared hit=3

This works just as well, because a parameter of type unknown is inferred to have the same type as id, which is bigint.

Using numeric as the parameter type

The parameter is set with

stmt.setBigDecimal(1, java.math.BigDecimal.valueOf(42));

and the resulting execution plan is

Query Text: SELECT * FROM large WHERE id = $1
Seq Scan on large  (...)
                   (actual time=0.012..11.123 rows=1 loops=1)
  Filter: ((id)::numeric = '42'::numeric)
  Rows Removed by Filter: 99999
  Buffers: shared hit=935

Surprisingly, PostgreSQL used a sequential scan, and the performance is much worse. The reason is that the JDBC driver maps java.math.BigDecimal to numeric, so the parameter has the data type numeric. As we have seen above, there is no equality operator for bigint and numeric, so PostgreSQL converts both to numeric. This is clearly visible in the execution plan above. Since the primary key index is defined on “id” and not on “id::numeric”, it cannot be used in this case.

How to avoid problems with query parameter data types

From the above, we can learn the following lesson:

  • Ideally, choose the correct data type for query parameters.
    In the above example, that would have meant using setLong, since the JDBC driver maps long to bigint.
  • In case of doubt, choose an untyped parameter (type unknown) and let PostgreSQL infer the correct data type.

Sometimes both of these strategies won’t work, because there is no data type in your host language that matches the PostgreSQL data type – and PostgreSQL guesses wrong. An example would be this query:

SELECT id, name FROM person
WHERE birthday <= current_timestamp - $1;

We want to supply an interval like “1 year” or “1-0”. Java does not have a data type for intervals, so we would have to send an unknown parameter. However, PostgreSQL infers that we mean the operator

timestamp with time zone - timestamp with time zone

rather than

timestamp with time zone - interval

so we will end up with error messages complaining that “1 year” is not a valid timestamp. In this case, the solution is to add an explicit type cast to the statement:

SELECT id, name FROM person
WHERE birthday <= current_timestamp - CAST ($1 AS interval);

So we have to amend the above list with a third point:

  • If automatic type resolution fails, add an explicit type cast to the statement.

Conclusion

It is easy to overlook the fact that the data type of a query parameter can matter for query performance. To avoid problems, either don’t specify a type and let PostgreSQL do the right thing, or explicitly specify the correct type.

If you want to read more about PostgreSQL query protocol and performance, perhaps you would be interested in my article on pipeline mode.