Operator classes explained

After my morning session in Dublin some people asked me to share the codes of an operator class I created for demo reasons. This posting is supposed to cover those codes.

Table of Contents

What is an operator class?

In PostgreSQL an index is not a hardwired thing, capable of following just one strategy. Instead an index is more of a framework people can use to implement their own indexing strategies. What is the point of all that?

Let us consider two values stored in a database:

BEGIN;

CREATE TABLE t_sva (sva text);
 
INSERT INTO t_sva VALUES ('1118090878');
INSERT INTO t_sva VALUES ('2345010477');

BEGIN;

CREATE TABLE t_sva (sva text);

INSERT INTO t_sva VALUES ('1118090878');

INSERT INTO t_sva VALUES ('2345010477');

At first glance those two numbers seem to be in their right order (alphabetical). However, in reality they are not. Those two odd numbers are in fact Austrian social security numbers. The first number is related to the 9^th of August 1978 (09 = day of month, 08 = month, 78 = year). 1118 is just a sequential number issued by the social security system. Taking into account that 1977 was actually before 1978 the second number should be the first entry in a sorted list.

The fact that those numbers need a “strange” sort order create some challenge when it comes to indexing this stuff. This is exactly when PostgreSQL operator classes kick in. An operator allows a user to create his own handcrafted indexing strategy. As the name (= operator class) already implies, creating an indexing strategy is all about operators. The goal is to create our own operators and use them alongside a normal btree.

Here is how it works:

CREATE OR REPLACE FUNCTION normalize_si(text)
RETURNS text AS $$
        BEGIN
	        RETURN substring($1, 9, 2) ||
       	         	substring($1, 7, 2) || substring($1, 5, 2) || substring($1, 1, 4);
        END; 
$$
LANGUAGE 'plpgsql' IMMUTABLE;

CREATE OR REPLACE FUNCTION normalize_si(text)

RETURNS text AS $$

BEGIN

RETURN substring($1, 9, 2) ||

substring($1, 7, 2) || substring($1, 5, 2) || substring($1, 1, 4);

END;

LANGUAGE 'plpgsql' IMMUTABLE;

First of all we create a simple stored procedure capable of transforming our strange string and turn it into something which can be sorted normally:

SELECT normalize_si('1118090878');
 normalize_si
--------------
 7808091118
(1 row)

SELECT normalize_si('1118090878');

normalize_si

--------------

7808091118

(1 row)

The clue here is that we can now use our own logic to compare values and turn this logic into operators. To do so we need a set of functions:

CREATE OR REPLACE FUNCTION si_lt(text, text)
RETURNS boolean AS
$$
        BEGIN
                RETURN public.normalize_si($1) < public.normalize_si($2);
        END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
 
-- lower equals
CREATE OR REPLACE FUNCTION si_le(text, text)
RETURNS boolean AS
$$
        BEGIN
                RETURN public.normalize_si($1) <= public.normalize_si($2);
        END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;

CREATE OR REPLACE FUNCTION si_lt(text, text)

RETURNS boolean AS

BEGIN

RETURN public.normalize_si($1) < public.normalize_si($2);

END;

$$ LANGUAGE 'plpgsql' IMMUTABLE;

-- lower equals

CREATE OR REPLACE FUNCTION si_le(text, text)

RETURNS boolean AS

BEGIN

RETURN public.normalize_si($1) <= public.normalize_si($2);

END;

$$ LANGUAGE 'plpgsql' IMMUTABLE;

In PostgreSQL every operator is simply based on a stored procedure. So if you have proper procedures at hand already, it is pretty simple to turn those into operators. CREATE OPERATOR does the job:

-- greater equal
 
CREATE OR REPLACE FUNCTION si_ge(text, text)
RETURNS boolean AS
$$
        BEGIN
                RETURN public.normalize_si($1) >= public.normalize_si($2);
        END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
 
-- greater
CREATE OR REPLACE FUNCTION si_gt(text, text)
RETURNS boolean AS
$$
        BEGIN
                RETURN public.normalize_si($1) > public.normalize_si($2);
        END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;

CREATE OPERATOR <# ( PROCEDURE=public.si_lt,
                        LEFTARG=text,
                        RIGHTARG=text);
 
CREATE OPERATOR <=# ( PROCEDURE=public.si_le,
                        LEFTARG=text,
                        RIGHTARG=text);
 
CREATE OPERATOR >=# ( PROCEDURE=public.si_ge,
                        LEFTARG=text,
                        RIGHTARG=text);
 
CREATE OPERATOR ># ( PROCEDURE=public.si_gt,
                        LEFTARG=text,
                        RIGHTARG=text);

-- greater equal

CREATE OR REPLACE FUNCTION si_ge(text, text)

RETURNS boolean AS

BEGIN

RETURN public.normalize_si($1) >= public.normalize_si($2);

END;

$$ LANGUAGE 'plpgsql' IMMUTABLE;

-- greater

CREATE OR REPLACE FUNCTION si_gt(text, text)

RETURNS boolean AS

BEGIN

RETURN public.normalize_si($1) > public.normalize_si($2);

END;

$$ LANGUAGE 'plpgsql' IMMUTABLE;

CREATE OPERATOR <# ( PROCEDURE=public.si_lt,

LEFTARG=text,

RIGHTARG=text);

CREATE OPERATOR <=# ( PROCEDURE=public.si_le,

LEFTARG=text,

RIGHTARG=text);

CREATE OPERATOR >=# ( PROCEDURE=public.si_ge,

LEFTARG=text,

RIGHTARG=text);

CREATE OPERATOR ># ( PROCEDURE=public.si_gt,

LEFTARG=text,

RIGHTARG=text);

If you want to create an operator you have to tell PostgreSQL which function to call. Then we have to map the left and the right argument to our two arguments passed to the function. Of course, there are more features here which can be used to make this work even better, but for the most simplistic stuff this is enough.

The names of those operators make no difference here. However, I would recommend to use something readable.

Support functions

Before an index shows us how to handle a new strategy, we have to define some support functions. Depending on the type of index we will need different sets of support functions. In our case we are planning to use a simple btree so all we need is a comparison function returning a simple integer value:

CREATE OR REPLACE FUNCTION si_same(text, text)
        RETURNS int AS
$$
        BEGIN
                IF      public.normalize_si($1) < public.normalize_si($2)
                THEN
                        RETURN -1;
                ELSIF   public.normalize_si($1) > public.normalize_si($2)
                THEN
                        RETURN +1;
                ELSE
                        RETURN 0;
                END IF;
        END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;

CREATE OR REPLACE FUNCTION si_same(text, text)

RETURNS int AS

BEGIN

IF public.normalize_si($1) < public.normalize_si($2)

THEN

RETURN -1;

ELSIF public.normalize_si($1) > public.normalize_si($2)

THEN

RETURN +1;

ELSE

RETURN 0;

END IF;

END;

$$ LANGUAGE 'plpgsql' IMMUTABLE;

Creating the operator class

Now we have all ingredients in place to use our own indexing strategy. All it takes now is one last command:

CREATE OPERATOR CLASS sva_special_ops
       FOR TYPE text USING btree
AS
    OPERATOR        1       <#  ,
    OPERATOR        2       <=# ,
    OPERATOR        3       =  ,
    OPERATOR        4       >=# ,
    OPERATOR        5       >#  ,
    FUNCTION        1       public.si_same(text, text)
;

CREATE OPERATOR CLASS sva_special_ops

FOR TYPE text USING btree

OPERATOR 1 <# ,

OPERATOR 2 <=# ,

OPERATOR 3 = ,

OPERATOR 4 >=# ,

OPERATOR 5 ># ,

FUNCTION 1 public.si_same(text, text)

;

We have to tell the index which operators to use when and which support functions to use. In case of btrees this is really simple.

We can already create an index using our own operator class now:

CREATE INDEX idx_special ON t_sva (sva sva_special_ops);

1	CREATE INDEX idx_special ON t_sva (sva sva_special_ops);

The optimizer is already capable of using our index:

SET enable_seqscan TO off;          
     -- to make sure that this work with just two rows

SELECT * FROM t_sva WHERE sva = '0000112276';
                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Index Only Scan using idx_special on t_sva  (cost=0.13..8.14 rows=1 width=32)
   Index Cond: (sva = '0000112276'::text)
(2 rows)

SET enable_seqscan TO off;

-- to make sure that this work with just two rows

SELECT * FROM t_sva WHERE sva = '0000112276';

QUERY PLAN

-------------------------------------------------------------------------------

Index Only Scan using idx_special on t_sva (cost=0.13..8.14 rows=1 width=32)

Index Cond: (sva = '0000112276'::text)

(2 rows)

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

Operator classes explained

What is an operator class?

Here is how it works:

Support functions

Creating the operator class

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts