Keyword mess

01.2019 / Category: / Tags:

I once received a mail with question: Can you tell me why I can't select the column references?


Well, the quick answer will be: because REFERENCES is a keyword you should use double quotes around it, e.g.

=# select references from v_table_relation;

But why do you sometimes need to explicitly quote identifiers and sometimes it works without them? And how do you know what keywords exactly are used in the PostgreSQL? And what keywords may be used as object names? And so on...

As usual one may find the answers in the PostgreSQL manual:

There we have detailed table that lists all tokens that are key words in the SQL standard and in PostgreSQL. From the manual you will know about reserved and non-reserved tokens, SQL standard compatibility and much more which is out of scope of this post.

There are two more ways to know what tokens can or cannot be used in certain situations. First is for true programmers, and the second is for true admins. Choose your pill, Neo.

Investigating the sources

PostgreSQL uses LALR(1)* parser to work with SQL statements. Implementation of the grammar (parser itself) may be found in the gram.y and implementation of the lexical analyzer (lexer) in the scan.l file.

Query first chewed to by lexer, which splits the text into tokens and sends them to the parser. It is logical to assume that lexer is the first who knows if the token is a keyword or not. Examining the source of it we find such lines:

So, the lexer knows if the identifier is a keyword, but this doesn't give us much, since during lexical analyzis we're lacking context. And that is what parser takes care of.

Our journey started from simple SELECT statement producing error, so let's try to examine gram.y from the top to bottom. Our route will be like this:

stmtblock -> stmtmulti -> stmt -> CreateStmt

Here we see that our columns are listed in a OptTableElementList node, so let's dig it:

OptTableElementList -> TableElementList -> TableElement -> columnDef -> ColId

Bamn! We found the right place!

By the way, I'm absolutely sure PostgreSQL sources are the best academic reading in the world. From here we see that we want to accept as much "reserved" keywords as possible, but this produces difficulties in different places causing grammar conflicts (reducereduce mostly, because shiftreduce are solved automatically). Thus, to prevent these errors developers divided keywords into several classes, e.g. "unreserved_keyword" and "col_name_keyword". And only one step left: to find these classes. Piece of cake! Let's check "unreserved_keyword" node:

Asking the server

There is one more way knowing how restrictive keyword usage is: execute query

SELECT * FROM pg_get_keywords();

Function pg_get_keywords() returns list of SQL keywords and their categories:

The catcode column contains a category code: U for unreserved, C for column name, T for type or function name, or R for reserved. The catdesc column contains a possibly-localized string describing the category.

0 0 votes
Article Rating
Notify of
1 Comment
Newest Most Voted
Inline Feedbacks
View all comments
5 years ago

Oh man! you're going deep!!! I feel so small, as far as I felt in person. You are doing a great job! Thank you for sharing.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf

+43 (0) 2622 93022-0

Get the newest PostgreSQL Info & Tools

    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    CYBERTEC PostgreSQL International GmbH
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram