CYBERTEC Logo

Conditional foreign keys and polymorphism in SQL: 4 Methods

08.2023 / Category: / Tags: | |
Is a key left by the neighbor with a request to feed the cat while he is gone a conditional foreign key?
© Laurenz Albe 2023

 

This article is about the notoriously difficult problem of “conditional foreign keys”. In object-oriented programming languages, polymorphism is a natural concept. However, it does not translate well to SQL. As a consequence, many people whose data models are driven by an object-oriented application design end up trying to implement conditional foreign keys. In this article, I will explore different ways to solve the problem and their respective advantages and disadvantages. Most of this discussion is not specific to PostgreSQL, but I will also explore how PostgreSQL features like JSON support and table inheritance can help with conditional foreign keys.

Polymorphism in object-oriented languages

As an example, let's consider a data model in Java for a traffic insurance company. We want to model different kinds of vehicles:

Vehicle and MotorVehicle are abstract classes, that is, you cannot create objects of that type. They define attributes that are common to all vehicles or motor vehicles. The classes you can instantiate (create actual objects of that type) are Car and Bicycle.

Now we can model customers and contracts:

A Contract references a Customer and a Vehicle. Note that while there can be no objects of type Vehicle, the attribute vehicle can reference any object of a subclass of Vehicle, like a Car or a Bicycle. This useful feature is called polymorphism.

The problem of conditional foreign keys

To persist the data from the insurance application, we have to map the application objects to database tables. Customer, Car and Bicycle are easy to translate:

The problem occurs when we try to map Contract to a database table:

[sql highlight="5"]
CREATE TABLE contract (
contract_id bigint GENERATED ALWAYS AS IDENTITY
PRIMARY KEY,
customer_id bigint REFERENCES customer NOT NULL,
vehicle_id bigint REFERENCES ???,
start date NOT NULL,
end date NOT NULL,
payment numeric(15,2) NOT NULL
);

The foreign key vehicle_id should sometimes reference a car, sometimes a bicycle, and sometimes some other type of vehicle. This is what I call a “conditional foreign key”. You cannot directly write a conditional foreign key in SQL. So, what options do you have to model one?

Method 1: Modeling conditional foreign keys with multiple foreign keys

With this technique, you add several foreign keys to contract: one for each type of vehicle. An additional column v_type identified the type of vehicle.

I added v_type further down to avoid wasting storage space. An additional check constraint makes sure that the data are consistent:

This code makes use of the fact that the boolean FALSE converts to the integer 0 and TRUE converts to 1.

The advantages of this solution are that consistency is guaranteed, and that we don't have to modify car and bicycle. The disadvantage is that we need many foreign key columns in contract. We have to add another foreign key for each new kind of vehicle, and we have to adapt the check constraint. This solution is particularly appealing in cases with not too many types of vehicles and if the various types have few common attributes.

Method 2: Modeling conditional foreign keys by storing all referenced data in a single table

With this technique, we store all vehicle data in a single table:

Again, a check constraint makes sure that the data are consistent, and each type of vehicle has only the appropriate attributes set. If we want, we can create views for car and bicycle:

Now, the definition of contract is simple:

The advantage of this solution is that the definition of contract is straightforward. The disadvantage is that vehicle has to grow new columns for each additional vehicle type. This solution is particularly useful if the tables referenced have most columns in common, so that vehicle does not end up having too many columns. It is also an appealing solution if there are many types of vehicles.

Method 3: Modeling conditional foreign keys with a single table and jsonb

This is a variation of the previous method and relies on PostgreSQL's JSON capabilities. Instead of creating columns for all vehicle types in the vehicle table, we add the additional attributes in a single jsonb column:

Again, we can create views for the individual types of vehicles:

The idea is to model all frequent attributes as normal columns and all rare ones as JSON attributes. The advantage is that the number of columns stays moderate, which is great if there are many types of vehicles with many attributes. If you create a GIN index on the jsonb column, you can perform efficient searches. On the down side, fetching and updating the JSON attributes will be more expensive.

Method 4: Modeling conditional foreign keys by pointing the foreign key in the other direction - "faking it"

With this method, we don't add a foreign key vehicle_id to contract. Instead, we add foreign keys from car and bicycle to contract:

You will say that that is not really correct, and you are right. The disadvantage of this solution is that we don't have a guarantee that each contract is related to one and only one vehicle. But it is a simple solution that you can resort to if the previous solutions are not feasible. If you index these foreign keys like you should, you can still find the vehicle that belongs to a contract fairly efficiently.

PostgreSQL table inheritance and polymorphism

Back in the day when object-oriented programming was the latest fad, PostgreSQL added table inheritance to turn from a RDBMS (relational database management system) to an ORDBMS (object-relational database management system). With inheritance, you can model subclasses as tables:

At first glance, this looks like it would be a perfect fit for our problem! Indeed, it can save you from writing the same column definition over and over, since tables automatically inherit all columns from their inheritance parents. But all these tables are separate tables, and a primary key constraint on vehicle will exist only on that table itself. It won't constrain the values of vehicle_id in the other tables. Each table needs its own primary key, and you will have to implement a solution like the first one I suggested.

Conclusion

I showed you some ways to model conditional foreign keys. Each of them has its advantages and disadvantages, and none of them is perfect. PostgreSQL table inheritance does not solve the problem either. Polymorphism is nice in object-oriented languages, but it doesn't carry over well into SQL.

If you are interested in data models that are not a natural fit for relational databases, you might want to read my article on how to use JSON in PostgreSQL correctly. For an easy introduction to normalization, read Michał's article on the topic.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes
Article Rating
Subscribe
Notify of
guest
13 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Yaroslav Schekin
Yaroslav Schekin
7 months ago

But you forgot the [arguably] best method --- using extra "vehicle" table, with all the tables having FK to it (making it 100% consistent requires triggers, though).

laurenz
laurenz
7 months ago

I'd argue that that your method is a hybrid of method 2 and method 4, with all the disadvantages of method 4. Triggers are not a safe way to implement constraints - there are always race conditions.

But I agree with you: there are probably many other variations and combinations of these methods that I didn't list. I may even have omitted some new, creative ways to solve the problem.

Where I disagree is that your method is the best one. I maintain that each of these methods has its advantages and disadvantages, and what is best will depend on the individual use case. Sadly, none of the methods is perfect.

Michael Vodep
Michael Vodep
7 months ago
Reply to  laurenz

> Triggers are not a safe way to implement constraints - there are always race conditions.

Why that? In the article you share a solution with row-locks where you claim that its safe?

Alphatier Wolf
Alphatier Wolf
7 months ago

Why do you not use sequence object as PK in table vehicle? And than one constrain in table car /bicycle only

I would suggest Method 5 PostgreSQL table inheritance and polymorphism with jsonb columns.

laurenz
laurenz
7 months ago
Reply to  Alphatier Wolf

I definitely recommend using a sequence (in the guise of an identity column) tu generate artificial primary keys. But I figured that that was irrelevant to the theme of my article.

If you use table inheritance, that would essentially be method 1 (several tables, several foreign keys). JSON is a trick you can use if you want the benefit of a single table, but avoid having hundreds of columns (method 3). I don't see any benefit in having several tables and using JSON in those.

Alphatier Wolf
Alphatier Wolf
7 months ago
Reply to  laurenz

Sorry, but an Identity Column is not the same as a SERIAL / Sequence Column!

And you wrote:
"... a primary key constraint on vehicle will exist only on that table itself. It won’t constrain the values of vehicle_id in the other tables. Each table needs its own primary key, ..."
With BIGSERIAL Column you have the same PK and every Table has own PK too.
When you then build CHECK CONSTRAIN's for Table car and bicycle you have a "perfect" solution.

Yaroslav Schekin
Yaroslav Schekin
7 months ago

is a hybrid of method 2 and method 4, with all the disadvantages of method 4.

No, it is not? This is what was meant:


CREATE TABLE customer (
customer_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
birthday date NOT NULL
);

CREATE TABLE vehicle (
vehicle_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
manufacturer text NOT NULL,
price numeric(15,2) NOT NULL
);

CREATE TABLE car (
car_id bigint PRIMARY KEY REFERENCES vehicle,
watt integer NOT NULL,
doors integer NOT NULL,
seats integer NOT NULL,
powersource text NOT NULL,
license_plate text NOT NULL
);

CREATE TABLE bicycle (
bicycle_id bigint PRIMARY KEY REFERENCES vehicle,
gears integer NOT NULL
);

CREATE TABLE contract (
contract_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL REFERENCES customer ,
vehicle_id bigint REFERENCES vehicle,
d_start date NOT NULL,
d_end date NOT NULL,
payment numeric(15,2) NOT NULL
);

Triggers are not a safe way to implement constraints - there are always race conditions.

Only they are, if used correctly (the problem illustrated in your article is not triggers, it's the complexity of writing reliable constraints on lower isolation levels).

But I agree with you: there are probably many other variations and combinations of these methods that I didn't list.

Indeed, all that you wrote in this post is hardly new --- google for "polymorphic associations", and you'll find all of the above methods (and more).

Where I disagree is that your method is the best one.

Whence "arguably".

Sadly, none of the methods is perfect.

Yes, but that is due to inability of some RDMBS-es to support non-trivial declarative constraints. 😉

laurenz
laurenz
7 months ago

> > is a hybrid of method 2 and method 4, with all the disadvantages of method 4.
>
> No, it is not? This is what was meant: [tables with additional properties referencing vehicle]

Right, it is not exactly the same, since your tables reference vehicle, not contract. What I meant when I said it is like my method 4 is that the data model does not enforce consistency. There could be a car and a bicycle with the same primary key value.

Still, your suggestion has merits.

> > Triggers are not a safe way to implement constraints - there are always race conditions.
>
> Only they are, if used correctly (the problem illustrated in your article is not triggers, it's the complexity of writing reliable constraints on lower isolation levels).

Either you didn't read my article, or you didn't understand it.

> all that you wrote in this post is hardly new

I don't claim that I invented anything new here. The idea of the article is a synopsis of known solutions for the problem.

laurenz
laurenz
7 months ago

I was sloppy.
I should have said "triggers are not a safe way to implement constraints unless you use the SERIALIZABLE isolation level or heavy locking".

Yaroslav Schekin
Yaroslav Schekin
7 months ago

the data model does not enforce consistency.

Unfortunately, yes --- declarative constraints in modern RDBMS-es are too primitive for that. So, one needs triggers to use it.

Still, your suggestion has merits.

Sure it does --- it's a relational (textbook) solution/material, unlike the others that are demonstrated in the article (which are just workarounds for the RDBMS defects, see above).

Either you didn't read my article, or you didn't understand it.

Why would I need to read something that makes counterfactual claims? 😉

Anyway, a) writing reliable triggers is possible (and even trivial if one uses SERIALIZABLE), but it's complex on lower isolation levels and b) the same integrity checks could be done without triggers (in the application), but there are obvious downsides to that.

Yet again, let me steal David's comment from that article which I wholeheartedly agree to:

As you've demonstrated here, we still need ASSERTIONs.

The idea of the article is a synopsis of known solutions for the problem.

And you didn't succeed to provide a complete one.
As you missed the "classic" solution, you either didn't read a textbook (about relational database design), or you didn't understand it. 😉

laurenz
laurenz
7 months ago

Thanks for the more detailed information. I am less confused now..
Yes, using table inheritance in combination with method 1 can be a good choice!

Alphatier Wolf
Alphatier Wolf
7 months ago

Sorry. This is your Answer?

Of sure is bigserial column not a PK, but identity column it isnt.
Wrote: CREATE TABLE vehicle ( vehicle_id bigserial primary key not null,
manufacturer date NOT NULL,
price numeric(15,2) NOT NULL
);
And see the BIGSERIAL column is a PK!

And yes, check contrain can only guarantee conditions within a single table row.

Your theme is not IDENTITY OR SERIAL COLUMN. It is "What is the best Schema Design"!

When you use inheritance, it is most better to use SERIAL columns, then will all inherited tables have a good PK column with this
And you need only two check contrains on car/bicycle. You will never insert data in vehicle / motorvehicle table. Why?. They are only relevant for search or report querys ...
You should only use car/bicycle table in your application for insert/updates ....

Thats it!

laurenz
laurenz
7 months ago

An identity column is a sequence column. It is better than serial, because it conforms to the standard and makes it hard to manually insert non-sequence values.

A bigserial column is not a primary key.

A check constraint can only guarantee conditions within a single table row, never across rows or tables.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    13
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram