Entity-attribute-value (EAV) design in PostgreSQL - don't do it!

11.2021

Category: Performance

Tags: administration, table

good (?) reasons to use an entity-attribute-value design — © Laurenz Albe 2022

Table of Contents

Customers have often asked me what I think of “Entity-attribute-value” (EAV) design. So I thought it would be a good idea to lay down my opinion in writing.

What is entity-attribute-value design?

The idea is not to create a table for each entity in the application. Rather, you store each attribute as a separate entry in an attribute table:

CREATE TABLE objects (
   objectid bigint PRIMARY KEY
   /* other object-level properties */
);

CREATE TABLE attstring (
   objectid bigint
      REFERENCES objects ON DELETE CASCADE NOT NULL,
   attname text NOT NULL,
   attval text,
   PRIMARY KEY (objectid, attname)
);

CREATE TABLE attint (
   objectid bigint
      REFERENCES objects ON DELETE CASCADE NOT NULL,
   attname text NOT NULL,
   attval integer,
   PRIMARY KEY (objectid, attname)
);

/* more tables for other data types */

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

CREATE TABLE objects (

objectid bigint PRIMARY KEY

/* other object-level properties */

);

CREATE TABLE attstring (

objectid bigint

REFERENCES objects ON DELETE CASCADE NOT NULL,

attname text NOT NULL,

attval text,

PRIMARY KEY (objectid, attname)

);

CREATE TABLE attint (

objectid bigint

REFERENCES objects ON DELETE CASCADE NOT NULL,

attname text NOT NULL,

attval integer,

PRIMARY KEY (objectid, attname)

);

/* more tables for other data types */

The name of the model is derived from the “att...” tables, which have the three columns: “entity ID”, “attribute name” and “value”.

There are several variations of the basic theme, among them:

omit the objects table
add additional tables that define “object types”, so that each type can only have certain attributes

Why would anybody consider an entity-attribute-value design?

The principal argument I hear in support of the EAV design is flexibility. You can create new entity types without having to create a database table. Taken to the extreme, each entity can have different attributes.

I suspect that another reason for people to consider such a data model is that they are more familiar with key-value stores than with relational databases.

Performance considerations of entity-attribute-value design

In my opinion, EAV database design is the worst possible design when it comes to performance. You will never get good database performance with such a data model.

The only use cases where EAV shines are when it is used as a key-value store.

`INSERT`

Inserting an entity will look like this:

INSERT INTO objects (objectid) VALUES (42);

INSERT INTO attstring (objectid, attname, attval)
VALUES (42, 'name', 'myobject');

INSERT INTO attint (objectid, attname, attval)
VALUES (42, 'start', 100),
       (42, 'end',   1000);

1

2

3

4

5

6

7

8

INSERT INTO objects (objectid) VALUES (42);

INSERT INTO attstring (objectid, attname, attval)

VALUES (42, 'name', 'myobject');

INSERT INTO attint (objectid, attname, attval)

VALUES (42, 'start', 100),

(42, 'end', 1000);

That means that we insert four rows into three tables and have four index modifications. Also, the three statements will require three client-server round trips. You can save on the round trips by using CTEs to turn that into a single statement, or by using the new pipeline mode of libpq. Still, it will be much more expensive than inserting a single table row.

`DELETE`

If you use cascading delete, you could do that in a single statement:

DELETE FROM objects WHERE objectid = 42;

1	DELETE FROM objects WHERE objectid = 42;

Still, you will end up deleting four table rows and modifying four indexes. That's much more work than deleting a single table row.

`UPDATE`

A single-column update could actually be faster with the EAV design, because only one small table row is modified:

UPDATE attint
SET attval = 2000
WHERE objectid = 42 AND attname = 'end';

1

2

3

UPDATE attint

SET attval = 2000

WHERE objectid = 42 AND attname = 'end';

But if you have to modify several columns, you will need to run several UPDATE statements. That will be slower than if you only had to modify a single (albeit bigger) table row.

`SELECT`

Querying the attributes of an entity requires a join:

SELECT as.attval AS 'name',
       ai1.attval AS 'start',
       ai2.attval AS 'end'
FROM objects AS o
   LEFT JOIN attstring AS as USING (objectid)
   LEFT JOIN attint AS ai1 USING (objectid)
   LEFT JOIN attint AS ai2 USING (objectid)
WHERE objectid = 42
  AND as.attname = 'name'
  AND ai1.attname = 'start'
  AND ai2.attname = 'end';

1

2

3

4

5

6

7

8

9

10

11

SELECT as.attval AS 'name',

ai1.attval AS 'start',

ai2.attval AS 'end'

FROM objects AS o

LEFT JOIN attstring AS as USING (objectid)

LEFT JOIN attint AS ai1 USING (objectid)

LEFT JOIN attint AS ai2 USING (objectid)

WHERE objectid = 42

AND as.attname = 'name'

AND ai1.attname = 'start'

AND ai2.attname = 'end';

Alternatively, you could run three separate queries, one for each attribute. No matter how you do it, it will be less efficient than a single-row SELECT from a single table.

Single-column aggregates

As an example for a query that might be faster with the EAV model, consider a query that aggregates data from a single column:

SELECT sum(attval) AS total
FROM othertab
   JOIN attint USING (objectid)
WHERE othertab.col = 'x'
  AND attint.attname = 'attendants';

1

2

3

4

5

SELECT sum(attval) AS total

FROM othertab

JOIN attint USING (objectid)

WHERE othertab.col = 'x'

AND attint.attname = 'attendants';

With a covering index on attint(objectid, attname) INCLUDE (attval), this could be quite a bit faster than aggregating a column from a wider table.

More complicated queries

After these examples, it is clear that writing more complicated queries will be a pain with the EAV design. Imagine a simple join:

SELECT e1a1.attval AS person_name,
       e1a2.attval AS person_id,
       e2a1.attval AS address_street,
       e2a2.attval AS address_city
FROM attint AS e1a2
   JOIN attstring AS e1a1
      ON e1a2.objectid = e1a1.objectid
   LEFT JOIN attint AS e2a0
      ON e1a2.attval = e2a0.attval
   LEFT JOIN attstring AS e2a1
      ON e2a0.objectid = e2a1.objectid
   LEFT JOIN attstring AS e2a2
      ON e2a0.objectid = e2a2.objectid
WHERE e1a1.attname = 'name'
  AND e1a2.attname = 'persnr'
  AND e2a0.attname = 'persnr'
  AND e2a1.attname = 'street'
  AND e2a2.attname = 'city';

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

SELECT e1a1.attval AS person_name,

e1a2.attval AS person_id,

e2a1.attval AS address_street,

e2a2.attval AS address_city

FROM attint AS e1a2

JOIN attstring AS e1a1

ON e1a2.objectid = e1a1.objectid

LEFT JOIN attint AS e2a0

ON e1a2.attval = e2a0.attval

LEFT JOIN attstring AS e2a1

ON e2a0.objectid = e2a1.objectid

LEFT JOIN attstring AS e2a2

ON e2a0.objectid = e2a2.objectid

WHERE e1a1.attname = 'name'

AND e1a2.attname = 'persnr'

AND e2a0.attname = 'persnr'

AND e2a1.attname = 'street'

AND e2a2.attname = 'city';

If you think that this query is hard to read, I agree with you. In a normal relational data model, the same operation could look like this:

SELECT person.name AS person_name,
       persnr AS person_id
       address.street,
       address.city
FROM person
   LEFT JOIN address
      USING (persnr);

1

2

3

4

5

6

7

SELECT person.name AS person_name,

persnr AS person_id

address.street,

address.city

FROM person

LEFT JOIN address

USING (persnr);

You can guess which query will perform better.

But we need an entity-attribute-value design for flexibility!

Relational data models are not famous for their flexibility. After all, that is the drive behind the NoSQL movement. However, there are good ways to deal with variable entities.

Creating tables on the fly

Nothing keeps you from running statements like CREATE TABLE and CREATE INDEX from your application. So if there is a limited number of entity types, and each type has a certain number of attributes, you can easily model that with a traditional relational model.

Certain problems remain:

A data model that grows on the fly may not end up being well-designed. But that's not different in the entity-attribute-value design.
If the application has to create tables, it needs permission to do so. But today, when many applications create their own database tables anyway, few people will worry about that.

Creating tables on the fly will only work well if the set of attributes for each entity is well-defined. If that is not the case, we need a different approach.

Using JSON for a flexible data model

PostgreSQL has extensive JSON support that can be used to model entities with a variable number of attributes.

For that, you model the important and frequently occurring attributes as normal table columns. Then you add an additional column of type jsonb with a GIN index on it. This column contains the “rare attributes” of the entity as key-value pairs.

When using a model like this, you should take care that attributes

used in joins
on which you need a database constraint
that you want to use in a WHERE condition with an operator different from “=”

are modeled as regular table columns.

Conclusion

Avoid entity-attribute-value designs in your relational database. EAV causes bad performance, and there are other ways to have a flexible data model in PostgreSQL.

Need help with your data modeling? Find out about CYBERTEC's data modeling services

3 responses to “Entity-attribute-value (EAV) design in PostgreSQL - don't do it!”

mohammad mahdi says:

November 23, 2021 at 7:10 pm

Consider this: you are creating a commerce app, each product needs various custom attributes, JSON is a solid solution but how about indexing? How about creating filtering interfaces on the fly? These need to be built in the background jobs and cached. Somethimes the project and data are small enough that it is just not worth it, in the technical term you are absolutely right but in the business terms somethings are not worth the effort

Reply
- Luca Ferrari says:
  
  November 24, 2021 at 12:23 pm
  
  Probably I don't get your point, but jsonb has indexing and it works quite well. Of course, you cannot migrate "all columns" into "all json", because that would be stupid. You have the chance to get advantages of both worlds, so take it. And it allows for dynamic filtering, so I don't see the problem.
  
  Reply
Luca Ferrari says:

November 24, 2021 at 12:21 pm

I've encountered this approach (luckily) only once, and it was horrible. The real problem, according to me, behind this "introspection on steroids" is that it is not tested enough: it appears to work great if you have a bunch of "objects", but quickly drives the database (and the application) unresponsive as soon as you load some real data into it.
I had to refactor all the application and database to get rid of this crap!

Besides, JSON(B) offers a great way to implement a part of schema-free entities easing also the application deployment against other customers (since you don't need a way to deploy schema changes because, well, there are not!).

Reply

Leave a Reply Cancel reply

Stay tuned with our

NEWSLETTER

CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0

office@cybertec-postgresql.com

ISO_27001_Badge

Customer Support

Support Platform

Services

Support CYBERTEC Partner PostgreSQL Books

Company

STAY TUNED WITH OUR NEWSLETTER

Get the newest PostgreSQL Info & Tools

Data Protection Policy Terms and conditions Terms of Service Imprint

©

2025

CYBERTEC PostgreSQL International GmbH