What does ER stand for?

An Entity Relationship (ER) diagram is one of the most important tools for database design. It helps you visualize the relationships between different entities and how they interact with each other. Many GUI tools have their own tools to build ER diagrams, e.g. pgAdmin IV, DBeaver, etc.
In this blog post, we’ll explore how to create an ER diagram for a PostgreSQL database using plain SQL and Mermaid. Mermaid is a JavaScript-based diagramming and charting tool that renders Markdown-inspired text definitions to create and modify diagrams dynamically.

mermaid ER diagram
Mermaid ER diagram example

Setup

Today I will show you how to generate an ER diagram for any PostgreSQL database using only plain SQL and Mermaid. I propose to use the Pagila example database as a target. You may either install it locally, or run a Docker Compos script.

The task is to create an SQL script which will output valid Mermaid syntax. Later we can use either Mermaid Live Editor or a local installation to view and save the ER diagram in one of the preferred formats, e.g. .svg, .png, etc.

Script

If you examine Mermaid ER syntax, you will find that:

  • it’s possible to define tables separately from references;
  • that you can specify table columns with types;
  • that you can add a name for the relationships.

Mermaid ER header

I’ve split the SQL script into 3 parts. The first one is very basic – it outputs the special keyword erDiagram to indicate how Mermaid should visualize the diagram.

select 'erDiagram'
union all
...

Collecting tables and columns

select 
    format(E'\t%s{\n%s\n}', 
        c.relname, 
        string_agg(format(E'\t\t~%s~ %s', 
            format_type(t.oid, a.atttypmod), 
            a.attname
        ), E'\n'))
from 
    pg_class c 
    join pg_namespace n on n.oid = c.relnamespace
    left join pg_attribute a ON c.oid = a.attrelid and a.attnum > 0 and not a.attisdropped
    left join pg_type t ON a.atttypid = t.oid
where 
    c.relkind in ('r', 'p') 
    and not c.relispartition
    and n.nspname !~ '^pg_' AND n.nspname <> 'information_schema'
group by c.relname
union all
...

Here’s what the above snippet does:

  1. It selects the table name (c.relname) and the associated column names (a.attname) with data types (t.oid) from the pg_class, pg_namespace, pg_attribute, and pg_type tables.
  2. It uses left join for columns and types because PostgreSQL allows the creation of tables without attributes, e.g. create table foo(); is a valid DDL statement.
  3. The join condition for attributes explicitly specifies it wants only user-defined columns (a.attnum > 0) and columns that are still valid (not a.attisdropped).
  4. It filters out any entities that are not regular tables (partitioned are OK), and excludes any tables in the PostgreSQL system or information_schema schemas. The partitions themselves are not interesting for us, because they are only implementation details.
  5. To produce a correct type name it uses special format_type() system function. Pay attention, the snippet uses special string constants with C-style escapes. That allows you to easily format tab indents as well as new lines.
  6. To aggregate column names with data types, a special string_agg() function is used.
  7. It then formats the output as a new-line-delimited string with the column definitions in braces.

Collecting relationships

select 
    format('%s }|..|| %s : %s', c1.relname, c2.relname, c.conname)
from 
    pg_constraint c
    join pg_class c1 on c.conrelid = c1.oid and c.contype = 'f'
    join pg_class c2 on c.confrelid = c2.oid
where
    not c1.relispartition and not c2.relispartition;

This snippet is much easier:

  1. It gets all the foreign key constraints in the database (c.contype = 'f').
  2. It filters out any foreign keys that are on a partition table or reference a partition table.
  3. It formats the output to show the table name, the referenced table name, and the constraint name. The cardinality is one-to-many by default.

Finally

Here is the final script to copy-paste:

select 'erDiagram'
union all
select 
    format(E'\t%s{\n%s\n}', 
        c.relname, 
        string_agg(format(E'\t\t~%s~ %s', 
            format_type(t.oid, a.atttypmod), 
            a.attname
        ), E'\n'))
from 
    pg_class c 
    join pg_namespace n on n.oid = c.relnamespace
    left join pg_attribute a ON c.oid = a.attrelid and a.attnum > 0 and not a.attisdropped
    left join pg_type t ON a.atttypid = t.oid
where 
    c.relkind in ('r', 'p') 
    and not c.relispartition
    and n.nspname !~ '^pg_' AND n.nspname <> 'information_schema'
group by c.relname
union all
select 
    format('%s }|..|| %s : %s', c1.relname, c2.relname, c.conname)
from 
    pg_constraint c
    join pg_class c1 on c.conrelid = c1.oid and c.contype = 'f'
    join pg_class c2 on c.confrelid = c2.oid
where
    not c1.relispartition and not c2.relispartition;

And here is how the final result looks for the Pagila database (click to enlarge):

 

Thanks for reading, I hope you enjoyed it! If you liked this one, check out my blog post about usql, universal psql!