CYBERTEC Logo

ER diagrams with SQL and Mermaid

02.2023 / Category: / Tags:

What does ER stand for?

An Entity Relationship (ER) diagram is one of the most important tools for database design. It helps you visualize the relationships between different entities and how they interact with each other. Many GUI tools have their own tools to build ER diagrams, e.g. pgAdmin IV, DBeaver, etc.
In this blog post, we'll explore how to create an ER diagram for a PostgreSQL database using plain SQL and Mermaid. Mermaid is a JavaScript-based diagramming and charting tool that renders Markdown-inspired text definitions to create and modify diagrams dynamically.

mermaid ER diagram
Mermaid ER diagram example

Setup

Today I will show you how to generate an ER diagram for any PostgreSQL database using only plain SQL and Mermaid. I propose to use the Pagila example database as a target. You may either install it locally, or run a Docker Compos script.

The task is to create an SQL script which will output valid Mermaid syntax. Later we can use either Mermaid Live Editor or a local installation to view and save the ER diagram in one of the preferred formats, e.g. .svg, .png, etc.

Script

If you examine Mermaid ER syntax, you will find that:

  • it's possible to define tables separately from references;
  • that you can specify table columns with types;
  • that you can add a name for the relationships.

Mermaid ER header

I've split the SQL script into 3 parts. The first one is very basic - it outputs the special keyword erDiagram to indicate how Mermaid should visualize the diagram.

Collecting tables and columns

Here's what the above snippet does:

  1. It selects the table name (c.relname) and the associated column names (a.attname) with data types (t.oid) from the pg_class, pg_namespace, pg_attribute, and pg_type tables.
  2. It uses left join for columns and types because PostgreSQL allows the creation of tables without attributes, e.g. create table foo(); is a valid DDL statement.
  3. The join condition for attributes explicitly specifies it wants only user-defined columns (a.attnum > 0) and columns that are still valid (not a.attisdropped).
  4. It filters out any entities that are not regular tables (partitioned are OK), and excludes any tables in the PostgreSQL system or information_schema schemas. The partitions themselves are not interesting for us, because they are only implementation details.
  5. To produce a correct type name it uses special format_type() system function. Pay attention, the snippet uses special string constants with C-style escapes. That allows you to easily format tab indents as well as new lines.
  6. To aggregate column names with data types, a special string_agg() function is used.
  7. It then formats the output as a new-line-delimited string with the column definitions in braces.

Collecting relationships

This snippet is much easier:

  1. It gets all the foreign key constraints in the database (c.contype = 'f').
  2. It filters out any foreign keys that are on a partition table or reference a partition table.
  3. It formats the output to show the table name, the referenced table name, and the constraint name. The cardinality is one-to-many by default.

Finally

Here is the final script to copy-paste:

And here is how the final result looks for the Pagila database (click to enlarge):

 

Thanks for reading, I hope you enjoyed it! If you liked this one, check out my blog post about usql, universal psql!

0 0 votes
Article Rating
Subscribe
Notify of
guest
9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Yuriy_D
Yuriy_D
1 year ago

I want to make updates to your SQL sample:
from
____pg_class c
____join pg_namespace n on n.oid = c.relnamespace
____left join pg_attribute a ON c.oid = a.attrelid

I would write like this:
FROM pg_class c
__JOIN pg_namespace n
____ON n.oid = c.relnamespace
__LEFT JOIN pg_attribute a
____ON a.attrelid = c.oid

please note:
- reserved keywords are in upper case
- visually easy to find joined tables and conditions
- in join condition expression for joined table is always on left

Pavlo Golub
Pavlo Golub
1 year ago
Reply to  Yuriy_D

The choice of using all uppercase or lowercase SQL reserved words in
coding is a matter of personal preference, which may have originated
from a time when code editors did not have color-coding features. While I
used to favor all uppercase SQL reserved words, I am now gravitating
towards using all lowercase. Regardless of the choice, it's essential to
maintain consistency throughout.

Pavlo Golub
Pavlo Golub
1 year ago
Reply to  Yuriy_D

Thanks. UPPERCASE reserved SQL words are kind of holy war nowadays. ­čÖé If you check my previous posts, I always use uppercase there. This time I decided to use lowercase.

Are these rules specified somewhere or are they your personal preferences?

Yuriy_D
Yuriy_D
1 year ago
Reply to  Pavlo Golub

its my best practice after 27 years of experience working with sql

Florian Klein
Florian Klein
1 year ago

I love it! Thanks. Just tried it on a schema of mine containing names
with spaces (I like problems), and the mermaid parser failed. It should
render names with double quotes, something that format() should be
able to handle with a %I formatter, no?

Pavlo Golub
Pavlo Golub
1 year ago
Reply to  Florian Klein

Yeah, it's not perfect. I already broke it several times ( https://github.com/mermaid-js/mermaid/issues/4008 ). And AFAIK one cannot use double quoting for relation names. It should be improved, agree. You can check the sources and maybe propose a PR ( https://github.com/mermaid-js/mermaid/tree/develop/packages/mermaid/src/diagrams/er ).

Florian Klein
Florian Klein
1 year ago
Reply to  Pavlo Golub

from what I tried, it looks like it works, just wrap names with double quotes as in:


family_has_attribute }|..|| family : "family has attribute"

Chris
Chris
3 days ago

The code samples don't show backslashes that are required to represent tab and newlines characters. Copying and pasting and SQL code as-is doesn't work. For example:

should be

Chris
Chris
3 days ago

It seems like the web page strips single backslashes from string. My comment from a few months ago had as its 2nd code sample: format(E' followed by backslash-t -- but the backslash isn't shown in my comment. Probably the same is true with the code sample.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
R├Âmerstra├če 19
2752 W├Âllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ┬ę
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram