OSM to PostGIS – The Basics

Ever wondered how to import OSM (OpenStreetMap) data into PostGIS [1] for the purpose of visualization and further analytics? Here are the basic steps to do so.
There are a bunch of tools on the market— osm2pgsql; imposm; ogr2org; just to mention some of those. In this article I will focus on osm2pgsql [2] .

Let’s start with the software prerequisites. PostGIS comes as a PostgreSQL database extension, which must be installed in addition to the core database. Up till now, the latest PostGIS version is 3, which was released some days ago. For the current tasks I utilized PostGIS 2.5 on top of PostgreSQL 11.
This brings me to the basic requirements for the import – PostgreSQL >= 9.4 and PostGIS 2.2 are required, even though I recommend installing PostGIS >=2.5 on your database;  it’s supported from 9.4 upwards. Please consult PostGIS’ overall compatibility and support matrix [3] to find a matching pair of components.

Osm2pgsql Setup

Let’s start by setting up osm2pgsql on the OS of your choice – I stick to Ubuntu 18.04.04 Bionic Beaver and compiled osm2gsql from source to get the latest updates.

Install required libraries

 sudo apt-get install make cmake g++ libboost-dev libboost-system-dev \
libboost-filesystem-dev libexpat1-dev zlib1g-dev \
libbz2-dev libpq-dev libproj-dev lua5.2 liblua5.2-dev 

Grab the repo

 git clone https://github.com/openstreetmap/osm2pgsql.git 

Compile

 mkdir build && cd build
cmake ..
make
sudo make install 

If everything went fine, I suggest checking the resulting binary and its release by executing

 ./osm2pgsql-version. 
osm2pgsql version 1.0.0 (64 bit id space)

Data acquisition

In the world of OSM, data acquisition is a topic of its own, and worth writing a separate post discussing different acquisition strategies depending on business needs, spatial extent and update frequency. I won’t get into details here, instead, I’ll just grab my osm data for my preferred area directly from Geofabrik, a company offering data extracts and related daily updates for various regions of the world. This can be very handy when you are just interested in a subregion and therefore don’t want to deal with splitting the whole planet osm depending on your area of interest – even though osm2pgsql offers the possibility to hand over a bounding box as a spatial mask.
As a side note – osm data’s features are delivered as lon/lat by default.

So let’s get your hands dirty and fetch a pbf of your preferred area from Geofabrik’s download servers [4] [5]. For a quick start, I recommend downloading a dataset covering a small area:

wget https://download.geofabrik.de/europe/iceland-latest.osm.pbf 

Optionally utilize osmium to check the pbf file by reading its metadata afterwards.

 
osmium fileinfo ~/osmdata/iceland-latest.osm.pbf 
 
File:
 Name: /home/florian/osmdata/iceland-latest.osm.pbf
 Format: PBF
 Compression: none
 Size: 40357343
Header:
 Bounding boxes:
   (-25.7408,62.8455,-12.4171,67.5008)
 With history: no
 Options:
   generator=osmium/1.8.0
   osmosis_replication_base_url=http://download.geofabrik.de/europe/iceland-updates
   osmosis_replication_sequence_number=2413
   osmosis_replication_timestamp=2019-10-26T20:18:02Z
   pbf_dense_nodes=true
   timestamp=2019-10-26T20:18:02Z

Database setup

Before finally importing the osm into PostGIS, we have to set up a database enabling the PostGIS extension. As easy as it sounds – connect to your database with your preferred database client or pgsql, and enable the extension by executing

 create extension postgis 

.
Subsequently, execute

 select POSTGIS_VERSION() 

to validate the PostGIS installation within your database.

Database import

osm2pgsql is a powerful tool to import osmdata into PostGIS offering various parameters to tune. It’s worthwhile to mention the existence of the default-style parameter [6], which defines how osm data is parsed and finally represented in the database. The diagram below shows the common database model generated by osm2pgsql using the default style.

Figure 1 Default osm2pgsql db-model [6]
Figure 1 – Default osm2pgsql db-model [6]

 

It’s hard to give a recommendation on how this style should be adopted, as this heavily depends on its application. As a rule of thumb, the default style is a good starting point for spatial analysis, visualizations and can even be fed with spatial services, since this layout is supported by various solutions (e.g. Mapnik rendering engine).

I will start off with some basic import routines and then move to more advanced ones. To speed up the process in general, I advise you to define the number of processes and cache in MB to use. Even if this blogpost is not intended as a performance report, I attached some execution times and further numbers for the given parametrized commands to better understand the impact the mentioned parameters have.

The imports were performed on a virtualized Ubuntu 18.04 (KVM) machine equipped with 24 cores (out of 32 logical cores provided by an AMD Ryzen Threadripper 2950X), 24GB RAM, and a dedicated 2TB NVMe SSD (Samsung 970 EVO).

Mandatory parameters

Before going into details, internalize the following main parameters:
-U for database user, -W to prompt for the database password, -d refers to the database and finally -H defines the host. The database schema is not exposed as a parameter and therefore must be adjusted via the search path.

Import utilizing default style

Default import of pbf: existing tables will be overwritten. Features are projected to WebMercator (SRID 3857) by default.

 osm2pgsql -U postgres -W -d osmDatabase -H 192.168.0.195 --number-processes 24 -C 20480 iceland-latest.osm.pbf 

The command was executed in ~11 seconds. The table below shows generated objects and cardinalities.

table_schematable_namerow_estimatetotalindextable
publicplanet_osm_line14268197 MB13 MB67 MB
publicplanet_osm_point14501719 MB7240 kB12 MB
publicplanet_osm_polygon176204110 MB17 MB66 MB
publicplanet_osm_roads882414 MB696 kB5776 kB

Import utilizing default style with slim

Parameter s (“slim”) forces the tool to store temporary node information in the database instead of in the memory. It is an optional parameter intended to enable huge imports and avoid out-of-memory exceptions. The parameter is mandatory if you want to enable incremental updates instead of full ones.  The parameter can be complemented with

--flat-nodes 

to store this information outside the database as a file.

The command was executed in ~37 seconds. The table below shows generated objects and cardinalities.

table_schematable_namerow_estimatetotalindextable
publicplanet_osm_line142681102 MB18 MB67 MB
publicplanet_osm_nodes6100392388 MB131 MB258 MB
publicplanet_osm_point14501723 MB11 MB12 MB
publicplanet_osm_polygon176204117 MB24 MB66 MB
publicplanet_osm_rels91419824 kB4872 kB4736 kB
publicplanet_osm_roads882415 MB1000 kB5776 kB
publicplanet_osm_ways325545399 MB306 MB91 MB

Import utilizing default style, tag configuration

By default, tags referenced by a column are exposed as separate columns. Parameter hstore forces the tool to store unreferenced tags in a separate hstore column.

Note: Database extension hstore must be installed beforehand.

 osm2pgsql -U postgres -W -d osmDatabase -H 192.168.0.195 -s --hstore --number-processes 24 -C 20480 iceland-latest.osm.pbf 

For the sake of completeness

  • –hstore-match-only and -hstore-all should be mentioned as well:
  • –hstore-all pushes standard tags to individual columns and the hstore column as well

The command was executed in ~35 seconds. The table below shows generated objects and cardinalities.

table_schematable_namerow_estimatetotalindextable
publicplanet_osm_line142875106 MB18 MB71 MB
publicplanet_osm_nodes6100560388 MB131 MB258 MB
publicplanet_osm_point15104127 MB12 MB15 MB
publicplanet_osm_polygon176342122 MB24 MB72 MB
publicplanet_osm_rels91419824 kB4872 kB4736 kB
publicplanet_osm_roads882416 MB1000 kB6240 kB
publicplanet_osm_ways325545399 MB306 MB91 MB

Import utilizing default style, multi-geometry

By default, objects containing multiple disjoint geometries are stored as separate features within the database. Think of Vienna and its districts, which could be represented as 23 individual polygons or one multi-polygon. Parameter -G forces the tool to store geometries belonging to the same object as multi-polygon.

 osm2pgsql -U postgres -W -d osmDatabase -H 192.168.0.195 -s -G --number-processes 24 -C 20480 iceland-latest.osm.pbf 

The impact emerges most clearly during spatial operations, since the spatial index utilizes the feature to its fullextent, in order to decide which features must be considered.

The command was executed in ~36 seconds. The table below shows generated objects and cardinalities.

table_schematable_namerow_estimatetotalindextable
publicplanet_osm_line142681102 MB18 MB67 MB
publicplanet_osm_nodes6100392388 MB131 MB258 MB
publicplanet_osm_point14501723 MB11 MB12 MB
publicplanet_osm_polygon174459117 MB24 MB65 MB
publicplanet_osm_rels91419824 kB4872 kB4736 kB
publicplanet_osm_roads882415 MB1000 kB5776 kB
publicplanet_osm_ways325545399 MB306 MB91 MB

Results and next steps

The table highlights the influence of osm2pgsql parameters on execution time, generated objects, cardinalities and subsequently sizing. In addition, it’s worth it to understand the impact of parameters like multi-geometry, which forces the tool to create multi-geometry features instead of single-geometry features. Preferring one over the other might lead to performance issues, especially when executing spatial operators (as those normally take advantage of the extent of the features).

The next posts will complement this post by inspecting and visualizing our import results and subsequently dealing with osm updates to stay up to date.

OSM to Postgis - QGIS Map of iceland
Figure 2 – QGIS-Map of iceland, SRID 3857 [7]

 

OSM to PostGIS - Qgis-Map of Reykjavík, SRID 3857 [7]
Figure 3 – QGIS-Map of Reykjavík, SRID 3857 [7]

References

[1]„PostGIS Reference,“ [Online]. Available: https://postgis.net/.
[2]„osm2pgsql GitHub Repository,“ [Online]. Available: https://github.com/openstreetmap/osm2pgsql.
[3]„PostGIS Compatiblity and Support,“ [Online]. Available: https://trac.osgeo.org/postgis/wiki/UsersWikiPostgreSQLPostGIS.
[4]„Geofabrik Download Server,“ [Online]. Available: https://download.geofabrik.de/.
[5]„PBF Format,“ [Online]. Available: https://wiki.openstreetmap.org/wiki/PBF_Format.
[6]„Default Style – osm2pgsql,“ [Online]. Available: https://wiki.openstreetmap.org/wiki/Osm2pgsql/schema.
[7]„QGIS Osm styles,“ [Online]. Available: https://github.com/yannos/Beautiful_OSM_in_QGIS.
[8]„osm2pgsql Database schema,“ [Online]. Available: https://wiki.openstreetmap.org/w/images/b/bf/UMLclassOf-osm2pgsql-schema.png.

 

Appendix

Generated objects and cardinalities from statistics

 SELECT *, pg_size_pretty(total_bytes) AS total
, pg_size_pretty(index_bytes) AS INDEX
, pg_size_pretty(table_bytes) AS TABLE

FROM (
SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
, c.reltuples AS row_estimate
, pg_total_relation_size(c.oid) AS total_bytes
, pg_indexes_size(c.oid) AS index_bytes
, pg_total_relation_size(reltoastrelid) AS toast_bytes
FROM pg_class c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE relkind = 'r'
) a
) a where a.table_schema ='public'; 

Generated objects and cardinalities with count

 create function rowcount_all(schema_name text DEFAULT 'public'::text)
returns TABLE(table_name text, cnt bigint)
language plpgsql
as
$$
declare
table_name text;
begin
for table_name in SELECT c.relname FROM pg_class c
JOIN pg_namespace s ON (c.relnamespace=s.oid)
WHERE c.relkind = 'r' AND s.nspname=schema_name
LOOP
RETURN QUERY EXECUTE format('select cast(%L as text),count(*) from %I.%I',
table_name, schema_name, table_name);
END LOOP;
end
$$;

alter function rowcount_all(text) owner to postgres;

 

with pgClass as (
SELECT *, pg_size_pretty(total_bytes) AS total
, pg_size_pretty(index_bytes) AS INDEX
, pg_size_pretty(table_bytes) AS TABLE

FROM (
SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
, c.reltuples AS row_estimate
, pg_total_relation_size(c.oid) AS total_bytes
, pg_indexes_size(c.oid) AS index_bytes
, pg_total_relation_size(reltoastrelid) AS toast_bytes
FROM pg_class c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE relkind = 'r'
) a
) a)
select rowcount_all.table_name, row_estimate,cnt,total,pgClass.INDEX,pgClass.TABLE from rowcount_all(),pgClass where rowcount_all.table_name=pgClass.TABLE_NAME