Traveling Salesman problem with PostGIS and pgRouting

Last time, we experimented with lesser known PostGIS functions to extract areas of interest for sales. Now, let’s extend our example regarding catchment areas by optimizing trips within the area of interest we generated in our previous example, which is around Hamburg. Let’s ask the following question:
which order should we visit our major cities in so that we can optimize our trip’s costs?

This optimization problem is commonly known as the Traveling Salesman Problem (or TSP).

Apart from PostGIS, pgRouting will be used to tackle this challenge within PostgreSQL only. pgRouting is served as a PostgreSQL extension, which adds a lot of geospatial routing functionality on top of PostGIS.
I recommend you check out its online documentation, located at https://pgrouting.org/, to get an overview.



Figure 1 Area of interest around Hamburg / Figure 2 Area of interest around Hamburg, Zoom

This article is organized as follows:

  1. Dataset import
  2. Layer setup
  3. Extract shortest round trip
  4. Final thoughts and outlook

Dataset import

We will re-use osm files and datasets processed in my last blogpost, so first, please play through the blogpost sections involved. Subsequently, an intermediate table “tspPoints” must be created, which contains major cities and airports covered by our preferred area around Hamburg only (see the annex for ddl).
To solve TSP by utilizing our pgRouting functionality, we must generate a graph out of osm data. Different tools like osm2pgrouting or osm2po exist, which generate a routable graph from plain osm data. Due to limited memory resources on my development machine, I decided to give osm2po a try.

So then - let’s start. After downloading the latest osm2po release from https://osm2po.de/releases/,
we need to adapt its configuration, in order to generate the desired graph  as an sql file.
Please uncomment the following line in osm2po.config to accomplish this task.

postp.0.class = de.cm.osm2po.plugins.postp.PgRoutingWriter

Now we’re ready to generate our graph by executing

The outputted sql file can easily be imported to PostGIS by calling

the resulting graph table contains geometries for edges only (see annex for ddl). To generate source and target nodes too, let’s utilize pgr_createverticestable as follows:

Layer setup

Slowly, we’re getting closer ????. Figure 3 to 5 represent our stacked layer setup on top of OpenStreetMap, which is based upon the following tables:


Figure 3 Intermediate trip stops


Figure 4 Edges around Hamburg


Figure 5 Edges + nodes around Hamburg

Solving TSP

Now let’s ask pgRouting to solve our traveling salesmen problem by querying

This command results in the optimal order of intermediate stops (edges next to our cities) forming our trip. It’s worth pointing out that the optimal route (order) depends on the cost-matrix’s respective cost-function, which was defined to calculate the edge’s weights. In our case, the cost function is prescribed by osm2po and expressed as length/kmh.

To export the sequence of city names to pass through, the nodes returned by pgr_TSP must be joined back to tspPoints. Please consult the annex for this particular query extension. From figure 6 we now realize that the optimal route goes from Hamburg Airport to Kiel, Lübeck, Rostock and Hamburg before bringing us back to our airport.

Figure 6 Optimal order of stops

Final thought and outlook

This time, we took a quick look at pgRouting to solve a basic traveling salesman problem within PostGIS. Feel free to experiment with further cost functions and to move on to calculating detailed routes between cities by utilizing further pgRouting functions.

Annex

"Catchment areas" with PostgreSQL and PostGIS

Recently a colleague in our sales department asked me for a way to partition an area of interest spatially. He wanted to approximate customer potential and optimize our sales strategies respective trips.
Furthermore he wanted the resulting regions to be created around international airports first, and then intersected by potential customer locations, in order to support a basic ranking. Well, I thought this would be a good opportunity to experiment with lesser-known PostGIS functions ????.

In our case, customer locations could be proxied by major cities.
It's also worth mentioning that I only used freely available datasets for this demo.

Here is the basic structure of the process:

  1. Dataset import
  2. Layer setup
  3. Setup areas around airports
  4. Intersect areas with major cities

Dataset import

Our sales team will kick off their trips from international airports in Germany, so we require locations and classifications of airports from Germany. Airports can be extracted from OpenStreetMap datasets filtering osm_points by Tag:aeroway=aerodrome. Alternatively, any other data source containing the most recent airport locations and classifications is welcome. For our demo, I decided to give a freely available csv from “OurAirports” a chance ????.

In addition, I used OpenStreetMap datasets to gather both spatial and attributive data for cities located in Germany. To do so, I downloaded the most recent dataset covering Germany from Geofabrik  and subsequently triggered the PostGIS import utilizing osm2pgsql. Please checkout my blogpost on this topic to get more details on how to accomplish this easily.

At the end of this article, you will find data structures for airports, osm_points and osm_polygons (country border). These are the foundation for our analysis.

Layer setup

Let’s start with cities and filter osm_points by place to retrieve only the major cities and towns.

Since we are only interested in international airports within Germany, let’s get rid of those tiny airports and heliports which are not relevant for our simple analysis.

Check out Figures 1 and 2 below to get an impression of what our core layers look like.

catchment areas postgis

Figure 1 Major cities and international airports


Figure 2 Major cities, towns and international airports


Areas around airports

So far, so good. It is time to start with our analysis by generating catchment areas around international airports first.
Instead of creating Voronoi polygons (Voronoi-Diagrams) using our preferred GIS-client, let us instead investigate how we can accomplish this task using only PostGIS’ functions.

Intersection of catchment areas with potential customers

So how do we proceed? Let’s count the major cities by area as a proxy for potential customer counts, in order to rank the most interesting areas. As a reminder - major cities are only used here as a proxy for more detailed customer locations. They should be enriched by further datasets in order to more realistically approximate customer potential. Below, you see the final query which outputs major city count by polygon. Figure 4 represents the resulting Voronoi polygons extended by their major city counts. Query results and visualisation might be used as building blocks for optimizing customer acquisition in the future.

Final thoughts and outlook

Today we started with some basic spatial analytics and took the opportunity to experiment with less known PostGIS functions.
Hopefully you enjoyed this session, and even more importantly, you’re now eager to play around with PostGIS and its rich feature set. It's just waiting to tackle your real-world challenges.

Annex

Intersecting Tracks of individuals – MobilityDB

Last time I announced we would check out MobilityDB to improve our approach to extract overlapping passage times of healthy and infected individuals – here we go!

MobilityDB itself is a PostgreSQL extension built on top of PostGIS, specializing on processing and analysing spatio-temporal data. To do so, the extension adds a bunch of types and functions on top of PostGIS to solve different kinds of spatial-temporal questions.

Please check out Documentation (mobilitydb.com) to get an impression of what you can expect here.
The extension is currently available for PostgreSQL 11, PostGIS 2.5 as v1.0-beta version, whereby I understood from the announcements that we can definitely expect a first version to be released in early 2020. To quick start, I definitely recommend using their docker container codewit/mobility.

The blog-post is structured as follows:

  1. Set up data structures within PostgreSQL (MobilityDB enabled)
  2. Set up trips based on our initial mobile_points
  3. Intersection of infected individual to retrieve possible contacts

Data structures of MobilityDB

As reminder, table mobile_points is a relic of our first blog-post and contains points of individuals. This table and its contents will be used to set up trajectories forming our trips. Table mobile_trips represent trips of individuals, whereas trips are modelled as trajectories utilizing MobilityDB’s tgeompoints data type. Instances of traj are generated from trip’s geometry and subsequently used for visualization.

Trip generation

Let’s start by generating trips out of points:

For each customer, a trip as sequence of instants of tgeompoint is generated. tgeompoint acts as continuous, temporal type introduced by MobilityDB. A sequence of tgeompoint interpolates spatio-temporally between our fulcrums. Here I would like to refer to MobilityDB’s documentation with emphasis on temporal types to dig deeper.

Figure 1 shows, not surprisingly, a visualization of resulting trip geometries (traj).

Figure 1 Customer trips as sequence of tgeompoint instants with mobilitydb
Figure 1 Customer trips as sequence of tgeompoint instants

Analysis and results MobilityDB

Let’s start with our analysis and identify spatio-temporally overlapping segments of individuals. The following query returns overlapping segments (within 2 meters) by customer pairs represented as geometries.

To do so and to utilize spatio-temporal indexes on trips, expanded bounding-boxes are intersected first.

Next, we evaluate if trips overlap spatio-temporally by filtering:

getTime returns a detailed temporal profile of overlapping sections as set of periods constrained by tdwithin. A period is hereby a custom type introduced by MobilityDB, which is a customized version of tstzrange. Subsequently to extract intersecting spatial segments of our trip only, periodset restricts our trip by utilizing atPeriodSet. Finally, getValues extracts geometries out of tgeompoint returned by atPeriodSet.

But we’re not done. Multiple disjoint trip segments result in multi-geometries (Figure 2, 2 disjoint segments for customer 1 and 2). To extract passage times by disjoint segment and customer, multi-geometries must be “splitted up” first. This action can be carried out utilizing st_dump. To extract passage times for resulting simple, disjoint geometries, periods of periodset must be related accordingly.

This can be accomplished as follows:

First we turn our periodset into an array of periods. Next, we flatten the array utilizing unnest.
In the end, we call timespan on periods to gather passage times by disjoint segment.

Please find attached to the end of the article an even more elegant, improved way to extract passage times by disjoint trip segment utilizing MobilityDB’s functions only.

Visualization of results

The image below now shows results for both queries by highlighting segments of contact for our individuals in blue, labelled by its passage times.

Figure 2 Overlapping segments of healthy/infected individuals with mobilitydb
Figure 2 Overlapping segments of healthy/infected individuals

So far so good – results correspond with our initial approach from my last blogpost.

What happens if we now remove some of our fulcrums and re-generate our trips?

Remember what I mentioned in the beginning regarding interpolation?

Figure 3 represents our generalized sequence of points, Figure 4 presents resulting trips, whose visualization already indicates that interpolation between points worked as expected. So even though we removed a bunch of fulcrums, resulting passage times correspond with our initial assessment (see figure 5).

Figure 3 Generalized points
Figure 3 Generalized points
Figure 4 Generated trips
Figure 4 Generated trips
Figure 5 Overlapping segments of healthy/infected individuals, reduced input set, mobilitydb
Figure 5 Overlapping segments of healthy/infected individuals, generalized points

Let’s go one step further...

and change the customers' speeds by the manipulation point’s timestamps in the beginning of one of ours trips only (figure 6 and 7). Figure 8 gives an impression, how this affects our results.

Figure 6 Generalized points
Figure 6 Generalized points
Figure 7 Different start times
Figure 7 Different start times
Figure 8 Overlapping segments of healthy/infected individuals, diverging speeds
Figure 8 Overlapping segments of healthy/infected individuals, diverging speeds

I just scratched the surface to showcase MobilityDB’s capabilities, but hopefully I made you curious enough to take a look by yourselves.
I (to be honest), already have felt in love with this extension and definitely will continue exploring.

Big respect for this great extension and special thanks goes to Esteban Zimanyi and Mahmoud Sakr, both main contributors, for their support!

Check out my other GIS posts:

You may also be interested in free OpenStreetMap data:


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

Intersecting GPS-Tracks to identify infected individuals

In times of COVID-19, governments contemplate tough measures to identify and trace infected people. These measures include the utilization of mobile phone data to trace down infected individuals and subsequently contacts to curb the epidemic. This article shows how PostGIS’ functions can be used to identify “overlapping” sections of infected and healthy individuals by analysing tracks spatio-temporally.
This time we don’t focus on performance and tuning, rather strive boosting your creativity on PostgreSQL’s spatial extension and its functionalities.

The article is structured as follows:

  1. Set up data structures within PostgreSQL
  2. Set up sample tracks via QGIS
  3. Segment sample tracks to retrieve induvial track points
  4. Intersection of infected individual to retrieve possible contacts

Data structures of GPS tracks

Let’s start by defining tables representing tracks and their points.
Table mobile_tracks acts as a helper table storing artificial tracks of individuals, which have been drawn via QGIS.
Table mobile_points stores points, which result from segmenting tracks. Finally, they have been enriched by timestamps, which are used to identify temporal intersections on top of spatial intersections. 

The test area is located in Vienna/Austria; therefore I chose MGI/Austria 34 (EPSG as 31286) as appropriate projection.

GPS tracks and points

As mentioned in the beginning, for points I decided to generate artificial ones from tracks, I previously digitized via QGIS. Figure 1 shows tracks of infected people in red, healthy individuals are represented as green line strings as foundation.

GPS tracks of healthy and infected individuals, GPS-Tracks
Figure 1 GPS tracks of healthy and infected individuals

For our simple example, I made the following assumptions:

To extract individual points for tracks, I utilized PostGIS’s ST_Segmentize function as follows:

The query creates points every meter, timestamps are subsequently increased by 1000 milliseconds.

Extracted GPS-Points, GPS-Tracks
Figure 2 Extracted GPS-Points

Identification of intersection points

Now it’s time to start with our analysis. Did our infected individual meet somebody?
Let’s start with an easy approach and select points of healthy individuals, which have been within 2 meters to infected individuals while respecting a time interval of 2 seconds. 

Figure 3 and 4 show results for the given query by highlighting points of contact for our individuals in blue. As mentioned before, this query covers the most basic solution to identify people who met.
Alternatively, PostGIS’ ST_CPAWithin and ST_ClosestPointOfApproach functions could be used here as well to solve this in a similar manner. To do so, our points must get modelled as trajectories first.

Contact points, GPS-Tracks
Figure 3 Contact points
Contact points - Zoom
Figure 4 Contact points - Zoom

To refine our solution, let’s identify temporal coherent segments of spatial proximity and respective passage times. The goal is to understand, how long people have been close enough together to inherit possible infections more realistically. The full query is attached to the end of the post. 

Step 1 - aggregStep1

Based on our first query results, for each point we calculate the time interval to its predecessor. Considering our initial assumptions, a non-coherent temporal segment will lead to gaps >1 second.

Temporal gaps between gps points, GPS-Tracks
Figure 5 Temporal gaps between gps points

Step 2 – aggregStep2

With our new column indicating the temporal gap between current and previous point, we cluster segments 

Step 3 – aggregStep3

For each cluster, we subtract min from max timestamp to extract a passage interval.

Coherent segments + passage time by infected/healthy customer, GPS-Tracks
Figure 6 Coherent segments + passage time by infected/healthy customer

Step 4

Finally, for each combination of infected/healthy individual, the segment with the maximum passage time is being extracted and a linestring is generated by utilizing st_makeline (see figure 7).

Max passage time by infected/healthy customer
Figure 7 Max passage time by infected/healthy customer

Even though results can serve as foundation for further analysis, our approach still remains constrained by our initially taken assumptions. 

Recently, a very promising PostgreSQL extension named MobilityDB has been published, which offers a lot of functionalities to solve all kinds of spatio-temporal related questions. Take a look at this blogpost to learn more.

Visualizing OSM data in QGIS

Last time we imported OpenStreetMap datasets from Iceland to PostGIS. To quickly visualize our OSM data results, we will now use QGIS to render our datasets and generate some nice maps on the client.

Let’s start with our prerequisites:

  1. A PostGIS enabled PostgreSQL database preloaded with OpenStreetMap datasets as described in blogpost /en/open-street-map-to-postgis-the-basics/
  2. An up and running QGIS Client, see https://www.qgis.org/de/site/ for further instructions
  3. A Git-Client to checkout some predefined styles for QGIS

Import OSM data sets into QGIS

QGIS supports PostgreSQL/PostGIS out of the box.
We can import our datasets by defining our database connection (1), showing up the db catalog (2) and adding datasets of interest (3) to our workspace.

Step 1Step 2Step 3
Visualizing OSM data in QGISVisualizing OSM data in QGIS
Figure 1 shows the resulting map, which serves as good basis for further adoptions.  

QGIS will automatically apply basic styles for the given geometry types and will start loading features of the current viewport and extent. Herby the tool will use information from PostGIS’s geometry_column view to extract geometry column, geometry type, spatial reference identifier and number of dimensions. To check what PostGIS is offering to QGIS, let’s query this view:

This brings up the following results:

tblnamegeocolgeodimsridtype
planet_osm_pointway23857POINT
planet_osm_lineway23857LINESTRING
planet_osm_polygonway23857GEOMETRY
planet_osm_roadsway23857LINESTRING

For type geometry, QGIS uses geometrytype to finally assess geometry’s type modifier.

select distinct geometrytype(way) from planet_osm_polygon;

Basic styling of OSM data in QGIS

As mentioned before, QGIS applies default styles to our datasets after being added to the QGIS workspace. Figure 1 gives a first impression on how points and lines from OpenStreetMap are visualized this way.

Basic QGIS style applied to osm_points and osm_lines from Iceland
Figure 1 Basic QGIS style applied to osm_points and osm_lines from Iceland

It’s now up to the user the define further styling rules to turn this basic map into a readable and performant map.

As a quick reminder - we imported our osm dump utilizing osm2pgsql with a default-style and no transformations into PostGIS. This resulted in three major tables for points, lines and polygons. It’s now up to the map designer to filter out relevant objects, such as amenities represented as points within the GIS client, to apply appropriate styling rules. Alternatively, it’s a good practice to the parametrize osm2pgsql during import or set up further views to normalize the database model.

Back to our styling task - how can we turn this useless map into something useable?

Please find below some basic rules to start with:

  1. Study OSM’s data-model ????
  2. Define which features are of interest and on which scale to present
  3. Develop queries to grab relevant features
  4. Turn queries in QGIS styling rules utilizing expressions
  5. Evaluate map beauty and performance
  6. Enjoy

Let me present this approach on amenities and pick some representative amenity types.

amenitycount
waste_basket1346
bench1285
parking479
 

To style osm_points and apply individual styles for waste_basket, bench and parking we open up the layer properties, click on symbology and finally select rule-based styling as demonstrated in the pictures below.
For each amenity type then, we set up an individual rule by

Step 1Step 2Step 3
Visualizing OSM data in QGISVisualizing OSM data in QGISVisualizing OSM data in QGIS

After zooming in, your map shows your styled amenities like below.

Custom QGIS styling applied to amenities from Reykjavík
Figure 2 Custom QGIS styling applied to amenities from Reykjavík

Well, better but not really appealing you might think, and you are absolutely right.
Luckily, we don’t have to start from scratch, and we can apply given styles to achieve better results.

But wait – we are not done. Let’s see what QGIS is requesting from the database to understand how actions in the map are converted to SQL statements. To do that, let’s enable LogCollector in postgres.conf and monitor the log file to grab the current sql statements. Subsequently this enables us to execute the query with explain analyse in the console to start with possible optimizations.

The resulting explain plan draws attention to a possibly useful, but missing index on column amenity, which has not been created by default by osm2pgsql. Additionally, the explain plain highlights the utilization of planet_osm_point_way_idx, an index on geometry column way, which is being used to quickly identify features, whose bounding boxes intersect with our map extent.

At this point it’s worth to mention auto_explain, a PostgreSQL extension which is logging execution plans for slow running queries automatically. As shortcut, checkout Kareel’s post about this great extension.

Import and apply QGIS styles

Let’s import layer settings on our datasets. For this demo I downloaded layer settings from https://github.com/yannos/Beautiful_OSM_in_QGIS, which result in a typical OpenStreetMap rendering view.

For each layer (point, line and polygon), open up the layer properties, go to symbology and load the individual layer settings file.

Step 1Step 2
Visualizing OSM data in QGISVisualizing OSM data in QGIS

Figure 3 shows the resulting map, which serves as good foundation for further adoptions.

It should be stressed that adding styling complexity is not free in terms of computational resources.
Slow loading maps on the client side are mostly caused by missing scale boundaries, expensive expressions leading to complex queries and missing indices on your backend, just to mention some of those traps.
OSM’s common style (https://www.openstreetmap.org) is generally very complex resulting in a resource-intensive rendering.

Figure 3 Customized QGIS Styling Yannos, https://github.com/yannos/Beautiful_OSM_in_QGIS 

Results

This time I briefly discussed how spatial data can be easily added from PostGIS to QGIS to ultimately create a simple visualization. This particular post emphasizes style complexity and finally draws attention to major pitfalls leading to slow rendering processes.

Open Street Map to PostGIS - The Basics

OSM to PostGIS – The Basics

Ever wondered how to import OSM (OpenStreetMap) data into PostGIS [1] for the purpose of visualization and further analytics? Here are the basic steps to do so.
There are a bunch of tools on the market— osm2pgsql; imposm; ogr2org; just to mention some of those. In this article I will focus on osm2pgsql [2] .

Let’s start with the software prerequisites. PostGIS comes as a PostgreSQL database extension, which must be installed in addition to the core database. Up till now, the latest PostGIS version is 3, which was released some days ago. For the current tasks I utilized PostGIS 2.5 on top of PostgreSQL 11.
This brings me to the basic requirements for the import – PostgreSQL >= 9.4 and PostGIS 2.2 are required, even though I recommend installing PostGIS >=2.5 on your database;  it’s supported from 9.4 upwards. Please consult PostGIS’ overall compatibility and support matrix [3] to find a matching pair of components.

Osm2pgsql Setup

Let’s start by setting up osm2pgsql on the OS of your choice – I stick to Ubuntu 18.04.04 Bionic Beaver and compiled osm2gsql from source to get the latest updates.

Install required libraries

sudo apt-get install make cmake g++ libboost-dev libboost-system-dev
libboost-filesystem-dev libexpat1-dev zlib1g-dev
libbz2-dev libpq-dev libproj-dev lua5.2 liblua5.2-dev

Grab the repo

git clone https://github.com/openstreetmap/osm2pgsql.git

Compile

mkdir build && cd build
cmake ..
make
sudo make install

If everything went fine, I suggest checking the resulting binary and its release by executing

./osm2pgsql-version

Data acquisition

In the world of OSM, data acquisition is a topic of its own, and worth writing a separate post discussing different acquisition strategies depending on business needs, spatial extent and update frequency. I won’t get into details here, instead, I’ll just grab my osm data for my preferred area directly from Geofabrik, a company offering data extracts and related daily updates for various regions of the world. This can be very handy when you are just interested in a subregion and therefore don’t want to deal with splitting the whole planet osm depending on your area of interest - even though osm2pgsql offers the possibility to hand over a bounding box as a spatial mask.
As a side note – osm data’s features are delivered as lon/lat by default.

So let’s get your hands dirty and fetch a pbf of your preferred area from Geofabrik’s download servers [4] [5]. For a quick start, I recommend downloading a dataset covering a small area:

wget https://download.geofabrik.de/europe/iceland-latest.osm.pbf

Optionally utilize osmium to check the pbf file by reading its metadata afterwards.

./osm2pgsql-version

Database setup

Before finally importing the osm into PostGIS, we have to set up a database enabling the PostGIS extension. As easy as it sounds – connect to your database with your preferred database client or pgsql, and enable the extension by executing

create extension postgis

Subsequently, execute

select POSTGIS_VERSION()

to validate the PostGIS installation within your database.

Database import

osm2pgsql is a powerful tool to import osmdata into PostGIS offering various parameters to tune. It’s worthwhile to mention the existence of the default-style parameter [6], which defines how osm data is parsed and finally represented in the database. The diagram below shows the common database model generated by osm2pgsql using the default style.

Figure 1 Default osm2pgsql db-model [6]
Figure 1 - Default osm2pgsql db-model [6]

It’s hard to give a recommendation on how this style should be adopted, as this heavily depends on its application. As a rule of thumb, the default style is a good starting point for spatial analysis, visualizations and can even be fed with spatial services, since this layout is supported by various solutions (e.g. Mapnik rendering engine).

I will start off with some basic import routines and then move to more advanced ones. To speed up the process in general, I advise you to define the number of processes and cache in MB to use. Even if this blogpost is not intended as a performance report, I attached some execution times and further numbers for the given parametrized commands to better understand the impact the mentioned parameters have.

The imports were performed on a virtualized Ubuntu 18.04 (KVM) machine equipped with 24 cores (out of 32 logical cores provided by an AMD Ryzen Threadripper 2950X), 24GB RAM, and a dedicated 2TB NVMe SSD (Samsung 970 EVO).

Mandatory parameters

Before going into details, internalize the following main parameters:
-U for database user, -W to prompt for the database password, -d refers to the database and finally -H defines the host. The database schema is not exposed as a parameter and therefore must be adjusted via the search path.

Import utilizing default style

Default import of pbf: existing tables will be overwritten. Features are projected to WebMercator (SRID 3857) by default.

osm2pgsql -U postgres -W -d osmDatabase -H 192.168.0.195 --number-processes 24 -C 20480 iceland-latest.osm.pbf

The command was executed in ~11 seconds. The table below shows generated objects and cardinalities.

table_schematable_namerow_estimatetotalindextable
publicplanet_osm_line14268197 MB13 MB67 MB
publicplanet_osm_point14501719 MB7240 kB12 MB
publicplanet_osm_polygon176204110 MB17 MB66 MB
publicplanet_osm_roads882414 MB696 kB5776 kB

Import utilizing default style with slim

Parameter s (“slim”) forces the tool to store temporary node information in the database instead of in the memory. It is an optional parameter intended to enable huge imports and avoid out-of-memory exceptions. The parameter is mandatory if you want to enable incremental updates instead of full ones.  The parameter can be complemented with --flat-nodes to store this information outside the database as a file.

The command was executed in ~37 seconds. The table below shows generated objects and cardinalities.

table_schematable_namerow_estimatetotalindextable
publicplanet_osm_line142681102 MB18 MB67 MB
publicplanet_osm_nodes6100392388 MB131 MB258 MB
publicplanet_osm_point14501723 MB11 MB12 MB
publicplanet_osm_polygon176204117 MB24 MB66 MB
publicplanet_osm_rels91419824 kB4872 kB4736 kB
publicplanet_osm_roads882415 MB1000 kB5776 kB
publicplanet_osm_ways325545399 MB306 MB91 MB

Import utilizing default style, tag configuration

By default, tags referenced by a column are exposed as separate columns. Parameter hstore forces the tool to store unreferenced tags in a separate hstore column.

Note: Database extension hstore must be installed beforehand.

osm2pgsql -U postgres -W -d osmDatabase -H 192.168.0.195 -s --hstore --number-processes 24 -C 20480 iceland-latest.osm.pbf

For the sake of completeness

The command was executed in ~35 seconds. The table below shows generated objects and cardinalities.

table_schematable_namerow_estimatetotalindextable
publicplanet_osm_line142875106 MB18 MB71 MB
publicplanet_osm_nodes6100560388 MB131 MB258 MB
publicplanet_osm_point15104127 MB12 MB15 MB
publicplanet_osm_polygon176342122 MB24 MB72 MB
publicplanet_osm_rels91419824 kB4872 kB4736 kB
publicplanet_osm_roads882416 MB1000 kB6240 kB
publicplanet_osm_ways325545399 MB306 MB91 MB

Import utilizing default style, multi-geometry

By default, objects containing multiple disjoint geometries are stored as separate features within the database. Think of Vienna and its districts, which could be represented as 23 individual polygons or one multi-polygon. Parameter -G forces the tool to store geometries belonging to the same object as multi-polygon.

osm2pgsql -U postgres -W -d osmDatabase -H 192.168.0.195 -s -G --number-processes 24 -C 20480 iceland-latest.osm.pbf

The impact emerges most clearly during spatial operations, since the spatial index utilizes the feature to its fullextent, in order to decide which features must be considered.

The command was executed in ~36 seconds. The table below shows generated objects and cardinalities.

table_schematable_namerow_estimatetotalindextable
publicplanet_osm_line142681102 MB18 MB67 MB
publicplanet_osm_nodes6100392388 MB131 MB258 MB
publicplanet_osm_point14501723 MB11 MB12 MB
publicplanet_osm_polygon174459117 MB24 MB65 MB
publicplanet_osm_rels91419824 kB4872 kB4736 kB
publicplanet_osm_roads882415 MB1000 kB5776 kB
publicplanet_osm_ways325545399 MB306 MB91 MB

Results and next steps

The table highlights the influence of osm2pgsql parameters on execution time, generated objects, cardinalities and subsequently sizing. In addition, it’s worth it to understand the impact of parameters like multi-geometry, which forces the tool to create multi-geometry features instead of single-geometry features. Preferring one over the other might lead to performance issues, especially when executing spatial operators (as those normally take advantage of the extent of the features).

The next posts will complement this post by inspecting and visualizing our import results and subsequently dealing with osm updates to stay up to date.

OSM to Postgis - QGIS Map of iceland
Figure 2 - QGIS-Map of iceland, SRID 3857 [7]
OSM to PostGIS - Qgis-Map of Reykjavík, SRID 3857 [7]
Figure 3 - QGIS-Map of Reykjavík, SRID 3857 [7]

References

[1]„PostGIS Reference,“ [Online]. Available: https://postgis.net/.
[2]„osm2pgsql GitHub Repository,“ [Online]. Available: https://github.com/openstreetmap/osm2pgsql.
[3]„PostGIS Compatiblity and Support,“ [Online]. Available: https://trac.osgeo.org/postgis/wiki/UsersWikiPostgreSQLPostGIS.
[4]„Geofabrik Download Server,“ [Online]. Available: https://download.geofabrik.de/.
[5]„PBF Format,“ [Online]. Available: https://wiki.openstreetmap.org/wiki/PBF_Format.
[6]„Default Style - osm2pgsql,“ [Online]. Available: https://wiki.openstreetmap.org/wiki/Osm2pgsql/schema.
[7]„QGIS Osm styles,“ [Online]. Available: https://github.com/yannos/Beautiful_OSM_in_QGIS.
[8]„osm2pgsql Database schema,“ [Online]. Available: https://wiki.openstreetmap.org/w/images/b/bf/UMLclassOf-osm2pgsql-schema.png.

Appendix

Generated objects and cardinalities from statistics

Generated objects and cardinalities with count