MobilityDB is on the move

Since my last article about MobilityDB, the overall project has further developed and improved.
This is a very good reason to invest some time and showcase MobilityDB’s rich feature stack by analyzing historical flight data from OpenSky-Network, a non-profit organisation which has been collecting air traffic surveillance data since 2013.

From data preparation and data visualization to analysis – this article covers all the steps needed to quick-start analyzing spatio-temporal data with PostGIS and MobilityDB together.
For a basic introduction to MobilityDB, please check out previous MobilityDB post first.

The article is divided into four sections:

  • Software requirements
  • Data allocation and preparation
  • Analysis
  • Outlook

Software requirements

The following software packages are recommended:

First, you need a current PostgreSQL database, equipped with PostGIS and MobilityDB. I recommend using the latest releases available for your OS of choice, even though MobilityDB works with older releases of PostgreSQL and PostGIS, too. Alternatively, use a docker container from https://registry.hub.docker.com/r/codewit/mobilitydb for those who don’t want to build MobilityDB from scratch.
Second, you’ll need a tool to copy our raw flight data served as huge csv files to our database. For this kind of task, I typically use ogr2ogr, a command-line tool shipped with gdal.
Lastly, to visualize our results graphically, we’ll utilize our good old friend “Quantum GIS, a feature-rich GIS-client, which is available for various operating systems.

Here’s a quick summary of my setup:

  • Ubuntu (20.04.3),
  • PostgreSQL (13),
  • PostGIS (3.2.1),
  • MobilityDB (1.0.0),
  • ogr2ogr (3.0.4),
  • QGIS (3.20.3)

Data allocation and preparation

The foundation of our analysis is historical flight data, offered by OpenSky-Network free of charge for non-commercial usage. OpenSky offers snapshots of the previous Monday’s complete state vector data for the last 6 months. These data sets cover information about time (update interval is one second), icao24, lat/lon, velocity, heading, vertrate, callsign, onground, alert/spi, squawk, baro/geoaltitude, lastposupdate and lastcontact. A detailed description can be found at here.

Enough theory; let’s start by downloading historical datasets for our analysis. The foundation for this post are 24 csv files covering flight data for the day 2022-02-28, which can be downloaded from here.

We’ll continue by creating a new PostgreSQL database, with extensions for PostGIS and MobilityDB enabled. The database will store our state vectors.

postgres=# create database flightanalysis;
CREATE DATABASE

postgres=# \c flightanalysis
You are now connected to database "flightanalysis" as user "postgres".
flightanalysis=# create extension MobilityDB cascade;
NOTICE: installing required extension "postgis"
CREATE EXTENSION

flightanalysis=# \dx
List of installed extensions
Name        | Version | Schema      | Description
------------+---------+------------+------------------------------------------------------------
mobilitydb  | 1.0.0   | public      | MobilityDB Extension
plpgsql     | 1.0     | pg_catalog  | PL/pgSQL procedural language
postgis     | 3.2.1   | public      | PostGIS geometry and geography spatial types and functions
(3 rows)

From OpenSky’s dataset descriptions, we inherit a data structure for a staging table, which stores unaltered raw state vectors.

create table tvectors
(
    ogc_fid       integer default nextval('flightsegments_ogc_fid_seq'::regclass) not null
        constraint flightsegments_pkey
            primary key,
    time          integer,
    ftime         timestamp with time zone,
    icao24        varchar,
    lat           double precision,
    lon           double precision,
    velocity      double precision,
    heading       double precision,
    vertrate      double precision,
    callsign      varchar,
    onground      boolean,
    alert         boolean,
    spi           boolean,
    squawk        integer,
    baroaltitude  double precision,
    geoaltitude   double precision,
    lastposupdate double precision,
    lastcontact   double precision
);

create index idx_points_icao24
    on tvectors (icao24, callsign);

From a directory containing the uncompressed csv files, state vectors can now be imported to our flightanalysis database as follows:

for f in `ls *.csv`; do
ogr2ogr -f PostgreSQL PG:"user=postgres dbname=flightanalysis" $f -oo AUTODETECT_TYPE=YES -nln tvectors --config PG_USE_COPY YES
done

How many vectors did we import?

flightanalysis=# select count(*) from tvectors;
count
----------
52261548
(1 row)

To analyse our vectors with MobilityDB, we must now turn our native position data into trajectories.
MobilityDB offers various data types to model trajectories, such as tgeompoint, which represents a temporal geometry point type.

create table if not exists flightsegments
(
icao24 varchar not null,
callsign varchar not null,
trip tgeompoint,
traj geometry(Geometry,4326),
constraint flightsegment_2_pkey
primary key (icao24, callsign)
);

create index idx_trip
on flightsegments using gist (trip);

create index idx_traj
on flightsegments using gist (traj);

To track individual flights from airplanes, trajectories must be generated by selecting vectors by icao24 and callsign ordered by time.

Initially, timestamps (column time) are represented as a unix timestamp in our staging table. For ease of use, we’ll translate unix timestamps to timestamps with time zones first.


update tvectors
set ftime= to_timestamp(time) AT TIME ZONE 'UTC';

create index idx_points_ftime
on tvectors (ftime);

Finally, we can create our trajectories by aggregating locations by icao24 and callsign ordered by time.

insert into flightsegments(icao24, callsign, trip)
SELECT icao24,
       callsign,
       tgeompoint_seq(array_agg(tgeompoint_inst(ST_SetSRID(ST_MakePoint(Lon, Lat), 4326), ftime) ORDER BY ftime))
from tvectors
where lon is not null
  and lat is not null
group by icao24, callsign;

To visualize our aggregated vectors in QGIS, someone must extract the vectors’ raw geometries from tgeompoint, as this data type is not supported natively out of the box from QGIS.
To create a geometrically simplified version for better performance, we’ll utilize st_simplify on top of trajectory. It’s worth mentioning that simplification methods from MobilityDB also exist, which simplify the whole trajectory and not only its geometry.

update flightsegments
set traj = st_simplify(trajectory(trip)::geometry, 0.001)::geometry;

From the images below, you can see the vast number of vectors these freely available datasets contain just for one single day.

You can see in the image that the data is noisy, and must be further cleaned and filtered. Gaps in coverage and recording lead to trajectories spanning the world – and subsequently lead to wrong and misleading results. Nevertheless, we’ll skip this extensive cleansing step today (but might cover this as a separate blog post) and move on with our analysis focusing on a rather small area.

 

Figure 1: Flight vectors world, 2022-02-28 - MobilityDB Blog
Figure 1: Flight vectors world, 2022-02-28

Analysis

Let’s investigate the trajectories and play through some interesting scenarios.

Number of distinct airframes (identified by icao24)

flightanalysis=# select count(distinct icao24) from flightsegments;
count
-------
41663
(1 row)

Average flight duration

flightanalysis=# select avg(duration(trip)) from flightsegments where st_length(trajectory(trip::tgeogpoint)) > 0;
avg
-----------------
03:50:48.245525
(1 row)

Flight vectors intersecting with Iceland between [2022-02-28 00:00:00+00, 2022-02-28 03:00:00+00]

For this kind of analysis, download and import country borders from Natural Earth first.

select icao24,
      callsign,
      trajectory(atPeriod(T.Trip, '[2022-02-28 00:00:00+00, [2022-2-28 03:00:00+00]'::period))
FROM flightsegments T,
    ne_10m_admin_0_countries R
WHERE T.Trip && stbox(R.geom, '[2022-02-28 00:00:00+00, [2022-02-28 03:00:00+00]'::period)
  AND st_intersects(trajectory(atPeriod(T.Trip, '[2022-02-28 00:00:00+00, [2022-2-28 03:00:00+00]'::period)),
                    r.geom)
  and R.name = 'Iceland'
  and st_length(trajectory(atPeriod(T.Trip, '[2022-02-28 00:00:00+00, [2022-2-28 03:00:00+00]'::period))::geography) > 0
Figure 2 Vectors intersecting with Iceland: MobilityDB Blog
Figure 2: Vectors intersecting with Iceland

Duration of flyover for flight vectors intersecting with Iceland between [2022-02-28 00:00:00+00, 2022-02-28 03:00:00+00]

select icao24,
      callsign,
      (duration(
              (atgeometry((atPeriod(T.Trip::tgeompoint, '[2022-02-28 00:00:00+00, [2022-02-28 03:00:00+00]'::period)),
                          R.geom))))::varchar,
      trajectory(
              atgeometry((atPeriod(T.Trip::tgeompoint, '[2022-02-28 00:00:00+00, [2022-02-28 03:00:00+00]'::period)),
                          R.geom))
FROM flightsegments T,
    ne_10m_admin_0_countries R
WHERE T.Trip && stbox(R.geom, '[2022-02-28 00:00:00+00, [2022-02-28 03:00:00+00]'::period)
  AND st_intersects(trajectory(atPeriod(T.Trip, '[2022-02-28 00:00:00+00, [2022-2-28 03:00:00+00]'::period)),
                    r.geom)
  and R.name = 'Iceland'
  and st_length(trajectory(atPeriod(T.Trip, '[2022-02-28 00:00:00+00, [2022-2-28 03:00:00+00]'::period))::geography) > 0

Figure 3 Clipped vectors intersecting with Iceland: Mobility DB
Figure 3: Clipped vectors intersecting with Iceland

 

Figure 4 Clipped vector intersecting with Iceland, icao24 "4cc2c5" callsign "ICE1046"
Figure 4: Clipped vector intersecting with Iceland, icao24 “4cc2c5” callsign “ICE1046”

Border crossing for individual airframe and callsign while intersecting Iceland between [2022-02-28 00:00:00+00, [2022-02-28 03:00:00+00]

Note that in this case, the plane did not leave the country during this trip.

with segments as (
    select icao24,
           callsign,
           unnest(sequences(
                   atgeometry(
                           (atPeriod(T.Trip::tgeompoint, '[2022-02-28 00:00:00+00, [2022-02-28 03:00:00+00]'::period)),
                           R.geom))) segment
    FROM flightsegments T,
         ne_10m_admin_0_countries R
    WHERE T.Trip && stbox(R.geom, '[2022-02-28 00:00:00+00, [2022-02-28 03:00:00+00]'::period)
      AND st_intersects(trajectory(atPeriod(T.Trip, '[2022-02-28 00:00:00+00, [2022-2-28 03:00:00+00]'::period)),
                        r.geom)
      and R.name = 'Iceland'
      and trim(callsign) = 'ICE1046'
      and st_length(trajectory(atPeriod(T.Trip,
                                        '[2022-02-28 00:00:00+00, [2022-2-28 03:00:00+00]'::period))::geography) > 0)

select icao24,
       callsign,
       st_startpoint(getvalues(segment)),
       st_endpoint(getvalues(segment)),
       starttimestamp(segment),
       endtimestamp(segment)
from segments
Figure 5 Border crossing, icao24 "4cc2c5" callsign "ICE1046" - MobilityDB Blog
Figure 5: Border crossing, icao24 “4cc2c5” callsign “ICE1046”

Outlook

MobilityDB’s rich feature stack drastically simplifies analysis on spatio-temporal data. If I may be allowed to imagine further uses for it, various businesses come to my mind where it could be applied in a smart and efficient manner. Toll systems, surveillance, logistics – just to name a few. Finally, its upcoming release will pave the way for new usages, from research and test utilization to production scenarios.

I hope you enjoyed this session. Learn more about using PostGIS by checking out my blog on spatial datasets based on the OpenStreetMap service

In case you want to learn more about PostGIS and Mobility, just get in touch with the folks from CYBERTEC. Stay tuned!

4 thoughts on “Analyzing historical flight data with MobilityDB

  1. I have a few queries:

    1. How to use velocity, heading and altitude fields to precisely depict/present the flights on map.
    2. What sort of different analysis can be drawn from this type of data?

    Would you please guide about it?

  2. Thanks for this, MobilityDB has great features but the documentation is not that great.

Comments are closed.