In times of COVID-19, governments contemplate tough measures to identify and trace infected people. These measures include the utilization of mobile phone data to trace down infected individuals and subsequently contacts to curb the epidemic. This article shows how PostGIS’ functions can be used to identify “overlapping” sections of infected and healthy individuals by analysing tracks spatio-temporally.
This time we don’t focus on performance and tuning, rather strive boosting your creativity on PostgreSQL’s spatial extension and its functionalities.

The article is structured as follows:

  1. Set up data structures within PostgreSQL
  2. Set up sample tracks via QGIS
  3. Segment sample tracks to retrieve induvial track points
  4. Intersection of infected individual to retrieve possible contacts

Data structures

Let’s start by defining tables representing tracks and their points.
Table mobile_tracks acts as a helper table storing artificial tracks of individuals, which have been drawn via QGIS.
Table mobile_points stores points, which result from segmenting tracks. Finally, they have been enriched by timestamps, which are used to identify temporal intersections on top of spatial intersections. 

The test area is located in Vienna/Austria; therefore I chose MGI/Austria 34 (EPSG as 31286) as appropriate projection.

create table if not exists mobile_tracks
(
   gid serial not null
      constraint mobile_tracks_ok
         primary key,
   customer_id integer,
   geom geometry(LineString,31286),
   infected boolean default false
);

create index if not exists mobile_tracks_geom_idx
   on mobile_tracks using gist (geom);

create unique index if not exists mobile_tracks_customer_id_uindex
    on mobile_tracks (customer_id);
create table if not exists mobile_points
(
   gid serial not null
      constraint mobile_points_ok
         primary key,
   customer_id integer,
   geom geometry(Point,31286),
   infected boolean,
   recorded timestamp
);

create index if not exists mobile_points_geom_idx
   on mobile_points using gist (geom);


Tracks and points

As mentioned in the beginning, for points I decided to generate artificial ones from tracks, I previously digitized via QGIS. Figure 1 shows tracks of infected people in red, healthy individuals are represented as green line strings as foundation.

GPS tracks of healthy and infected individuals, GPS-Tracks
Figure 1 GPS tracks of healthy and infected individuals

 

For our simple example, I made the following assumptions:

  • Individuals are moving with the same speed
  • Tracks of individuals start at the same time

To extract individual points for tracks, I utilized PostGIS’s ST_Segmentize function as follows:

with dumpedPoints as (
    select (st_dumppoints(st_segmentize(geom, 1))).geom,
           ((st_dumppoints(st_segmentize(geom, 1))).path[1]) as path,
           customer_id,
           infected,
           gid
    from mobile_tracks),
     aggreg as (
         select *, now() + interval '1 second' * row_number() over (partition by customer_id order by path) as tstamp
         from dumpedPoints)
insert
into mobile_points(geom, customer_id, infected, recorded)
select geom, customer_id, infected, tstamp
from aggreg;

The query creates points every meter, timestamps are subsequently increased by 1000 milliseconds.

Extracted GPS-Points, GPS-Tracks
Figure 2 Extracted GPS-Points

Identification of intersection points

Now it’s time to start with our analysis. Did our infected individual meet somebody?
Let’s start with an easy approach and select points of healthy individuals, which have been within 2 meters to infected individuals while respecting a time interval of 2 seconds. 

SELECT distinct on (m1.gid) m1.customer_id infectionSourceCust,
                            m2.customer_id infectionTargetCust,
                            m1.gid,
                            m1.recorded,
                            m2.gid,
                            m2.recorded
FROM mobile_points m1
         inner join
     mobile_points m2 on st_dwithin(m1.geom, m2.geom, 2)
where m1.infected = true
  and m2.infected = false
  and m1.gid < m2.gid AND (m2.recorded >= m1.recorded - interval '1seconds' and
       m2.recorded <= m1.recorded + interval '1seconds')
order by m1.gid, st_dwithin(m1.geom, m2.geom, 2) asc


Figure 3 and 4 show results for the given query by highlighting points of contact for our individuals in blue. As mentioned before, this query covers the most basic solution to identify people who met.
Alternatively, PostGIS’ ST_CPAWithin and ST_ClosestPointOfApproach functions could be used here as well to solve this in a similar manner. To do so, our points must get modelled as trajectories first.

Contact points, GPS-Tracks
Figure 3 Contact points

Contact points - Zoom
Figure 4 Contact points – Zoom

 

To refine our solution, let’s identify temporal coherent segments of spatial proximity and respective passage times. The goal is to understand, how long people have been close enough together to inherit possible infections more realistically. The full query is attached to the end of the post. 

Step 1 – aggregStep1
Based on our first query results, for each point we calculate the time interval to its predecessor. Considering our initial assumptions, a non-coherent temporal segment will lead to gaps >1 second.

Temporal gaps between gps points, GPS-Tracks
Figure 5 Temporal gaps between gps points

 

Step 2 – aggregStep2
With our new column indicating the temporal gap between current and previous point, we cluster segments 

Step 3 – aggregStep3
For each cluster, we subtract min from max timestamp to extract a passage interval.

Coherent segments + passage time by infected/healthy customer, GPS-Tracks
Figure 6 Coherent segments + passage time by infected/healthy customer

 

Step 4
Finally, for each combination of infected/healthy individual, the segment with the maximum passage time is being extracted and a linestring is generated by utilizing st_makeline (see figure 7).

Max passage time by infected/healthy customer
Figure 7 Max passage time by infected/healthy customer

 

Even though results can serve as foundation for further analysis, our approach still remains constrained by our initially taken assumptions. 

Recently, a very promising PostgreSQL extension named MobilityDB has been published, which offers a lot of functionalities to solve all kinds of spatio-temporal related questions. Take a look at this blogpost to learn more.

with points as
         (
             SELECT distinct on (m1.gid) m1.customer_id infectionSourceCust,

                                         m2.customer_id infectionTargetCust,
                                         m1.gid,
                                         m1.geom,
                                         m1.recorded    m1rec,
                                         m2.gid,
                                         m2.recorded
             FROM mobile_points m1
                      inner join
                  mobile_points m2 on st_dwithin(m1.geom, m2.geom, 2)
             where m1.infected = true
               and m2.infected = false
               and m1.gid < m2.gid AND (m2.recorded >= m1.recorded - interval '1seconds' and
                    m2.recorded <= m1.recorded + interval '1seconds') order by m1.gid, m1.recorded), aggregStep1 as ( SELECT *, (m1rec - lag(m1rec, 1) OVER (partition by infectionSourceCust,infectionTargetCust ORDER by m1rec ASC)) as lag from points), aggregStep2 as (SELECT *, SUM(CASE WHEN extract('epoch' from lag) > 1 THEN 1 ELSE 0 END)
                 OVER (partition by infectionSourceCust,infectionTargetCust ORDER BY m1rec ASC) AS legSegment
          from aggregStep1),
     aggregStep3 as (
         select *,
                min(m1rec) OVER w   minRec,
                max(m1rec) OVER w   maxRec,
                (max(m1rec) over w) -
                (min(m1rec) OVER w) recDiff
         from aggregStep2
             window w as (partition by infectionSourceCust,infectionTargetCust,legSegment)
     )
select distinct on (infectionsourcecust, infectiontargetcust) infectionsourcecust,
                                                              infectiontargetcust,
                                                              (extract('epoch' from (recdiff))) passageTime,
                                                              st_makeline(geom) OVER (partition by infectionSourceCust,infectionTargetCust,legSegment)
from aggregStep3
order by infectionsourcecust, infectiontargetcust, passageTime desc