In times of COVID-19, governments contemplate tough measures to identify and trace infected people. These measures include the utilization of mobile phone data to trace down infected individuals and subsequently contacts to curb the epidemic. This article shows how PostGIS’ functions can be used to identify “overlapping” sections of infected and healthy individuals by analysing tracks spatio-temporally.
This time we don’t focus on performance and tuning, rather strive boosting your creativity on PostgreSQL’s spatial extension and its functionalities.
The article is structured as follows:
- Set up data structures within PostgreSQL
- Set up sample tracks via QGIS
- Segment sample tracks to retrieve induvial track points
- Intersection of infected individual to retrieve possible contacts
Let’s start by defining tables representing tracks and their points.
Table mobile_tracks acts as a helper table storing artificial tracks of individuals, which have been drawn via QGIS.
Table mobile_points stores points, which result from segmenting tracks. Finally, they have been enriched by timestamps, which are used to identify temporal intersections on top of spatial intersections.
The test area is located in Vienna/Austria; therefore I chose MGI/Austria 34 (EPSG as 31286) as appropriate projection.
create table if not exists mobile_tracks ( gid serial not null constraint mobile_tracks_ok primary key, customer_id integer, geom geometry(LineString,31286), infected boolean default false ); create index if not exists mobile_tracks_geom_idx on mobile_tracks using gist (geom); create unique index if not exists mobile_tracks_customer_id_uindex on mobile_tracks (customer_id);
create table if not exists mobile_points ( gid serial not null constraint mobile_points_ok primary key, customer_id integer, geom geometry(Point,31286), infected boolean, recorded timestamp ); create index if not exists mobile_points_geom_idx on mobile_points using gist (geom);
Tracks and points
As mentioned in the beginning, for points I decided to generate artificial ones from tracks, I previously digitized via QGIS. Figure 1 shows tracks of infected people in red, healthy individuals are represented as green line strings as foundation.
For our simple example, I made the following assumptions:
- Individuals are moving with the same speed
- Tracks of individuals start at the same time
To extract individual points for tracks, I utilized PostGIS’s ST_Segmentize function as follows:
with dumpedPoints as ( select (st_dumppoints(st_segmentize(geom, 1))).geom, ((st_dumppoints(st_segmentize(geom, 1))).path) as path, customer_id, infected, gid from mobile_tracks), aggreg as ( select *, now() + interval '1 second' * row_number() over (partition by customer_id order by path) as tstamp from dumpedPoints) insert into mobile_points(geom, customer_id, infected, recorded) select geom, customer_id, infected, tstamp from aggreg;
The query creates points every meter, timestamps are subsequently increased by 1000 milliseconds.
Identification of intersection points
Now it’s time to start with our analysis. Did our infected individual meet somebody?
Let’s start with an easy approach and select points of healthy individuals, which have been within 2 meters to infected individuals while respecting a time interval of 2 seconds.
SELECT distinct on (m1.gid) m1.customer_id infectionSourceCust, m2.customer_id infectionTargetCust, m1.gid, m1.recorded, m2.gid, m2.recorded FROM mobile_points m1 inner join mobile_points m2 on st_dwithin(m1.geom, m2.geom, 2) where m1.infected = true and m2.infected = false and m1.gid < m2.gid AND (m2.recorded >= m1.recorded - interval '1seconds' and m2.recorded <= m1.recorded + interval '1seconds') order by m1.gid, st_dwithin(m1.geom, m2.geom, 2) asc
Figure 3 and 4 show results for the given query by highlighting points of contact for our individuals in blue. As mentioned before, this query covers the most basic solution to identify people who met.
Alternatively, PostGIS’ ST_CPAWithin and ST_ClosestPointOfApproach functions could be used here as well to solve this in a similar manner. To do so, our points must get modelled as trajectories first.
To refine our solution, let’s identify temporal coherent segments of spatial proximity and respective passage times. The goal is to understand, how long people have been close enough together to inherit possible infections more realistically. The full query is attached to the end of the post.
Step 1 – aggregStep1
Based on our first query results, for each point we calculate the time interval to its predecessor. Considering our initial assumptions, a non-coherent temporal segment will lead to gaps >1 second.
Step 2 – aggregStep2
With our new column indicating the temporal gap between current and previous point, we cluster segments
Step 3 – aggregStep3
For each cluster, we subtract min from max timestamp to extract a passage interval.
Finally, for each combination of infected/healthy individual, the segment with the maximum passage time is being extracted and a linestring is generated by utilizing st_makeline (see figure 7).
Even though results can serve as foundation for further analysis, our approach still remains constrained by our initially taken assumptions.
Recently, a very promising PostgreSQL extension named MobilityDB has been published, which offers a lot of functionalities to solve all kinds of spatio-temporal related questions. Take a look at this blogpost to learn more.
with points as ( SELECT distinct on (m1.gid) m1.customer_id infectionSourceCust, m2.customer_id infectionTargetCust, m1.gid, m1.geom, m1.recorded m1rec, m2.gid, m2.recorded FROM mobile_points m1 inner join mobile_points m2 on st_dwithin(m1.geom, m2.geom, 2) where m1.infected = true and m2.infected = false and m1.gid < m2.gid AND (m2.recorded >= m1.recorded - interval '1seconds' and m2.recorded <= m1.recorded + interval '1seconds') order by m1.gid, m1.recorded), aggregStep1 as ( SELECT *, (m1rec - lag(m1rec, 1) OVER (partition by infectionSourceCust,infectionTargetCust ORDER by m1rec ASC)) as lag from points), aggregStep2 as (SELECT *, SUM(CASE WHEN extract('epoch' from lag) > 1 THEN 1 ELSE 0 END) OVER (partition by infectionSourceCust,infectionTargetCust ORDER BY m1rec ASC) AS legSegment from aggregStep1), aggregStep3 as ( select *, min(m1rec) OVER w minRec, max(m1rec) OVER w maxRec, (max(m1rec) over w) - (min(m1rec) OVER w) recDiff from aggregStep2 window w as (partition by infectionSourceCust,infectionTargetCust,legSegment) ) select distinct on (infectionsourcecust, infectiontargetcust) infectionsourcecust, infectiontargetcust, (extract('epoch' from (recdiff))) passageTime, st_makeline(geom) OVER (partition by infectionSourceCust,infectionTargetCust,legSegment) from aggregStep3 order by infectionsourcecust, infectiontargetcust, passageTime desc