Skip to content

datapartnership/pingkit

Repository files navigation

pingkit — Mobile location data for transport

Workshop materials for Development Data Partnership Day, Washington D.C., 5 June 2026. Event page: https://datapartnership.org/updates/partnership-day/

pingkit is a small Python toolkit and a pair of teaching notebooks that walk through the end-to-end workflow for working with mobile-location (GPS ping) data in a transport-analysis setting: loading raw ping tables, quality-checking the panel, detecting activity stops and building a trip-based OD matrix.

Presenters

  • Sebastian Mueller, Asian Development Bank
  • Maria Sol Tadeo, World Bank

Learning objectives

By the end of the workshop an attendee can:

  1. Explain what a GPS ping record contains and how it is collected.
  2. Name 2–3 transport use cases for ping data and articulate the key biases.
  3. Describe how Irys and Quadrant differ on coverage, sampling, and access.
  4. List the main re-identification risks and the standard mitigations (aggregation, k-anonymity).
  5. (Part 2 attendees) Load a ping dataset, run quality-control checks, and build a simple OD matrix with a map.

Agenda — 90 minutes (45 + 45)

Part 1 — Theory (45 min, stands alone)

  • Mobile location data 101: what it is (GPS pings), how it's collected, what a typical record looks like
  • Use cases for transport: commuting patterns, OD matrix generation, etc.
  • Data providers: overview of Irys and Quadrant, coverage and sampling differences, access models, and known limitations (representativeness, panel bias, privacy considerations)
  • Privacy, ethics, and responsible use: re-identification risks, aggregation and k-anonymity practices
  • How to build an OD matrix
  • Intro to KidoDynamics, a telcom data-derived analytics provider

Brief Q&A

Part 2 — Hands-On (45 min, for those who stay)

  • Hands-on Part 1 — pingkit walkthrough: loading a sample dataset, basic exploration, quality checks
  • Hands-on Part 2 — applied workflow: building an OD matrix from sample data, plus a quick visualization
  • Final Q&A and next steps: how to request data access, where to find documentation, and follow-up channel

Audience and prerequisites

  • Part 1 prerequisites. None. The theory section is code-free and defines all jargon inline.
  • Part 2 prerequisites. Python and pandas literacy. No prior mobile-data experience required.
  • Access requirements. None for this repository — the sample dataset is fully synthetic. Real Irys / Quadrant data is available to staff of Development Data Partnership member organisations through the Partnership Portal.

What's in this repository

The table of contents below is generated from docs/_toc.yml:

A flat map of the key files:

Path What it is
docs/training.md Part 1 theory chapter — slide-ready Markdown with speaker notes
notebooks/01_explore.ipynb Part 2 hands-on 1 — load the sample dataset, run QC
notebooks/02_od_matrix.ipynb Part 2 hands-on 2 — detect stops, build a trip-based (time-resolved) OD matrix with k-anonymity, render a flow map
src/pingkit/ Small library: io, quality, od, viz
data/sample_pings_dc.parquet Synthetic dataset (~2.75M pings, 5,000 devices, 7 days, Washington D.C.; heavy-tailed panel, employment-centre commutes) — see data/README.md
scripts/generate_sample.py Reproducible generator for the sample dataset (fixed seed)

Getting started

Option A — GitHub Codespaces (recommended)

  1. Open this repository on GitHub.
  2. Click Code → Codespaces → Create codespace on main.
  3. Wait for the devcontainer to build. The post-create command (uv pip install --system -e .) installs pingkit and all dependencies from pyproject.toml — typically under two minutes.
  4. Open notebooks/01_explore.ipynb, select the Python 3 kernel when prompted, and run cells top to bottom.

See docs/github-codespaces-setup.md for a step-by-step guide, including how to avoid charges on a paid GitHub plan.

Option B — Local install

Requires Python ≥ 3.10.

git clone https://github.com/datapartnership/pingkit.git
cd pingkit
pip install -e .
jupyter lab notebooks/

GeoPandas brings in GDAL, GEOS, and PROJ; if pip install -e . fails locally, install those system libraries first (brew install gdal geos proj on macOS; the system Python on Linux usually already has them via libgdal-dev, libgeos-dev, libproj-dev).

Re-generating the sample dataset

The committed Parquet at data/sample_pings_dc.parquet is reproducible from a fixed seed:

python scripts/generate_sample.py

Pass --n-devices, --seed, or --output to vary it. See data/README.md for the full schema, generation method, and known limitations of the synthetic data.

Data use and privacy

  • The dataset shipped with this repository is synthetic.
  • No real Irys or Quadrant data is included.
  • The Part 1 theory chapter and the OD-matrix notebook walk through re-identification risks and the aggregation / k-anonymity practices recommended for working with real ping data.

Follow-up

  • Data access. Staff of Development Partner organisations can request Irys or Quadrant data via the Partnership Portal.

License

Mozilla Public License 2.0.

Releases

No releases published

Packages

 
 
 

Contributors

Generated from worldbank/template