Workshop materials for Development Data Partnership Day, Washington D.C., 5 June 2026. Event page: https://datapartnership.org/updates/partnership-day/
pingkit is a small Python toolkit and a pair of teaching notebooks that walk through the end-to-end workflow for working with mobile-location (GPS ping) data in a transport-analysis setting: loading raw ping tables, quality-checking the panel, detecting activity stops and building a trip-based OD matrix.
- Sebastian Mueller, Asian Development Bank
- Maria Sol Tadeo, World Bank
By the end of the workshop an attendee can:
- Explain what a GPS ping record contains and how it is collected.
- Name 2–3 transport use cases for ping data and articulate the key biases.
- Describe how Irys and Quadrant differ on coverage, sampling, and access.
- List the main re-identification risks and the standard mitigations (aggregation, k-anonymity).
- (Part 2 attendees) Load a ping dataset, run quality-control checks, and build a simple OD matrix with a map.
Part 1 — Theory (45 min, stands alone)
- Mobile location data 101: what it is (GPS pings), how it's collected, what a typical record looks like
- Use cases for transport: commuting patterns, OD matrix generation, etc.
- Data providers: overview of Irys and Quadrant, coverage and sampling differences, access models, and known limitations (representativeness, panel bias, privacy considerations)
- Privacy, ethics, and responsible use: re-identification risks, aggregation and k-anonymity practices
- How to build an OD matrix
- Intro to KidoDynamics, a telcom data-derived analytics provider
Brief Q&A
Part 2 — Hands-On (45 min, for those who stay)
- Hands-on Part 1 — pingkit walkthrough: loading a sample dataset, basic exploration, quality checks
- Hands-on Part 2 — applied workflow: building an OD matrix from sample data, plus a quick visualization
- Final Q&A and next steps: how to request data access, where to find documentation, and follow-up channel
- Part 1 prerequisites. None. The theory section is code-free and defines all jargon inline.
- Part 2 prerequisites. Python and
pandasliteracy. No prior mobile-data experience required. - Access requirements. None for this repository — the sample dataset is fully synthetic. Real Irys / Quadrant data is available to staff of Development Data Partnership member organisations through the Partnership Portal.
The table of contents below is generated from docs/_toc.yml:
A flat map of the key files:
| Path | What it is |
|---|---|
docs/training.md |
Part 1 theory chapter — slide-ready Markdown with speaker notes |
notebooks/01_explore.ipynb |
Part 2 hands-on 1 — load the sample dataset, run QC |
notebooks/02_od_matrix.ipynb |
Part 2 hands-on 2 — detect stops, build a trip-based (time-resolved) OD matrix with k-anonymity, render a flow map |
src/pingkit/ |
Small library: io, quality, od, viz |
data/sample_pings_dc.parquet |
Synthetic dataset (~2.75M pings, 5,000 devices, 7 days, Washington D.C.; heavy-tailed panel, employment-centre commutes) — see data/README.md |
scripts/generate_sample.py |
Reproducible generator for the sample dataset (fixed seed) |
- Open this repository on GitHub.
- Click Code → Codespaces → Create codespace on main.
- Wait for the devcontainer to build. The post-create command (
uv pip install --system -e .) installspingkitand all dependencies frompyproject.toml— typically under two minutes. - Open
notebooks/01_explore.ipynb, select the Python 3 kernel when prompted, and run cells top to bottom.
See docs/github-codespaces-setup.md for a step-by-step guide, including how to avoid charges on a paid GitHub plan.
Requires Python ≥ 3.10.
git clone https://github.com/datapartnership/pingkit.git
cd pingkit
pip install -e .
jupyter lab notebooks/GeoPandas brings in GDAL, GEOS, and PROJ; if pip install -e . fails locally, install those system libraries first (brew install gdal geos proj on macOS; the system Python on Linux usually already has them via libgdal-dev, libgeos-dev, libproj-dev).
The committed Parquet at data/sample_pings_dc.parquet is reproducible from a fixed seed:
python scripts/generate_sample.pyPass --n-devices, --seed, or --output to vary it. See data/README.md for the full schema, generation method, and known limitations of the synthetic data.
- The dataset shipped with this repository is synthetic.
- No real Irys or Quadrant data is included.
- The Part 1 theory chapter and the OD-matrix notebook walk through re-identification risks and the aggregation / k-anonymity practices recommended for working with real ping data.
- Data access. Staff of Development Partner organisations can request Irys or Quadrant data via the Partnership Portal.