Skip to content

hcu-cml/pykci

Repository files navigation

pykci - Python Knowledge Graph for Cities

pykci

Python Knowledge Graph for Cities

Python 3 Neo4j CityGML 2.0 3D Tiles LLM Round-trip License: MIT

Ingest CityGML city models into a Neo4j graph, query them in natural language, and export them back.


What is pykci?

Cities publish their 3D models as CityGML, an XML-based OGC standard. pykci reads those files into a Neo4j property graph, where buildings, walls, roofs, trees, bridges, and tunnels become nodes and relationships you can query.

From the graph you can:

  • Ask questions in natural language ("How many buildings are taller than 20 m near the harbour?") via a local or commercial LLM that generates Cypher.
  • View the city in 3D in a browser (3D Tiles + CesiumJS) or explore the graph itself interactively.
  • Export back to CityGML. The round-trip aims to be lossless: IDs, coordinate strings, and attributes are stored verbatim and re-emitted.

New to 3D city models? CityGML describes a city's geometry together with semantic attributes (function, roof type, height, and so on). Representing it as a graph makes the relationships between objects - which surfaces belong to a building, which buildings are near each other - explicit and queryable.


Highlights

🏙️ CityGML 2.0 coverage The 10 thematic modules with geometry (Building, Transportation, Vegetation, WaterBody, LandUse, CityFurniture, Relief, Bridge, Tunnel, CityObjectGroup), LoD 0–4, implicit geometry.
🔁 Lossless round-trip Ingest → graph → export. A census gate checks that elements, attributes, text nodes, and coordinate tokens are reproduced; verified across LoD0–4 and all modules (the Appearance module is excluded by design).
🗣️ Natural-language queries Model-agnostic LangChain layer: run a local LLM (Ollama) to keep data on-premise, or use Claude for harder spatial/semantic queries.
🌐 3D in the browser Exports OGC 3D Tiles 1.1, including window/door openings cut as holes in walls, viewable in CesiumJS.
🕸️ Graph viewers Self-contained HTML: a 2D force-directed graph and a combined 3D-Tiles-plus-graph view, both with light/dark mode.
📍 Spatial features Neo4j Spatial R-tree index, NEARBY_BUILDING proximity edges, and GDS Louvain community detection.
📊 Benchmarked Compared against 3DCityKG and 3DCityDB v5 on a 6.93 GB Hamburg LoD2 dataset under a fixed resource envelope (see Benchmarks).

Architecture

                       input/citygml/*.gml
                               │
                               ▼
                       ingest_citygml.py
                               │
                               ▼
                    ┌──────────────────────┐
                    │   Neo4j graph store   │
                    │  bolt://localhost:7687│
                    └──────────┬───────────┘
                               │
   ┌───────────────┬───────────┼────────────────┬─────────────────────┐
   ▼               ▼           ▼                 ▼                     ▼
export_citygml  build_3dtiles  visualize_*    export_graph_html   query_ollama_full
   │               │           │                 │                  query_ollama_min
   ▼               ▼           ▼                 ▼                     │
output/citygml  output/3dtiles  output/html    output/html        (NL → Cypher answers)
 (lossless GML) (CesiumJS 3D)  (Three.js / CesiumJS / vis-network)
Graph store Local Neo4j (bolt://localhost:7687) with APOC, Spatial, GenAI, and GDS plugins
Input CityGML 2.0, all LOD levels (LoD0–4), implicit geometry
3D output OGC 3D Tiles 1.1 (GLB + tileset.json), viewable in CesiumJS
Viewers Three.js (from a CSV of IDs) · vis-network (full graph) · CesiumJS (3D + graph overlay)
NL queries LangChain → Cypher, model-agnostic - Ollama (private, on-premise) or Anthropic Claude (ANTHROPIC_API_KEY)

Quick start (Docker)

All services run via Docker Compose from the docker/ directory.

1 · Neo4j only - ingestion and export

cd docker
docker compose up -d neo4j

Neo4j is ready when http://localhost:7474 responds (health-check polls every 15 s). Then ingest a CityGML file:

python ingest_citygml.py input/citygml/yourfile.gml

2 · Neo4j + Ollama - ingestion and natural-language queries

cd docker
docker compose up -d neo4j ollama

# First run only - pull a model (pick one)
docker compose exec ollama ollama pull qwen2.5-coder:14b   # ~9 GB
docker compose exec ollama ollama pull qwen2.5-coder:7b    # ~4.7 GB

# Start an interactive query session
docker compose run --rm llm-query

Reset to a fresh database

cd docker
docker compose down
docker volume rm docker_neo4j_data docker_neo4j_logs
docker compose up -d neo4j

Remove everything (including the downloaded model)

cd docker
docker compose down -v

Requirements

Python dependencies

pip install lxml neo4j pyproj mapbox-earcut numpy
pip install shapely                                                        # optional: clean window/door holes in 3D Tiles
pip install langchain langchain-community langchain-ollama langchain-neo4j  # NL queries (local)
pip install langchain-anthropic                                            # NL queries (Claude)

For the Claude backend, set an API key: export ANTHROPIC_API_KEY=sk-ant-... (PowerShell: $env:ANTHROPIC_API_KEY = "sk-ant-...").

Neo4j plugins

These plugins extend the database. Copy their JARs into Neo4j's plugins/ directory (or use the provided docker/Dockerfile.neo4j, which bundles them automatically).

Plugin Used for Bundled?
neo4j-contrib/spatial R-tree spatial index (spatial.bbox, spatial.intersects)
APOC Batch operations, collection utilities
Neo4j GenAI Vector embeddings for semantic similarity search
Graph Data Science Louvain community detection on the proximity graph — download

The Graph Data Science JAR is not bundled (its license restricts redistribution); download it from the link above into docker/plugins/ before building the Neo4j image if you want Louvain community detection. All plugins are detected at runtime - scripts skip gracefully if a plugin is absent. Set NEO4J_GENAI_KEY=<your-OpenAI-key> before running ingest_citygml.py to populate building embeddings.


Scripts

1 · Ingest CityGML → Neo4j

python ingest_citygml.py path/to/file.gml

Parses a CityGML 2.0 file and writes all supported modules into Neo4j. The script is idempotent - re-running merges into existing nodes via MERGE … SET without creating duplicates. Works for any CityGML dataset; CRS, codes, LOD level, and attribute names are all read from the file at runtime.

Supported CityGML modules: Building, Transportation (Road, Railway, Track, Square), Vegetation (SolitaryVegetationObject, PlantCover), WaterBody, LandUse, CityFurniture, Relief, Bridge, Tunnel, CityObjectGroup.

After writing feature nodes the script also:

  • Registers all features in the Neo4j Spatial R-tree index (layer features)
  • Creates a building_embeddings vector index and encodes Building.description via the GenAI plugin (requires NEO4J_GENAI_KEY)
  • Creates NEARBY_BUILDING edges (100 m radius, R-tree accelerated)
  • Runs GDS Louvain community detection and writes community_id onto each Building node

Output: nodes and relationships in Neo4j + output/<stem>_summary.md

2 · Export Neo4j → CityGML

python export_citygml.py

Reconstructs a CityGML 2.0 file from the graph, covering all ingested modules. Geometry, bounding boxes, creationDate, gen:stringAttribute values (original CamelCase names preserved), externalReference, LOD solids, boundary surfaces, and terrain intersections are all restored.

Output: output/citygml/<stem>_export.gml

3 · Export CityGML → 3D Tiles

python build_3dtiles.py [--dataset <file.gml>] [--out <dir>]

Converts the graph's geometry (ingest first) to a 3D Tiles 1.1 dataset for browser visualisation; omit --dataset when the graph holds a single dataset. The CRS and geoid undulation are derived from the dataset's srsName automatically - no hardcoded EPSG codes. LoD3 window/door openings stored as gml:interior rings are cut as holes in the wall meshes (overlapping/touching openings resolved with shapely when installed) so they are not hidden behind solid walls.

Output:

output/3dtiles/
├── tileset.json     # 3D Tiles root (dataset CRS → ECEF tile transform)
└── tiles/city.glb   # GLB: all feature modules, one mesh primitive per material

To view in a browser: serve the output directory over HTTP (3D Tiles will not load from file://) and point a CesiumJS viewer - e.g. the combined viewer from visualize_graph_3d.py - at tileset.json.

4 · Generate a 3D HTML viewer from a list of IDs

python visualize_buildings_3d.py ids.csv
python visualize_buildings_3d.py ids.csv --radius 100
python visualize_buildings_3d.py ids.csv --output output/html/my_view.html --title "HafenCity selection"
python visualize_buildings_3d.py ids.csv --ref-id DEHHALKA4IB000Cv --radius 100

Reads feature IDs from a CSV, queries Neo4j for LOD2 surface geometry (walls, roof, ground), and writes a self-contained Three.js HTML file that works directly from file:// without a web server.

CSV format - one column named gmlid, id, building_id, or fid (auto-detected). Optional role column (reference / neighbour / highlight) controls colours. Optional label column sets the display name.

Output: output/html/buildings_3d.html (default) or the path given by --output.

Features:

  • Actual LOD2 polygon geometry from Neo4j (walls, roof faces, ground)
  • Orbit / zoom / pan controls (Three.js OrbitControls)
  • Dashed radius circle and connector lines when --radius is set
  • Hover tooltip with feature ID, description, and distance from reference
  • Works for any CityGML dataset - no hardcoded coordinates

4b · Export the full property graph as an interactive HTML viewer

python export_graph_html.py
python export_graph_html.py --skip-geometry
python export_graph_html.py --limit 500
python export_graph_html.py --output output/html/mygraph.html

Queries all nodes and relationships from Neo4j and writes a self-contained vis-network HTML file that works from file:// without a web server. By default every node and edge is exported; use --skip-geometry to hide geometry leaf nodes (GeometryRing, GeometryPolygon, etc.) when the graph is too large to render comfortably.

Flag Default Effect
--output <path> output/html/graph_<timestamp>.html Custom output path
--limit <N> none Cap the number of nodes fetched
--skip-geometry off Exclude geometry leaf nodes

Features:

  • Force-directed layout (vis-network) with physics toggle
  • Light / dark mode toggle (🌙 / ☀️ button)
  • Left panel: label legend with per-label visibility toggle
  • Right panel: full property list for any selected node or edge
  • Search bar - matches node label or any property value and flies the camera to the first hit

4c · Combined 3D Tiles + graph viewer

# First serve the 3D Tiles (built by build_3dtiles.py)
cd output/3dtiles && python serve.py

# In a second terminal, generate the viewer
python visualize_graph_3d.py
python visualize_graph_3d.py --tileset-url http://localhost:8080/tileset.json
python visualize_graph_3d.py --opacity 0.4
python visualize_graph_3d.py --output output/html/graph_3d.html

Queries every node and edge in the Neo4j graph, transforms coordinates into WGS84 (with geoid undulation so nodes align vertically with the 3D tile geometry), and generates a self-contained CesiumJS HTML file.

Flag Default Effect
--tileset-url <url> http://localhost:8080/tileset.json URL of the served tileset
--opacity <0–1> 0.5 Initial 3D Tiles transparency
--output <path> output/html/graph_3d_<timestamp>.html Output path

Node placement

Nodes are positioned in 3D space according to two strategies:

Spatial nodes - nodes that own or reference geometry are placed at the centroid of that geometry.

Node type Position source Altitude
Building / CityObject Precomputed center_x / center_y bbox centroid ground_z + measured_height / 2
BoundarySurface Average of all exterior-ring vertices Ring centroid z
GeometrySolid Average of all surface-member ring vertices Ring centroid z + 6 m
GeometryPolygon Average of exterior-ring vertices Ring centroid z + 3 m
GeometryRing Average of pos_list vertices Ring centroid z − 3 m
GeometryLineString Average of pos_list vertices Vertex centroid z
TerrainIntersection Average of all connected GeometryLineString vertices Vertex centroid z

When multiple node types share the same centroid (e.g. a surface with a single polygon and a single ring), each is shifted by a small deterministic XY jitter (2 m radius, derived from a hash of the node's element ID) so they remain distinguishable.

Thematic / semantic nodes - nodes with no own geometry are anchored to their related spatial nodes.

Node type Anchor Altitude offset
BuildingFunction Average centroid of connected Building nodes Building rooftop + 8 m
RoofType Average centroid of connected Building nodes Building rooftop + 5 m
District Average centroid of all buildings in the district Building rooftops + 80 m
City Average centroid of all buildings in the city Building rooftops + 150 m
Dataset Average centroid of all buildings in the dataset Building rooftops + 220 m

Visual encoding

  • Node size scales with hierarchy: Dataset / City (13–14 px) → Building (14 px) → BoundarySurface (7 px) → GeometryRing (4 px).
  • Node colour is assigned per type; the left-panel legend shows all types and allows per-group visibility toggle.
  • Edges - all relationships in the graph (excluding internal R-tree edges) are drawn as 3D polylines. Toggle with the Edges button.
  • Depth test disabled on nodes - they are always visible and clickable even when occluded by 3D tile meshes.
  • Individual building highlight on hover / click via EXT_mesh_features per-vertex feature IDs embedded in the GLB.

Other features

  • 3D Tiles rendered semi-transparent (opacity slider) so graph elements show through the building models
  • Right panel: full property list for any clicked node or building
  • Node labels, edge labels, and edges each have an independent toggle button
  • Light / dark mode toggle - light mode uses a warm beige basemap (CartoDB Positron, semi-transparent over a sandy globe base); dark mode uses CartoDB Dark Matter
  • Camera positioned at the data extent on load; flies to the tileset once it finishes loading

4d · Scalable graph point-cloud tiles

python build_graph_pointcloud.py
python build_graph_pointcloud.py --building-tileset-url http://localhost:8080/tileset.json
python build_graph_pointcloud.py --max-per-tile 500 --geo-error-scale 20
python build_graph_pointcloud.py --output output/graph_tiles

Exports every node and edge in the Neo4j graph as an OGC 3D Tiles 1.0 point-cloud (pnts) quadtree for browser visualisation that scales to large graphs - where visualize_graph_3d.py embeds the whole graph in one HTML file, this streams tiles on demand. Nodes are placed using the same spatial / aggregate strategy as section 4c and projected to WGS84 (with geoid undulation); the most important nodes (DatasetCityBuilding → … → GeometryRing) stay in coarser tiles so they appear first as you zoom in. Edges are loaded from edges.json and drawn only when the camera is below 2 km altitude.

Flag Default Effect
--output <path> output/graph_tiles Output directory
--building-tileset-url <url> http://localhost:8080/tileset.json Optional building 3D Tiles overlay shown alongside the graph
--max-per-tile <N> 500 Max nodes per quadtree tile before it splits
--max-depth <N> 8 Max quadtree depth
--geo-error-scale <N> 20 geometricError = tile_diagonal_m / scale

Output:

output/graph_tiles/
├── tileset.json     # 3D Tiles quadtree manifest
├── nodes.json       # node properties for the property panel
├── edges.json       # graph edges (drawn when the camera is close)
├── viewer.html      # CesiumJS viewer
├── serve.py         # CORS-aware HTTP server (port 8081)
└── tiles/           # *.pnts tile files

To view in a browser:

cd output/graph_tiles && python serve.py        # graph tiles at :8081
cd output/3dtiles && python serve.py             # buildings at :8080 (optional overlay)
# then open http://localhost:8081/viewer.html

5 · Natural-language queries

python db-llm/query_ollama_full.py
python db-llm/query_ollama_full.py --model qwen2.5-coder:7b

Interactive CLI that translates plain-English questions into Cypher via GraphCypherQAChain. Neo4j must be running. The query layer is model-agnostic:

  • Local (Ollama) - default; runs entirely on-premise, keeping data private. Requires Ollama at http://localhost:11434. Best for routine queries.
  • Commercial (Claude) - swap the LLM object (ChatOllamaChatAnthropic) for the most demanding spatial/semantic reasoning. Requires ANTHROPIC_API_KEY. Note: the schema, question, and returned rows leave the local machine - prefer the local backend for privacy-sensitive data.

Minimal-context variant (small models on a laptop)

python db-llm/query_ollama_min.py --model qwen2.5-coder:7b
python db-llm/query_ollama_min.py --ctx 16384 --summarize

query_ollama_min.py is a lean alternative tuned for small open models on a laptop with no GPU (e.g. Apple Silicon). Instead of injecting all of GRAPH.md + CYPHER.md (~24k tokens) every call, it embeds a hand-distilled ~850-token schema + Cypher cheat-sheet (MINIMAL_CONTEXT) directly in the script, so the context window (--ctx, default 8192) stays small and the KV cache fits in RAM - no swapping, much faster. It makes one LLM call per question (add --summarize for a plain-English answer), and strips <think>…</think> blocks and ``` fences before running the query. Prefer a tool/code model such as qwen2.5-coder:7b; `deepseek-r1` adds `` overhead and weaker current-Cypher accuracy.


Graph model

CityGML 2.0 modules

CityGML 2.0 defines 14 modules. 10 are ingested; the remaining 4 are not stored as graph nodes.

Module Neo4j label(s) Geometry style Supported
Building Building Solid + BoundarySurfaces
Transportation Road Railway Track Square TransportationComplex lod*MultiSurface
Vegetation SolitaryVegetationObject PlantCover lod*Geometry / ImplicitGeometry
WaterBody WaterBody Solid or lod*MultiSurface + BoundarySurfaces
LandUse LandUse lod*MultiSurface
CityFurniture CityFurniture lod*Geometry / ImplicitGeometry
Relief ReliefFeature TIN / MassPoint / Breakline
Bridge Bridge Solid or lod*MultiSurface + BoundarySurfaces
Tunnel Tunnel Solid or lod*MultiSurface + BoundarySurfaces
CityObjectGroup CityObjectGroup lod*Geometry
Appearance - - -
Generics - gen:*Attribute values stored on feature nodes -
TexturedSurface - deprecated in CityGML 2.0 -
Core - abstract base types, not stored directly -

All concrete feature types (Building + all CityObject subtypes) that carry a gml:boundedBy envelope are registered in the Neo4j Spatial R-tree (layer features) after ingestion, enabling spatial.bbox, spatial.intersects, and spatial.withinDistance queries across the entire dataset.

Schema

All non-Building feature nodes carry two labels: the module-specific label (e.g. Bridge) and the shared base label CityObject. Query either way:

MATCH (n:Bridge) ...                          // requires APOC at ingest time
MATCH (n:CityObject {feature_type: 'Bridge'}) // always works

Most nodes also carry an extra Entity label backed by an Entity(id) index. It is internal plumbing: it lets the ingester's generic edge-creation lookups (MATCH (a:Entity {id: …}), where the endpoint label isn't known at write time) hit an index instead of scanning all nodes. Ignore Entity in your own queries; the high-cardinality leaf nodes (GeometryRing, GeometryLineString, TerrainIntersection, Dataset) don't carry it.

── Spatial context ──────────────────────────────────────────────────────────────
(Dataset)                  -[:COVERS]->                         (District)
(District)                 -[:PART_OF]->                        (City)

── Building module ──────────────────────────────────────────────────────────────
(Building)                 -[:PART_OF]->                        (Dataset)
(Building)                 -[:LOCATED_IN]->                     (District)
(Building)                 -[:HAS_FUNCTION]->                   (BuildingFunction)
(Building)                 -[:HAS_ROOF_TYPE]->                  (RoofType)
(Building)                 -[:HAS_LOD_SOLID]->                  (GeometrySolid)
(Building)                 -[:HAS_BOUNDARY {order}]->           (BoundarySurface)
(Building)                 -[:HAS_TERRAIN_INTERSECTION]->       (TerrainIntersection)
(Building)                 -[:NEARBY_BUILDING {distance_m}]->   (Building)

── Transportation module ────────────────────────────────────────────────────────
(Road)                     -[:PART_OF]->                        (Dataset)
(Railway)                  -[:PART_OF]->                        (Dataset)
(Track)                    -[:PART_OF]->                        (Dataset)
(Square)                   -[:PART_OF]->                        (Dataset)
(TransportationComplex)    -[:PART_OF]->                        (Dataset)
(Road|Railway|Track|Square|TransportationComplex) -[:HAS_LOD_SURFACE {lod}]-> (GeometryMultiSurface)

── Vegetation module ────────────────────────────────────────────────────────────
(SolitaryVegetationObject) -[:PART_OF]->                        (Dataset)
(PlantCover)               -[:PART_OF]->                        (Dataset)
(SolitaryVegetationObject|PlantCover) -[:HAS_LOD_SURFACE {lod}]-> (GeometryMultiSurface)

── WaterBody module ─────────────────────────────────────────────────────────────
(WaterBody)                -[:PART_OF]->                        (Dataset)
(WaterBody)                -[:HAS_LOD_SOLID]->                  (GeometrySolid)
(WaterBody)                -[:HAS_BOUNDARY {order}]->           (BoundarySurface)

── LandUse module ───────────────────────────────────────────────────────────────
(LandUse)                  -[:PART_OF]->                        (Dataset)
(LandUse)                  -[:HAS_LOD_SURFACE {lod}]->          (GeometryMultiSurface)

── CityFurniture module ─────────────────────────────────────────────────────────
(CityFurniture)            -[:PART_OF]->                        (Dataset)
(CityFurniture)            -[:HAS_LOD_SURFACE {lod}]->          (GeometryMultiSurface)

── Relief module ────────────────────────────────────────────────────────────────
(ReliefFeature)            -[:PART_OF]->                        (Dataset)
(ReliefFeature)            -[:HAS_LOD_SURFACE {lod}]->          (GeometryMultiSurface)

── Bridge module ────────────────────────────────────────────────────────────────
(Bridge)                   -[:PART_OF]->                        (Dataset)
(Bridge)                   -[:HAS_LOD_SOLID]->                  (GeometrySolid)
(Bridge)                   -[:HAS_BOUNDARY {order}]->           (BoundarySurface)

── Tunnel module ────────────────────────────────────────────────────────────────
(Tunnel)                   -[:PART_OF]->                        (Dataset)
(Tunnel)                   -[:HAS_LOD_SOLID]->                  (GeometrySolid)
(Tunnel)                   -[:HAS_BOUNDARY {order}]->           (BoundarySurface)

── CityObjectGroup module ───────────────────────────────────────────────────────
(CityObjectGroup)          -[:PART_OF]->                        (Dataset)
(CityObjectGroup)          -[:HAS_LOD_SURFACE {lod}]->          (GeometryMultiSurface)

── Geometry (shared by all feature types) ───────────────────────────────────────
(GeometrySolid)            -[:HAS_SURFACE_MEMBER {order}]->     (GeometryPolygon)
(GeometryMultiSurface)     -[:HAS_SURFACE_MEMBER {order}]->     (GeometryPolygon)
(BoundarySurface)          -[:HAS_POLYGON {order}]->            (GeometryPolygon)
(GeometryPolygon)          -[:HAS_EXTERIOR_RING]->              (GeometryRing)
(GeometryPolygon)          -[:HAS_INTERIOR_RING {ring_index}]-> (GeometryRing)
(TerrainIntersection)      -[:HAS_LINE {order}]->               (GeometryLineString)

The graph aims to be lossless: gen:*Attribute values, gml:id values, and coordinate strings are stored verbatim so the source CityGML can be reconstructed. A census gate verifies this (the Appearance module is excluded by design).


Neo4j Browser

Open http://localhost:7474 to explore the graph visually.

// Overview of all node types and counts
MATCH (n) RETURN labels(n)[0] AS label, count(n) AS count ORDER BY count DESC

// All buildings with function and height
MATCH (b:Building)-[:HAS_FUNCTION]->(f:BuildingFunction)
RETURN b.id, b.measured_height, f.name ORDER BY b.measured_height DESC

// All city objects by module in a dataset
MATCH (n:CityObject)-[:PART_OF]->(d:Dataset)
RETURN d.name AS dataset, n.feature_type AS type, count(n) AS count
ORDER BY dataset, count DESC

// Full geometry of one building
MATCH (b:Building {id: "<building-gml-id>"})
      -[:HAS_BOUNDARY]->(s:BoundarySurface)
      -[:HAS_POLYGON]->(p:GeometryPolygon)
      -[:HAS_EXTERIOR_RING]->(r:GeometryRing)
RETURN s.surface_type, p.id, r.pos_list
ORDER BY s.surface_type

// Buildings with flat roofs (ALKIS roofType code 1000)
MATCH (b:Building)-[:HAS_ROOF_TYPE]->(rt:RoofType {code: "1000"})
RETURN b.id, b.measured_height ORDER BY b.measured_height DESC

// Spatial R-tree: all features within a bounding rectangle (dataset CRS coordinates)
CALL spatial.bbox('features',
     {x: <min_x>, y: <min_y>},
     {x: <max_x>, y: <max_y>}) YIELD node
RETURN node.id, labels(node) AS types,
       coalesce(node.feature_type, 'Building') AS feature_type,
       node.measured_height AS height
ORDER BY height DESC

// Vegetation: trees with world-space position (implicit geometry)
MATCH (v:SolitaryVegetationObject)-[:HAS_LOD_SURFACE]->(ms:GeometryMultiSurface)
RETURN v.id, v.species, ms.reference_point AS position, ms.transformation_matrix AS transform

// NEARBY_BUILDING: neighbours of one building within 100 m
MATCH (a:Building {id: "<building-gml-id>"})-[r:NEARBY_BUILDING]-(b:Building)
RETURN b.id, r.distance_m ORDER BY r.distance_m

// GDS community detection results
MATCH (b:Building) WHERE b.community_id IS NOT NULL
RETURN b.community_id AS community, count(b) AS members,
       round(avg(b.measured_height) * 10) / 10 AS avg_height_m
ORDER BY members DESC

// GenAI semantic search (requires NEO4J_GENAI_KEY and populated embeddings)
CALL genai.vector.encode('<query text>', 'OpenAI', {token: $api_key}) YIELD vector AS qv
CALL db.index.vector.queryNodes('building_embeddings', 10, qv) YIELD node AS b, score
RETURN b.id, b.description, score ORDER BY score DESC

Benchmarks

pykci was compared with 3DCityKG (Neo4j) and 3DCityDB v5 (PostGIS) on the 6.93 GB Hamburg LoD2 dataset, under the same 8-core / 20 GB Docker envelope (wall-clock time averaged over 3 runs).

pykci 3DCityKG 3DCityDB v5
Graph size 15.2 M nodes / 21.3 M edges ~5.9× nodes / ~4.4× edges relational (not a graph)
Spatial index R-tree persisted R-tree not persisted at this scale PostGIS spatial index
Mapping time slower than 3DCityDB - fastest of the three

In this run, pykci produced the smallest of the two graphs and retained its R-tree, while 3DCityDB was the fastest to load. These numbers are from a single hardware and dataset configuration; treat them as indicative rather than definitive. Full methodology, raw numbers, and reproduction steps: eval/tool_comparison/comparison_2026-06-19.md, reproduce with run_benchmark.sh.


File layout

pykci/
├── input/
│   └── citygml/          ← place *.gml files here (read-only)
├── output/
│   ├── citygml/          ← exported GML files
│   ├── 3dtiles/          ← tileset.json, tiles/city.glb, serve.py
│   ├── graph_tiles/      ← graph point-cloud tileset + viewer.html, serve.py
│   ├── html/             ← standalone HTML exports (maps, viewers, reports)
│   └── stats/            ← census JSON + round-trip summaries
├── db-llm/               ← LangChain + Ollama query interface
│   ├── query_ollama_full.py  ← full-context NL→Cypher (GraphCypherQAChain)
│   └── query_ollama_min.py   ← minimal-context NL→Cypher for small/laptop models
├── docker/               ← Docker Compose for Neo4j + Ollama + query container
│   └── Dockerfile.ingest ← containerized pykci ingest (used by the benchmark)
├── eval/                 ← QA harness: round-trip fidelity + GML inventory
│   └── tool_comparison/  ← pykci vs 3DCityKG vs 3DCityDB benchmark (report + run_benchmark.sh)
├── resources/            ← project logo and brand assets
├── ingest_citygml.py     ← CityGML → Neo4j
├── export_citygml.py     ← Neo4j → CityGML
├── build_3dtiles.py      ← CityGML → 3D Tiles
├── build_graph_pointcloud.py  ← Neo4j graph → 3D Tiles point-cloud
├── GRAPH.md              ← Full graph schema, Cypher rules, ALKIS codes
├── CYPHER.md             ← Cypher 25 + Neo4j Spatial reference
└── README.md

Reference docs

File Audience Contents
GRAPH.md Agents / developers Full graph schema, ingestion rules, ALKIS codes, Cypher examples
CYPHER.md Agents / LLM (injected) Cypher 25 + Neo4j Spatial plugin reference
eval/tool_comparison/comparison_2026-06-19.md Developers / reviewers Mapping-time & database benchmark: pykci vs 3DCityKG vs 3DCityDB v5. Reproduce with run_benchmark.sh

License

Released under the MIT License. Sample CityGML datasets, CityGML codelists/XSDs, and the bundled Neo4j plugin JARs are the property of their respective authors and remain under their own licenses.


pykci · Python Knowledge Graph for Cities · CityGML → Neo4j → 3D & natural language

About

pykci: Python Knowledge Graph for Cities

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages