diff --git a/docs/plans/2026-02-16-rust-port-analysis.md b/docs/plans/2026-02-16-rust-port-analysis.md
new file mode 100644
index 0000000000..e611de22cc
--- /dev/null
+++ b/docs/plans/2026-02-16-rust-port-analysis.md
@@ -0,0 +1,431 @@
+# Rust Port Analysis: onadata Performance Hotspots
+
+**Date**: 2026-02-16
+**Branch**: `rusty`
+**Status**: Analysis / Proposal
+
+## Executive Summary
+
+onadata is a Django-based ODK (Open Data Kit) data collection platform. After thorough
+analysis, we identified **5 high-impact areas** where porting CPU/memory-intensive Python
+code to Rust (via PyO3 native extensions) would yield significant performance gains.
+
+The biggest wins come from **export generation**, **XML parsing/submission processing**,
+and **data transformation pipelines** -- all of which involve tight loops over large
+datasets with recursive data structures, string manipulation, and format conversion.
+
+---
+
+## Architecture Overview
+
+```
+                    ┌─────────────────────────────────────────────┐
+                    │              Django REST API                 │
+                    └──────────┬──────────────┬───────────────────┘
+                               │              │
+                    ┌──────────▼──┐    ┌──────▼──────────────┐
+                    │  Submission  │    │   Export / Query     │
+                    │  Pipeline    │    │   Pipeline           │
+                    └──────┬──────┘    └──────┬───────────────┘
+                           │                  │
+                    ┌──────▼──────┐    ┌──────▼───────────────┐
+                    │ XML Parse → │    │ DB Query → Transform  │
+                    │ JSON → Save │    │ → Format → File I/O   │
+                    └─────────────┘    └──────────────────────┘
+                           │                  │
+                    ┌──────▼──────────────────▼───────────────┐
+                    │         Celery Async Task Queue          │
+                    └──────────────────────────────────────────┘
+```
+
+---
+
+## 1. EXPORT GENERATION (Priority: CRITICAL)
+
+**Files**: `export_builder.py`, `csv_builder.py`, `export_tools.py`
+**Impact**: Largest single performance bottleneck
+
+### What it does
+
+Converts form submissions (100k+ rows) into XLSX, CSV, SAV (SPSS), KML, GeoJSON formats.
+Each export runs as a Celery task and involves:
+
+1. Querying all submissions from PostgreSQL
+2. Flattening nested repeat groups into tabular format
+3. Processing select-multiple fields (splitting into binary columns)
+4. Type conversion, GPS parsing, label lookups
+5. Writing to output format (openpyxl for XLSX, csv module for CSV, etc.)
+
+### Why it's slow in Python
+
+| Bottleneck | Location | Complexity | Description |
+|---|---|---|---|
+| `dict_to_joined_export()` | export_builder.py:112-180 | O(r * f * d) per row | Recursive dict creation for every submission. Creates intermediate dicts at each nesting level. |
+| `split_select_multiples()` | export_builder.py:746-796 | O(s * c) per row | Dict comprehension per select-multiple field. 50 fields * 100 choices = 5000 dict updates/row. |
+| `pre_process_row()` | export_builder.py:835-909 | O(v) per row | Regex compiled per string value per row. Dynamic value replacement with `re.findall()` on every cell. |
+| CSV column discovery | csv_builder.py:803-818 | O(2N) | Iterates ALL data twice: once to discover repeat columns, once to write. |
+| Nested repeat writes | export_builder.py:1137-1143 | O(r * n * d) | For nested repeats: 100k submissions * 100 repeats * 50 sub-repeats = 500M write operations. |
+
+### Rust opportunity
+
+A Rust export engine exposed via PyO3 could:
+
+- **Stream-process rows** without intermediate dict allocation (zero-copy where possible)
+- **Pre-compile regex** once, reuse across all rows
+- **Flatten nested structures** iteratively with stack-based traversal instead of Python recursion
+- **Write output formats** directly using Rust crates (`calamine`/`rust_xlsxwriter` for XLSX, `csv` crate)
+- **Parallelize section writes** across threads (Python's GIL prevents this)
+
+**Estimated speedup**: 10-50x for large exports (100k+ rows with repeat groups)
+
+### Proposed Rust module: `onadata_export`
+
+```
+onadata_export/
+├── src/
+│   ├── lib.rs              # PyO3 module entry
+│   ├── flatten.rs          # dict_to_joined_export replacement
+│   ├── select_multiples.rs # split_select_multiples replacement
+│   ├── preprocess.rs       # pre_process_row with compiled regex
+│   ├── writers/
+│   │   ├── xlsx.rs         # XLSX writer (rust_xlsxwriter)
+│   │   ├── csv.rs          # CSV writer
+│   │   └── sav.rs          # SAV/SPSS writer
+│   └── schema.rs           # Form schema representation
+└── Cargo.toml
+```
+
+---
+
+## 2. XML PARSING & SUBMISSION PROCESSING (Priority: HIGH)
+
+**Files**: `xform_instance_parser.py`, `instance.py`, `logger_tools.py`
+**Impact**: Every single submission goes through this path
+
+### What it does
+
+When a form submission arrives (XML from a mobile device):
+
+1. Read entire XML into memory
+2. Parse with minidom (full DOM tree, 2-3x memory of raw XML)
+3. Recursively convert DOM to Python dict (`_xml_node_to_dict`)
+4. Flatten nested dict into key-value pairs
+5. Extract geolocation, media references, UUIDs
+6. Convert numeric strings to numbers (recursive traversal)
+7. Compute SHA256 hash
+8. Save JSON representation
+
+### Why it's slow in Python
+
+| Bottleneck | Location | Issue |
+|---|---|---|
+| `clean_and_parse_xml()` | xform_instance_parser.py:174-183 | Regex on full XML + minidom DOM tree (2-3x memory) |
+| `_xml_node_to_dict()` | xform_instance_parser.py:187-240 | Recursive DOM traversal with `xpath_from_xml_node()` called per node (walks parent chain each time) |
+| `_flatten_dict_nest_repeats()` | xform_instance_parser.py:243-273 | Recursive generator with `list(new_prefix)` copy on every iteration |
+| `numeric_converter()` | instance.py:398-414 | Recursive dict traversal with try/except int/float per value |
+| `get_values_matching_key()` | dict_tools.py:8-33 | Full recursive document search for geolocation/media extraction |
+| **XML parsed 6+ times** | Multiple locations | `get_dict()` called separately by `save()`, `_set_geom()`, `get_expected_media()`, `get_full_dict()` |
+
+### Rust opportunity
+
+A Rust XML processor could:
+
+- **Parse XML once** with `quick-xml` (SAX-style, no DOM tree) and extract all needed data in a single pass
+- **Build the flat dict, geolocation, media list, UUID, and numeric conversions** all in one traversal
+- **Avoid recursive Python calls** -- use iterative stack-based traversal
+- **Return a Python dict** via PyO3 with all data ready, eliminating 5 of 6 redundant parses
+- **Compute SHA256** natively (10x+ faster than Python's hashlib for in-process hashing)
+
+**Estimated speedup**: 5-20x per submission (more for large submissions with many repeat groups)
+
+### Proposed Rust module: `onadata_xml`
+
+```
+onadata_xml/
+├── src/
+│   ├── lib.rs           # PyO3 module entry
+│   ├── parser.rs        # Single-pass XML → structured data
+│   ├── flatten.rs       # Iterative flattening (replaces recursive Python)
+│   ├── numeric.rs       # Fast numeric conversion
+│   ├── geom.rs          # Geolocation extraction
+│   └── media.rs         # Media reference extraction
+└── Cargo.toml
+```
+
+---
+
+## 3. DATA AGGREGATION & CHART BUILDING (Priority: MEDIUM-HIGH)
+
+**Files**: `query.py`, `chart_tools.py`, `parsed_instance.py`
+**Impact**: Every chart render and data view query
+
+### What it does
+
+Aggregates submission data for charts/dashboards:
+
+1. Execute raw PostgreSQL queries with JSON operators
+2. Fetch results into Python dicts
+3. Group, sort, and label-map results in Python
+4. Build chart-ready data structures
+
+### Why it's slow in Python
+
+| Bottleneck | Location | Issue |
+|---|---|---|
+| `_flatten_multiple_dict_into_one()` | chart_tools.py:151-170 | **O(N^2)** nested loop: iterates results * unique values to group data |
+| `_use_labels_from_field_name()` | chart_tools.py:173-197 | Double iteration over data (once for labels, once for key rename) |
+| `_use_labels_from_group_by_name()` | chart_tools.py:212-219 | Nested loop: items * sub-items for label replacement |
+| Post-query sorting | chart_tools.py:329-341 | Python re-sort + timezone regex on every row |
+| `_dictfetchall()` | query.py:18-22 | All rows materialized as dicts in memory |
+| `get_field_records()` | query.py:244-247 | Python float conversion instead of SQL CAST |
+| JSON parsing per row | parsed_instance.py:136-142 | `json.loads()` called per row in result iterator |
+
+### Rust opportunity
+
+- **Replace O(N^2) grouping** with HashMap-based O(N) grouping
+- **Batch label lookups** with pre-built HashMap instead of linear scan
+- **Parse JSON in bulk** using `serde_json` (much faster than Python's `json` module)
+- **Handle timezone conversion** with compiled regex + chrono crate
+
+**Estimated speedup**: 3-10x for aggregation queries on large datasets
+
+### Proposed Rust module: `onadata_agg`
+
+```
+onadata_agg/
+├── src/
+│   ├── lib.rs         # PyO3 module entry
+│   ├── grouping.rs    # HashMap-based grouping (replaces O(N²) loop)
+│   ├── labels.rs      # Pre-indexed label lookups
+│   ├── json_parse.rs  # Bulk JSON parsing
+│   └── datetime.rs    # Timezone handling
+└── Cargo.toml
+```
+
+---
+
+## 4. ENCRYPTION / DECRYPTION (Priority: MEDIUM)
+
+**Files**: `libs/kms/tools.py`, `logger/tasks.py`
+**Impact**: Every encrypted submission
+
+### What it does
+
+For encrypted form submissions:
+
+1. Load all encrypted attachments into memory (`BytesIO(file.read())`)
+2. Call external KMS for key material (network-bound)
+3. Decrypt submission XML and media files
+4. Compute SHA256 of decrypted content
+5. Save decrypted attachments individually
+
+### Why it's slow in Python
+
+| Bottleneck | Location | Issue |
+|---|---|---|
+| Attachment loading | tools.py:487-491 | All attachments loaded into memory simultaneously |
+| SHA256 hashing | tools.py:560 | Python hashlib for potentially large files |
+| Per-attachment DB writes | tools.py:570 | Individual `instance.attachments.create()` per file, no `bulk_create()` |
+
+### Rust opportunity
+
+- **Stream-decrypt** attachments without loading all into memory
+- **Native SHA256** via `ring` or `sha2` crate (2-5x faster for large files)
+- **Prepare bulk insert data** for batch DB writes
+- Note: The KMS network call is the dominant bottleneck here and Rust won't help with that
+
+**Estimated speedup**: 2-5x for the crypto/hashing portions (network I/O dominates overall)
+
+### Proposed Rust module: `onadata_crypto`
+
+```
+onadata_crypto/
+├── src/
+│   ├── lib.rs         # PyO3 module entry
+│   ├── decrypt.rs     # Streaming decryption
+│   └── hash.rs        # Fast SHA256
+└── Cargo.toml
+```
+
+---
+
+## 5. BULK CSV IMPORT (Priority: MEDIUM)
+
+**Files**: `csv_import.py`, `entities_utils.py`
+**Impact**: Large CSV uploads (100k+ rows)
+
+### What it does
+
+Imports CSV data as form submissions or entity updates:
+
+1. Count total rows (full file scan)
+2. Parse each row, validate types
+3. Transform flat CSV dict to nested dict
+4. Generate XML submission per row
+5. Process through full submission pipeline
+
+### Why it's slow in Python
+
+| Bottleneck | Location | Issue |
+|---|---|---|
+| Upfront row count | csv_import.py:341 | `sum(1 for row in csv_file)` scans entire file before processing |
+| Per-row dict transformation | csv_import.py:424-432 | 3 nested function calls: `csv_dict_to_nested_dict()`, `flatten_split_select_multiples()`, `dict_merge()` |
+| Per-row XML generation | csv_import.py:462 | `dict2xmlsubmission()` string manipulation per row |
+| Per-row entity persistence | entities_utils.py:355 | Individual `serializer.save()`, no `bulk_create()` |
+
+### Rust opportunity
+
+- **Single-pass CSV parsing** with row count + processing combined (using `csv` crate)
+- **Batch dict-to-XML conversion** with pre-compiled templates
+- **Prepare bulk inserts** instead of per-row saves
+- **Validate types** at parse time using Rust's type system
+
+**Estimated speedup**: 3-8x for CSV parsing and transformation (DB writes still dominate)
+
+---
+
+## Prioritized Implementation Roadmap
+
+### Phase 1: Export Engine (Highest ROI)
+```
+Effort:   ████████░░  (8/10)
+Impact:   ██████████  (10/10)
+Speedup:  10-50x for large exports
+```
+- Replace `ExportBuilder` core with Rust
+- Stream-process rows, write XLSX/CSV directly
+- Eliminate intermediate dict allocations
+- Parallelize section writes across threads
+
+### Phase 2: XML Submission Parser (High ROI)
+```
+Effort:   ██████░░░░  (6/10)
+Impact:   ████████░░  (8/10)
+Speedup:  5-20x per submission
+```
+- Single-pass XML parser replacing 6+ redundant parses
+- Returns complete Python dict with all extracted data
+- Eliminates recursive traversals
+
+### Phase 3: Aggregation Engine (Medium ROI)
+```
+Effort:   ████░░░░░░  (4/10)
+Impact:   ██████░░░░  (6/10)
+Speedup:  3-10x for chart queries
+```
+- HashMap-based grouping replacing O(N^2) loops
+- Bulk JSON parsing
+- Pre-indexed label lookups
+
+### Phase 4: Crypto Helpers (Lower ROI)
+```
+Effort:   ███░░░░░░░  (3/10)
+Impact:   ████░░░░░░  (4/10)
+Speedup:  2-5x for hashing (network I/O dominates)
+```
+- Streaming decryption
+- Native SHA256
+
+### Phase 5: CSV Import Parser (Lower ROI)
+```
+Effort:   ████░░░░░░  (4/10)
+Impact:   ████░░░░░░  (4/10)
+Speedup:  3-8x for parsing (DB writes dominate)
+```
+- Combined count + parse pass
+- Batch transformation
+
+---
+
+## Integration Strategy: PyO3 Native Extensions
+
+### Why PyO3
+
+- Mature Rust-Python bridge with zero-copy where possible
+- Compiles to native `.so`/`.dylib` that imports like any Python module
+- Supports Python dicts, lists, strings natively
+- Can release GIL for true parallelism
+
+### Integration pattern
+
+```python
+# Before (Python)
+from onadata.libs.utils.export_builder import ExportBuilder
+
+builder = ExportBuilder()
+builder.set_survey(survey)
+builder.to_xlsx_export(path, data, username, xform)
+
+# After (Rust via PyO3, drop-in replacement)
+from onadata_export import RustExportBuilder
+
+builder = RustExportBuilder()
+builder.set_survey(survey)  # accepts Python survey object
+builder.to_xlsx_export(path, data, username, xform)
+```
+
+### Build integration
+
+```toml
+# pyproject.toml addition
+[build-system]
+requires = ["maturin>=1.0,<2.0"]
+
+[tool.maturin]
+features = ["pyo3/extension-module"]
+```
+
+### Rollout strategy
+
+1. Feature-flag each Rust module: `USE_RUST_EXPORTS=true`
+2. Run both Python and Rust paths in parallel, compare outputs
+3. Benchmark with production-scale data
+4. Gradually shift traffic to Rust path
+5. Remove Python implementation after validation
+
+---
+
+## Risk Assessment
+
+| Risk | Mitigation |
+|---|---|
+| Rust introduces subtle behavior differences | Parallel execution + output comparison in staging |
+| Build complexity (Rust toolchain in CI/CD) | maturin handles cross-compilation; pre-built wheels |
+| Team unfamiliar with Rust | Start with Phase 1 (export) as learning project; well-defined interface |
+| PyO3 overhead for small operations | Only port hot paths; keep Django/ORM in Python |
+| Maintenance burden of two languages | Clear module boundaries; Rust modules are self-contained |
+
+---
+
+## Estimated Impact Summary
+
+| Component | Current (100k rows) | With Rust | Speedup |
+|---|---|---|---|
+| XLSX Export | ~45-90 min | ~2-5 min | 10-50x |
+| XML Submission Parse | ~15ms/submission | ~1-3ms | 5-20x |
+| Chart Aggregation | ~5-15s | ~1-3s | 3-10x |
+| Decryption (crypto only) | ~200ms/submission | ~50-100ms | 2-5x |
+| CSV Import (parse only) | ~8ms/row | ~1-2ms/row | 3-8x |
+
+**Note**: These are estimates based on typical Python-to-Rust speedups for similar workloads.
+Actual numbers depend on data shape, hardware, and I/O patterns. The DB and network I/O
+portions remain unchanged regardless of language.
+
+---
+
+## Conclusion
+
+Your intuition is correct -- the form processing and export pipelines are the prime
+candidates for Rust porting. The export engine (Phase 1) offers the highest ROI because:
+
+1. It's the most CPU/memory-intensive code path
+2. It processes the largest data volumes
+3. It has well-defined inputs/outputs (easy to wrap with PyO3)
+4. Python's GIL prevents parallelizing section writes
+5. The recursive dict manipulation and regex-per-row patterns are exactly where Rust excels
+
+Phase 2 (XML parsing) is the second priority because it affects every single submission
+and currently parses the same XML 6+ times due to lack of caching between method calls.
+
+The Django ORM, REST API, authentication, and routing should stay in Python -- Rust
+offers no meaningful advantage for I/O-bound web framework code.
diff --git a/docs/plans/2026-02-16-rust-xml-parser-design.md b/docs/plans/2026-02-16-rust-xml-parser-design.md
new file mode 100644
index 0000000000..cda52ef5f8
--- /dev/null
+++ b/docs/plans/2026-02-16-rust-xml-parser-design.md
@@ -0,0 +1,282 @@
+# Design: Rust XML Submission Parser (`onadata_xml`)
+
+**Date**: 2026-02-16
+**Branch**: `rusty`
+**Approach**: C -- Rust parser + Python cached wrapper (drop-in replacement)
+
+## Problem
+
+The `XFormInstanceParser` parses submission XML using Python's minidom (full DOM tree,
+2-3x memory of raw XML). The same XML is parsed 6+ times per submission because
+`get_dict()` is called independently by `save()`, `_set_geom()`, `get_expected_media()`,
+and `get_full_dict()`. Each parse involves recursive DOM traversal, xpath computation
+per node, recursive dict flattening, and recursive numeric conversion.
+
+## Solution
+
+A Rust native extension (`onadata_xml`) that parses XML in a single pass using
+`quick-xml` (SAX-style, no DOM), returning all extracted data at once. A Python wrapper
+class (`RustXFormInstanceParser`) provides the same interface as the existing parser,
+so callers don't change.
+
+## Rust Module: `onadata_xml`
+
+### Crate Structure
+
+```
+rust/onadata_xml/
+├── Cargo.toml
+├── pyproject.toml
+└── src/
+    ├── lib.rs          # PyO3 module entry, parse_submission()
+    ├── parser.rs       # Single-pass XML -> structured data
+    ├── flatten.rs      # Iterative dict flattening (stack-based)
+    ├── numeric.rs      # String -> int/float conversion
+    └── geom.rs         # Geopoint extraction from parsed data
+```
+
+### Core Function
+
+```rust
+#[pyfunction]
+fn parse_submission(
+    xml_str: &str,
+    repeat_xpaths: Vec<String>,
+    encrypted: bool,
+    numeric_fields: HashSet<String>,
+    geo_xpaths: Vec<String>,
+) -> PyResult<SubmissionResult> { ... }
+```
+
+### SubmissionResult Fields
+
+| Field | Type | Replaces |
+|---|---|---|
+| `dict` | `dict` | `_xml_node_to_dict()` output |
+| `flat_dict` | `dict` | `_flatten_dict_nest_repeats()` + `numeric_converter()` |
+| `attributes` | `dict[str, str]` | `_get_all_attributes()` + `_set_attributes()` |
+| `root_node_name` | `str` | `_root_node.nodeName` |
+| `uuid` | `Optional[str]` | `get_uuid_from_xml()` |
+| `deprecated_uuid` | `Optional[str]` | `get_deprecated_uuid_from_xml()` |
+| `submission_date` | `Optional[str]` | `get_submission_date_from_xml()` |
+| `geom_points` | `list[tuple[float, float]]` | `_set_geom()` point extraction |
+| `checksum` | `str` | `sha256(xml).hexdigest()` |
+
+### Rust Crates
+
+- `quick-xml` -- SAX-style streaming parser (no DOM, ~10x faster than minidom)
+- `sha2` -- native SHA256
+- `pyo3` -- Python bindings
+
+### Parser Algorithm
+
+Single pass over XML using `quick-xml::Reader`. Maintains a stack of node names for
+xpath computation. As it encounters relevant nodes, it accumulates:
+
+1. The nested dict structure (handling repeats via the `repeat_xpaths` set)
+2. Attributes (skipping `entity` node attributes)
+3. UUID from `meta/instanceID` or root `instanceID` attribute
+4. Deprecated UUID from `meta/deprecatedID`
+5. Submission date from root `submissionDate` attribute
+6. Text values with numeric conversion applied inline
+
+After the parse pass, a second in-Rust step:
+
+1. Flattens the dict iteratively (stack-based, not recursive)
+2. Extracts geopoints from fields matching `geo_xpaths`
+3. Computes SHA256 checksum
+
+All returned to Python as a single `SubmissionResult` object.
+
+## Python Wrapper: `RustXFormInstanceParser`
+
+Lives in `onadata/apps/logger/xform_instance_parser.py`, same file as the original.
+
+```python
+class RustXFormInstanceParser:
+    def __init__(self, xml_str, data_dictionary):
+        self.data_dicionary = data_dictionary
+        repeat_xpaths = [
+            get_abbreviated_xpath(e.get_xpath())
+            for e in data_dictionary.get_survey_elements_of_type("repeat")
+        ]
+        numeric_fields = get_numeric_fields(data_dictionary)
+        geo_xpaths = data_dictionary.geopoint_xpaths()
+
+        from onadata_xml import parse_submission
+        self._result = parse_submission(
+            xml_str, repeat_xpaths, data_dictionary.encrypted,
+            numeric_fields, geo_xpaths,
+        )
+
+    def to_dict(self):
+        return self._result.dict
+
+    def to_flat_dict(self):
+        return self._result.flat_dict
+
+    def get_root_node(self):
+        return None  # DOM node not available; callers only use root_node_name
+
+    def get_root_node_name(self):
+        return self._result.root_node_name
+
+    def get_attributes(self):
+        return self._result.attributes
+
+    def get_xform_id_string(self):
+        return self._result.attributes["id"]
+
+    def get_version(self):
+        return self._result.attributes.get("version")
+
+    def get_flat_dict_with_attributes(self):
+        result = self.to_flat_dict().copy()
+        result[XFORM_ID_STRING] = self.get_xform_id_string()
+        version = self.get_version()
+        if version:
+            result[VERSION] = version
+        return result
+```
+
+## Integration Points
+
+### `Instance._set_parser()` (instance.py:516)
+
+```python
+def _set_parser(self):
+    if not hasattr(self, "_parser"):
+        if settings.USE_RUST_XML_PARSER:
+            self._parser = RustXFormInstanceParser(self.xml, self.xform)
+        else:
+            self._parser = XFormInstanceParser(self.xml, self.xform)
+```
+
+### `Instance._set_geom()` (instance.py:416)
+
+Reads from cached result instead of re-parsing:
+
+```python
+def _set_geom(self):
+    self._set_parser()
+    if settings.USE_RUST_XML_PARSER and hasattr(self._parser, '_result'):
+        points = [Point(lng, lat) for lat, lng in self._parser._result.geom_points]
+    else:
+        # existing code path
+        ...
+```
+
+### `Instance._set_uuid()` (instance.py:528)
+
+Reads from cached result:
+
+```python
+def _set_uuid(self):
+    if self.xml and not self.uuid:
+        if settings.USE_RUST_XML_PARSER and hasattr(self, '_parser'):
+            uuid = self._parser._result.uuid
+        else:
+            uuid = get_uuid_from_xml(self.xml)
+        if uuid is not None:
+            self.uuid = uuid
+    set_uuid(self)
+```
+
+### `create_instance()` in `logger_tools.py`
+
+The `sha256(xml).hexdigest()` call (line 637) can use `self._parser._result.checksum`
+when the Rust parser is active, avoiding a redundant hash computation.
+
+## Feature Flag & Rollout
+
+### Settings
+
+```python
+# onadata/settings/common.py
+USE_RUST_XML_PARSER = False
+RUST_XML_PARSER_SHADOW_MODE = False
+```
+
+### Shadow Mode
+
+Run both parsers, compare outputs, log differences:
+
+```python
+def _set_parser(self):
+    if not hasattr(self, "_parser"):
+        self._parser = XFormInstanceParser(self.xml, self.xform)
+        if settings.RUST_XML_PARSER_SHADOW_MODE:
+            rust_parser = RustXFormInstanceParser(self.xml, self.xform)
+            _compare_parser_outputs(self._parser, rust_parser, self.pk)
+```
+
+### Rollout Sequence
+
+1. Shadow mode in staging -- validate output parity
+2. Feature flag on in production
+3. Remove Python parser + shadow mode after validation
+
+## Error Handling
+
+| Condition | Exception |
+|---|---|
+| Empty XML / no children | `InstanceEmptyError` |
+| Malformed XML | `InstanceParseError` |
+| No survey element | `ValueError` |
+| Missing `id` attribute | `KeyError` |
+
+Rust module imports and raises existing Python exception classes via PyO3.
+
+## Testing Strategy
+
+### Layer 1: Rust Unit Tests (`cargo test`)
+
+- Simple flat forms
+- Repeat groups (single and nested)
+- CDATA sections
+- Encrypted submissions with `<media>` nodes
+- Entity metadata (entity node attributes skipped)
+- Missing/empty nodes
+- Geopoint extraction (valid, malformed, multiple)
+- Numeric conversion edge cases (int, float, NaN, empty string)
+- UUID extraction from `<meta><instanceID>` and from root attribute
+- SHA256 checksum correctness
+
+### Layer 2: Python Integration Tests
+
+Run existing `XFormInstanceParser` test fixtures against `RustXFormInstanceParser`,
+assert identical outputs for `to_dict()`, `to_flat_dict()`,
+`get_flat_dict_with_attributes()`, `get_root_node_name()`, `get_attributes()`.
+
+### Layer 3: Shadow Mode
+
+Comparison logging in staging against real-world submissions.
+
+## Build & CI
+
+### pyproject.toml (rust/onadata_xml/)
+
+```toml
+[build-system]
+requires = ["maturin>=1.0,<2.0"]
+build-backend = "maturin"
+
+[project]
+name = "onadata-xml"
+requires-python = ">=3.10"
+
+[tool.maturin]
+features = ["pyo3/extension-module"]
+```
+
+### CI Additions
+
+- Install Rust toolchain (`rustup`) in CI
+- `cd rust/onadata_xml && maturin develop` before running Python tests
+- `cargo test` as separate CI step for Rust unit tests
+
+### Dev Workflow
+
+- `maturin develop` builds and installs into active virtualenv
+- Rust code changes require re-running `maturin develop`
+- Python wrapper changes reload normally
diff --git a/docs/plans/2026-02-16-rust-xml-parser-plan.md b/docs/plans/2026-02-16-rust-xml-parser-plan.md
new file mode 100644
index 0000000000..86f0479880
--- /dev/null
+++ b/docs/plans/2026-02-16-rust-xml-parser-plan.md
@@ -0,0 +1,989 @@
+# Rust XML Submission Parser Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Replace Python's minidom-based XML submission parser with a single-pass Rust native extension that eliminates 6+ redundant parses per submission.
+
+**Architecture:** A PyO3 Rust crate (`onadata_xml`) exposes a `parse_submission()` function. A Python wrapper class (`RustXFormInstanceParser`) provides an identical interface to the existing `XFormInstanceParser`. Feature-flagged via `USE_RUST_XML_PARSER` setting.
+
+**Tech Stack:** Rust, PyO3, maturin, quick-xml, sha2
+
+---
+
+### Task 1: Scaffold the Rust Crate
+
+**Files:**
+- Create: `rust/onadata_xml/Cargo.toml`
+- Create: `rust/onadata_xml/pyproject.toml`
+- Create: `rust/onadata_xml/src/lib.rs`
+
+**Step 1: Create directory structure**
+
+Run: `mkdir -p rust/onadata_xml/src`
+
+**Step 2: Create Cargo.toml**
+
+Create `rust/onadata_xml/Cargo.toml`:
+
+```toml
+[package]
+name = "onadata_xml"
+version = "0.1.0"
+edition = "2021"
+
+[lib]
+name = "onadata_xml"
+crate-type = ["cdylib"]
+
+[dependencies]
+pyo3 = { version = "0.23", features = ["extension-module"] }
+quick-xml = "0.37"
+sha2 = "0.10"
+```
+
+**Step 3: Create pyproject.toml**
+
+Create `rust/onadata_xml/pyproject.toml`:
+
+```toml
+[build-system]
+requires = ["maturin>=1.0,<2.0"]
+build-backend = "maturin"
+
+[project]
+name = "onadata-xml"
+requires-python = ">=3.9"
+
+[tool.maturin]
+features = ["pyo3/extension-module"]
+```
+
+**Step 4: Create minimal lib.rs**
+
+Create `rust/onadata_xml/src/lib.rs`:
+
+```rust
+use pyo3::prelude::*;
+
+#[pyfunction]
+fn parse_submission(xml_str: &str) -> PyResult<String> {
+    Ok(format!("received {} bytes", xml_str.len()))
+}
+
+#[pymodule]
+fn onadata_xml(m: &Bound<'_, PyModule>) -> PyResult<()> {
+    m.add_function(wrap_pyfunction!(parse_submission, m)?)?;
+    Ok(())
+}
+```
+
+**Step 5: Build and verify**
+
+Run: `cd rust/onadata_xml && maturin develop`
+Expected: Builds successfully, installs into virtualenv
+
+Run: `python -c "from onadata_xml import parse_submission; print(parse_submission('<test/>'))"`
+Expected: `received 7 bytes`
+
+**Step 6: Commit**
+
+```bash
+git add rust/
+git commit -m "feat: scaffold onadata_xml Rust crate with PyO3 + maturin"
+```
+
+---
+
+### Task 2: Implement the Core XML-to-Dict Parser in Rust
+
+**Files:**
+- Create: `rust/onadata_xml/src/parser.rs`
+- Modify: `rust/onadata_xml/src/lib.rs`
+
+**Step 1: Write Rust unit tests for XML-to-dict conversion**
+
+Add to `rust/onadata_xml/src/parser.rs` the parser module with tests. The parser
+must handle these cases (matching Python's `_xml_node_to_dict` behavior):
+
+- Leaf text nodes → `{"nodeName": "textValue"}`
+- Empty nodes → skipped (None)
+- CDATA sections → `{"parentNodeName": "cdataValue"}`
+- Repeat groups (xpaths in `repeat_xpaths`) → values collected into lists
+- Encrypted forms with `<media>` nodes → treated as repeats
+- Nested repeats → lists of dicts inside lists
+- Duplicate node names not in repeats → aggregated into lists
+
+Test cases from existing fixtures:
+
+```rust
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_simple_flat_form() {
+        let xml = r#"<tutorial id="tutorial"><name>Larry</name><age>23</age></tutorial>"#;
+        let result = xml_to_dict(xml, &[], false);
+        // result["tutorial"]["name"] == "Larry"
+        // result["tutorial"]["age"] == "23"
+    }
+
+    #[test]
+    fn test_repeat_nodes() {
+        // From repeated_nodes.xml fixture
+        let xml = r#"<RW id="R" version="1"><S2A><S2_1>1</S2_1></S2A><S2A><S2_2>2</S2_2></S2A></RW>"#;
+        let result = xml_to_dict(xml, &["S2A"], false);
+        // result["RW"]["S2A"] is a list of 2 dicts
+    }
+
+    #[test]
+    fn test_encrypted_media_nodes() {
+        let xml = r#"<data id="enc" encrypted="yes"><media><file>a.enc</file></media><media><file>b.enc</file></media></data>"#;
+        let result = xml_to_dict(xml, &[], true);
+        // result["data"]["media"] is a list of 2 dicts
+    }
+
+    #[test]
+    fn test_empty_nodes_skipped() {
+        let xml = r#"<form id="f"><note/><val>x</val></form>"#;
+        let result = xml_to_dict(xml, &[], false);
+        // result["form"] has only "val", no "note"
+    }
+}
+```
+
+**Step 2: Run Rust tests to verify they fail**
+
+Run: `cd rust/onadata_xml && cargo test`
+Expected: Compilation errors (functions don't exist yet)
+
+**Step 3: Implement `xml_to_dict` using quick-xml**
+
+In `rust/onadata_xml/src/parser.rs`, implement a stack-based SAX parser using
+`quick-xml::Reader`. The algorithm:
+
+1. Create a stack of `(node_name, HashMap)` entries
+2. On `Event::Start(tag)` → push new frame onto stack, compute xpath from stack
+3. On `Event::Text(text)` → set text value on current frame
+4. On `Event::CData(text)` → set CDATA value on parent frame
+5. On `Event::End(tag)` → pop frame, merge into parent:
+   - If xpath is in `repeat_xpaths` or (encrypted && name == "media") → append to list
+   - Else if key already exists → convert to list and append
+   - Else → insert as dict value
+6. On `Event::Empty(tag)` → skip (empty node, matches Python's `return None`)
+
+The function returns a `PyObject` (Python dict) via PyO3.
+
+Key types:
+
+```rust
+use pyo3::prelude::*;
+use pyo3::types::{PyDict, PyList, PyString};
+use quick_xml::events::Event;
+use quick_xml::Reader;
+use std::collections::HashSet;
+
+pub fn xml_to_dict(
+    py: Python<'_>,
+    xml_str: &str,
+    repeat_xpaths: &HashSet<String>,
+    encrypted: bool,
+) -> PyResult<PyObject> { ... }
+```
+
+**Step 4: Run Rust tests to verify they pass**
+
+Run: `cd rust/onadata_xml && cargo test`
+Expected: All tests pass
+
+**Step 5: Commit**
+
+```bash
+git add rust/onadata_xml/src/parser.rs rust/onadata_xml/src/lib.rs
+git commit -m "feat: implement xml_to_dict parser using quick-xml"
+```
+
+---
+
+### Task 3: Implement Dict Flattening
+
+**Files:**
+- Create: `rust/onadata_xml/src/flatten.rs`
+- Modify: `rust/onadata_xml/src/lib.rs`
+
+**Step 1: Write Rust unit tests for flattening**
+
+Must match Python's `_flatten_dict_nest_repeats` behavior:
+
+```rust
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_flatten_simple() {
+        // {"form": {"info": {"name": "Adam", "age": "80"}}}
+        // → {"info/name": "Adam", "info/age": "80"}
+    }
+
+    #[test]
+    fn test_flatten_repeats() {
+        // {"form": {"kids": [{"name": "Abel"}, {"name": "Cain"}]}}
+        // → {"kids": [{"kids/name": "Abel"}, {"kids/name": "Cain"}]}
+    }
+
+    #[test]
+    fn test_flatten_nested_repeats() {
+        // Nested repeats produce nested lists of flattened dicts
+    }
+}
+```
+
+**Step 2: Run tests to verify they fail**
+
+Run: `cd rust/onadata_xml && cargo test`
+Expected: FAIL
+
+**Step 3: Implement `flatten_dict` using iterative stack-based approach**
+
+In `rust/onadata_xml/src/flatten.rs`:
+
+```rust
+pub fn flatten_dict_nest_repeats(
+    py: Python<'_>,
+    data_dict: &Bound<'_, PyDict>,
+) -> PyResult<PyObject> { ... }
+```
+
+Uses a `Vec` as an explicit stack instead of recursion. Walks the nested dict,
+building xpath keys by joining path components with "/". Lists produce sub-dicts
+with flattened keys (matching Python behavior where repeats become
+`[{"kids/kids_details/kids_name": "Abel"}]`).
+
+**Step 4: Run tests to verify they pass**
+
+Run: `cd rust/onadata_xml && cargo test`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add rust/onadata_xml/src/flatten.rs rust/onadata_xml/src/lib.rs
+git commit -m "feat: implement iterative dict flattening for repeat groups"
+```
+
+---
+
+### Task 4: Implement Numeric Conversion and Geom Extraction
+
+**Files:**
+- Create: `rust/onadata_xml/src/numeric.rs`
+- Create: `rust/onadata_xml/src/geom.rs`
+- Modify: `rust/onadata_xml/src/lib.rs`
+
+**Step 1: Write Rust tests for numeric conversion**
+
+Must match Python's `numeric_checker` (instance.py:152-166):
+- `"42"` → `42` (int)
+- `"3.14"` → `3.14` (float)
+- `"NaN"` → `0` (matches Python: `0 if math.isnan(value) else value`)
+- `"hello"` → `"hello"` (unchanged)
+- `""` → `""` (unchanged)
+- `"-7"` → `-7` (negative int)
+
+```rust
+#[test]
+fn test_numeric_checker() {
+    assert_eq!(numeric_check("42"), Value::Int(42));
+    assert_eq!(numeric_check("3.14"), Value::Float(3.14));
+    assert_eq!(numeric_check("NaN"), Value::Int(0));
+    assert_eq!(numeric_check("hello"), Value::Str("hello".into()));
+}
+```
+
+**Step 2: Write Rust tests for geopoint extraction**
+
+Must match `_set_geom` behavior (instance.py:416-441):
+- Input: flat dict + geo_xpaths list
+- Searches flat dict values for matching keys
+- Splits GPS string `"-1.2627 36.7926 0.0 30.0"` into `(lat, lng)` tuple
+- Returns `Vec<(f64, f64)>`
+
+```rust
+#[test]
+fn test_extract_geopoints() {
+    // flat_dict with "gps" = "-1.2627 36.7926 0.0 30.0"
+    // geo_xpaths = ["gps"]
+    // → [(−1.2627, 36.7926)]
+}
+```
+
+**Step 3: Run tests to verify they fail**
+
+Run: `cd rust/onadata_xml && cargo test`
+Expected: FAIL
+
+**Step 4: Implement numeric conversion**
+
+In `rust/onadata_xml/src/numeric.rs`:
+
+```rust
+use pyo3::prelude::*;
+
+/// Applies numeric conversion inline during dict construction.
+/// Called on leaf text values when the xpath is in numeric_fields.
+pub fn numeric_check(py: Python<'_>, value: &str) -> PyObject {
+    if let Ok(i) = value.parse::<i64>() {
+        return i.into_pyobject(py).unwrap().into_any().unbind();
+    }
+    if let Ok(f) = value.parse::<f64>() {
+        if f.is_nan() {
+            return 0i64.into_pyobject(py).unwrap().into_any().unbind();
+        }
+        return f.into_pyobject(py).unwrap().into_any().unbind();
+    }
+    PyString::new(py, value).into_any().unbind()
+}
+```
+
+**Step 5: Implement geopoint extraction**
+
+In `rust/onadata_xml/src/geom.rs`:
+
+```rust
+use pyo3::prelude::*;
+use std::collections::HashSet;
+
+/// Extracts (lat, lng) tuples from the flat dict for matching geo_xpaths.
+/// Searches recursively through nested dicts/lists (matching get_values_matching_key).
+pub fn extract_geopoints(
+    py: Python<'_>,
+    flat_dict: &Bound<'_, PyDict>,
+    geo_xpaths: &HashSet<String>,
+) -> PyResult<Vec<(f64, f64)>> { ... }
+```
+
+**Step 6: Run tests to verify they pass**
+
+Run: `cd rust/onadata_xml && cargo test`
+Expected: PASS
+
+**Step 7: Commit**
+
+```bash
+git add rust/onadata_xml/src/numeric.rs rust/onadata_xml/src/geom.rs rust/onadata_xml/src/lib.rs
+git commit -m "feat: add numeric conversion and geopoint extraction"
+```
+
+---
+
+### Task 5: Wire Up the Full `parse_submission` Function
+
+**Files:**
+- Modify: `rust/onadata_xml/src/lib.rs`
+
+**Step 1: Write Rust integration test for full parse_submission**
+
+```rust
+#[test]
+fn test_parse_submission_full() {
+    let xml = r#"<?xml version='1.0' ?><tutorial id="tutorial">
+        <name>Larry</name><age>23</age>
+        <gps>-1.2836198 36.8795437 0.0 1044.0</gps>
+        <meta><instanceID>uuid:729f173c688e482486a48661700455ff</instanceID></meta>
+    </tutorial>"#;
+
+    Python::with_gil(|py| {
+        let result = parse_submission(
+            py, xml,
+            vec![],        // repeat_xpaths
+            false,         // encrypted
+            HashSet::new(), // numeric_fields
+            vec!["gps".into()], // geo_xpaths
+        ).unwrap();
+
+        // Verify all fields of SubmissionResult
+        // result.root_node_name == "tutorial"
+        // result.uuid == Some("729f173c688e482486a48661700455ff")
+        // result.geom_points == [(-1.2836198, 36.8795437)]
+        // result.checksum == sha256 of xml
+        // result.attributes["id"] == "tutorial"
+        // result.dict["tutorial"]["name"] == "Larry"
+        // result.flat_dict["name"] == "Larry"
+        // result.flat_dict["age"] == "23" (not in numeric_fields)
+    });
+}
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `cd rust/onadata_xml && cargo test`
+Expected: FAIL
+
+**Step 3: Implement `parse_submission` and `SubmissionResult`**
+
+In `rust/onadata_xml/src/lib.rs`:
+
+```rust
+use pyo3::prelude::*;
+use pyo3::types::PyDict;
+use sha2::{Sha256, Digest};
+use std::collections::HashSet;
+
+mod parser;
+mod flatten;
+mod numeric;
+mod geom;
+
+#[pyclass]
+#[derive(Clone)]
+pub struct SubmissionResult {
+    #[pyo3(get)]
+    pub dict: PyObject,
+    #[pyo3(get)]
+    pub flat_dict: PyObject,
+    #[pyo3(get)]
+    pub attributes: PyObject,
+    #[pyo3(get)]
+    pub root_node_name: String,
+    #[pyo3(get)]
+    pub uuid: Option<String>,
+    #[pyo3(get)]
+    pub deprecated_uuid: Option<String>,
+    #[pyo3(get)]
+    pub submission_date: Option<String>,
+    #[pyo3(get)]
+    pub geom_points: Vec<(f64, f64)>,
+    #[pyo3(get)]
+    pub checksum: String,
+}
+
+#[pyfunction]
+fn parse_submission(
+    py: Python<'_>,
+    xml_str: &str,
+    repeat_xpaths: Vec<String>,
+    encrypted: bool,
+    numeric_fields: HashSet<String>,
+    geo_xpaths: Vec<String>,
+) -> PyResult<SubmissionResult> {
+    // 1. Clean XML (strip whitespace between tags)
+    let clean_xml = clean_xml(xml_str);
+
+    // 2. Parse XML to dict + extract attributes, uuid, etc.
+    let parsed = parser::parse_full(py, &clean_xml, &repeat_xpaths.into_iter().collect(),
+                                     encrypted, &numeric_fields)?;
+
+    // 3. Flatten dict
+    let flat_dict = flatten::flatten_dict_nest_repeats(py, &parsed.dict)?;
+
+    // 4. Extract geopoints from the parsed dict
+    let geo_set: HashSet<String> = geo_xpaths.into_iter().collect();
+    let geom_points = geom::extract_geopoints(py, &parsed.dict, &geo_set)?;
+
+    // 5. Compute SHA256
+    let mut hasher = Sha256::new();
+    hasher.update(xml_str.as_bytes());
+    let checksum = format!("{:x}", hasher.finalize());
+
+    Ok(SubmissionResult {
+        dict: parsed.dict,
+        flat_dict,
+        attributes: parsed.attributes,
+        root_node_name: parsed.root_node_name,
+        uuid: parsed.uuid,
+        deprecated_uuid: parsed.deprecated_uuid,
+        submission_date: parsed.submission_date,
+        geom_points,
+        checksum,
+    })
+}
+
+fn clean_xml(xml_str: &str) -> String {
+    // Equivalent to: re.sub(r">\s+<", "><", xml_string.strip())
+    let trimmed = xml_str.trim();
+    let mut result = String::with_capacity(trimmed.len());
+    let mut after_close = false;
+    let mut whitespace_buf = String::new();
+    for ch in trimmed.chars() {
+        if after_close {
+            if ch.is_whitespace() {
+                whitespace_buf.push(ch);
+                continue;
+            } else if ch == '<' {
+                // drop whitespace between > and <
+                after_close = false;
+                result.push('<');
+                whitespace_buf.clear();
+                continue;
+            } else {
+                // not followed by <, flush whitespace
+                result.push_str(&whitespace_buf);
+                whitespace_buf.clear();
+                after_close = false;
+            }
+        }
+        if ch == '>' {
+            after_close = true;
+            whitespace_buf.clear();
+        }
+        result.push(ch);
+    }
+    result.push_str(&whitespace_buf);
+    result
+}
+
+#[pymodule]
+fn onadata_xml(m: &Bound<'_, PyModule>) -> PyResult<()> {
+    m.add_function(wrap_pyfunction!(parse_submission, m)?)?;
+    m.add_class::<SubmissionResult>()?;
+    Ok(())
+}
+```
+
+**Step 4: Run tests to verify they pass**
+
+Run: `cd rust/onadata_xml && cargo test`
+Expected: PASS
+
+**Step 5: Build and smoke test from Python**
+
+Run: `cd rust/onadata_xml && maturin develop`
+
+Run:
+```python
+python -c "
+from onadata_xml import parse_submission
+r = parse_submission(
+    '<tutorial id=\"tutorial\"><name>Larry</name><age>23</age><meta><instanceID>uuid:abc123</instanceID></meta></tutorial>',
+    [], False, set(), []
+)
+print('root:', r.root_node_name)
+print('uuid:', r.uuid)
+print('dict:', r.dict)
+print('flat:', r.flat_dict)
+print('attrs:', r.attributes)
+print('sha:', r.checksum[:16])
+"
+```
+
+Expected: Prints correct parsed values.
+
+**Step 6: Commit**
+
+```bash
+git add rust/onadata_xml/src/
+git commit -m "feat: wire up full parse_submission with SubmissionResult"
+```
+
+---
+
+### Task 6: Add the Python Wrapper Class
+
+**Files:**
+- Modify: `onadata/apps/logger/xform_instance_parser.py`
+
+**Step 1: Write Python test for RustXFormInstanceParser**
+
+Create test in `onadata/apps/logger/tests/test_rust_parsing.py`:
+
+```python
+"""Tests that RustXFormInstanceParser produces identical output to XFormInstanceParser."""
+import os
+
+from django.test import TestCase, override_settings
+
+from onadata.apps.main.tests.test_base import TestBase
+from onadata.apps.logger.xform_instance_parser import (
+    RustXFormInstanceParser,
+    XFormInstanceParser,
+)
+from onadata.libs.utils.common_tags import XFORM_ID_STRING, VERSION
+
+
+class TestRustXFormInstanceParser(TestBase):
+    """Compare Rust parser output against Python parser for identical inputs."""
+
+    def _publish_and_get_xml(self, fixture_dir, xls_name, xml_name):
+        self._create_user_and_login()
+        xls_path = os.path.join(
+            os.path.dirname(os.path.abspath(__file__)),
+            f"../fixtures/{fixture_dir}/{xls_name}",
+        )
+        self._publish_xls_file_and_set_xform(xls_path)
+        xml_path = os.path.join(
+            os.path.dirname(os.path.abspath(__file__)),
+            f"../fixtures/{fixture_dir}/instances/{xml_name}",
+        )
+        with open(xml_path) as f:
+            return f.read()
+
+    @override_settings(USE_RUST_XML_PARSER=True)
+    def test_nested_repeats_match(self):
+        xml = self._publish_and_get_xml(
+            "new_repeats", "new_repeats.xlsx",
+            "new_repeats_2012-07-05-14-33-53.xml",
+        )
+        py_parser = XFormInstanceParser(xml, self.xform)
+        rust_parser = RustXFormInstanceParser(xml, self.xform)
+
+        self.assertEqual(py_parser.to_dict(), rust_parser.to_dict())
+        self.assertEqual(py_parser.to_flat_dict(), rust_parser.to_flat_dict())
+        self.assertEqual(
+            py_parser.get_flat_dict_with_attributes(),
+            rust_parser.get_flat_dict_with_attributes(),
+        )
+        self.assertEqual(py_parser.get_root_node_name(), rust_parser.get_root_node_name())
+        self.assertEqual(py_parser.get_xform_id_string(), rust_parser.get_xform_id_string())
+
+    @override_settings(USE_RUST_XML_PARSER=True)
+    def test_encrypted_form_match(self):
+        xml = self._publish_and_get_xml(
+            "tutorial_encrypted", "tutorial_encrypted.xlsx",
+            "tutorial_encrypted.xml",
+        )
+        py_parser = XFormInstanceParser(xml, self.xform)
+        rust_parser = RustXFormInstanceParser(xml, self.xform)
+
+        self.assertEqual(py_parser.to_dict(), rust_parser.to_dict())
+        self.assertEqual(py_parser.to_flat_dict(), rust_parser.to_flat_dict())
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `python manage.py test onadata.apps.logger.tests.test_rust_parsing -v2 --settings=onadata.settings.github_actions_test`
+Expected: FAIL (RustXFormInstanceParser doesn't exist yet)
+
+**Step 3: Add RustXFormInstanceParser to xform_instance_parser.py**
+
+Add at the end of `onadata/apps/logger/xform_instance_parser.py`:
+
+```python
+class RustXFormInstanceParser:
+    """Drop-in replacement for XFormInstanceParser using Rust native parser."""
+
+    def __init__(self, xml_str, data_dictionary):
+        self.data_dicionary = data_dictionary
+        repeat_xpaths = [
+            get_abbreviated_xpath(e.get_xpath())
+            for e in data_dictionary.get_survey_elements_of_type("repeat")
+        ]
+
+        from onadata.libs.data.query import get_numeric_fields
+
+        numeric_fields = set(get_numeric_fields(data_dictionary))
+        geo_xpaths = (
+            data_dictionary.geopoint_xpaths()
+            if hasattr(data_dictionary, "geopoint_xpaths")
+            else []
+        )
+
+        from onadata_xml import parse_submission
+
+        self._result = parse_submission(
+            smart_str(xml_str.strip()) if isinstance(xml_str, str) else xml_str,
+            repeat_xpaths,
+            data_dictionary.encrypted,
+            numeric_fields,
+            geo_xpaths,
+        )
+
+    def get_root_node(self):
+        return None
+
+    def get_root_node_name(self):
+        return self._result.root_node_name
+
+    def to_dict(self):
+        return self._result.dict
+
+    def to_flat_dict(self):
+        return self._result.flat_dict
+
+    def get_attributes(self):
+        return self._result.attributes
+
+    def get_xform_id_string(self):
+        return self._result.attributes["id"]
+
+    def get_version(self):
+        return self._result.attributes.get("version")
+
+    def get_flat_dict_with_attributes(self):
+        result = self.to_flat_dict().copy()
+        result[XFORM_ID_STRING] = self.get_xform_id_string()
+        version = self.get_version()
+        if version:
+            result[VERSION] = version
+        return result
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `python manage.py test onadata.apps.logger.tests.test_rust_parsing -v2 --settings=onadata.settings.github_actions_test`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add onadata/apps/logger/xform_instance_parser.py onadata/apps/logger/tests/test_rust_parsing.py
+git commit -m "feat: add RustXFormInstanceParser wrapper class with parity tests"
+```
+
+---
+
+### Task 7: Integrate Feature Flag and Wire Into Instance Model
+
+**Files:**
+- Modify: `onadata/settings/common.py`
+- Modify: `onadata/apps/logger/models/instance.py`
+
+**Step 1: Add feature flag settings**
+
+Add to end of `onadata/settings/common.py`:
+
+```python
+# Rust XML parser feature flags
+USE_RUST_XML_PARSER = False
+RUST_XML_PARSER_SHADOW_MODE = False
+```
+
+**Step 2: Modify Instance._set_parser()**
+
+In `onadata/apps/logger/models/instance.py`, line 516-520, change:
+
+```python
+def _set_parser(self):
+    if not hasattr(self, "_parser"):
+        self._parser = XFormInstanceParser(self.xml, self.xform)
+```
+
+To:
+
+```python
+def _set_parser(self):
+    if not hasattr(self, "_parser"):
+        if getattr(settings, "USE_RUST_XML_PARSER", False):
+            from onadata.apps.logger.xform_instance_parser import (
+                RustXFormInstanceParser,
+            )
+            self._parser = RustXFormInstanceParser(self.xml, self.xform)
+        else:
+            self._parser = XFormInstanceParser(self.xml, self.xform)
+```
+
+**Step 3: Modify Instance._set_geom() to use cached geom_points**
+
+In `onadata/apps/logger/models/instance.py`, line 416-441, change `_set_geom`
+to check for Rust parser result first:
+
+```python
+def _set_geom(self):
+    xform = self.xform
+    self._set_parser()
+
+    if (
+        getattr(settings, "USE_RUST_XML_PARSER", False)
+        and hasattr(self._parser, "_result")
+    ):
+        points = [
+            Point(lng, lat) for lat, lng in self._parser._result.geom_points
+        ]
+    else:
+        geo_xpaths = xform.geopoint_xpaths()
+        doc = self.get_dict()
+        points = []
+        if geo_xpaths:
+            for xpath in geo_xpaths:
+                for gps in get_values_matching_key(doc, xpath):
+                    try:
+                        geometry = [float(s) for s in gps.split()]
+                        lat, lng = geometry[0:2]
+                        points.append(Point(lng, lat))
+                    except ValueError:
+                        return
+
+    if not xform.instances_with_geopoints and points:
+        xform.instances_with_geopoints = True
+        xform.save()
+
+    if points:
+        self.geom = GeometryCollection(points)
+    else:
+        self.geom = None
+```
+
+**Step 4: Modify Instance._set_uuid() to use cached uuid**
+
+In `onadata/apps/logger/models/instance.py`, line 528-536, change:
+
+```python
+def _set_uuid(self):
+    if self.xml and not self.uuid:
+        if (
+            getattr(settings, "USE_RUST_XML_PARSER", False)
+            and hasattr(self, "_parser")
+            and hasattr(self._parser, "_result")
+        ):
+            uuid = self._parser._result.uuid
+        else:
+            uuid = get_uuid_from_xml(self.xml)
+        if uuid is not None:
+            self.uuid = uuid
+    set_uuid(self)
+```
+
+**Step 5: Run existing tests to verify no regression**
+
+Run: `python manage.py test onadata.apps.logger.tests.test_parsing -v2 --settings=onadata.settings.github_actions_test`
+Expected: PASS (feature flag is off, so existing Python path is used)
+
+**Step 6: Run with feature flag on**
+
+Run: `USE_RUST_XML_PARSER=True python manage.py test onadata.apps.logger.tests.test_parsing -v2 --settings=onadata.settings.github_actions_test`
+Expected: PASS
+
+**Step 7: Commit**
+
+```bash
+git add onadata/settings/common.py onadata/apps/logger/models/instance.py
+git commit -m "feat: integrate Rust XML parser with feature flag in Instance model"
+```
+
+---
+
+### Task 8: Add Shadow Mode for Safe Rollout
+
+**Files:**
+- Modify: `onadata/apps/logger/models/instance.py`
+
+**Step 1: Implement shadow mode comparison**
+
+Add a helper function to `onadata/apps/logger/models/instance.py`:
+
+```python
+def _compare_parser_outputs(py_parser, rust_parser, instance_pk):
+    """Log differences between Python and Rust parser outputs."""
+    logger = logging.getLogger("onadata.rust_parser_shadow")
+    try:
+        if py_parser.to_dict() != rust_parser.to_dict():
+            logger.warning("dict mismatch for instance pk=%s", instance_pk)
+        if py_parser.to_flat_dict() != rust_parser.to_flat_dict():
+            logger.warning("flat_dict mismatch for instance pk=%s", instance_pk)
+        if py_parser.get_root_node_name() != rust_parser.get_root_node_name():
+            logger.warning("root_node_name mismatch for instance pk=%s", instance_pk)
+    except Exception:
+        logger.exception("Shadow mode comparison failed for instance pk=%s", instance_pk)
+```
+
+**Step 2: Wire shadow mode into _set_parser()**
+
+Update `_set_parser`:
+
+```python
+def _set_parser(self):
+    if not hasattr(self, "_parser"):
+        if getattr(settings, "USE_RUST_XML_PARSER", False):
+            from onadata.apps.logger.xform_instance_parser import (
+                RustXFormInstanceParser,
+            )
+            self._parser = RustXFormInstanceParser(self.xml, self.xform)
+        else:
+            self._parser = XFormInstanceParser(self.xml, self.xform)
+
+            if getattr(settings, "RUST_XML_PARSER_SHADOW_MODE", False):
+                try:
+                    from onadata.apps.logger.xform_instance_parser import (
+                        RustXFormInstanceParser,
+                    )
+                    rust_parser = RustXFormInstanceParser(self.xml, self.xform)
+                    _compare_parser_outputs(self._parser, rust_parser, self.pk)
+                except Exception:
+                    logger = logging.getLogger("onadata.rust_parser_shadow")
+                    logger.exception("Shadow mode Rust parser failed for pk=%s", self.pk)
+```
+
+**Step 3: Run tests**
+
+Run: `python manage.py test onadata.apps.logger.tests.test_parsing -v2 --settings=onadata.settings.github_actions_test`
+Expected: PASS
+
+**Step 4: Commit**
+
+```bash
+git add onadata/apps/logger/models/instance.py
+git commit -m "feat: add shadow mode for Rust XML parser comparison logging"
+```
+
+---
+
+### Task 9: Final Integration Test and Push
+
+**Files:**
+- Modify: `onadata/apps/logger/tests/test_rust_parsing.py`
+
+**Step 1: Add end-to-end submission test with Rust parser**
+
+Add to `test_rust_parsing.py`:
+
+```python
+@override_settings(USE_RUST_XML_PARSER=True)
+def test_full_submission_with_rust_parser(self):
+    """Test that a full submission round-trip works with the Rust parser."""
+    self._create_user_and_login()
+    xls_path = os.path.join(
+        os.path.dirname(os.path.abspath(__file__)),
+        "../fixtures/tutorial/tutorial.xlsx",
+    )
+    self._publish_xls_file_and_set_xform(xls_path)
+    xml_path = os.path.join(
+        os.path.dirname(os.path.abspath(__file__)),
+        "../fixtures/tutorial/instances/tutorial_2012-06-27_11-27-53_w_uuid.xml",
+    )
+    self._make_submission(xml_path)
+    self.assertEqual(self.response.status_code, 201)
+
+    # Verify instance was saved correctly
+    instance = self.xform.instances.first()
+    self.assertIsNotNone(instance)
+    self.assertEqual(instance.uuid, "729f173c688e482486a48661700455ff")
+
+    # Verify get_dict works
+    data = instance.get_dict()
+    self.assertEqual(data.get("name"), "Larry\n        Again")
+    self.assertEqual(data.get("age"), "23")
+```
+
+**Step 2: Run full test suite**
+
+Run: `python manage.py test onadata.apps.logger -v2 --settings=onadata.settings.github_actions_test --parallel=4`
+Expected: All existing tests PASS
+
+**Step 3: Run with Rust parser enabled**
+
+Run: `USE_RUST_XML_PARSER=True python manage.py test onadata.apps.logger.tests.test_rust_parsing -v2 --settings=onadata.settings.github_actions_test`
+Expected: PASS
+
+**Step 4: Commit and push**
+
+```bash
+git add onadata/apps/logger/tests/test_rust_parsing.py
+git commit -m "test: add end-to-end submission test with Rust XML parser"
+git push origin rusty
+```
+
+---
+
+## Summary of Tasks
+
+| Task | Description | Key Files |
+|------|-------------|-----------|
+| 1 | Scaffold Rust crate | `rust/onadata_xml/` |
+| 2 | Core XML-to-dict parser | `src/parser.rs` |
+| 3 | Dict flattening | `src/flatten.rs` |
+| 4 | Numeric conversion + geom extraction | `src/numeric.rs`, `src/geom.rs` |
+| 5 | Wire up `parse_submission` + `SubmissionResult` | `src/lib.rs` |
+| 6 | Python wrapper class | `xform_instance_parser.py` |
+| 7 | Feature flag + Instance model integration | `instance.py`, `common.py` |
+| 8 | Shadow mode | `instance.py` |
+| 9 | Final integration test + push | `test_rust_parsing.py` |
diff --git a/onadata/apps/logger/models/instance.py b/onadata/apps/logger/models/instance.py
index 7295004e98..c06a16d9b4 100644
--- a/onadata/apps/logger/models/instance.py
+++ b/onadata/apps/logger/models/instance.py
@@ -383,6 +383,24 @@ def convert_to_serializable_date(date):
     return date
 
 
+def _compare_parser_outputs(py_parser, rust_parser, instance_pk):
+    """Log differences between Python and Rust parser outputs for shadow mode."""
+    _logger = logging.getLogger("onadata.rust_parser_shadow")
+    try:
+        if py_parser.to_dict() != rust_parser.to_dict():
+            _logger.warning("dict mismatch for instance pk=%s", instance_pk)
+        if py_parser.to_flat_dict() != rust_parser.to_flat_dict():
+            _logger.warning("flat_dict mismatch for instance pk=%s", instance_pk)
+        if py_parser.get_root_node_name() != rust_parser.get_root_node_name():
+            _logger.warning(
+                "root_node_name mismatch for instance pk=%s", instance_pk
+            )
+    except Exception:
+        _logger.exception(
+            "Shadow mode comparison failed for instance pk=%s", instance_pk
+        )
+
+
 class InstanceBaseClass:
     """Interface of functions for Instance and InstanceHistory model"""
 
@@ -416,29 +434,40 @@ def numeric_converter(self, json_dict, numeric_fields=None):
     def _set_geom(self):
         # pylint: disable=no-member
         xform = self.xform
-        geo_xpaths = xform.geopoint_xpaths()
-        doc = self.get_dict()
-        points = []
-
-        if geo_xpaths:
-            for xpath in geo_xpaths:
-                for gps in get_values_matching_key(doc, xpath):
-                    try:
-                        geometry = [float(s) for s in gps.split()]
-                        lat, lng = geometry[0:2]
-                        points.append(Point(lng, lat))
-                    except ValueError:
-                        return
+        self._set_parser()
 
-            if not xform.instances_with_geopoints and points:
-                xform.instances_with_geopoints = True
-                xform.save()
+        if (
+            getattr(settings, "USE_RUST_XML_PARSER", False)
+            and hasattr(self._parser, "_result")
+        ):
+            points = [
+                Point(lng, lat)
+                for lat, lng in self._parser._result.geom_points
+            ]
+        else:
+            geo_xpaths = xform.geopoint_xpaths()
+            doc = self.get_dict()
+            points = []
+
+            if geo_xpaths:
+                for xpath in geo_xpaths:
+                    for gps in get_values_matching_key(doc, xpath):
+                        try:
+                            geometry = [float(s) for s in gps.split()]
+                            lat, lng = geometry[0:2]
+                            points.append(Point(lng, lat))
+                        except ValueError:
+                            return
+
+        if not xform.instances_with_geopoints and points:
+            xform.instances_with_geopoints = True
+            xform.save()
 
-            # pylint: disable=attribute-defined-outside-init
-            if points:
-                self.geom = GeometryCollection(points)
-            else:
-                self.geom = None
+        # pylint: disable=attribute-defined-outside-init
+        if points:
+            self.geom = GeometryCollection(points)
+        else:
+            self.geom = None
 
     def get_full_dict(self, include_related=True):
         """Returns the submission XML as a python dictionary object
@@ -517,7 +546,34 @@ def _set_parser(self):
         if not hasattr(self, "_parser"):
             # pylint: disable=no-member
             # pylint: disable=attribute-defined-outside-init
-            self._parser = XFormInstanceParser(self.xml, self.xform)
+            if getattr(settings, "USE_RUST_XML_PARSER", False):
+                from onadata.apps.logger.xform_instance_parser import (
+                    RustXFormInstanceParser,
+                )
+
+                self._parser = RustXFormInstanceParser(self.xml, self.xform)
+            else:
+                self._parser = XFormInstanceParser(self.xml, self.xform)
+
+                if getattr(settings, "RUST_XML_PARSER_SHADOW_MODE", False):
+                    try:
+                        from onadata.apps.logger.xform_instance_parser import (
+                            RustXFormInstanceParser,
+                        )
+
+                        rust_parser = RustXFormInstanceParser(
+                            self.xml, self.xform
+                        )
+                        _compare_parser_outputs(
+                            self._parser, rust_parser, self.pk
+                        )
+                    except Exception:
+                        _shadow_logger = logging.getLogger(
+                            "onadata.rust_parser_shadow"
+                        )
+                        _shadow_logger.exception(
+                            "Shadow mode Rust parser failed for pk=%s", self.pk
+                        )
 
     def _set_survey_type(self):
         # pylint: disable=attribute-defined-outside-init
@@ -529,8 +585,15 @@ def _set_uuid(self):
         # pylint: disable=no-member,attribute-defined-outside-init
         # pylint: disable=access-member-before-definition
         if self.xml and not self.uuid:
-            # pylint: disable=no-member
-            uuid = get_uuid_from_xml(self.xml)
+            if (
+                getattr(settings, "USE_RUST_XML_PARSER", False)
+                and hasattr(self, "_parser")
+                and hasattr(self._parser, "_result")
+            ):
+                uuid = self._parser._result.uuid
+            else:
+                # pylint: disable=no-member
+                uuid = get_uuid_from_xml(self.xml)
             if uuid is not None:
                 self.uuid = uuid
         set_uuid(self)
diff --git a/onadata/apps/logger/tests/test_rust_parsing.py b/onadata/apps/logger/tests/test_rust_parsing.py
new file mode 100644
index 0000000000..ebd634f9ef
--- /dev/null
+++ b/onadata/apps/logger/tests/test_rust_parsing.py
@@ -0,0 +1,124 @@
+"""Tests that RustXFormInstanceParser produces identical output to XFormInstanceParser."""
+
+import os
+
+from django.test import override_settings
+
+from onadata.apps.logger.xform_instance_parser import (
+    RustXFormInstanceParser,
+    XFormInstanceParser,
+)
+from onadata.apps.main.tests.test_base import TestBase
+
+
+class TestRustXFormInstanceParser(TestBase):
+    """Compare Rust parser output against Python parser for identical inputs."""
+
+    def _get_fixture_path(self, *parts):
+        return os.path.join(
+            os.path.dirname(os.path.abspath(__file__)),
+            "..",
+            "fixtures",
+            *parts,
+        )
+
+    def _publish_and_get_xml(self, fixture_dir, xls_name, xml_rel_path):
+        self._create_user_and_login()
+        xls_path = self._get_fixture_path(fixture_dir, xls_name)
+        self._publish_xls_file_and_set_xform(xls_path)
+        xml_path = self._get_fixture_path(fixture_dir, xml_rel_path)
+        with open(xml_path) as f:
+            return f.read()
+
+    def _assert_parsers_match(self, xml):
+        """Assert that Python and Rust parsers produce identical output."""
+        py_parser = XFormInstanceParser(xml, self.xform)
+        rust_parser = RustXFormInstanceParser(xml, self.xform)
+
+        self.assertEqual(
+            py_parser.to_dict(),
+            rust_parser.to_dict(),
+            "to_dict() mismatch",
+        )
+        self.assertEqual(
+            py_parser.to_flat_dict(),
+            rust_parser.to_flat_dict(),
+            "to_flat_dict() mismatch",
+        )
+        self.assertEqual(
+            py_parser.get_flat_dict_with_attributes(),
+            rust_parser.get_flat_dict_with_attributes(),
+            "get_flat_dict_with_attributes() mismatch",
+        )
+        self.assertEqual(
+            py_parser.get_root_node_name(),
+            rust_parser.get_root_node_name(),
+            "get_root_node_name() mismatch",
+        )
+        self.assertEqual(
+            py_parser.get_xform_id_string(),
+            rust_parser.get_xform_id_string(),
+            "get_xform_id_string() mismatch",
+        )
+
+    def test_nested_repeats_parity(self):
+        """Test that nested repeats produce identical output."""
+        xml = self._publish_and_get_xml(
+            "new_repeats",
+            "new_repeats.xlsx",
+            os.path.join("instances", "new_repeats_2012-07-05-14-33-53.xml"),
+        )
+        self._assert_parsers_match(xml)
+
+    def test_encrypted_form_parity(self):
+        """Test that encrypted form parsing produces identical output."""
+        xml = self._publish_and_get_xml(
+            "tutorial_encrypted",
+            "tutorial_encrypted.xlsx",
+            os.path.join("instances", "tutorial_encrypted.xml"),
+        )
+        self._assert_parsers_match(xml)
+
+    def test_rust_parser_uuid_extraction(self):
+        """Test UUID extraction from Rust parser."""
+        xml = self._publish_and_get_xml(
+            "new_repeats",
+            "new_repeats.xlsx",
+            os.path.join("instances", "new_repeats_2012-07-05-14-33-53.xml"),
+        )
+        rust_parser = RustXFormInstanceParser(xml, self.xform)
+        # new_repeats fixture doesn't have a UUID in meta
+        # Just verify the attribute is accessible
+        self.assertIsNotNone(rust_parser._result)
+        self.assertEqual(rust_parser.get_root_node_name(), "new_repeats")
+
+    def test_rust_parser_geom_extraction(self):
+        """Test geopoint extraction from Rust parser."""
+        xml = self._publish_and_get_xml(
+            "new_repeats",
+            "new_repeats.xlsx",
+            os.path.join("instances", "new_repeats_2012-07-05-14-33-53.xml"),
+        )
+        rust_parser = RustXFormInstanceParser(xml, self.xform)
+        # The new_repeats form has a gps field
+        geom_points = rust_parser._result.geom_points
+        self.assertIsInstance(geom_points, list)
+
+    @override_settings(USE_RUST_XML_PARSER=True)
+    def test_full_submission_with_rust_parser(self):
+        """Test that a full submission round-trip works with the Rust parser."""
+        self._create_user_and_login()
+        xls_path = self._get_fixture_path("tutorial", "tutorial.xlsx")
+        self._publish_xls_file_and_set_xform(xls_path)
+        xml_path = self._get_fixture_path(
+            "tutorial",
+            "instances",
+            "tutorial_2012-06-27_11-27-53_w_uuid.xml",
+        )
+        self._make_submission(xml_path)
+        self.assertEqual(self.response.status_code, 201)
+
+        # Verify instance was saved correctly
+        instance = self.xform.instances.first()
+        self.assertIsNotNone(instance)
+        self.assertEqual(instance.uuid, "729f173c688e482486a48661700455ff")
diff --git a/onadata/apps/logger/xform_instance_parser.py b/onadata/apps/logger/xform_instance_parser.py
index 4927b82eaa..41e00f6b8a 100644
--- a/onadata/apps/logger/xform_instance_parser.py
+++ b/onadata/apps/logger/xform_instance_parser.py
@@ -463,3 +463,62 @@ def get_entity_uuid_from_xml(xml):
     """Returns the uuid for the XML submission's entity"""
     entity_node = get_meta_from_xml(xml, "entity")
     return entity_node.getAttribute("id")
+
+
+class RustXFormInstanceParser:
+    """Drop-in replacement for XFormInstanceParser using Rust native parser."""
+
+    def __init__(self, xml_str, data_dictionary):
+        self.data_dicionary = data_dictionary
+        repeat_xpaths = [
+            get_abbreviated_xpath(e.get_xpath())
+            for e in data_dictionary.get_survey_elements_of_type("repeat")
+        ]
+
+        from onadata.libs.data.query import get_numeric_fields
+
+        numeric_fields = set(get_numeric_fields(data_dictionary))
+        geo_xpaths = (
+            data_dictionary.geopoint_xpaths()
+            if hasattr(data_dictionary, "geopoint_xpaths")
+            else []
+        )
+
+        from onadata_xml import parse_submission
+
+        self._result = parse_submission(
+            smart_str(xml_str.strip()) if isinstance(xml_str, str) else xml_str,
+            repeat_xpaths,
+            data_dictionary.encrypted,
+            numeric_fields,
+            geo_xpaths,
+        )
+
+    def get_root_node(self):
+        return None
+
+    def get_root_node_name(self):
+        return self._result.root_node_name
+
+    def to_dict(self):
+        return self._result.dict
+
+    def to_flat_dict(self):
+        return self._result.flat_dict
+
+    def get_attributes(self):
+        return self._result.attributes
+
+    def get_xform_id_string(self):
+        return self._result.attributes["id"]
+
+    def get_version(self):
+        return self._result.attributes.get("version")
+
+    def get_flat_dict_with_attributes(self):
+        result = self.to_flat_dict().copy()
+        result[XFORM_ID_STRING] = self.get_xform_id_string()
+        version = self.get_version()
+        if version:
+            result[VERSION] = version
+        return result
diff --git a/onadata/settings/common.py b/onadata/settings/common.py
index 18906bea76..545a612d55 100644
--- a/onadata/settings/common.py
+++ b/onadata/settings/common.py
@@ -722,3 +722,7 @@ def configure_logging(logger, **kwargs):
 CSP_INCLUDE_NONCE_IN = ["script-src", "style-src"]
 
 ENABLE_TABLE_PARTITIONING = False
+
+# Rust XML parser feature flags
+USE_RUST_XML_PARSER = False
+RUST_XML_PARSER_SHADOW_MODE = False
diff --git a/rust/onadata_xml/.gitignore b/rust/onadata_xml/.gitignore
new file mode 100644
index 0000000000..ea8c4bf7f3
--- /dev/null
+++ b/rust/onadata_xml/.gitignore
@@ -0,0 +1 @@
+/target
diff --git a/rust/onadata_xml/Cargo.lock b/rust/onadata_xml/Cargo.lock
new file mode 100644
index 0000000000..f09faf281d
--- /dev/null
+++ b/rust/onadata_xml/Cargo.lock
@@ -0,0 +1,268 @@
+# This file is automatically @generated by Cargo.
+# It is not intended for manual editing.
+version = 4
+
+[[package]]
+name = "autocfg"
+version = "1.5.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8"
+
+[[package]]
+name = "block-buffer"
+version = "0.10.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3078c7629b62d3f0439517fa394996acacc5cbc91c5a20d8c658e77abd503a71"
+dependencies = [
+ "generic-array",
+]
+
+[[package]]
+name = "cfg-if"
+version = "1.0.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"
+
+[[package]]
+name = "cpufeatures"
+version = "0.2.17"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "59ed5838eebb26a2bb2e58f6d5b5316989ae9d08bab10e0e6d103e656d1b0280"
+dependencies = [
+ "libc",
+]
+
+[[package]]
+name = "crypto-common"
+version = "0.1.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "78c8292055d1c1df0cce5d180393dc8cce0abec0a7102adb6c7b1eef6016d60a"
+dependencies = [
+ "generic-array",
+ "typenum",
+]
+
+[[package]]
+name = "digest"
+version = "0.10.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9ed9a281f7bc9b7576e61468ba615a66a5c8cfdff42420a70aa82701a3b1e292"
+dependencies = [
+ "block-buffer",
+ "crypto-common",
+]
+
+[[package]]
+name = "generic-array"
+version = "0.14.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "85649ca51fd72272d7821adaf274ad91c288277713d9c18820d8499a7ff69e9a"
+dependencies = [
+ "typenum",
+ "version_check",
+]
+
+[[package]]
+name = "heck"
+version = "0.5.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "2304e00983f87ffb38b55b444b5e3b60a884b5d30c0fca7d82fe33449bbe55ea"
+
+[[package]]
+name = "indoc"
+version = "2.0.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "79cf5c93f93228cf8efb3ba362535fb11199ac548a09ce117c9b1adc3030d706"
+dependencies = [
+ "rustversion",
+]
+
+[[package]]
+name = "libc"
+version = "0.2.182"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6800badb6cb2082ffd7b6a67e6125bb39f18782f793520caee8cb8846be06112"
+
+[[package]]
+name = "memchr"
+version = "2.8.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79"
+
+[[package]]
+name = "memoffset"
+version = "0.9.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "488016bfae457b036d996092f6cb448677611ce4449e970ceaf42695203f218a"
+dependencies = [
+ "autocfg",
+]
+
+[[package]]
+name = "onadata_xml"
+version = "0.1.0"
+dependencies = [
+ "pyo3",
+ "quick-xml",
+ "sha2",
+]
+
+[[package]]
+name = "once_cell"
+version = "1.21.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "42f5e15c9953c5e4ccceeb2e7382a716482c34515315f7b03532b8b4e8393d2d"
+
+[[package]]
+name = "portable-atomic"
+version = "1.13.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "c33a9471896f1c69cecef8d20cbe2f7accd12527ce60845ff44c153bb2a21b49"
+
+[[package]]
+name = "proc-macro2"
+version = "1.0.106"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8fd00f0bb2e90d81d1044c2b32617f68fcb9fa3bb7640c23e9c748e53fb30934"
+dependencies = [
+ "unicode-ident",
+]
+
+[[package]]
+name = "pyo3"
+version = "0.23.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7778bffd85cf38175ac1f545509665d0b9b92a198ca7941f131f85f7a4f9a872"
+dependencies = [
+ "cfg-if",
+ "indoc",
+ "libc",
+ "memoffset",
+ "once_cell",
+ "portable-atomic",
+ "pyo3-build-config",
+ "pyo3-ffi",
+ "pyo3-macros",
+ "unindent",
+]
+
+[[package]]
+name = "pyo3-build-config"
+version = "0.23.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "94f6cbe86ef3bf18998d9df6e0f3fc1050a8c5efa409bf712e661a4366e010fb"
+dependencies = [
+ "once_cell",
+ "target-lexicon",
+]
+
+[[package]]
+name = "pyo3-ffi"
+version = "0.23.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e9f1b4c431c0bb1c8fb0a338709859eed0d030ff6daa34368d3b152a63dfdd8d"
+dependencies = [
+ "libc",
+ "pyo3-build-config",
+]
+
+[[package]]
+name = "pyo3-macros"
+version = "0.23.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "fbc2201328f63c4710f68abdf653c89d8dbc2858b88c5d88b0ff38a75288a9da"
+dependencies = [
+ "proc-macro2",
+ "pyo3-macros-backend",
+ "quote",
+ "syn",
+]
+
+[[package]]
+name = "pyo3-macros-backend"
+version = "0.23.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "fca6726ad0f3da9c9de093d6f116a93c1a38e417ed73bf138472cf4064f72028"
+dependencies = [
+ "heck",
+ "proc-macro2",
+ "pyo3-build-config",
+ "quote",
+ "syn",
+]
+
+[[package]]
+name = "quick-xml"
+version = "0.37.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "331e97a1af0bf59823e6eadffe373d7b27f485be8748f71471c662c1f269b7fb"
+dependencies = [
+ "memchr",
+]
+
+[[package]]
+name = "quote"
+version = "1.0.44"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "21b2ebcf727b7760c461f091f9f0f539b77b8e87f2fd88131e7f1b433b3cece4"
+dependencies = [
+ "proc-macro2",
+]
+
+[[package]]
+name = "rustversion"
+version = "1.0.22"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "b39cdef0fa800fc44525c84ccb54a029961a8215f9619753635a9c0d2538d46d"
+
+[[package]]
+name = "sha2"
+version = "0.10.9"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a7507d819769d01a365ab707794a4084392c824f54a7a6a7862f8c3d0892b283"
+dependencies = [
+ "cfg-if",
+ "cpufeatures",
+ "digest",
+]
+
+[[package]]
+name = "syn"
+version = "2.0.116"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3df424c70518695237746f84cede799c9c58fcb37450d7b23716568cc8bc69cb"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "unicode-ident",
+]
+
+[[package]]
+name = "target-lexicon"
+version = "0.12.16"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "61c41af27dd6d1e27b1b16b489db798443478cef1f06a660c96db617ba5de3b1"
+
+[[package]]
+name = "typenum"
+version = "1.19.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "562d481066bde0658276a35467c4af00bdc6ee726305698a55b86e61d7ad82bb"
+
+[[package]]
+name = "unicode-ident"
+version = "1.0.24"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75"
+
+[[package]]
+name = "unindent"
+version = "0.2.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7264e107f553ccae879d21fbea1d6724ac785e8c3bfc762137959b5802826ef3"
+
+[[package]]
+name = "version_check"
+version = "0.9.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
diff --git a/rust/onadata_xml/Cargo.toml b/rust/onadata_xml/Cargo.toml
new file mode 100644
index 0000000000..b50272e694
--- /dev/null
+++ b/rust/onadata_xml/Cargo.toml
@@ -0,0 +1,13 @@
+[package]
+name = "onadata_xml"
+version = "0.1.0"
+edition = "2021"
+
+[lib]
+name = "onadata_xml"
+crate-type = ["cdylib"]
+
+[dependencies]
+pyo3 = { version = "0.23", features = ["extension-module"] }
+quick-xml = "0.37"
+sha2 = "0.10"
diff --git a/rust/onadata_xml/pyproject.toml b/rust/onadata_xml/pyproject.toml
new file mode 100644
index 0000000000..f349263cc3
--- /dev/null
+++ b/rust/onadata_xml/pyproject.toml
@@ -0,0 +1,11 @@
+[build-system]
+requires = ["maturin>=1.0,<2.0"]
+build-backend = "maturin"
+
+[project]
+name = "onadata-xml"
+version = "0.1.0"
+requires-python = ">=3.9"
+
+[tool.maturin]
+features = ["pyo3/extension-module"]
diff --git a/rust/onadata_xml/src/flatten.rs b/rust/onadata_xml/src/flatten.rs
new file mode 100644
index 0000000000..af7cda2d69
--- /dev/null
+++ b/rust/onadata_xml/src/flatten.rs
@@ -0,0 +1,278 @@
+/// Dict flattening module.
+///
+/// Replicates Python's `_flatten_dict_nest_repeats` which:
+/// - For regular values: yields (path, value) where path is list of keys
+/// - For dicts: recurses deeper
+/// - For lists (repeat groups): creates a list of flattened sub-dicts,
+///   each with full xpath keys joined by "/", stripped of root node prefix
+/// - Final flat_dict is built by: {"/".join(path[1:]): value}
+use crate::parser::Value;
+
+/// A flattened entry: (path_segments, value).
+/// The value can be a simple Value::Str or a Value::List of flattened dicts.
+type FlatEntry = (Vec<String>, Value);
+
+/// Flatten a nested dict with repeat nesting.
+///
+/// Replicates `_flatten_dict_nest_repeats(data_dict, prefix)`.
+///
+/// `data_dict` must be a Value::Dict (list of (key, value) pairs).
+/// `prefix` is the current path prefix.
+fn flatten_dict_nest_repeats_inner(data_dict: &[(String, Value)], prefix: &[String]) -> Vec<FlatEntry> {
+    let mut entries = Vec::new();
+
+    for (key, value) in data_dict {
+        let mut new_prefix = prefix.to_vec();
+        new_prefix.push(key.clone());
+
+        match value {
+            Value::Dict(inner_pairs) => {
+                // Recurse into dict
+                let sub = flatten_dict_nest_repeats_inner(inner_pairs, &new_prefix);
+                entries.extend(sub);
+            }
+            Value::List(items) => {
+                // Create a list of flattened sub-dicts
+                let mut repeats: Vec<Value> = Vec::new();
+
+                for item in items {
+                    let item_prefix = new_prefix.clone();
+
+                    match item {
+                        Value::Dict(item_pairs) => {
+                            // Flatten each dict item into a flat dict
+                            let sub_entries =
+                                flatten_dict_nest_repeats_inner(item_pairs, &item_prefix);
+                            let mut repeat_dict: Vec<(String, Value)> = Vec::new();
+
+                            for (path, r_value) in sub_entries {
+                                // Join path[1:] with "/"
+                                let flat_key = path[1..].join("/");
+                                repeat_dict.push((flat_key, r_value));
+                            }
+                            repeats.push(Value::Dict(repeat_dict));
+                        }
+                        _ => {
+                            // Non-dict item in list (e.g. a string)
+                            let flat_key = item_prefix[1..].join("/");
+                            let mut repeat_dict: Vec<(String, Value)> = Vec::new();
+                            repeat_dict.push((flat_key, item.clone()));
+                            repeats.push(Value::Dict(repeat_dict));
+                        }
+                    }
+                }
+
+                entries.push((new_prefix, Value::List(repeats)));
+            }
+            Value::Str(_) => {
+                entries.push((new_prefix, value.clone()));
+            }
+        }
+    }
+
+    entries
+}
+
+/// Flatten a parsed XML dict into a flat dict.
+///
+/// Takes the top-level dict (e.g. {"tutorial": {...}}) and returns
+/// a flat dict where keys are xpath segments joined by "/", with the
+/// root node name stripped.
+///
+/// This matches the Python code:
+/// ```python
+/// for path, value in _flatten_dict_nest_repeats(self._dict, []):
+///     self._flat_dict["/".join(path[1:])] = value
+/// ```
+pub fn flatten_dict(dict: &Value) -> Vec<(String, Value)> {
+    match dict {
+        Value::Dict(pairs) => {
+            let entries = flatten_dict_nest_repeats_inner(pairs, &[]);
+            let mut flat = Vec::new();
+            for (path, value) in entries {
+                let key = path[1..].join("/");
+                flat.push((key, value));
+            }
+            flat
+        }
+        _ => Vec::new(),
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::parser::{parse_xml, Value};
+
+    fn find_flat<'a>(flat: &'a [(String, Value)], key: &str) -> Option<&'a Value> {
+        flat.iter().find(|(k, _)| k == key).map(|(_, v)| v)
+    }
+
+    #[test]
+    fn test_simple_form_flatten() {
+        let xml = r#"<?xml version='1.0' ?><tutorial id="tutorial">
+  <name>Larry
+        Again
+  </name>
+  <age>23</age>
+  <picture>1333604907194.jpg</picture>
+  <has_children>0</has_children>
+  <gps>-1.2836198 36.8795437 0.0 1044.0</gps>
+  <web_browsers>firefox chrome safari</web_browsers>
+  <meta>
+    <instanceID>uuid:729f173c688e482486a48661700455ff</instanceID>
+  </meta>
+</tutorial>"#;
+
+        let result = parse_xml(xml, &[], false).unwrap();
+        let dict = result.dict.unwrap();
+        let flat = flatten_dict(&dict);
+
+        assert_eq!(
+            find_flat(&flat, "name"),
+            Some(&Value::Str("Larry\n        Again\n  ".to_string()))
+        );
+        assert_eq!(
+            find_flat(&flat, "age"),
+            Some(&Value::Str("23".to_string()))
+        );
+        assert_eq!(
+            find_flat(&flat, "picture"),
+            Some(&Value::Str("1333604907194.jpg".to_string()))
+        );
+        assert_eq!(
+            find_flat(&flat, "has_children"),
+            Some(&Value::Str("0".to_string()))
+        );
+        assert_eq!(
+            find_flat(&flat, "gps"),
+            Some(&Value::Str(
+                "-1.2836198 36.8795437 0.0 1044.0".to_string()
+            ))
+        );
+        assert_eq!(
+            find_flat(&flat, "web_browsers"),
+            Some(&Value::Str("firefox chrome safari".to_string()))
+        );
+        assert_eq!(
+            find_flat(&flat, "meta/instanceID"),
+            Some(&Value::Str(
+                "uuid:729f173c688e482486a48661700455ff".to_string()
+            ))
+        );
+    }
+
+    #[test]
+    fn test_nested_repeats_flatten() {
+        let xml = r#"<new_repeats id="new_repeats">
+  <info><age>80</age><name>Adam</name></info>
+  <kids><kids_details><kids_age>50</kids_age><kids_name>Abel</kids_name></kids_details><has_kids>1</has_kids></kids>
+  <web_browsers>chrome ie</web_browsers>
+  <gps>-1.2627557 36.7926442 0.0 30.0</gps>
+</new_repeats>"#;
+
+        let repeats = vec!["kids/kids_details".to_string()];
+        let result = parse_xml(xml, &repeats, false).unwrap();
+        let dict = result.dict.unwrap();
+        let flat = flatten_dict(&dict);
+
+        assert_eq!(
+            find_flat(&flat, "gps"),
+            Some(&Value::Str(
+                "-1.2627557 36.7926442 0.0 30.0".to_string()
+            ))
+        );
+        assert_eq!(
+            find_flat(&flat, "kids/has_kids"),
+            Some(&Value::Str("1".to_string()))
+        );
+        assert_eq!(
+            find_flat(&flat, "info/age"),
+            Some(&Value::Str("80".to_string()))
+        );
+        assert_eq!(
+            find_flat(&flat, "info/name"),
+            Some(&Value::Str("Adam".to_string()))
+        );
+        assert_eq!(
+            find_flat(&flat, "web_browsers"),
+            Some(&Value::Str("chrome ie".to_string()))
+        );
+
+        // kids/kids_details should be a list of flattened dicts
+        let kids_details = find_flat(&flat, "kids/kids_details").unwrap();
+        match kids_details {
+            Value::List(list) => {
+                assert_eq!(list.len(), 1);
+                match &list[0] {
+                    Value::Dict(d) => {
+                        // Check for kids/kids_details/kids_age and kids/kids_details/kids_name
+                        assert!(d.iter().any(|(k, v)| k == "kids/kids_details/kids_age"
+                            && *v == Value::Str("50".to_string())));
+                        assert!(d.iter().any(|(k, v)| k == "kids/kids_details/kids_name"
+                            && *v == Value::Str("Abel".to_string())));
+                    }
+                    _ => panic!("Expected Dict in list"),
+                }
+            }
+            _ => panic!("Expected List for kids/kids_details"),
+        }
+    }
+
+    #[test]
+    fn test_encrypted_media_flatten() {
+        let xml = r#"<data id="tutorial_encrypted" version="201701031234" encrypted="yes" xmlns="http://www.opendatakit.org/xforms/encrypted"><base64EncryptedKey>ZJTc</base64EncryptedKey><orx:meta xmlns:orx="http://openrosa.org/xforms"><orx:instanceID>uuid:f8971231-f3b8-4b2b-8c35-d95fa207d937</orx:instanceID></orx:meta>
+<media><file>1483528430996.jpg.enc</file></media>
+<media><file>1483528445767.jpg.enc</file></media>
+<encryptedXmlFile>submission.xml.enc</encryptedXmlFile><base64EncryptedElementSignature>UUR8</base64EncryptedElementSignature></data>"#;
+
+        let result = parse_xml(xml, &[], true).unwrap();
+        let dict = result.dict.unwrap();
+        let flat = flatten_dict(&dict);
+
+        let media = find_flat(&flat, "media").unwrap();
+        match media {
+            Value::List(list) => {
+                assert_eq!(list.len(), 2);
+                match &list[0] {
+                    Value::Dict(d) => {
+                        assert!(d.iter().any(|(k, v)| k == "media/file"
+                            && *v == Value::Str("1483528430996.jpg.enc".to_string())));
+                    }
+                    _ => panic!("Expected Dict"),
+                }
+                match &list[1] {
+                    Value::Dict(d) => {
+                        assert!(d.iter().any(|(k, v)| k == "media/file"
+                            && *v == Value::Str("1483528445767.jpg.enc".to_string())));
+                    }
+                    _ => panic!("Expected Dict"),
+                }
+            }
+            _ => panic!("Expected List for media"),
+        }
+    }
+
+    #[test]
+    fn test_auto_repeated_flatten() {
+        // S2A repeated 3 times without being in repeat_xpaths
+        let xml = r#"<RW_OUNIS_2016 id="ROUNIS2" version="201608211141">
+<S2A><S2A_note/><S2_1_3_2_2>1</S2_1_3_2_2><S2_1_3_2_3>1.25</S2_1_3_2_3></S2A>
+<S2A><S2A_note/><S2_1_3_3_2>1</S2_1_3_3_2><S2_1_3_3_3>1.25</S2_1_3_3_3></S2A>
+<S2A><S2A_note/><S2_1_3_5_2>1</S2_1_3_5_2><S2_1_3_5_3><S3B><S3_1_3_4>2</S3_1_3_4><S3_1_3_4>test</S3_1_3_4></S3B><S3B><S3_1_3_5>8</S3_1_3_5><S3_1_3_6>test2</S3_1_3_6></S3B><S3B><S3_1_3_7>5</S3_1_3_7><S3_1_3_8>test</S3_1_3_8></S3B></S2_1_3_5_3></S2A>
+</RW_OUNIS_2016>"#;
+
+        let result = parse_xml(xml, &[], false).unwrap();
+        let dict = result.dict.unwrap();
+        let flat = flatten_dict(&dict);
+
+        // S2A should be a list in the flat dict
+        let s2a = find_flat(&flat, "S2A").unwrap();
+        match s2a {
+            Value::List(list) => {
+                assert_eq!(list.len(), 3);
+            }
+            _ => panic!("Expected List for S2A, got {:?}", s2a),
+        }
+    }
+}
diff --git a/rust/onadata_xml/src/geom.rs b/rust/onadata_xml/src/geom.rs
new file mode 100644
index 0000000000..bdd7c0f96c
--- /dev/null
+++ b/rust/onadata_xml/src/geom.rs
@@ -0,0 +1,200 @@
+/// Geopoint extraction module.
+///
+/// Replicates Python's `_set_geom` which:
+/// 1. For each xpath in geo_xpaths, searches the NESTED dict recursively
+///    for matching keys (using `get_values_matching_key` from dict_tools.py)
+/// 2. Splits GPS string by whitespace, takes first 2 as (lat, lng) floats
+///
+/// The search function `get_values_matching_key` does a recursive traversal
+/// of the entire dict structure, including into lists.
+use crate::parser::Value;
+
+/// Recursively search a Value tree for all values with a matching key.
+///
+/// Replicates Python's `get_values_matching_key(doc, key)` from dict_tools.py.
+///
+/// The Python code:
+/// - If key in doc: yield doc[key]
+/// - For each (k, v) in doc.items():
+///   - If v is dict: recurse
+///   - If v is list: for each item, if dict/list: recurse; elif item == key: yield item
+fn get_values_matching_key<'a>(value: &'a Value, key: &str) -> Vec<&'a Value> {
+    let mut results = Vec::new();
+
+    match value {
+        Value::Dict(pairs) => {
+            // First check if this dict directly contains the key
+            if let Some(v) = pairs.iter().find(|(k, _)| k == key) {
+                results.push(&v.1);
+            }
+
+            // Then recurse into all values
+            for (_k, v) in pairs {
+                match v {
+                    Value::Dict(_) => {
+                        results.extend(get_values_matching_key(v, key));
+                    }
+                    Value::List(items) => {
+                        for item in items {
+                            match item {
+                                Value::Dict(_) | Value::List(_) => {
+                                    results.extend(get_values_matching_key(item, key));
+                                }
+                                Value::Str(s) if s == key => {
+                                    results.push(item);
+                                }
+                                _ => {}
+                            }
+                        }
+                    }
+                    _ => {}
+                }
+            }
+        }
+        Value::List(items) => {
+            for item in items {
+                match item {
+                    Value::Dict(_) | Value::List(_) => {
+                        results.extend(get_values_matching_key(item, key));
+                    }
+                    Value::Str(s) if s == key => {
+                        results.push(item);
+                    }
+                    _ => {}
+                }
+            }
+        }
+        _ => {}
+    }
+
+    results
+}
+
+/// Extract geopoints from the nested dict.
+///
+/// For each geo_xpath, search the nested dict recursively for matching keys.
+/// For each matching value (GPS string), split by whitespace and take first 2 as (lat, lng).
+///
+/// Returns a list of (lat, lng) tuples. On any parse error for a geopoint,
+/// returns early with the points collected so far (matching Python's `return` on ValueError).
+pub fn extract_geopoints(dict: &Value, geo_xpaths: &[String]) -> Vec<(f64, f64)> {
+    let mut points = Vec::new();
+
+    for xpath in geo_xpaths {
+        // Search the nested dict recursively for matching keys.
+        // geo_xpaths contains abbreviated xpaths used as search keys.
+        let values = get_values_matching_key(dict, xpath);
+        for gps_val in values {
+            if let Value::Str(gps_str) = gps_val {
+                let parts: Vec<&str> = gps_str.split_whitespace().collect();
+                if parts.len() >= 2 {
+                    match (parts[0].parse::<f64>(), parts[1].parse::<f64>()) {
+                        (Ok(lat), Ok(lng)) => {
+                            points.push((lat, lng));
+                        }
+                        _ => {
+                            // Python returns on ValueError, stopping all processing
+                            return points;
+                        }
+                    }
+                }
+            }
+        }
+    }
+
+    points
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::parser::parse_xml;
+
+    #[test]
+    fn test_extract_gps_simple() {
+        let xml = r#"<?xml version='1.0' ?><tutorial id="tutorial">
+  <name>Larry</name>
+  <gps>-1.2836198 36.8795437 0.0 1044.0</gps>
+  <meta><instanceID>uuid:abc</instanceID></meta>
+</tutorial>"#;
+
+        let result = parse_xml(xml, &[], false).unwrap();
+        let dict = result.dict.unwrap();
+        let points = extract_geopoints(&dict, &["gps".to_string()]);
+
+        assert_eq!(points.len(), 1);
+        assert!((points[0].0 - (-1.2836198)).abs() < 1e-10);
+        assert!((points[0].1 - 36.8795437).abs() < 1e-10);
+    }
+
+    #[test]
+    fn test_extract_gps_nested() {
+        let xml = r#"<new_repeats id="new_repeats">
+  <info><age>80</age></info>
+  <gps>-1.2627557 36.7926442 0.0 30.0</gps>
+</new_repeats>"#;
+
+        let result = parse_xml(xml, &[], false).unwrap();
+        let dict = result.dict.unwrap();
+        let points = extract_geopoints(&dict, &["gps".to_string()]);
+
+        assert_eq!(points.len(), 1);
+        assert!((points[0].0 - (-1.2627557)).abs() < 1e-10);
+        assert!((points[0].1 - 36.7926442).abs() < 1e-10);
+    }
+
+    #[test]
+    fn test_no_gps() {
+        let xml = "<root><name>test</name></root>";
+        let result = parse_xml(xml, &[], false).unwrap();
+        let dict = result.dict.unwrap();
+        let points = extract_geopoints(&dict, &["gps".to_string()]);
+        assert!(points.is_empty());
+    }
+
+    #[test]
+    fn test_empty_geo_xpaths() {
+        let xml = "<root><gps>-1.0 36.0 0.0 0.0</gps></root>";
+        let result = parse_xml(xml, &[], false).unwrap();
+        let dict = result.dict.unwrap();
+        let points = extract_geopoints(&dict, &[]);
+        assert!(points.is_empty());
+    }
+
+    #[test]
+    fn test_get_values_matching_key_nested() {
+        // Simulate a nested dict structure
+        let dict = Value::Dict(vec![
+            (
+                "root".to_string(),
+                Value::Dict(vec![
+                    (
+                        "group".to_string(),
+                        Value::Dict(vec![("gps".to_string(), Value::Str("-1.0 36.0".to_string()))]),
+                    ),
+                ]),
+            ),
+        ]);
+
+        let values = get_values_matching_key(&dict, "gps");
+        assert_eq!(values.len(), 1);
+        assert_eq!(*values[0], Value::Str("-1.0 36.0".to_string()));
+    }
+
+    #[test]
+    fn test_get_values_matching_key_in_list() {
+        // Value with a list of dicts (repeat group)
+        let dict = Value::Dict(vec![
+            (
+                "locations".to_string(),
+                Value::List(vec![
+                    Value::Dict(vec![("gps".to_string(), Value::Str("-1.0 36.0".to_string()))]),
+                    Value::Dict(vec![("gps".to_string(), Value::Str("-2.0 37.0".to_string()))]),
+                ]),
+            ),
+        ]);
+
+        let values = get_values_matching_key(&dict, "gps");
+        assert_eq!(values.len(), 2);
+    }
+}
diff --git a/rust/onadata_xml/src/lib.rs b/rust/onadata_xml/src/lib.rs
new file mode 100644
index 0000000000..96a4078f05
--- /dev/null
+++ b/rust/onadata_xml/src/lib.rs
@@ -0,0 +1,184 @@
+use std::collections::HashSet;
+
+use pyo3::prelude::*;
+use pyo3::types::{PyDict, PyList, PyNone, PyString};
+use sha2::{Digest, Sha256};
+
+mod flatten;
+mod geom;
+mod numeric;
+mod parser;
+
+use flatten::flatten_dict;
+use geom::extract_geopoints;
+use numeric::{numeric_checker, NumericValue};
+use parser::{parse_xml, Value};
+
+// ---------------------------------------------------------------------------
+// SubmissionResult PyO3 class
+// ---------------------------------------------------------------------------
+
+#[pyclass]
+pub struct SubmissionResult {
+    #[pyo3(get)]
+    pub dict: PyObject,
+    #[pyo3(get)]
+    pub flat_dict: PyObject,
+    #[pyo3(get)]
+    pub attributes: PyObject,
+    #[pyo3(get)]
+    pub root_node_name: String,
+    #[pyo3(get)]
+    pub uuid: Option<String>,
+    #[pyo3(get)]
+    pub deprecated_uuid: Option<String>,
+    #[pyo3(get)]
+    pub submission_date: Option<String>,
+    #[pyo3(get)]
+    pub geom_points: Vec<(f64, f64)>,
+    #[pyo3(get)]
+    pub checksum: String,
+}
+
+// ---------------------------------------------------------------------------
+// Value -> Python object conversion
+// ---------------------------------------------------------------------------
+
+/// Convert a parser::Value to a Python object.
+///
+/// - Value::Str -> Python str (or int/float if in numeric_fields)
+/// - Value::Dict -> Python dict
+/// - Value::List -> Python list
+fn value_to_py(py: Python<'_>, value: &Value, numeric_fields: &HashSet<String>, current_key: &str) -> PyResult<PyObject> {
+    match value {
+        Value::Str(s) => {
+            if numeric_fields.contains(current_key) {
+                match numeric_checker(s) {
+                    NumericValue::Int(i) => Ok(i.into_pyobject(py)?.into_any().unbind()),
+                    NumericValue::Float(f) => Ok(f.into_pyobject(py)?.into_any().unbind()),
+                    NumericValue::Str(s) => Ok(PyString::new(py, &s).into_any().unbind()),
+                }
+            } else {
+                Ok(PyString::new(py, s).into_any().unbind())
+            }
+        }
+        Value::Dict(pairs) => {
+            let dict = PyDict::new(py);
+            for (key, val) in pairs {
+                let py_val = value_to_py(py, val, numeric_fields, key)?;
+                dict.set_item(key, py_val)?;
+            }
+            Ok(dict.into_any().unbind())
+        }
+        Value::List(items) => {
+            let list = PyList::empty(py);
+            for item in items {
+                let py_item = value_to_py(py, item, numeric_fields, current_key)?;
+                list.append(py_item)?;
+            }
+            Ok(list.into_any().unbind())
+        }
+    }
+}
+
+/// Convert a flat dict (Vec of (String, Value)) to a Python dict.
+///
+/// The numeric_fields set contains abbreviated xpaths that should be converted
+/// to numeric values. This matches Python's `numeric_converter` which walks the
+/// flat dict recursively.
+fn flat_dict_to_py(
+    py: Python<'_>,
+    flat: &[(String, Value)],
+    numeric_fields: &HashSet<String>,
+) -> PyResult<PyObject> {
+    let dict = PyDict::new(py);
+    for (key, value) in flat {
+        let py_val = value_to_py(py, value, numeric_fields, key)?;
+        dict.set_item(key, py_val)?;
+    }
+    Ok(dict.into_any().unbind())
+}
+
+// ---------------------------------------------------------------------------
+// SHA256 checksum
+// ---------------------------------------------------------------------------
+
+fn sha256_hex(data: &str) -> String {
+    let mut hasher = Sha256::new();
+    hasher.update(data.as_bytes());
+    format!("{:x}", hasher.finalize())
+}
+
+// ---------------------------------------------------------------------------
+// parse_submission pyfunction
+// ---------------------------------------------------------------------------
+
+/// Parse an XML submission and return a SubmissionResult.
+///
+/// Arguments:
+/// - xml_str: The raw XML string
+/// - repeat_xpaths: List of xpaths that should be treated as repeating groups
+/// - encrypted: Whether the form is encrypted (forces "media" to be list-type)
+/// - numeric_fields: Set of abbreviated xpaths for numeric conversion
+/// - geo_xpaths: List of field names for geopoint extraction
+#[pyfunction]
+#[pyo3(signature = (xml_str, repeat_xpaths, encrypted, numeric_fields, geo_xpaths))]
+fn parse_submission(
+    py: Python<'_>,
+    xml_str: &str,
+    repeat_xpaths: Vec<String>,
+    encrypted: bool,
+    numeric_fields: HashSet<String>,
+    geo_xpaths: Vec<String>,
+) -> PyResult<SubmissionResult> {
+    // Parse XML
+    let parse_result = parse_xml(xml_str, &repeat_xpaths, encrypted)
+        .map_err(|e| pyo3::exceptions::PyValueError::new_err(e))?;
+
+    // Build Python dict from parsed Value tree
+    let py_dict = match &parse_result.dict {
+        Some(dict) => value_to_py(py, dict, &numeric_fields, "")?,
+        None => PyNone::get(py).to_owned().into_any().unbind(),
+    };
+
+    // Build flat dict
+    let flat = match &parse_result.dict {
+        Some(dict) => flatten_dict(dict),
+        None => Vec::new(),
+    };
+    let py_flat_dict = flat_dict_to_py(py, &flat, &numeric_fields)?;
+
+    // Build attributes dict
+    let attrs_dict = PyDict::new(py);
+    for (key, value) in &parse_result.attributes {
+        attrs_dict.set_item(key, value)?;
+    }
+
+    // Extract geopoints from the nested dict
+    let geom_points = match &parse_result.dict {
+        Some(dict) => extract_geopoints(dict, &geo_xpaths),
+        None => Vec::new(),
+    };
+
+    // SHA256 of original XML string
+    let checksum = sha256_hex(xml_str);
+
+    Ok(SubmissionResult {
+        dict: py_dict,
+        flat_dict: py_flat_dict,
+        attributes: attrs_dict.into_any().unbind(),
+        root_node_name: parse_result.root_node_name,
+        uuid: parse_result.uuid,
+        deprecated_uuid: parse_result.deprecated_uuid,
+        submission_date: parse_result.submission_date,
+        geom_points,
+        checksum,
+    })
+}
+
+#[pymodule]
+fn onadata_xml(m: &Bound<'_, PyModule>) -> PyResult<()> {
+    m.add_class::<SubmissionResult>()?;
+    m.add_function(wrap_pyfunction!(parse_submission, m)?)?;
+    Ok(())
+}
diff --git a/rust/onadata_xml/src/numeric.rs b/rust/onadata_xml/src/numeric.rs
new file mode 100644
index 0000000000..2c789f72f4
--- /dev/null
+++ b/rust/onadata_xml/src/numeric.rs
@@ -0,0 +1,102 @@
+/// Numeric conversion utilities matching Python's `numeric_checker`.
+///
+/// Tries int, then float (NaN -> 0), otherwise returns original string.
+
+/// Result of numeric checking - either a parsed number or the original string.
+#[derive(Debug, Clone, PartialEq)]
+pub enum NumericValue {
+    Int(i64),
+    Float(f64),
+    Str(String),
+}
+
+/// Replicates Python's `numeric_checker(string_value)`:
+/// - Try int(string_value) -> return int
+/// - Try float(string_value) -> if NaN return 0, else return float
+/// - Otherwise return string unchanged
+pub fn numeric_checker(string_value: &str) -> NumericValue {
+    // Try parsing as integer first
+    if let Ok(i) = string_value.parse::<i64>() {
+        return NumericValue::Int(i);
+    }
+
+    // Try parsing as float
+    if let Ok(f) = string_value.parse::<f64>() {
+        if f.is_nan() {
+            return NumericValue::Int(0);
+        }
+        return NumericValue::Float(f);
+    }
+
+    NumericValue::Str(string_value.to_string())
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_integer() {
+        assert_eq!(numeric_checker("23"), NumericValue::Int(23));
+    }
+
+    #[test]
+    fn test_negative_integer() {
+        assert_eq!(numeric_checker("-5"), NumericValue::Int(-5));
+    }
+
+    #[test]
+    fn test_zero() {
+        assert_eq!(numeric_checker("0"), NumericValue::Int(0));
+    }
+
+    #[test]
+    fn test_float() {
+        assert_eq!(numeric_checker("1.25"), NumericValue::Float(1.25));
+    }
+
+    #[test]
+    fn test_negative_float() {
+        assert_eq!(numeric_checker("-1.2836198"), NumericValue::Float(-1.2836198));
+    }
+
+    #[test]
+    fn test_nan() {
+        assert_eq!(numeric_checker("NaN"), NumericValue::Int(0));
+    }
+
+    #[test]
+    fn test_nan_lowercase() {
+        // Python's float("nan") works, Rust's parse also handles various NaN forms
+        assert_eq!(numeric_checker("nan"), NumericValue::Int(0));
+    }
+
+    #[test]
+    fn test_string() {
+        assert_eq!(
+            numeric_checker("hello"),
+            NumericValue::Str("hello".to_string())
+        );
+    }
+
+    #[test]
+    fn test_empty_string() {
+        assert_eq!(numeric_checker(""), NumericValue::Str("".to_string()));
+    }
+
+    #[test]
+    fn test_gps_string() {
+        assert_eq!(
+            numeric_checker("-1.2836198 36.8795437 0.0 1044.0"),
+            NumericValue::Str("-1.2836198 36.8795437 0.0 1044.0".to_string())
+        );
+    }
+
+    #[test]
+    fn test_uuid_string() {
+        assert_eq!(
+            numeric_checker("uuid:729f173c688e482486a48661700455ff"),
+            NumericValue::Str("uuid:729f173c688e482486a48661700455ff".to_string())
+        );
+    }
+}
diff --git a/rust/onadata_xml/src/parser.rs b/rust/onadata_xml/src/parser.rs
new file mode 100644
index 0000000000..c2fdeedd64
--- /dev/null
+++ b/rust/onadata_xml/src/parser.rs
@@ -0,0 +1,1041 @@
+/// Core XML-to-dict parser.
+///
+/// Replicates Python's `clean_and_parse_xml`, `_xml_node_to_dict`,
+/// `xpath_from_xml_node`, `_get_all_attributes`, UUID/deprecatedID extraction,
+/// and submissionDate extraction.
+use std::collections::HashSet;
+
+use quick_xml::events::{BytesCData, BytesStart, BytesText, Event};
+use quick_xml::Reader;
+
+// ---------------------------------------------------------------------------
+// Value enum -- our Rust-side representation of the nested Python dict
+// ---------------------------------------------------------------------------
+
+/// A value in the parsed XML dict tree.
+/// Mirrors what the Python code produces:
+/// - `Str` for leaf text nodes
+/// - `Dict` for element nodes with children (preserves insertion order)
+/// - `List` for repeated elements / encrypted media
+#[derive(Debug, Clone, PartialEq)]
+pub enum Value {
+    Str(String),
+    Dict(Vec<(String, Value)>),
+    List(Vec<Value>),
+}
+
+impl Value {
+    /// Lookup a key in a Dict value.  Returns None for non-Dict variants.
+    #[allow(dead_code)]
+    pub fn get(&self, key: &str) -> Option<&Value> {
+        match self {
+            Value::Dict(pairs) => pairs.iter().find(|(k, _)| k == key).map(|(_, v)| v),
+            _ => None,
+        }
+    }
+}
+
+// ---------------------------------------------------------------------------
+// Attribute triple
+// ---------------------------------------------------------------------------
+
+/// (attr_key, attr_value, element_name)
+#[derive(Debug, Clone)]
+pub struct XmlAttribute {
+    pub key: String,
+    pub value: String,
+    pub node_name: String,
+}
+
+// ---------------------------------------------------------------------------
+// ParseResult -- everything extracted from a single parse pass
+// ---------------------------------------------------------------------------
+
+#[derive(Debug)]
+pub struct ParseResult {
+    /// The nested dict, e.g. {"tutorial": {"name": "Larry", ...}}
+    pub dict: Option<Value>,
+    /// Root element name (e.g. "tutorial")
+    pub root_node_name: String,
+    /// All XML attributes (respecting entity-skip and first-wins rules)
+    pub attributes: Vec<(String, String)>,
+    /// UUID extracted from meta/instanceID (uuid: prefix stripped)
+    pub uuid: Option<String>,
+    /// Deprecated UUID from meta/deprecatedID (uuid: prefix stripped)
+    pub deprecated_uuid: Option<String>,
+    /// submissionDate attribute from root element
+    pub submission_date: Option<String>,
+}
+
+// ---------------------------------------------------------------------------
+// Internal DOM tree built from quick-xml events
+// ---------------------------------------------------------------------------
+
+/// Minimal DOM node built during SAX-style parsing.
+#[derive(Debug, Clone)]
+enum DomNode {
+    Element {
+        /// Local name (namespace prefix stripped for matching, but kept for
+        /// nodeName output to match Python minidom behaviour).
+        name: String,
+        attrs: Vec<(String, String)>,
+        children: Vec<DomNode>,
+    },
+    Text(String),
+    CData(String),
+}
+
+/// Build a minimal DOM tree from cleaned XML bytes.
+fn build_dom(xml_bytes: &[u8]) -> Result<DomNode, String> {
+    let mut reader = Reader::from_reader(xml_bytes);
+    reader.config_mut().trim_text_start = false;
+    reader.config_mut().trim_text_end = false;
+
+    let mut stack: Vec<DomNode> = Vec::new();
+    // Sentinel root
+    stack.push(DomNode::Element {
+        name: "#document".to_string(),
+        attrs: vec![],
+        children: vec![],
+    });
+
+    let mut buf = Vec::new();
+    loop {
+        match reader.read_event_into(&mut buf) {
+            Ok(Event::Start(ref e)) => {
+                let name = elem_name(e);
+                let attrs = elem_attrs(e);
+                stack.push(DomNode::Element {
+                    name,
+                    attrs,
+                    children: vec![],
+                });
+            }
+            Ok(Event::Empty(ref e)) => {
+                // Self-closing element like <note/>
+                let name = elem_name(e);
+                let attrs = elem_attrs(e);
+                let node = DomNode::Element {
+                    name,
+                    attrs,
+                    children: vec![],
+                };
+                // Push onto current top
+                if let Some(DomNode::Element { children, .. }) = stack.last_mut() {
+                    children.push(node);
+                }
+            }
+            Ok(Event::End(ref _e)) => {
+                let node = stack.pop().ok_or("Unexpected end tag")?;
+                if let Some(DomNode::Element { children, .. }) = stack.last_mut() {
+                    children.push(node);
+                } else {
+                    return Err("No parent for end tag".to_string());
+                }
+            }
+            Ok(Event::Text(ref e)) => {
+                let text = decode_text(e);
+                if let Some(DomNode::Element { children, .. }) = stack.last_mut() {
+                    children.push(DomNode::Text(text));
+                }
+            }
+            Ok(Event::CData(ref e)) => {
+                let text = decode_cdata(e);
+                if let Some(DomNode::Element { children, .. }) = stack.last_mut() {
+                    children.push(DomNode::CData(text));
+                }
+            }
+            Ok(Event::Decl(_)) | Ok(Event::PI(_)) | Ok(Event::Comment(_)) => {}
+            Ok(Event::DocType(_)) => {}
+            Ok(Event::Eof) => break,
+            Err(e) => return Err(format!("XML parse error: {e}")),
+        }
+        buf.clear();
+    }
+
+    // stack should have only the sentinel #document
+    if stack.len() != 1 {
+        return Err("Malformed XML: unclosed elements".to_string());
+    }
+    Ok(stack.pop().unwrap())
+}
+
+fn elem_name(e: &BytesStart) -> String {
+    String::from_utf8_lossy(e.name().as_ref()).to_string()
+}
+
+fn elem_attrs(e: &BytesStart) -> Vec<(String, String)> {
+    e.attributes()
+        .filter_map(|a| a.ok())
+        .map(|a| {
+            let key = String::from_utf8_lossy(a.key.as_ref()).to_string();
+            let val = String::from_utf8_lossy(&a.value).to_string();
+            (key, val)
+        })
+        .collect()
+}
+
+fn decode_text(e: &BytesText) -> String {
+    // Unescape XML entities
+    match e.unescape() {
+        Ok(s) => s.to_string(),
+        Err(_) => String::from_utf8_lossy(e.as_ref()).to_string(),
+    }
+}
+
+fn decode_cdata(e: &BytesCData) -> String {
+    String::from_utf8_lossy(e.as_ref()).to_string()
+}
+
+// ---------------------------------------------------------------------------
+// Clean XML (matching Python's clean_and_parse_xml)
+// ---------------------------------------------------------------------------
+
+/// Strips whitespace, removes whitespace between XML tags.
+/// Matches: `re.sub(r">\s+<", "><", smart_str(xml_string.strip()))`
+pub fn clean_xml(xml_str: &str) -> String {
+    let trimmed = xml_str.trim();
+    // Remove whitespace between tags
+    let mut result = String::with_capacity(trimmed.len());
+    let mut chars = trimmed.chars().peekable();
+    while let Some(c) = chars.next() {
+        if c == '>' {
+            result.push(c);
+            // Consume any whitespace that is immediately followed by '<'
+            let mut ws_buf = String::new();
+            while let Some(&next) = chars.peek() {
+                if next.is_whitespace() {
+                    ws_buf.push(next);
+                    chars.next();
+                } else {
+                    break;
+                }
+            }
+            // If next char is '<', drop the whitespace; otherwise keep it
+            if let Some(&'<') = chars.peek() {
+                // drop ws_buf
+            } else {
+                result.push_str(&ws_buf);
+            }
+        } else {
+            result.push(c);
+        }
+    }
+    result
+}
+
+// ---------------------------------------------------------------------------
+// xpath computation (matching Python's xpath_from_xml_node)
+// ---------------------------------------------------------------------------
+
+/// Compute the xpath for a node given the path of ancestor names.
+/// Python's `xpath_from_xml_node` walks parent chain, collects names,
+/// then returns "/".join(names[1:]) -- skipping the document node AND
+/// the root element node (since _gather_parent_node_list skips when
+/// parentNode.parentNode is None, i.e. the document's child = root element).
+///
+/// Actually, re-reading the Python code more carefully:
+/// ```python
+/// def _gather_parent_node_list(node):
+///     node_names = []
+///     if node.parentNode and node.parentNode.parentNode:
+///         node_names.extend(_gather_parent_node_list(node.parentNode))
+///     node_names.extend([node.nodeName])
+///     return node_names
+/// ```
+///
+/// For a node at path document -> root -> child -> grandchild:
+/// - grandchild: parent=child, parent.parent=root (exists) -> recurse to child
+///   - child: parent=root, parent.parent=document (exists) -> recurse to root
+///     - root: parent=document, parent.parent=None -> STOP, return ["root"]
+///   - child returns ["root", "child"]
+/// - grandchild returns ["root", "child", "grandchild"]
+/// Then xpath_from_xml_node returns "/".join(names[1:]) = "child/grandchild"
+///
+/// So the xpath skips the root element name and gives the path from root's children down.
+///
+/// We pass `ancestor_names` which is the list of element names from root down (not including
+/// the document node). For a child at depth 2 under root:
+/// ancestor_names = ["root", "child"] and current node name is the node itself.
+/// The xpath = "/".join(ancestor_names[1:] + [node_name])... wait, let me re-check.
+///
+/// Actually, ancestor_names in our traversal doesn't include the current node.
+/// So for grandchild: ancestor_names = ["root", "child"], node_name = "grandchild"
+/// full path = ["root", "child", "grandchild"], xpath = "child/grandchild"
+///
+/// This matches: skip first element (root), join the rest.
+pub fn compute_xpath(ancestor_names: &[String], node_name: &str) -> String {
+    // ancestor_names[0] is the root element name.
+    // We want: ancestor_names[1..] joined with "/" then "/" then node_name
+    let mut parts: Vec<&str> = ancestor_names.iter().skip(1).map(|s| s.as_str()).collect();
+    parts.push(node_name);
+    parts.join("/")
+}
+
+// ---------------------------------------------------------------------------
+// _xml_node_to_dict equivalent
+// ---------------------------------------------------------------------------
+
+/// Convert a DomNode (element) into our Value tree.
+/// `repeats` is the set of xpaths that should be treated as list-type.
+/// `encrypted` when true forces "media" child elements to be list-type.
+/// `ancestor_names` tracks the path for xpath computation.
+fn node_to_dict(
+    node: &DomNode,
+    repeats: &HashSet<String>,
+    encrypted: bool,
+    ancestor_names: &[String],
+) -> Option<Value> {
+    match node {
+        DomNode::Text(_) | DomNode::CData(_) => {
+            // Leaf nodes handled by parent
+            None
+        }
+        DomNode::Element {
+            name,
+            children,
+            ..
+        } => {
+            // If node has 0 children -> None
+            if children.is_empty() {
+                return None;
+            }
+
+            // If node has 1 child that is Text -> {nodeName: textValue}
+            if children.len() == 1 {
+                match &children[0] {
+                    DomNode::Text(text) => {
+                        return Some(Value::Dict(vec![(name.clone(), Value::Str(text.clone()))]));
+                    }
+                    DomNode::CData(text) => {
+                        // CDATA section -> {parentNodeName: cdataValue}
+                        return Some(Value::Dict(vec![(name.clone(), Value::Str(text.clone()))]));
+                    }
+                    _ => {}
+                }
+            }
+
+            // Check for CDATA among children (Python checks this in the loop)
+            for child in children {
+                if let DomNode::CData(text) = child {
+                    return Some(Value::Dict(vec![(name.clone(), Value::Str(text.clone()))]));
+                }
+            }
+
+            // This is an internal node - iterate children
+            let mut value: Vec<(String, Value)> = Vec::new();
+            let mut current_path = ancestor_names.to_vec();
+            current_path.push(name.clone());
+
+            for child in children {
+                match child {
+                    DomNode::Text(_) => {
+                        // Text nodes among element siblings are ignored
+                        // (Python: the loop only processes element children
+                        //  via _xml_node_to_dict which returns None for text)
+                        continue;
+                    }
+                    DomNode::CData(text) => {
+                        // CDATA found during iteration (Python line 200-201)
+                        return Some(Value::Dict(vec![(name.clone(), Value::Str(text.clone()))]));
+                    }
+                    DomNode::Element {
+                        name: child_name, ..
+                    } => {
+                        let child_dict =
+                            node_to_dict(child, repeats, encrypted, &current_path);
+
+                        if child_dict.is_none() {
+                            continue;
+                        }
+
+                        let child_dict = child_dict.unwrap();
+
+                        // Extract the child's value from the wrapper dict
+                        let child_value = match &child_dict {
+                            Value::Dict(pairs) => {
+                                if pairs.len() == 1 && pairs[0].0 == *child_name {
+                                    pairs[0].1.clone()
+                                } else {
+                                    // This shouldn't happen per Python assertion
+                                    child_dict.clone()
+                                }
+                            }
+                            _ => child_dict.clone(),
+                        };
+
+                        let child_xpath = compute_xpath(&current_path, child_name);
+
+                        let is_list_type = repeats.contains(&child_xpath)
+                            || (encrypted && child_name == "media");
+
+                        // Find if child_name already exists in value
+                        let existing_idx =
+                            value.iter().position(|(k, _)| k == child_name);
+
+                        if is_list_type {
+                            // List type: always append to list
+                            if let Some(idx) = existing_idx {
+                                match &mut value[idx].1 {
+                                    Value::List(list) => {
+                                        list.push(child_value);
+                                    }
+                                    _ => {
+                                        // Shouldn't happen since we always init as list
+                                    }
+                                }
+                            } else {
+                                value.push((
+                                    child_name.clone(),
+                                    Value::List(vec![child_value]),
+                                ));
+                            }
+                        } else {
+                            // Dict type
+                            if let Some(idx) = existing_idx {
+                                // Node is repeated, aggregate
+                                let existing = &mut value[idx].1;
+                                match existing {
+                                    Value::List(list) => {
+                                        // Already a list, just append
+                                        list.push(child_value);
+                                    }
+                                    _ => {
+                                        // Convert to list
+                                        let prev = existing.clone();
+                                        *existing = Value::List(vec![prev, child_value]);
+                                    }
+                                }
+                            } else {
+                                value.push((child_name.clone(), child_value));
+                            }
+                        }
+                    }
+                }
+            }
+
+            if value.is_empty() {
+                return None;
+            }
+
+            Some(Value::Dict(vec![(name.clone(), Value::Dict(value))]))
+        }
+    }
+}
+
+// ---------------------------------------------------------------------------
+// Attribute collection (matching Python's _get_all_attributes + _set_attributes)
+// ---------------------------------------------------------------------------
+
+/// Recursively collect all attributes from an element tree.
+fn collect_attributes(node: &DomNode, out: &mut Vec<XmlAttribute>) {
+    if let DomNode::Element {
+        name,
+        attrs,
+        children,
+    } = node
+    {
+        for (key, val) in attrs {
+            out.push(XmlAttribute {
+                key: key.clone(),
+                value: val.clone(),
+                node_name: name.clone(),
+            });
+        }
+        for child in children {
+            collect_attributes(child, out);
+        }
+    }
+}
+
+/// Apply Python's _set_attributes logic: skip entity nodes, first-wins for duplicates.
+fn build_attributes(raw: &[XmlAttribute]) -> Vec<(String, String)> {
+    let mut result: Vec<(String, String)> = Vec::new();
+    let mut seen: HashSet<String> = HashSet::new();
+    for attr in raw {
+        if attr.node_name == "entity" {
+            continue;
+        }
+        if seen.contains(&attr.key) {
+            // Duplicate - skip (first wins)
+            continue;
+        }
+        seen.insert(attr.key.clone());
+        result.push((attr.key.clone(), attr.value.clone()));
+    }
+    result
+}
+
+// ---------------------------------------------------------------------------
+// UUID extraction
+// ---------------------------------------------------------------------------
+
+/// Extract UUID from meta/instanceID or orx:meta/orx:instanceID.
+/// Also checks root element's instanceID attribute.
+fn extract_uuid(root: &DomNode, attributes: &[(String, String)]) -> Option<String> {
+    // First try meta/instanceID in the XML tree
+    if let Some(uuid) = extract_meta_value(root, "instanceID") {
+        return strip_uuid_prefix(&uuid);
+    }
+
+    // Then check root element's instanceID attribute
+    for (key, value) in attributes {
+        if key == "instanceID" {
+            return strip_uuid_prefix(value);
+        }
+    }
+
+    None
+}
+
+/// Extract deprecated UUID from meta/deprecatedID or orx:meta/orx:deprecatedID.
+fn extract_deprecated_uuid(root: &DomNode) -> Option<String> {
+    if let Some(uuid) = extract_meta_value(root, "deprecatedID") {
+        return strip_uuid_prefix(&uuid);
+    }
+    None
+}
+
+/// Extract a value from meta/<tag_name> or orx:meta/orx:<tag_name>.
+fn extract_meta_value(root: &DomNode, tag_name: &str) -> Option<String> {
+    if let DomNode::Element { children, .. } = root {
+        for child in children {
+            if let DomNode::Element {
+                name, children: meta_children, ..
+            } = child
+            {
+                let name_lower = name.to_lowercase();
+                if name_lower == "meta" || name_lower == "orx:meta" {
+                    for meta_child in meta_children {
+                        if let DomNode::Element {
+                            name: child_name,
+                            children: value_children,
+                            ..
+                        } = meta_child
+                        {
+                            let child_name_lower = child_name.to_lowercase();
+                            if child_name_lower == tag_name.to_lowercase()
+                                || child_name_lower
+                                    == format!("orx:{}", tag_name.to_lowercase())
+                            {
+                                // Get text content
+                                if let Some(text) = get_text_content(value_children) {
+                                    return Some(text.trim().to_string());
+                                }
+                            }
+                        }
+                    }
+                }
+            }
+        }
+    }
+    None
+}
+
+/// Get the text content of a node's children.
+fn get_text_content(children: &[DomNode]) -> Option<String> {
+    for child in children {
+        match child {
+            DomNode::Text(text) => return Some(text.clone()),
+            DomNode::CData(text) => return Some(text.clone()),
+            _ => {}
+        }
+    }
+    None
+}
+
+/// Strip "uuid:" prefix from a UUID string.
+fn strip_uuid_prefix(s: &str) -> Option<String> {
+    if let Some(rest) = s.strip_prefix("uuid:") {
+        if rest.is_empty() {
+            None
+        } else {
+            Some(rest.to_string())
+        }
+    } else if !s.is_empty() {
+        // Return as-is if no uuid: prefix but non-empty
+        Some(s.to_string())
+    } else {
+        None
+    }
+}
+
+/// Extract submissionDate from root element's attributes.
+fn extract_submission_date(attributes: &[(String, String)]) -> Option<String> {
+    for (key, value) in attributes {
+        if key == "submissionDate" {
+            if !value.is_empty() {
+                return Some(value.clone());
+            }
+        }
+    }
+    None
+}
+
+// ---------------------------------------------------------------------------
+// Public parse entry point
+// ---------------------------------------------------------------------------
+
+/// Parse an XML submission string.
+///
+/// This is the main entry point that performs:
+/// 1. Clean XML (strip whitespace between tags)
+/// 2. Build DOM tree
+/// 3. Convert root element to nested dict (skipping #document wrapper)
+/// 4. Collect attributes (entity-skip, first-wins)
+/// 5. Extract UUID, deprecated UUID, submission date
+pub fn parse_xml(
+    xml_str: &str,
+    repeat_xpaths: &[String],
+    encrypted: bool,
+) -> Result<ParseResult, String> {
+    let cleaned = clean_xml(xml_str);
+    let dom = build_dom(cleaned.as_bytes())?;
+
+    // Get the root element (first element child of #document)
+    let root_element = match &dom {
+        DomNode::Element { children, .. } => children
+            .iter()
+            .find(|c| matches!(c, DomNode::Element { .. }))
+            .ok_or("No root element found")?,
+        _ => return Err("Expected document node".to_string()),
+    };
+
+    let root_name = match root_element {
+        DomNode::Element { name, .. } => name.clone(),
+        _ => unreachable!(),
+    };
+
+    // Build repeat xpath set
+    let repeats: HashSet<String> = repeat_xpaths.iter().cloned().collect();
+
+    // Convert root element to dict
+    // ancestor_names is empty because we start at root (no ancestors above it)
+    let dict = node_to_dict(root_element, &repeats, encrypted, &[]);
+
+    // Collect attributes from root element (not #document)
+    let mut raw_attrs = Vec::new();
+    collect_attributes(root_element, &mut raw_attrs);
+    let attributes = build_attributes(&raw_attrs);
+
+    // Extract UUID and deprecated UUID
+    let uuid = extract_uuid(root_element, &attributes);
+    let deprecated_uuid = extract_deprecated_uuid(root_element);
+
+    // Extract submission date from root attributes
+    let submission_date = extract_submission_date(&attributes);
+
+    Ok(ParseResult {
+        dict,
+        root_node_name: root_name,
+        attributes,
+        uuid,
+        deprecated_uuid,
+        submission_date,
+    })
+}
+
+// ---------------------------------------------------------------------------
+// Tests
+// ---------------------------------------------------------------------------
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_clean_xml() {
+        let input = "  <?xml version='1.0' ?><root>  \n  <child>text</child>  \n  </root>  ";
+        let cleaned = clean_xml(input);
+        assert_eq!(
+            cleaned,
+            "<?xml version='1.0' ?><root><child>text</child></root>"
+        );
+    }
+
+    #[test]
+    fn test_clean_xml_preserves_inner_text() {
+        let input = "<root><name>Larry\n        Again\n  </name></root>";
+        let cleaned = clean_xml(input);
+        // Text inside a single element should be preserved
+        assert_eq!(cleaned, "<root><name>Larry\n        Again\n  </name></root>");
+    }
+
+    #[test]
+    fn test_simple_form() {
+        let xml = r#"<?xml version='1.0' ?><tutorial id="tutorial">
+  <name>Larry
+        Again
+  </name>
+  <age>23</age>
+  <picture>1333604907194.jpg</picture>
+  <has_children>0</has_children>
+  <gps>-1.2836198 36.8795437 0.0 1044.0</gps>
+  <web_browsers>firefox chrome safari</web_browsers>
+  <meta>
+    <instanceID>uuid:729f173c688e482486a48661700455ff</instanceID>
+  </meta>
+</tutorial>"#;
+
+        let result = parse_xml(xml, &[], false).unwrap();
+
+        assert_eq!(result.root_node_name, "tutorial");
+        assert_eq!(
+            result.uuid,
+            Some("729f173c688e482486a48661700455ff".to_string())
+        );
+        assert_eq!(result.deprecated_uuid, None);
+        assert_eq!(result.submission_date, None);
+
+        // Check attributes
+        assert_eq!(result.attributes, vec![("id".to_string(), "tutorial".to_string())]);
+
+        // Check dict structure
+        let dict = result.dict.unwrap();
+        match &dict {
+            Value::Dict(pairs) => {
+                assert_eq!(pairs.len(), 1);
+                assert_eq!(pairs[0].0, "tutorial");
+                match &pairs[0].1 {
+                    Value::Dict(inner) => {
+                        // Check name preserves whitespace
+                        let name_val = inner.iter().find(|(k, _)| k == "name").unwrap();
+                        match &name_val.1 {
+                            Value::Str(s) => {
+                                assert_eq!(s, "Larry\n        Again\n  ");
+                            }
+                            _ => panic!("Expected Str for name"),
+                        }
+
+                        // Check age
+                        let age_val = inner.iter().find(|(k, _)| k == "age").unwrap();
+                        assert_eq!(age_val.1, Value::Str("23".to_string()));
+
+                        // Check meta/instanceID
+                        let meta_val = inner.iter().find(|(k, _)| k == "meta").unwrap();
+                        match &meta_val.1 {
+                            Value::Dict(meta_inner) => {
+                                assert_eq!(meta_inner.len(), 1);
+                                assert_eq!(meta_inner[0].0, "instanceID");
+                                assert_eq!(
+                                    meta_inner[0].1,
+                                    Value::Str(
+                                        "uuid:729f173c688e482486a48661700455ff".to_string()
+                                    )
+                                );
+                            }
+                            _ => panic!("Expected Dict for meta"),
+                        }
+                    }
+                    _ => panic!("Expected Dict for tutorial"),
+                }
+            }
+            _ => panic!("Expected Dict"),
+        }
+    }
+
+    #[test]
+    fn test_nested_repeats() {
+        let xml = r#"<new_repeats id="new_repeats">
+  <info><age>80</age><name>Adam</name></info>
+  <kids><kids_details><kids_age>50</kids_age><kids_name>Abel</kids_name></kids_details><has_kids>1</has_kids></kids>
+  <web_browsers>chrome ie</web_browsers>
+  <gps>-1.2627557 36.7926442 0.0 30.0</gps>
+</new_repeats>"#;
+
+        let repeats = vec!["kids/kids_details".to_string()];
+        let result = parse_xml(xml, &repeats, false).unwrap();
+
+        let dict = result.dict.unwrap();
+        // dict = {"new_repeats": {"info": ..., "kids": ..., ...}}
+        match &dict {
+            Value::Dict(pairs) => {
+                assert_eq!(pairs[0].0, "new_repeats");
+                let inner = match &pairs[0].1 {
+                    Value::Dict(d) => d,
+                    _ => panic!("Expected Dict"),
+                };
+
+                // Check kids/kids_details is a list
+                let kids = inner.iter().find(|(k, _)| k == "kids").unwrap();
+                match &kids.1 {
+                    Value::Dict(kids_inner) => {
+                        let kids_details =
+                            kids_inner.iter().find(|(k, _)| k == "kids_details").unwrap();
+                        match &kids_details.1 {
+                            Value::List(list) => {
+                                assert_eq!(list.len(), 1);
+                                // The single item should be a dict with kids_age and kids_name
+                                match &list[0] {
+                                    Value::Dict(d) => {
+                                        assert!(d.iter().any(|(k, _)| k == "kids_age"));
+                                        assert!(d.iter().any(|(k, _)| k == "kids_name"));
+                                    }
+                                    _ => panic!("Expected Dict in list"),
+                                }
+                            }
+                            _ => panic!("Expected List for kids_details"),
+                        }
+                    }
+                    _ => panic!("Expected Dict for kids"),
+                }
+            }
+            _ => panic!("Expected Dict"),
+        }
+    }
+
+    #[test]
+    fn test_encrypted_media() {
+        let xml = r#"<data id="tutorial_encrypted" version="201701031234" encrypted="yes" xmlns="http://www.opendatakit.org/xforms/encrypted"><base64EncryptedKey>ZJTc</base64EncryptedKey><orx:meta xmlns:orx="http://openrosa.org/xforms"><orx:instanceID>uuid:f8971231-f3b8-4b2b-8c35-d95fa207d937</orx:instanceID></orx:meta>
+<media><file>1483528430996.jpg.enc</file></media>
+<media><file>1483528445767.jpg.enc</file></media>
+<encryptedXmlFile>submission.xml.enc</encryptedXmlFile><base64EncryptedElementSignature>UUR8</base64EncryptedElementSignature></data>"#;
+
+        let result = parse_xml(xml, &[], true).unwrap();
+
+        assert_eq!(
+            result.uuid,
+            Some("f8971231-f3b8-4b2b-8c35-d95fa207d937".to_string())
+        );
+
+        let dict = result.dict.unwrap();
+        match &dict {
+            Value::Dict(pairs) => {
+                assert_eq!(pairs[0].0, "data");
+                let inner = match &pairs[0].1 {
+                    Value::Dict(d) => d,
+                    _ => panic!("Expected Dict"),
+                };
+
+                // media should be a list with 2 items
+                let media = inner.iter().find(|(k, _)| k == "media").unwrap();
+                match &media.1 {
+                    Value::List(list) => {
+                        assert_eq!(list.len(), 2);
+                        // First item
+                        match &list[0] {
+                            Value::Dict(d) => {
+                                assert_eq!(d[0].0, "file");
+                                assert_eq!(
+                                    d[0].1,
+                                    Value::Str("1483528430996.jpg.enc".to_string())
+                                );
+                            }
+                            _ => panic!("Expected Dict in media list"),
+                        }
+                        // Second item
+                        match &list[1] {
+                            Value::Dict(d) => {
+                                assert_eq!(d[0].0, "file");
+                                assert_eq!(
+                                    d[0].1,
+                                    Value::Str("1483528445767.jpg.enc".to_string())
+                                );
+                            }
+                            _ => panic!("Expected Dict in media list"),
+                        }
+                    }
+                    _ => panic!("Expected List for media"),
+                }
+            }
+            _ => panic!("Expected Dict"),
+        }
+    }
+
+    #[test]
+    fn test_repeated_nodes_auto_list() {
+        // S2A appears 3 times without being in repeat_xpaths.
+        // Python auto-converts to list on second occurrence.
+        let xml = r#"<RW_OUNIS_2016 id="ROUNIS2" version="201608211141">
+<S2A><S2A_note/><S2_1_3_2_2>1</S2_1_3_2_2><S2_1_3_2_3>1.25</S2_1_3_2_3></S2A>
+<S2A><S2A_note/><S2_1_3_3_2>1</S2_1_3_3_2><S2_1_3_3_3>1.25</S2_1_3_3_3></S2A>
+<S2A><S2A_note/><S2_1_3_5_2>1</S2_1_3_5_2><S2_1_3_5_3><S3B><S3_1_3_4>2</S3_1_3_4><S3_1_3_4>test</S3_1_3_4></S3B><S3B><S3_1_3_5>8</S3_1_3_5><S3_1_3_6>test2</S3_1_3_6></S3B><S3B><S3_1_3_7>5</S3_1_3_7><S3_1_3_8>test</S3_1_3_8></S3B></S2_1_3_5_3></S2A>
+</RW_OUNIS_2016>"#;
+
+        let result = parse_xml(xml, &[], false).unwrap();
+        let dict = result.dict.unwrap();
+
+        match &dict {
+            Value::Dict(pairs) => {
+                assert_eq!(pairs[0].0, "RW_OUNIS_2016");
+                let inner = match &pairs[0].1 {
+                    Value::Dict(d) => d,
+                    _ => panic!("Expected Dict"),
+                };
+
+                // S2A should be a list of 3 dicts
+                let s2a = inner.iter().find(|(k, _)| k == "S2A").unwrap();
+                match &s2a.1 {
+                    Value::List(list) => {
+                        assert_eq!(list.len(), 3);
+
+                        // First S2A: {S2_1_3_2_2: "1", S2_1_3_2_3: "1.25"}
+                        // (S2A_note is empty/self-closing, so skipped)
+                        match &list[0] {
+                            Value::Dict(d) => {
+                                assert!(d.iter().any(|(k, v)| k == "S2_1_3_2_2"
+                                    && *v == Value::Str("1".to_string())));
+                                assert!(d.iter().any(|(k, v)| k == "S2_1_3_2_3"
+                                    && *v == Value::Str("1.25".to_string())));
+                            }
+                            _ => panic!("Expected Dict in S2A list"),
+                        }
+
+                        // Third S2A has nested S2_1_3_5_3 with S3B repeats
+                        match &list[2] {
+                            Value::Dict(d) => {
+                                let s2_1_3_5_3 =
+                                    d.iter().find(|(k, _)| k == "S2_1_3_5_3").unwrap();
+                                match &s2_1_3_5_3.1 {
+                                    Value::Dict(inner_d) => {
+                                        let s3b =
+                                            inner_d.iter().find(|(k, _)| k == "S3B").unwrap();
+                                        match &s3b.1 {
+                                            Value::List(s3b_list) => {
+                                                assert_eq!(s3b_list.len(), 3);
+                                                // First S3B has S3_1_3_4 appearing twice -> list ["2", "test"]
+                                                match &s3b_list[0] {
+                                                    Value::Dict(d) => {
+                                                        let field = d
+                                                            .iter()
+                                                            .find(|(k, _)| k == "S3_1_3_4")
+                                                            .unwrap();
+                                                        match &field.1 {
+                                                            Value::List(vals) => {
+                                                                assert_eq!(vals.len(), 2);
+                                                                assert_eq!(
+                                                                    vals[0],
+                                                                    Value::Str("2".to_string())
+                                                                );
+                                                                assert_eq!(
+                                                                    vals[1],
+                                                                    Value::Str(
+                                                                        "test".to_string()
+                                                                    )
+                                                                );
+                                                            }
+                                                            _ => panic!(
+                                                                "Expected List for S3_1_3_4"
+                                                            ),
+                                                        }
+                                                    }
+                                                    _ => panic!("Expected Dict in S3B list"),
+                                                }
+                                            }
+                                            _ => panic!("Expected List for S3B"),
+                                        }
+                                    }
+                                    _ => panic!("Expected Dict for S2_1_3_5_3"),
+                                }
+                            }
+                            _ => panic!("Expected Dict in S2A list"),
+                        }
+                    }
+                    _ => panic!("Expected List for S2A, got {:?}", s2a.1),
+                }
+            }
+            _ => panic!("Expected Dict"),
+        }
+    }
+
+    #[test]
+    fn test_self_closing_tag_skipped() {
+        let xml = "<root><note/><name>test</name></root>";
+        let result = parse_xml(xml, &[], false).unwrap();
+        let dict = result.dict.unwrap();
+        match &dict {
+            Value::Dict(pairs) => {
+                let inner = match &pairs[0].1 {
+                    Value::Dict(d) => d,
+                    _ => panic!("Expected Dict"),
+                };
+                // note should be skipped
+                assert!(!inner.iter().any(|(k, _)| k == "note"));
+                // name should be present
+                assert!(inner.iter().any(|(k, _)| k == "name"));
+            }
+            _ => panic!("Expected Dict"),
+        }
+    }
+
+    #[test]
+    fn test_entity_attributes_skipped() {
+        let xml = r#"<data id="form1"><entity id="ent1" dataset="people"><label>test</label></entity><name>test</name></data>"#;
+        let result = parse_xml(xml, &[], false).unwrap();
+        // "id" from data should be present, but "id" and "dataset" from entity should be skipped
+        assert_eq!(
+            result.attributes,
+            vec![("id".to_string(), "form1".to_string())]
+        );
+    }
+
+    #[test]
+    fn test_submission_date_extraction() {
+        let xml = r#"<data id="form1" submissionDate="2023-01-15T10:30:00.000Z"><name>test</name></data>"#;
+        let result = parse_xml(xml, &[], false).unwrap();
+        assert_eq!(
+            result.submission_date,
+            Some("2023-01-15T10:30:00.000Z".to_string())
+        );
+    }
+
+    #[test]
+    fn test_deprecated_uuid() {
+        let xml = r#"<data id="form1"><meta><instanceID>uuid:new-uuid</instanceID><deprecatedID>uuid:old-uuid</deprecatedID></meta><name>test</name></data>"#;
+        let result = parse_xml(xml, &[], false).unwrap();
+        assert_eq!(result.uuid, Some("new-uuid".to_string()));
+        assert_eq!(result.deprecated_uuid, Some("old-uuid".to_string()));
+    }
+
+    #[test]
+    fn test_orx_namespace_uuid() {
+        let xml = r#"<data id="test" xmlns:orx="http://openrosa.org/xforms"><orx:meta><orx:instanceID>uuid:f8971231-f3b8-4b2b-8c35-d95fa207d937</orx:instanceID></orx:meta><name>test</name></data>"#;
+        let result = parse_xml(xml, &[], false).unwrap();
+        assert_eq!(
+            result.uuid,
+            Some("f8971231-f3b8-4b2b-8c35-d95fa207d937".to_string())
+        );
+    }
+
+    #[test]
+    fn test_empty_root() {
+        let xml = "<root/>";
+        let result = parse_xml(xml, &[], false).unwrap();
+        assert!(result.dict.is_none());
+        assert_eq!(result.root_node_name, "root");
+    }
+
+    #[test]
+    fn test_xpath_computation() {
+        // For a child "age" under root "tutorial", xpath should be "age"
+        assert_eq!(compute_xpath(&["tutorial".to_string()], "age"), "age");
+
+        // For grandchild "instanceID" under root "tutorial" > "meta"
+        assert_eq!(
+            compute_xpath(
+                &["tutorial".to_string(), "meta".to_string()],
+                "instanceID"
+            ),
+            "meta/instanceID"
+        );
+
+        // For deeply nested
+        assert_eq!(
+            compute_xpath(
+                &["root".to_string(), "a".to_string(), "b".to_string()],
+                "c"
+            ),
+            "a/b/c"
+        );
+    }
+
+    #[test]
+    fn test_xmlns_attributes_included() {
+        // xmlns attributes should be included (they are regular attributes to quick-xml)
+        let xml = r#"<data id="test" xmlns="http://example.com"><name>v</name></data>"#;
+        let result = parse_xml(xml, &[], false).unwrap();
+        // Should have both 'id' and 'xmlns'
+        assert!(result.attributes.iter().any(|(k, _)| k == "id"));
+        assert!(result.attributes.iter().any(|(k, _)| k == "xmlns"));
+    }
+}