[BUG] - Fix three stale-cache bugs after swarm particle addition#216
Open
bknight1 wants to merge 2 commits into
Open
[BUG] - Fix three stale-cache bugs after swarm particle addition#216bknight1 wants to merge 2 commits into
bknight1 wants to merge 2 commits into
Conversation
Bug 1 — Stale kd-tree in add_particles_with_coordinates: add_particles_with_coordinates() uses self.dm.migrate() directly, bypassing Swarm.migrate() which calls _invalidate_canonical_data(). The manual invalidation at lines 3574-3577 nils _canonical_data but missed self._kdtree, causing RBF interpolation to use old particle coordinates/neighbors. Bug 2 — Stale cached projector in _project_to_work_variable: Cached Projection solvers on the mesh object were reused across evaluate() calls without _force_setup=True. After a Stokes solve modifies the DM, the cached projector's PETSc solver state is stale, causing deadlock on PETSc 3.22.2 and silent wrong results on 3.24.2. Bug 3 — Stale proxy mesh variable after swarm write (documented only): _update() marks proxy as stale, but the lazy re-interpolation (_update_proxy_if_stale()) only fires on material.sym access. Code reading the proxy MeshVariable DM directly (e.g. a Projection solver evaluating its uw_function) gets stale data. See GitHub issue #215. Underworld development team with AI support from Claude Code
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes two of three stale-cache bugs that occur after Swarm.add_particles_with_coordinates() is called: a stale kd-tree in the swarm, and a stale cached Projection solver used by uw.function.evaluate() for derivative/L2 expressions. A third bug (stale proxy MeshVariable DM after swarm writes) is documented via a TODO comment but not fixed; a workaround is recommended in the PR description. A new regression/stress test file is added.
Changes:
- Add
self._kdtree = Noneinvalidation inSwarm.add_particles_with_coordinates()so subsequent_get_kdtree()rebuilds against the new coordinates. - Force
_force_setup=Trueon the scalar and tensor cached projectors in_project_to_work_variable(), rebuilding SNES/DM state every call. - Add
tests/test_0112_swarm_add_particles.pycovering add-after-populate, proxy updates, MPI re-add cycles, kd-tree + save patterns, and a Stokes/projection/evaluate/write loop.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/underworld3/swarm.py | Adds kd-tree cache invalidation after the direct dm.migrate() in add_particles_with_coordinates, plus a TODO comment documenting the unfixed stale-proxy case. |
| src/underworld3/function/_function.pyx | Passes _force_setup=True to both cached Projection/MultiComponent_Projection solves to avoid stale solver state. |
| tests/test_0112_swarm_add_particles.py | New regression and stress tests for swarm particle addition, proxy updates, MPI patterns, and a Stokes+projection+evaluate cycle. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+613
to
+615
| # _force_setup=True: rebuild solver state to avoid stale cached | ||
| # projector after Stokes/DM modifications (issue #215, Bug 2). | ||
| projector.solve(zero_init_guess=False, _force_setup=True) |
Comment on lines
3579
to
3584
| # Invalidate cached data — particle count changed after addNPoints + migrate | ||
| self._particle_coordinates._canonical_data = None | ||
| self._kdtree = None # issue #215, Bug 1: stale kd-tree after add_particles_with_coordinates | ||
| for var in self._vars.values(): | ||
| if hasattr(var, "_canonical_data"): | ||
| var._canonical_data = None |
Comment on lines
3577
to
3584
| @@ -3572,6 +3578,7 @@ def add_particles_with_coordinates(self, coordinatesArray) -> int: | |||
|
|
|||
| # Invalidate cached data — particle count changed after addNPoints + migrate | |||
| self._particle_coordinates._canonical_data = None | |||
| self._kdtree = None # issue #215, Bug 1: stale kd-tree after add_particles_with_coordinates | |||
| for var in self._vars.values(): | |||
| if hasattr(var, "_canonical_data"): | |||
| var._canonical_data = None | |||
Comment on lines
+242
to
+244
| with swarm.access(material): | ||
| if n_added > 0: | ||
| material.data[old_size:old_size + n_added, 0] = material_init[re_add_mask, 0] |
Comment on lines
+614
to
+615
| _log = [] | ||
| dbg = uw.mpi.rank == 0 and (lambda s: (_log.append(s), print(s, flush=True))[-1]) or (lambda s: None) |
Replace three manual canonical-data invalidation blocks with calls to _invalidate_canonical_data(), which now also marks swarm variable proxies as stale via _update(). This ensures the proxy mesh variable is re-interpolated after migration, particle addition, and snapshot restore. Affected callers: - Swarm.migrate() — already used _invalidate_canonical_data - add_particles_with_coordinates — replaced manual block - apply_snapshot_payload — replaced manual block - disk_snapshot.py — replaced manual block Underworld development team with AI support from Claude Code
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes three stale-cache bugs that occur when a swarm is modified via
add_particles_with_coordinates()and then used for interpolation or projection. These affect the common pattern of re-adding particles to empty cells each timestep.Fixes #215
Bug 1 — Stale kd-tree in
add_particles_with_coordinatesadd_particles_with_coordinates()callsself.dm.migrate()directly, bypassingSwarm.migrate()which calls_invalidate_canonical_data(). The manual invalidation inadd_particles_with_coordinatesset_canonical_data = Nonefor coordinates and variables but missedself._kdtree. After particle addition,swarm._get_kdtree()returned a kd-tree built from old coordinates.Fix: Added
self._kdtree = Nonein the invalidation block inadd_particles_with_coordinates.Bug 2 — Stale cached projector in
_project_to_work_variable_project_to_work_variable()cachesProjectionsolver instances on the mesh object (_eval_projector_scalarand_eval_{shape}_projector). When reused acrossevaluate()calls after a Stokes solve modifies the DM, the cached projectors PETSc solver state is stale. On PETSc 3.22.2 (Setonix HPC) this causes an MPI deadlock; on PETSc 3.24.2 (macOS) it silently returns stale results.Fix: Added
_force_setup=Trueto both the scalar (line 640) and tensor (line 613)projector.solve()calls.Bug 3 — Stale proxy mesh variable after swarm write (documented, no fix)
The lazy proxy update pattern marks the proxy as stale on data write (
_proxy_stale = True), but the actual re-interpolation (_rbf_to_meshVar) only fires whenmaterial.symis accessed or_update_proxy_if_stale()is called. Code that reads the proxy MeshVariable DM directly (e.g. aProjectionsolver evaluating itsuw_functionat quadrature points) reads stale data.Workaround: Call
material._update_proxy_if_stale()(or accessmaterial.sym) after modifying a swarm variable and before using its proxy mesh variable.Test
Added
tests/test_0112_swarm_add_particles.pywithtest_proxy_updates_after_add_particlesthat verifies the proxy mesh variable correctly reflects swarm changes after adding particles.Platform-dependent behavior
See issue #215 comment for details. PETSc 3.22.2 deadlocks on these bugs; 3.24.2 silently returns wrong values.
Underworld development team with AI support from Claude Code