Skip to content

refactor(data): replace fragile regex in __inc__ and _get_cat_dim with explicit key sets #58

Description

@laserkelvin

Context

During the attribute rename work in #56, we added neighbor_list to the regex pattern in DataMixin.__inc__() (nvalchemi/data/data.py:155):

return self.num_nodes if bool(re.search("(index|face|neighbor_list)", key)) else 0

This method determines which fields receive cumulative node-index offsets when batching. The regex approach is implicit and fragile — it matches any field whose name contains the substring, so a field like neighbor_list_shifts would incorrectly match (flagged by @zubatyuk in PR #56 review).

A similar pattern exists in the Zarr backend's _get_cat_dim() (nvalchemi/data/datapipes/backends/zarr.py:231):

if bool(re.search("(index|face)", key)):
    return -1

In contrast, _get_field_level() in the same file already does this correctly — it checks membership in AtomicData._default_node_keys, _default_edge_keys, and _default_system_keys.

Proposed fix

Replace both regex sites with explicit set membership, as @zubatyuk suggested:

  1. Add an _index_keys ClassVar on AtomicData (or DataMixin) containing the canonical set of fields that require node-index offsets (currently {"neighbor_list"}).
  2. Update __inc__ to use key in self._index_keys instead of regex.
  3. Update _get_cat_dim to use AtomicData._index_keys (or an equivalent edge-key set) instead of regex.

This makes the behavior explicit, self-documenting, and safe against substring collisions.

Affected files

  • nvalchemi/data/data.pyDataMixin.__inc__()
  • nvalchemi/data/datapipes/backends/zarr.py_get_cat_dim()

Origin

Identified during review of PR #56 (attribute rename). Deferred as out of scope for that PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions