Skip to content

Validate NX_class against known NeXus class names#68

Open
SimonHeybrock wants to merge 5 commits into
mainfrom
validate-nx-class-name
Open

Validate NX_class against known NeXus class names#68
SimonHeybrock wants to merge 5 commits into
mainfrom
validate-nx-class-name

Conversation

@SimonHeybrock
Copy link
Copy Markdown
Member

Summary

Adds a NX_class_invalid validator that errors when a group's NX_class attribute is not a known NeXus class name. Catches typos such as NXPositioner vs NXpositioner, which previously slipped through entirely: NX_class_attr_missing only checked presence, and downstream validators keying off specific names (e.g. chopper_frequency_units_invalid) silently did not apply.

The allow-list is the full set of NeXus base classes pulled from nexusformat/definitions plus NXcanSAS. Application and contributed definitions are intentionally not included: a raw ESS file declaring e.g. NXapm_paraprobe_clusterer_results is almost certainly a mistake and should fail validation. Extend nexus_application_classes when a legitimate new application class is needed.

The base class list lives in its own module (src/chexus/nexus_base_classes.py) with the upstream URL and a gh api refresh command in the docstring.

Deprecated base classes (NXgeometry, NXorientation, NXshape, NXtranslation) remain in the allow-list so they continue to be reported only by the dedicated NX_class_is_legacy validator, not flagged twice.

Test plan

  • Existing test suite passes (72 tests)
  • New tests cover the motivating typo, other unknown names, the known-good NXcanSAS, and the no-NX_class-attr applies-to case

Catches typos like NXPositioner vs NXpositioner that previously slipped
through unchecked (NX_class_attr_missing only verified presence, and
downstream validators keying off specific names silently did not apply).

The allow-list contains all NeXus base classes (pulled from
nexusformat/definitions) plus NXcanSAS. Application/contributed
definitions are intentionally not included: a raw ESS file declaring,
say, NXapm_paraprobe_clusterer_results is almost certainly a mistake
and should fail validation. Extend nexus_application_classes when a
legitimate new application class is needed.

The base class list lives in its own module with the upstream URL and a
gh api refresh command in the docstring.
@SimonHeybrock SimonHeybrock marked this pull request as ready for review May 20, 2026 05:23
Copy link
Copy Markdown
Member

@jl-wynen jl-wynen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good. Just comments about the data file.

Comment thread src/chexus/nexus_base_classes.py Outdated
Refresh with:

gh api repos/nexusformat/definitions/contents/base_classes \
--jq '.[].name' | grep '\\.nxdl\\.xml$' | sed 's/\\.nxdl\\.xml$//' | sort
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This outputs nothing for me. Is your grep an alias for something else? I need

grep '.nxdl.xml$' | sed 's/.nxdl.xml$//'

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably some quoting issue, replacing with a jq only command.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Single backslashes work for me, too. So is this an issue with Python strings? You could simply use a raw string.

Comment thread src/chexus/nexus_base_classes.py Outdated
Refresh with:

gh api repos/nexusformat/definitions/contents/base_classes \
--jq '.[].name' | grep '\\.nxdl\\.xml$' | sed 's/\\.nxdl\\.xml$//' | sort
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this command does not generate the complete Python module file here, how about you pipe the result into a text file and load that from Python? That would simplify updating the list.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to ship the text file in the wheel etc., is it worth it? Alternative: add a script that writes the file.

Copy link
Copy Markdown
Member

@jl-wynen jl-wynen May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ship text files in other packages, too. It's pretty simple. But a separate script in a tools folder would work, too.

SimonHeybrock and others added 3 commits May 20, 2026 08:27
The previous refresh command relied on `\.` in a single-quoted regex,
which several shell/grep combinations interpret differently or which
copy-paste from the rendered docstring mangles. Doing everything in jq
removes the shell-escaping hazard.
The 142-entry list is now regenerated by tools/refresh_nexus_base_classes.py
which fetches the directory listing via `gh api` and rewrites the Python
module in place. Avoids hand-editing the literal on every upstream update.

Per reviewer suggestion; chose a script over a separate .txt data file to
avoid runtime resource-loading machinery for a static list.
Comment thread tools/refresh_nexus_base_classes.py Outdated
)

HEADER = '''# SPDX-License-Identifier: BSD-3-Clause
# Copyright (c) 2023 Scipp contributors (https://github.com/scipp)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sticking with 2023 now? I've noticed this in a couple of your files but wasn't sure if it's on purpose.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is a while since I looked at headers, just copy and paste from other file I suppose :D

Comment thread src/chexus/nexus_base_classes.py Outdated
Comment thread src/chexus/validators.py Outdated
Comment thread tools/refresh_nexus_base_classes.py Outdated
Comment thread tools/refresh_nexus_base_classes.py Outdated
Co-authored-by: Simon Heybrock <12912489+SimonHeybrock@users.noreply.github.com>
@SimonHeybrock SimonHeybrock enabled auto-merge May 20, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants