Validate NX_class against known NeXus class names#68
Conversation
Catches typos like NXPositioner vs NXpositioner that previously slipped through unchecked (NX_class_attr_missing only verified presence, and downstream validators keying off specific names silently did not apply). The allow-list contains all NeXus base classes (pulled from nexusformat/definitions) plus NXcanSAS. Application/contributed definitions are intentionally not included: a raw ESS file declaring, say, NXapm_paraprobe_clusterer_results is almost certainly a mistake and should fail validation. Extend nexus_application_classes when a legitimate new application class is needed. The base class list lives in its own module with the upstream URL and a gh api refresh command in the docstring.
jl-wynen
left a comment
There was a problem hiding this comment.
The code looks good. Just comments about the data file.
| Refresh with: | ||
|
|
||
| gh api repos/nexusformat/definitions/contents/base_classes \ | ||
| --jq '.[].name' | grep '\\.nxdl\\.xml$' | sed 's/\\.nxdl\\.xml$//' | sort |
There was a problem hiding this comment.
This outputs nothing for me. Is your grep an alias for something else? I need
grep '.nxdl.xml$' | sed 's/.nxdl.xml$//'
There was a problem hiding this comment.
Probably some quoting issue, replacing with a jq only command.
There was a problem hiding this comment.
Ah, right. Single backslashes work for me, too. So is this an issue with Python strings? You could simply use a raw string.
| Refresh with: | ||
|
|
||
| gh api repos/nexusformat/definitions/contents/base_classes \ | ||
| --jq '.[].name' | grep '\\.nxdl\\.xml$' | sed 's/\\.nxdl\\.xml$//' | sort |
There was a problem hiding this comment.
Since this command does not generate the complete Python module file here, how about you pipe the result into a text file and load that from Python? That would simplify updating the list.
There was a problem hiding this comment.
We'd need to ship the text file in the wheel etc., is it worth it? Alternative: add a script that writes the file.
There was a problem hiding this comment.
We ship text files in other packages, too. It's pretty simple. But a separate script in a tools folder would work, too.
The previous refresh command relied on `\.` in a single-quoted regex, which several shell/grep combinations interpret differently or which copy-paste from the rendered docstring mangles. Doing everything in jq removes the shell-escaping hazard.
The 142-entry list is now regenerated by tools/refresh_nexus_base_classes.py which fetches the directory listing via `gh api` and rewrites the Python module in place. Avoids hand-editing the literal on every upstream update. Per reviewer suggestion; chose a script over a separate .txt data file to avoid runtime resource-loading machinery for a static list.
| ) | ||
|
|
||
| HEADER = '''# SPDX-License-Identifier: BSD-3-Clause | ||
| # Copyright (c) 2023 Scipp contributors (https://github.com/scipp) |
There was a problem hiding this comment.
Are we sticking with 2023 now? I've noticed this in a couple of your files but wasn't sure if it's on purpose.
There was a problem hiding this comment.
I think is a while since I looked at headers, just copy and paste from other file I suppose :D
Co-authored-by: Simon Heybrock <12912489+SimonHeybrock@users.noreply.github.com>
Summary
Adds a
NX_class_invalidvalidator that errors when a group'sNX_classattribute is not a known NeXus class name. Catches typos such asNXPositionervsNXpositioner, which previously slipped through entirely:NX_class_attr_missingonly checked presence, and downstream validators keying off specific names (e.g.chopper_frequency_units_invalid) silently did not apply.The allow-list is the full set of NeXus base classes pulled from
nexusformat/definitionsplusNXcanSAS. Application and contributed definitions are intentionally not included: a raw ESS file declaring e.g.NXapm_paraprobe_clusterer_resultsis almost certainly a mistake and should fail validation. Extendnexus_application_classeswhen a legitimate new application class is needed.The base class list lives in its own module (
src/chexus/nexus_base_classes.py) with the upstream URL and agh apirefresh command in the docstring.Deprecated base classes (
NXgeometry,NXorientation,NXshape,NXtranslation) remain in the allow-list so they continue to be reported only by the dedicatedNX_class_is_legacyvalidator, not flagged twice.Test plan
NX_class-attr applies-to case