Skip to content

autotailor: Add validation for variable names, rule IDs, and profile IDs #2342

Open
ggbecker wants to merge 2 commits intoOpenSCAP:mainfrom
ggbecker:autotailor-validate-datastream-fields
Open

autotailor: Add validation for variable names, rule IDs, and profile IDs #2342
ggbecker wants to merge 2 commits intoOpenSCAP:mainfrom
ggbecker:autotailor-validate-datastream-fields

Conversation

@ggbecker
Copy link
Copy Markdown
Member

Add DataStreamValidator class that validates all IDs against the SCAP
datastream before generating tailoring XML. This prevents silent failures
from invalid variable names, rule IDs, group IDs, or profile IDs.

Key features:
- Parses datastream to extract valid profiles, values, rules, and groups
- Validates IDs before use in Profile and Tailoring classes
- Provides fuzzy matching suggestions for typos using difflib
- Generates clear error messages with suggestions
- Add --no-validate flag for performance-critical use cases

Performance:
- ~227ms overhead on 20MB datastream (validation enabled by default)
- ~33ms with --no-validate flag (7x faster)
- Validation prevents compliance drift and silent failures

Fixes issue where autotailor accepted arbitrary variable names without
validation, creating invalid XML that fails at evaluation time.
Add comprehensive unit tests for the new validation feature:

- test_datastream_validator: Tests validator with valid and invalid IDs
  for profiles, values, rules, and groups

- test_profile_with_validator: Tests Profile class integration with
  validator, ensuring invalid IDs are rejected

- test_validator_suggestions: Tests fuzzy matching suggestions for
  typos in ID names

All tests pass and verify that:
- Valid IDs are accepted
- Invalid IDs are rejected with clear error messages
- Similar valid IDs are suggested for typos
- Validation integrates properly with Profile class
@sonarqubecloud
Copy link
Copy Markdown

@jan-cerny jan-cerny self-assigned this Apr 22, 2026
Comment thread utils/autotailor
root = tree.getroot()

# Register namespaces
namespaces = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a global variable. It could be used all over the script.

Comment thread utils/autotailor
self.group_ids = set()
self._parse_datastream()

def _parse_datastream(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split this method to more smaller methods.

Comment thread utils/autotailor
Comment on lines +613 to +614
help="Skip validation of IDs against the datastream. This significantly speeds up "
"execution on large datastreams but may produce invalid tailoring files if incorrect "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="Skip validation of IDs against the datastream. This significantly speeds up "
"execution on large datastreams but may produce invalid tailoring files if incorrect "
help="Skip validation of IDs against the data stream. This significantly speeds up "
"execution on large data streams but may produce invalid tailoring files if incorrect "

Comment thread utils/autotailor
return msg

def validate_profile(self, profile_id):
"""Validate a profile ID exists in the datastream."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always a "data stream", never a "datastream" or even something worse like "DataStream".

Comment thread utils/autotailor
if args.json_tailoring:
t.import_json_tailoring(args.json_tailoring)
try:
t.import_json_tailoring(args.json_tailoring)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the JSON tailoring also validated?

Comment thread utils/autotailor
help="Use local path for the benchmark href instead of absolute file:// URI. "
"Absolute paths are converted to basename, relative paths are preserved.")
parser.add_argument(
"--no-validate", action="store_true",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this option to the man page.

Comment thread utils/autotailor
if profile_id not in self.profile_ids:
raise ValueError(self._create_validation_error("Profile", profile_id, self.profile_ids))

def validate_value(self, value_id):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be amazing if the tool would also verify validity of selectors used in the -V CLI option.

For example:

utils/autotailor -V xccdf_org.ssgproject.content_value_var_selinux_state=OOO /usr/share/xml/scap/ssg/content/ssg-fedora-ds.xml cis

This command should error because the XCCDF Value doesn't contain selector OOO. But currently it isn't checked and the command doesn't error and creates invalid tailoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants