Skip to content

Bug Fix - Fix NVBandwidth benchmark results parsing bug (#748)#782

Open
NJX-njx wants to merge 2 commits into
microsoft:mainfrom
NJX-njx:fix/nvbandwidth-parsing-bug-748
Open

Bug Fix - Fix NVBandwidth benchmark results parsing bug (#748)#782
NJX-njx wants to merge 2 commits into
microsoft:mainfrom
NJX-njx:fix/nvbandwidth-parsing-bug-748

Conversation

@NJX-njx
Copy link
Copy Markdown

@NJX-njx NJX-njx commented Mar 3, 2026

Summary

Fixes #748 - NVBandwidth benchmark results parsing bug causing invalid metric names like _sum_None\ and \device_to_device_latency_sm_sum_None.

Root Cause

When parsing SUM lines, \parse_status['test_name']\ or \parse_status['benchmark_type']\ could be empty/None (e.g., after waived tests or when header parsing fails), leading to invalid metric keys.

Changes

  • Add validation: only add SUM result when both \ est_name\ and \�enchmark_type\ are valid
  • Use test name from SUM line (\match.group(1)) as fallback when \parse_status\ is stale
  • Infer \�enchmark_type\ from test name when not set: 'latency' in name -> 'lat', else 'bw'

Testing

  • Logic verified against \ ests/data/nvbandwidth_results.log\ format
  • Unit tests require CUDA platform (CI will run on Linux)

…ency_sm_sum_None

- Add validation before adding SUM results: only add when test_name and benchmark_type are valid
- Use test name from SUM line as fallback when parse_status is stale (e.g., after waived tests)
- Infer benchmark_type from test name when not set (latency tests -> lat, else bw)

Fixes microsoft#748

Made-with: Cursor
Copilot AI review requested due to automatic review settings March 3, 2026 14:11
@NJX-njx NJX-njx requested a review from a team as a code owner March 3, 2026 14:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes NVBandwidth SUM-line parsing to prevent invalid metric keys (e.g., _sum_None) by making SUM parsing more resilient when parsing state is missing or stale.

Changes:

  • Use the SUM line’s captured test name as a fallback when parse_status['test_name'] is empty.
  • Infer benchmark_type from the test name when it wasn’t detected from the matrix header.
  • Guard SUM metric emission to only occur when both test_name and benchmark_type are valid.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 192 to +194
if self.re_summary_pattern.match(line):
value = self.re_summary_pattern.match(line).group(2)
test_name = parse_status['test_name']
match = self.re_summary_pattern.match(line)
value = match.group(2)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re_summary_pattern.match(line) is evaluated twice (once in the if and again to assign match). Consider assigning once (e.g., match = ...; if match:) to avoid duplicated work and keep the control flow simpler.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NJX-njx Could you please resolve the comments from Copilot? Thanks.

Comment on lines +195 to +203
# Use test_name from parse_status, fallback to group(1) from SUM line
test_name = parse_status['test_name'] or match.group(1).lower()
benchmark_type = parse_status['benchmark_type']
parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value)
# Infer benchmark_type from test_name if not set (e.g., after waived tests)
if benchmark_type is None:
benchmark_type = 'lat' if 'latency' in test_name else 'bw'
# Only add result when we have valid metric name (avoid _sum_None or sum_None)
if test_name and benchmark_type:
parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new fallback/inference paths for SUM parsing (using match.group(1) when parse_status['test_name'] is empty, and inferring benchmark_type when it's None) aren’t covered by existing nvbandwidth tests (current tests only exercise the full header+matrix path). Adding a unit test with a minimal raw output containing a SUM ... line but no prior matrix header (and/or no preceding Running ...) would help prevent regressions of the _sum_None metric-name bug.

Copilot uses AI. Check for mistakes.
@guoshzhao guoshzhao changed the title Fix NVBandwidth benchmark results parsing bug (#748) Bug Fix - Fix NVBandwidth benchmark results parsing bug (#748) Mar 11, 2026
@guoshzhao guoshzhao requested review from guoshzhao and polarG March 11, 2026 16:16
@guoshzhao guoshzhao self-assigned this Mar 11, 2026
@guoshzhao guoshzhao added bug Something isn't working benchmarks SuperBench Benchmarks micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks labels Mar 11, 2026
@@ -190,10 +190,17 @@ def _process_raw_line(self, line, parse_status):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re_summary_pattern.match(line) is still evaluated twice — once in this if, then again on the line below to assign match. Copilot raised this in #discussion_r2878517701 and @guoshzhao explicitly asked for it to be addressed in #discussion_r3070415674; please collapse the two calls:

match = self.re_summary_pattern.match(line)
if match:
    value = match.group(2)
    ...

test_name = parse_status['test_name'] or match.group(1).lower()
benchmark_type = parse_status['benchmark_type']
parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value)
# Infer benchmark_type from test_name if not set (e.g., after waived tests)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fallback fixes the SUM symptom but masks the true root cause of device_to_device_latency_sm_sum_None from issue #748.

Upstream nvbandwidth prints the matrix header for this test as:

Device to Device Latency SM GPU(row) <-> GPU(column) (ns)

(see third_party/nvbandwidth/testcases_sm.cpp:141). That line does NOT start with memcpy or memory latency, so re_matrix_header_line at line 19 never fires for this test. As a consequence:

  1. parse_status['benchmark_type'] stays None.
  2. metrix_row / metrix_col are never extracted.
  3. The matrix-row branch at line 167 is guarded by parse_status['benchmark_type'] and therefore silently skips all per-cell results for device_to_device_latency_sm.

With this PR, the SUM is rescued via the 'latency' in test_name heuristic, but the per-cell device_to_device_latency_sm_gpu0_gpu1_lat (etc.) results are still lost — users will only see the SUM, not the matrix.

A more complete fix is to broaden the header detector itself, e.g.:

re_matrix_header_line = re.compile(r'^(memcpy|memory latency|Device to Device Latency)', re.IGNORECASE)

The existing branch 'bw' if 'bandwidth' in line else 'lat' already classifies correctly for both new and old header lines, so the fallback + inference here can stay as a defense-in-depth net while the regex change recovers the lost matrix data.

parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value)
# Infer benchmark_type from test_name if not set (e.g., after waived tests)
if benchmark_type is None:
benchmark_type = 'lat' if 'latency' in test_name else 'bw'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this guard drops a SUM line (either test_name is empty or benchmark_type is still falsy after inference), nothing is logged. That trades a malformed metric name for silent data loss, which makes the next regression (a new upstream output format change) hard to detect from a passing run.

Please emit a warning here so the drop is observable, e.g.:

if test_name and benchmark_type:
    parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value)
else:
    logger.warning(
        f'nvbandwidth: skipping SUM line, incomplete parse state '
        f'(test_name={test_name!r}, benchmark_type={benchmark_type!r}, line={line!r})'
    )

The warning also gives users a clear signal in the runner logs when a future nvbandwidth release adds a test whose header line doesn't match the existing detection patterns.

value = self.re_summary_pattern.match(line).group(2)
test_name = parse_status['test_name']
match = self.re_summary_pattern.match(line)
value = match.group(2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the three new behaviors here (or match.group(1).lower() fallback, 'latency' in test_name inference, if test_name and benchmark_type guard) is exercised by the existing test suite — tests/data/nvbandwidth_results.log and test_nvbandwidth.py::test_nvbandwidth_result_parsing_real_output only cover the happy memcpy ... / memory latency ... header path. Without a regression test, a future refactor can silently undo the fix from issue #748.

Copilot already flagged this in #discussion_r2878517758; please add a unit test using a synthetic raw output that triggers each new path. Minimal example exercising the device_to_device_latency_sm regression:

def test_nvbandwidth_sum_fallback_for_unknown_header(self):
    benchmark_name = 'nvbandwidth'
    (cls, _) = BenchmarkRegistry._BenchmarkRegistry__select_benchmark(benchmark_name, Platform.CUDA)
    benchmark = cls(benchmark_name, parameters='')
    assert benchmark._preprocess()

    raw_output = (
        'Running device_to_device_latency_sm.\n'
        'Device to Device Latency SM GPU(row) <-> GPU(column) (ns)\n'
        '           0         1\n'
        ' 0    772.58    810.10\n'
        ' 1    810.10    772.58\n'
        '\n'
        'SUM device_to_device_latency_sm 3165.36\n'
    )
    assert benchmark._process_raw_result(0, raw_output)
    assert benchmark.result['device_to_device_latency_sm_sum_lat'][0] == 3165.36
    # \u2191 with the suggested regex broadening (see [comment on L198](https://github.com/microsoft/superbenchmark/pull/782#discussion_r3326258884)), per-cell metrics should also be present.

A second case covering the test_name == '' and benchmark_type is None drop path (asserting the SUM is skipped and a warning is logged) would also be valuable, especially if the logger.warning suggestion on L200 is adopted.

@polarG
Copy link
Copy Markdown
Contributor

polarG commented Jun 1, 2026

Hi @NJX-njx, could address the current comments? We are planning to merge this PR. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmarks SuperBench Benchmarks bug Something isn't working micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NVBandwidth benchmark results parsing bug

4 participants