[CI] Surface upstream pytorch CI job link in parity summary#3264
Open
pablo-garay wants to merge 4 commits into
Open
[CI] Surface upstream pytorch CI job link in parity summary#3264pablo-garay wants to merge 4 commits into
pablo-garay wants to merge 4 commits into
Conversation
509bb1f to
f10baac
Compare
The parity summary's FAILED TESTS and LOG-BASED FAILURES tables list the
failing test tuples but stop short of pointing the reviewer at the
upstream pytorch/pytorch CI job that actually ran the test - making it
several extra clicks to land on the stacktrace.
download_testlogs already knows the job id of every artifact and log file
it pulls. Persist it through the pipeline and surface it as a clickable
"Job ID" column at the end of both tables:
- download_testlogs: keep the trailing "_<jobid>" segment of the original
artifact name when shortening unzipped XML dirs, and write a single
"_wf_run_id" file at the parent rocm_xml/cuda_xml level. For per-log
artifacts, write a companion "<filename>.job_url" file with the
canonical html_url from the GitHub API job object.
- summarize_xml_testreports.py: read _wf_run_id once, parse "_<jobid>"
off each test-<cfg>-N-N dir, stamp a job_url on every test case, and
emit job_url_{set1_name}/job_url_{set2_name} columns in the per-arch
CSV.
- detect_log_failures.py: read the per-log .job_url file and stamp
job_url on every emitted failure/flaky row; add job_url to both CSV
writers.
- generate_summary.py: propagate job_url_* through collect_failed_tests
and through the flaky-as-log-failure loader, and add a "Job ID" column
at the end of both markdown tables rendered as [<jobid>](<url>).
Every read uses .get(..., '') / os.path.isfile, so existing artifacts and
CSVs without the new fields render as empty cells and the pipeline keeps
working unchanged.
Signed-off-by: Garay-Fernandez <pgarayfe@amd.com>
f10baac to
11af07f
Compare
|
Jenkins build for 842bb2995390e4be256dbfd15dfa0e65b7da4b1f commit finished as NOT_BUILT |
Drop comment bloat and a try/except layer that didn't match the surrounding style: - _shorten_unzipped_dirs: keep the original `if m:` structure and add the job id suffix inside it, instead of restructuring with `if not m: continue` + extra locals. - write_test_log_to_file / download_xml_files / scan_logs / parse_xml_reports_as_dict: drop the try/except around the small per-file reads and writes; other file IO in these functions doesn't guard either. - parse_xml_reports_as_dict: always set case["job_url"] (empty string when absent), mirroring how case["shard"] is set unconditionally one line above. - generate_summary.py: rename _job_url_cell -> _job_id_link to reflect what the markdown cell actually shows; drop its docstring. No behavior change. Signed-off-by: Garay-Fernandez <pgarayfe@amd.com>
|
Jenkins build for cb9a83d5226e30b77a3cae86ad61db5ad4783c24 commit finished as NOT_BUILT |
Explain why these reads/writes exist: how download_testlogs hands the upstream pytorch CI job id to summarize_xml_testreports.py (via the "_<jobid>" suffix on each shard dir + a "_wf_run_id" file at the parent), and the job page URL to detect_log_failures.py (via a "<log>.job_url" file next to each log). Also rename the local "jid" match to "job_id_match" so the code reads on first pass without context. No behavior change. Signed-off-by: Garay-Fernandez <pgarayfe@amd.com>
|
Jenkins build for d0120b1e943d8ff4f834c80084b1bc2a35c26bff commit finished as NOT_BUILT |
Split the trailing ternary that was deciding both the label and the whole f-string into a "get label, then build link" flow. Drop the [job](url) fallback: both URL writers (summarize_xml_testreports.py and write_test_log_to_file) produce URLs containing "/job/<digits>", so the fallback never fired in practice and a cell labeled "job" wouldn't tell a reviewer anything. If the URL is malformed, render an empty cell - same as when no URL is available. No behavior change for the URLs we actually emit. Signed-off-by: Garay-Fernandez <pgarayfe@amd.com>
|
Jenkins build for d0120b1e943d8ff4f834c80084b1bc2a35c26bff commit finished as FAILURE |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FAILED TESTSandLOG-BASED FAILURES (not in XML)tables in the parity summary markdown. Each cell renders as[<job_id>](https://github.com/pytorch/pytorch/actions/runs/<wf>/job/<job_id>), dropping the reviewer one click away from the stacktrace.pytorch/pytorchCI job url through the existing pipeline —download_testlogswas already fetching that info, it just wasn't being preserved. No new API calls; no schema migrations; just persistence throughdownload_testlogs→summarize_xml_testreports.py/detect_log_failures.py→generate_summary.py..get(..., '')/os.path.isfile, so older artifacts and CSVs render the column as empty cells instead of breaking.Example resulting row (FAILED TESTS, set2-disabled case)
Data flow
_shorten_unzipped_dirskeeps the trailing_<jobid>of the artifact name on eachtest-<cfg>-N-N/dir →download_xml_fileswrites one_wf_run_idfile at the parent →parse_xml_reports_as_dictbuilds the url and stamps it on each test case → per-arch CSV carriesjob_url_{set_name}→collect_failed_testspropagates → markdown renders.write_test_log_to_filewrites a companion<filename>.job_urlfile (full url from the job'shtml_url) →scan_logsreads it and stampsjob_urlon every failure / flaky row →log_failures_<arch>.csv/flaky_tests_<arch>.csvcarry it →load_log_failures/load_flaky_tests_as_log_failurespropagate → markdown renders.Test plan
parity.ymlrun and confirm:test-<cfg>-N-N_<jobid>after_shorten_unzipped_dirs._wf_run_idfile exists alongside the shard dirs inrocm_xml/andcuda_xml/.<filename>.job_urlcompanion files exist next to eachrocm*.txt/cuda*.txtlog file.summarize_xml_testreports.pyand confirmjob_url_<set1_name>/job_url_<set2_name>columns are populated for failing rows.log_failures_<arch>.csv/flaky_tests_<arch>.csvand confirmjob_urlcolumn is populated.Job IDcell in both tables → lands on the failing pytorch/pytorch job page with the stacktrace.