iOS device startup: upload .logarchive and pull crash reports on cleanup failure#5232
Open
matouskozak wants to merge 3 commits into
Open
iOS device startup: upload .logarchive and pull crash reports on cleanup failure#5232matouskozak wants to merge 3 commits into
matouskozak wants to merge 3 commits into
Conversation
…nup failure When post-iteration app cleanup fails (devicectl 'No such process'), the app already exited before we could terminate it. To diagnose why — crash, iOS- initiated termination, or natural exit: 1. Zip and upload the iOS device's .logarchive to the Helix results container 2. Pull device-side crash reports (.ips) via xcrun devicectl Investigative-only: triggers only on the cleanup failure that already terminates the iteration. Cherry-picked from matouskozak/ios-sdk-jobs-macos-26 (71bfd4a1, 3101d30c). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds diagnostic collection for iOS device startup cleanup failures, helping investigate cases where the app process is already gone when the runner tries to kill it.
Changes:
- Wraps the iOS app kill step in
CalledProcessErrorhandling. - Uploads the collected
.logarchiveand attempts to copy device crash logs before re-raising. - Adds
make_archiveimport for zipping the log archive.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Open
3 tasks
- Wrap os.makedirs into the same try/except as the crash copy so a diagnostic-path failure cannot mask the original CalledProcessError from the failed app kill. - Prune the systemCrashLogs copy after devicectl to entries relevant to this iteration: keep files matching the bundle name or with mtime >= the iteration's log-collect start. Drops dozens of unrelated .ips files (WiFiLQMMetrics, wifip2pd, old crashes, etc.) per upload. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
Comment on lines
+771
to
+792
| # The Mac.iPhone.17.Perf devices are shared across many test runs and iOS retains | ||
| # crash reports in /var/mobile/Library/Logs/CrashReporter/ until rotated out, so | ||
| # systemCrashLogs accumulates days of stale .ips files from previous runs of the | ||
| # same app bundles. Wipe it now so the crash collection on a kill failure below | ||
| # only contains crashes generated during this test run. devicectl has no per-file | ||
| # delete; copying an empty directory in with --remove-existing-content replaces | ||
| # the destination contents. | ||
| getLogger().info("Clearing systemCrashLogs on device.") | ||
| try: | ||
| with tempfile.TemporaryDirectory() as emptyCrashDir: | ||
| crashClearCmd = [ | ||
| 'xcrun', 'devicectl', 'device', 'copy', 'to', | ||
| '--device', deviceUDID, | ||
| '--domain-type', 'systemCrashLogs', | ||
| '--source', emptyCrashDir, | ||
| '--destination', '/', | ||
| '--remove-existing-content', 'true', | ||
| ] | ||
| RunCommand(crashClearCmd, verbose=True).run() | ||
| except Exception as ex: | ||
| getLogger().warning(f"Failed to clear systemCrashLogs (continuing): {ex}") | ||
|
|
Comment on lines
+872
to
+876
| # The kill is cleanup-only; the measurement data is already in the .logarchive above. | ||
| # devicectl returns non-zero when the app process is already gone (e.g. iOS terminated | ||
| # it, the app crashed, or it self-exited). Upload the .logarchive AND any device-side | ||
| # crash reports for this bundle to the Helix results container so we can diagnose | ||
| # why the app was already gone before re-raising. |
Replace devicectl-based copy-everything-then-filter approach with XHarness's CrashSnapshotReporter pattern, ported to Python: * Before the iteration loop, run 'mlaunch --list-crash-reports' to capture the device's existing crash report set. * In the kill-failure handler, re-list and download only the diff via 'mlaunch --download-crash-report --download-crash-report-to'. * Poll the final snapshot for up to 60s so iOS has time to finish writing the crash report after the process dies (matches CrashSnapshotReporter.EndCaptureAsync). The previous bundle/mtime filter still uploaded the historical backlog of unrelated crashes that accumulate on shared Mac.iPhone.17.Perf devices (8-20 stale .ips per work item seen in build 2986965). The snapshot diff scopes uploads to only the crashes generated during this test run. It's a list+copy, not a move — device state is unchanged, matching XHarness behaviour. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0d442be to
bad02e6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the post-iteration iOS app cleanup fails (devicectl returns 'No such process'), the app already exited before we could terminate it. This PR adds diagnostic data collection to help investigate why — crash, iOS-initiated termination, or natural exit.
Changes
src/scenarios/shared/runner.py:make_archiveto theshutilimportkillCmdCommand.run()in atry/except CalledProcessErrorblock that:.logarchivetoHELIX_WORKITEM_UPLOAD_ROOT.ips) viaxcrun devicectl device copy from --domain-type systemCrashLogsWhy
Origin
Cherry-picked from
matouskozak/ios-sdk-jobs-macos-26(commits 71bfd4a1, 3101d30c) which were left unmerged after PR #5231 was merged.Verification
python3 -m py_compile src/scenarios/shared/runner.pypasses