ohcl_boot: exclude CPUs with restored NVMe interrupts from sidecar#3706
Merged
Conversation
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
Contributor
There was a problem hiding this comment.
Pull request overview
Adjusts OpenHCL boot’s servicing-restore sidecar CPU override logic so that vCPUs involved in restored NVMe state (either outstanding I/O or merely a mapped NVMe interrupt) are kernel-started, avoiding missed NVMe completion interrupts that can trigger keepalive restore to recreate completion queues.
Changes:
- Combine
cpus_with_outstanding_ioandcpus_with_mapped_interrupts_no_iofrom persisted state into a single sidecar-exclusion CPU set (sorted + deduped). - Drive the sidecar per-CPU override / sidecar-disable decision off that combined set, and update logging/comments to match the intent.
chris-oo
reviewed
Jun 10, 2026
jstarks
reviewed
Jun 17, 2026
emirceski
approved these changes
Jun 18, 2026
chris-oo
approved these changes
Jun 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hopefully fixes intermittent failures of openvmm_openhcl_linux_x64_servicing_keepalive_with_nvme_fault.
The test arms an NVMe fault that panics on any CREATE_IO_COMPLETION_QUEUE after servicing with keepalive — asserting the restore path never re-creates I/O completion queues.
In a failing run, the persisted boot state was:
read_from_dtonly usedcpus_with_outstanding_ioto drive the sidecar override, so both CPUs stayed sidecar-started after restore. NVMe interrupts targeted at those CPUs were not delivered, and the keepalive restore eventually issued a CREATE_IO_COMPLETION_QUEUE, tripping the fault.Fix: in
read_from_dt, combinecpus_with_outstanding_ioandcpus_with_mapped_interrupts_no_io(sorted, deduped) into a single "needs kernel start" set and use it for the sidecar exclusion / disable decision.