feat(sched): add Partitioned EDF scheduler#686
Conversation
Clone the selected network device while holding the net manager read lock, then release the lock before invoking the debug dump path. This keeps potentially slow diagnostic dumping outside the shared manager lock and matches the existing interface operation pattern.
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
This reverts commit b4176f8.
This reverts commit 8c0cda5.
There was a problem hiding this comment.
Pull request overview
This PR introduces a new Partitioned EDF (Earliest Deadline First) scheduling mode that pins tasks to specific CPU cores and runs EDF independently per core, along with a userland test application and documentation updates.
Changes:
- Add
SchedulerType::PartitionedEDF(deadline, core)and a newpartitioned_edfscheduler implementation with per-core run queues and IPI-based preemption. - Track partitioned runnable-task counts per CPU and update worker wake-up logic to prioritize cores with partitioned work.
- Add a
test_partitioned_edfapplication and wire it intouserland/kernelfeature flags; update scheduler documentation/output HTML.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| userland/src/lib.rs | Adds feature-gated invocation of the new test app. |
| userland/Cargo.toml | Adds optional dependency + feature flag for test_partitioned_edf. |
| kernel/Cargo.toml | Wires kernel feature to userland feature for the test. |
| awkernel_async_lib/src/scheduler.rs | Adds PartitionedEDF variant and integrates it into scheduler selection/priority. |
| awkernel_async_lib/src/scheduler/partitioned_edf.rs | Implements the new partitioned EDF scheduler with per-core queues and preemption. |
| awkernel_async_lib/src/task.rs | Adds Task::partitioned_core, per-core queue counters, and adjusts worker wake-up logic. |
| applications/tests/test_partitioned_edf/src/lib.rs | Implements test scenarios for pinning, preemption, and multi-core independence. |
| applications/tests/test_partitioned_edf/Cargo.toml | Adds the new test application crate. |
| mdbook/src/internal/scheduler.md | Documents the new scheduler and updates scheduler directory listing. |
| docs/print.html | Regenerates/updates rendered documentation with PartitionedEDF content. |
| docs/internal/scheduler.html | Regenerates/updates rendered documentation with PartitionedEDF content. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let mut node = MCSNode::new(); | ||
| let _guard = GLOBAL_WAKE_GET_MUTEX.lock(&mut node); | ||
| if !self.invoke_preemption(task.clone()) { | ||
| let partitioned_core = partitioned_core as usize; | ||
| if partitioned_core >= self.data.len() { | ||
| panic!("PartitionedEDF: core {partitioned_core} exceeds max supported CPU count"); | ||
| } | ||
|
|
There was a problem hiding this comment.
wake_task() calls invoke_preemption() before validating partitioned_core. invoke_preemption() indexes per-CPU state using task.partitioned_core (and expects it), so an invalid core_id (e.g. >= NUM_MAX_CPU) will panic before the later bounds check in wake_task(). Validate partitioned_core (non-zero, < num_cpu(), and < NUM_MAX_CPU) before invoking preemption, or pass a validated core into invoke_preemption() so it can’t index out of range.
| let mut node = MCSNode::new(); | |
| let _guard = GLOBAL_WAKE_GET_MUTEX.lock(&mut node); | |
| if !self.invoke_preemption(task.clone()) { | |
| let partitioned_core = partitioned_core as usize; | |
| if partitioned_core >= self.data.len() { | |
| panic!("PartitionedEDF: core {partitioned_core} exceeds max supported CPU count"); | |
| } | |
| let partitioned_core = partitioned_core as usize; | |
| if partitioned_core == 0 | |
| || partitioned_core >= awkernel_lib::cpu::num_cpu() | |
| || partitioned_core >= awkernel_lib::cpu::NUM_MAX_CPU | |
| || partitioned_core >= self.data.len() | |
| { | |
| panic!("PartitionedEDF: core {partitioned_core} exceeds max supported CPU count"); | |
| } | |
| let mut node = MCSNode::new(); | |
| let _guard = GLOBAL_WAKE_GET_MUTEX.lock(&mut node); | |
| if !self.invoke_preemption(task.clone()) { |
There was a problem hiding this comment.
Check this as follows.
let partitioned_core = partitioned_core as usize;
if partitioned_core >= num_cpu() || partitioned_core == 0 {
panic!("PartitionedEDF: core {partitioned_core} is out of range");
}| let partitioned_core = scheduler_type.partitioned_core(); | ||
| if let Some(i) = partitioned_core { | ||
| if i == 0 || i >= num_cpu() as u16 { | ||
| log::warn!( | ||
| "Partitioned core should be between 1 and {}. It is addressed as {}.", | ||
| num_cpu() - 1, | ||
| i | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
Invalid PartitionedEDF core IDs are only logged as warnings during spawn, but the task is still created with partitioned_core = Some(i). If i == 0 this contradicts the stated design (BSP excluded), and if i >= num_cpu() (but < NUM_MAX_CPU) the task will be enqueued for a non-existent CPU and effectively never run (and may also lead to panics in preemption paths). Consider failing fast (panic / return an error) or falling back to a non-partitioned scheduler when the core ID is out of range.
| let partitioned_core = scheduler_type.partitioned_core(); | |
| if let Some(i) = partitioned_core { | |
| if i == 0 || i >= num_cpu() as u16 { | |
| log::warn!( | |
| "Partitioned core should be between 1 and {}. It is addressed as {}.", | |
| num_cpu() - 1, | |
| i | |
| ); | |
| } | |
| } | |
| let partitioned_core = match scheduler_type.partitioned_core() { | |
| Some(i) if i == 0 || i >= num_cpu() as u16 => { | |
| log::warn!( | |
| "Partitioned core should be between 1 and {}. It is addressed as {}. Falling back to a non-partitioned task.", | |
| num_cpu() - 1, | |
| i | |
| ); | |
| None | |
| } | |
| partitioned_core => partitioned_core, | |
| }; |
There was a problem hiding this comment.
Fallback to core 1 as follows.
let partitioned_core = if let Some(i) = scheduler_type.partitioned_core() {
if i == 0 || i >= num_cpu() as u16 {
log::warn!(
"Partitioned core should be between 1 and {}. Falling back to core 1. Given core: {}",
num_cpu() - 1,
i
);
Some(1)
} else {
Some(i)
}
} else {
None
};| let partitioned_core = partitioned_core as usize; | ||
| if partitioned_core >= self.data.len() { | ||
| panic!("PartitionedEDF: core {partitioned_core} exceeds max supported CPU count"); | ||
| } | ||
|
|
||
| NUM_PARTITIONED_TASKS_IN_QUEUE[partitioned_core].fetch_add(1, Ordering::Relaxed); | ||
|
|
||
| let mut node_inner = MCSNode::new(); | ||
| let mut data = self.data[partitioned_core].lock(&mut node_inner); | ||
| let internal_data = data.get_or_insert_with(EDFData::new); | ||
| internal_data.queue.push(PartitionedEDFTask { | ||
| task: task.clone(), | ||
| absolute_deadline, | ||
| wake_time, | ||
| }); |
There was a problem hiding this comment.
The bounds check here uses self.data.len() (NUM_MAX_CPU), but not the actual online CPU count (num_cpu()). A partitioned_core in [num_cpu(), NUM_MAX_CPU) will pass this check, be enqueued, and then never be dequeued/woken because no such CPU exists. It would be safer to validate against num_cpu() (and core != 0) and reject or remap the task.
There was a problem hiding this comment.
Check it as follows.
let partitioned_core = partitioned_core as usize;
if partitioned_core >= num_cpu() || partitioned_core == 0 {
panic!("PartitionedEDF: core {partitioned_core} is out of range");
}Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
| }; | ||
|
|
||
| let partitioned_core = partitioned_core as usize; | ||
| if partitioned_core >= num_cpu() || partitioned_core == 0 { |
There was a problem hiding this comment.
Tasks::new already validates partitioned_core and falls back to Some(1) with a warning when invalid, but wake_task re-reads the original value from info.scheduler_type and panics here. The fallback path is therefore dead code: any task spawned with core == 0 or core >= num_cpu() will panic the kernel on first wake despite the friendly-warning intent. Read task.partitioned_core (the validated value) here, or rewrite info.scheduler_type at spawn time so both paths agree.
There was a problem hiding this comment.
Use task's partitioned_core as follows.
let partitioned_core =
task.partitioned_core.expect("Task has no partitioned core") as usize;| } | ||
|
|
||
| NUM_TASK_IN_QUEUE.fetch_add(1, Ordering::Release); | ||
| if self.partitioned_core.is_none() { |
There was a problem hiding this comment.
When a partitioned task panics, the wake path falls into the global panicked::SCHEDULER, but this conditional skips the NUM_TASK_IN_QUEUE increment because partitioned_core.is_some(). The partitioned scheduler's counter is also not touched in this path. Neither counter reflects the queued task, so wake_workers will not wake any CPU and the panicked task can sit indefinitely. Always increment NUM_TASK_IN_QUEUE for the panicked path.
There was a problem hiding this comment.
Fixed. The increment condition in Task::wake() is now:
if panicked || self.partitioned_core.is_none() {
NUM_TASK_IN_QUEUE.fetch_add(1, Ordering::Release);
}When panicked is true the counter is always incremented regardless of partitioned_core, so wake_workers will correctly wake a CPU for the panicked task. The panicked task is then routed to panicked::SCHEDULER which reads from NUM_TASK_IN_QUEUE.
| }; | ||
|
|
||
| pub struct PartitionedEDFScheduler { | ||
| data: [Mutex<Option<EDFData>>; NUM_MAX_CPU], // Run queue. |
There was a problem hiding this comment.
Mutex<Option<EDFData>> plus get_or_insert_with(EDFData::new) adds a branch and an extra as_mut() to every wake_task / get_next. BinaryHeap::new() does not allocate until pushed, so the lazy init buys nothing — and the slot is created NUM_MAX_CPU times regardless. Use [Mutex<EDFData>; NUM_MAX_CPU] (or a BinaryHeap directly) and drop the Option.
There was a problem hiding this comment.
Fixed. The field is now [Mutex; NUM_MAX_CPU] initialized via const fn EDFData::new() which returns an empty BinaryHeap. Since BinaryHeap::new() does not allocate, there is no cost difference compared to the Option-based lazy init, and the branch and as_mut() are gone from the hot path.
| let num_cpu = awkernel_lib::cpu::num_cpu(); | ||
|
|
||
| for i in 1..num_cpu { | ||
| for (i, partitioned_tasks) in NUM_PARTITIONED_TASKS_IN_QUEUE[..num_cpus] |
There was a problem hiding this comment.
wake_workers iterates NUM_PARTITIONED_TASKS_IN_QUEUE[..num_cpus].iter().enumerate().skip(1) twice — first to skip CPUs that have partitioned work, then to wake those same CPUs. Each atomic load acts as an optimization barrier, doubling memory traffic on a hot path. Fuse into one loop: load the count once, wake_cpu and continue if > 0, otherwise fall through to the global wake logic.
There was a problem hiding this comment.
Fixed. The second loop has been removed. The single loop now handles both cases in one pass: load NUM_PARTITIONED_TASKS_IN_QUEUE[i] once per CPU; if it is > 0 call wake_cpu(i) and continue; otherwise fall through to the global task logic in the same iteration.
| tasks | ||
| } | ||
|
|
||
| pub fn get_task_running(cpu_id: usize) -> RunningTask { |
There was a problem hiding this comment.
This new public function lacks a doc comment. The convention that task_id == 0 means "idle" and the valid range of cpu_id (< num_cpu()) are non-obvious, and the relationship to the existing get_tasks_running() is unclear. Add a /// comment covering return-value semantics and bounds.
There was a problem hiding this comment.
Fixed. A doc comment has been added:
/// Get the running task for the given CPU.
///
/// `cpu_id` must be less than `awkernel_lib::cpu::num_cpu()`.
/// The returned [`RunningTask::task_id`] is `0` when the CPU is idle.
///
/// To get the running tasks for all CPUs at once, use [`get_tasks_running`].
| ) | ||
| } | ||
|
|
||
| pub const fn partitioned_core(&self) -> Option<u16> { |
There was a problem hiding this comment.
This new public method has no doc comment. The semantics of Some(n) (the assigned core index, expected to be in 1..num_cpu()) and the fact that all non-PartitionedEDF variants return None should be documented for consumers.
There was a problem hiding this comment.
Fixed. A doc comment has been added:
/// Return the partitioned core index if this is a [`SchedulerType::PartitionedEDF`] scheduler.
///
/// Returns `Some(n)` where `n` is the CPU core index (`1..num_cpu()`) assigned to the
/// partitioned EDF scheduler. Returns `None` for all other scheduler variants.
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Address review feedback from atsushi421 on PR tier4#686. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…or partitioned tasks - wake(): change counter increment so that panicked and non-partitioned tasks both use NUM_TASK_IN_QUEUE. Non-panicked partitioned tasks are left to partitioned_edf::wake_task(), which was already the original design. This removes the double-increment introduced in 848a478 where both task.rs and partitioned_edf.rs incremented NUM_PARTITIONED_TASKS_IN_QUEUE for the same non-preempted task. - spawn(): validate and normalise the partitioned core *before* creating TaskInfo, so info.scheduler_type and task.partitioned_core always agree after a fallback to core 1. Previously info.scheduler_type could still hold the original invalid core number while task.partitioned_core was 1. Addresses review feedback from atsushi421 on PR tier4#686. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tionedTask<T> Add PartitionedTask<T> to scheduler.rs — a RAII smart pointer that increments NUM_PARTITIONED_TASKS_IN_QUEUE[cpu_id] on construction and decrements it on Drop. The PartitionedEDF queue now stores BinaryHeap<PartitionedTask<PartitionedEDFTask>> instead of a bare BinaryHeap<PartitionedEDFTask>. This removes all manual fetch_add / fetch_sub calls from partitioned_edf.rs and the outer fetch_sub from get_next_task(): - Terminated/Panicked entries: Drop fires automatically when `continue` is reached in the loop. - Valid entries: Drop fires after entry.task.clone() evaluates, decrementing exactly once per dequeue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Change PartitionedTask<T>::inner from T to Option<T> and add a take() method. The counter is now decremented exactly once: - take(): explicitly transfers ownership and decrements (used when returning a valid task from get_next). - Drop: decrements only when inner.is_some(), i.e. take() was not called (used for Terminated/Panicked tasks discarded in the loop). This avoids Arc::clone() in get_next() — take().unwrap().task moves the Arc directly — and makes the decrement site unambiguous. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove Deref impl (required unwrap on Option<T>) - Delegate Ord/PartialEq to Option<T>'s built-in impls (no unwrap needed) - Replace unwrap+Deref access in get_next() with let-else + take() pattern Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Description
This PR adds a Partitioned EDF (Earliest Deadline First) scheduler (
SchedulerType::PartitionedEDF(deadline, core)), which statically assigns tasks to specific CPU cores and applies EDF scheduling independently within each core.Key design
BinaryHeapordered by absolute deadline, ensuring scheduling decisions on one core do not affect others.PartitionedEDF(relative_deadline_us, core_id). The assigned core is stored inTask::partitioned_core.max(running_task, pending_preemption)to avoid unnecessary preemption inversion.PartitionedEDFsits at the top ofPRIORITY_LIST, aboveGEDF. CPUs with partitioned tasks in queue exclusively serve those tasks.gedf::calculate_and_update_dag_deadlinefor DAG-aware task graphs.wake_workerswakes cores with partitioned tasks via a dedicated loop, separately from the general non-partitioned task distribution.Files
awkernel_async_lib/src/scheduler/partitioned_edf.rsawkernel_async_lib/src/scheduler.rsPartitionedEDFvariant,get_num_partitioned_schedulers()const fn, updatedget_next_taskawkernel_async_lib/src/task.rsTask::partitioned_core,NUM_PARTITIONED_TASKS_IN_QUEUE,get_task_running, updatedwake_workersapplications/tests/test_partitioned_edf/Related links
How was this PR tested?
make x86_64 RELEASE=1passes with no errors.test_partitioned_edf): built and linked via--features test_partitioned_edf. Three test scenarios are implemented:cpu_id()and checks it matches the assigned core; mismatch is reported as[FAIL].heavy(deadline=9900ms) andlight(deadline=990ms) are pinned to core 1;lightis expected to preemptheavybased on EDF ordering, observable from start/end timestamps in the log.num_cpu < 3) run in parallel; each task'scpu_id()log confirms no cross-core migration.To run:
Notes for reviewers
get_num_partitioned_schedulers()is aconst fnthat counts leadingPartitionedEDFentries inPRIORITY_LISTat compile time. If a new partitioned scheduler type is added, it must be placed at the front ofPRIORITY_LISTand thematches!pattern updated accordingly.wake_workersskips core 0 (.skip(1)), and the task spawn path emitslog::warn!ifcore == 0orcore >= num_cpu()is given.invoke_preemptioncompares the new task againstmax(running_task, pending_preemption), mirroring the GEDF implementation. This prevents unnecessary preemption when a higher-priority task is already pending.NUM_TASK_IN_QUEUEwas changed fromAtomicU64toAtomicU32on 64-bit targets to match the 32-bit path; task counts do not require 64-bit range.