feat(sched): add Partitioned EDF scheduler by ytakano · Pull Request #686 · tier4/awkernel

ytakano · 2026-04-27T09:29:26Z

Description

This PR adds a Partitioned EDF (Earliest Deadline First) scheduler (SchedulerType::PartitionedEDF(deadline, core)), which statically assigns tasks to specific CPU cores and applies EDF scheduling independently within each core.

Key design

Per-CPU run queues: Each core maintains its own BinaryHeap ordered by absolute deadline, ensuring scheduling decisions on one core do not affect others.
Static task-to-core binding: Tasks are pinned to a designated core at spawn time via PartitionedEDF(relative_deadline_us, core_id). The assigned core is stored in Task::partitioned_core.
Intra-core EDF preemption: When a newly woken task has an earlier deadline than the task currently running on the same core, an IPI is sent to trigger preemption. The preemption target is determined as max(running_task, pending_preemption) to avoid unnecessary preemption inversion.
Priority over GEDF: PartitionedEDF sits at the top of PRIORITY_LIST, above GEDF. CPUs with partitioned tasks in queue exclusively serve those tasks.
DAG support: Absolute deadline calculation reuses gedf::calculate_and_update_dag_deadline for DAG-aware task graphs.
Worker wake-up: wake_workers wakes cores with partitioned tasks via a dedicated loop, separately from the general non-partitioned task distribution.

Files

File	Change
`awkernel_async_lib/src/scheduler/partitioned_edf.rs`	New scheduler implementation
`awkernel_async_lib/src/scheduler.rs`	Add `PartitionedEDF` variant, `get_num_partitioned_schedulers()` const fn, updated `get_next_task`
`awkernel_async_lib/src/task.rs`	Add `Task::partitioned_core`, `NUM_PARTITIONED_TASKS_IN_QUEUE`, `get_task_running`, updated `wake_workers`
`applications/tests/test_partitioned_edf/`	Test application (core pinning, EDF preemption, multi-core independence)

How was this PR tested?

Build: make x86_64 RELEASE=1 passes with no errors.
Test application (test_partitioned_edf): built and linked via --features test_partitioned_edf. Three test scenarios are implemented:
1. Core pinning — each task logs cpu_id() and checks it matches the assigned core; mismatch is reported as [FAIL].
2. EDF preemption — heavy (deadline=9900ms) and light (deadline=990ms) are pinned to core 1; light is expected to preempt heavy based on EDF ordering, observable from start/end timestamps in the log.
3. Multi-core independence — two tasks on core 1 and core 2 (skipped when num_cpu < 3) run in parallel; each task's cpu_id() log confirms no cross-core migration.

To run:

RUSTFLAGS="-C panic=unwind" cargo +nightly-2025-11-16 x86 --release --features test_partitioned_edf

Notes for reviewers

get_num_partitioned_schedulers() is a const fn that counts leading PartitionedEDF entries in PRIORITY_LIST at compile time. If a new partitioned scheduler type is added, it must be placed at the front of PRIORITY_LIST and the matches! pattern updated accordingly.
Core ID 0 (BSP) is intentionally excluded from partitioned task assignment. wake_workers skips core 0 (.skip(1)), and the task spawn path emits log::warn! if core == 0 or core >= num_cpu() is given.
The preemption logic in invoke_preemption compares the new task against max(running_task, pending_preemption), mirroring the GEDF implementation. This prevents unnecessary preemption when a higher-priority task is already pending.
NUM_TASK_IN_QUEUE was changed from AtomicU64 to AtomicU32 on 64-bit targets to match the 32-bit path; task counts do not require 64-bit range.

Clone the selected network device while holding the net manager read lock, then release the lock before invoking the debug dump path. This keeps potentially slow diagnostic dumping outside the shared manager lock and matches the existing interface operation pattern.

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

This reverts commit b4176f8.

This reverts commit 8c0cda5.

Copilot

Pull request overview

This PR introduces a new Partitioned EDF (Earliest Deadline First) scheduling mode that pins tasks to specific CPU cores and runs EDF independently per core, along with a userland test application and documentation updates.

Changes:

Add SchedulerType::PartitionedEDF(deadline, core) and a new partitioned_edf scheduler implementation with per-core run queues and IPI-based preemption.
Track partitioned runnable-task counts per CPU and update worker wake-up logic to prioritize cores with partitioned work.
Add a test_partitioned_edf application and wire it into userland/kernel feature flags; update scheduler documentation/output HTML.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
userland/src/lib.rs	Adds feature-gated invocation of the new test app.
userland/Cargo.toml	Adds optional dependency + feature flag for `test_partitioned_edf`.
kernel/Cargo.toml	Wires kernel feature to userland feature for the test.
awkernel_async_lib/src/scheduler.rs	Adds `PartitionedEDF` variant and integrates it into scheduler selection/priority.
awkernel_async_lib/src/scheduler/partitioned_edf.rs	Implements the new partitioned EDF scheduler with per-core queues and preemption.
awkernel_async_lib/src/task.rs	Adds `Task::partitioned_core`, per-core queue counters, and adjusts worker wake-up logic.
applications/tests/test_partitioned_edf/src/lib.rs	Implements test scenarios for pinning, preemption, and multi-core independence.
applications/tests/test_partitioned_edf/Cargo.toml	Adds the new test application crate.
mdbook/src/internal/scheduler.md	Documents the new scheduler and updates scheduler directory listing.
docs/print.html	Regenerates/updates rendered documentation with PartitionedEDF content.
docs/internal/scheduler.html	Regenerates/updates rendered documentation with PartitionedEDF content.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-27T10:20:57Z

+        let mut node = MCSNode::new();
+        let _guard = GLOBAL_WAKE_GET_MUTEX.lock(&mut node);
+        if !self.invoke_preemption(task.clone()) {
+            let partitioned_core = partitioned_core as usize;
+            if partitioned_core >= self.data.len() {
+                panic!("PartitionedEDF: core {partitioned_core} exceeds max supported CPU count");
+            }
+


wake_task() calls invoke_preemption() before validating partitioned_core. invoke_preemption() indexes per-CPU state using task.partitioned_core (and expects it), so an invalid core_id (e.g. >= NUM_MAX_CPU) will panic before the later bounds check in wake_task(). Validate partitioned_core (non-zero, < num_cpu(), and < NUM_MAX_CPU) before invoking preemption, or pass a validated core into invoke_preemption() so it can’t index out of range.

Suggested change

let mut node = MCSNode::new();

let _guard = GLOBAL_WAKE_GET_MUTEX.lock(&mut node);

if !self.invoke_preemption(task.clone()) {

let partitioned_core = partitioned_core as usize;

if partitioned_core >= self.data.len() {

panic!("PartitionedEDF: core {partitioned_core} exceeds max supported CPU count");

}

let partitioned_core = partitioned_core as usize;

if partitioned_core == 0

|| partitioned_core >= awkernel_lib::cpu::num_cpu()

|| partitioned_core >= awkernel_lib::cpu::NUM_MAX_CPU

|| partitioned_core >= self.data.len()

{

panic!("PartitionedEDF: core {partitioned_core} exceeds max supported CPU count");

}

let mut node = MCSNode::new();

let _guard = GLOBAL_WAKE_GET_MUTEX.lock(&mut node);

if !self.invoke_preemption(task.clone()) {

Check this as follows.

let partitioned_core = partitioned_core as usize; if partitioned_core >= num_cpu() || partitioned_core == 0 { panic!("PartitionedEDF: core {partitioned_core} is out of range"); }

Copilot · 2026-04-27T10:20:57Z

+                let partitioned_core = scheduler_type.partitioned_core();
+                if let Some(i) = partitioned_core {
+                    if i == 0 || i >= num_cpu() as u16 {
+                        log::warn!(
+                            "Partitioned core should be between 1 and {}. It is addressed as {}.",
+                            num_cpu() - 1,
+                            i
+                        );
+                    }
+                }


Invalid PartitionedEDF core IDs are only logged as warnings during spawn, but the task is still created with partitioned_core = Some(i). If i == 0 this contradicts the stated design (BSP excluded), and if i >= num_cpu() (but < NUM_MAX_CPU) the task will be enqueued for a non-existent CPU and effectively never run (and may also lead to panics in preemption paths). Consider failing fast (panic / return an error) or falling back to a non-partitioned scheduler when the core ID is out of range.

Suggested change

let partitioned_core = scheduler_type.partitioned_core();

if let Some(i) = partitioned_core {

if i == 0 || i >= num_cpu() as u16 {

log::warn!(

"Partitioned core should be between 1 and {}. It is addressed as {}.",

num_cpu() - 1,

i

);

}

}

let partitioned_core = match scheduler_type.partitioned_core() {

Some(i) if i == 0 || i >= num_cpu() as u16 => {

log::warn!(

"Partitioned core should be between 1 and {}. It is addressed as {}. Falling back to a non-partitioned task.",

num_cpu() - 1,

i

);

None

}

partitioned_core => partitioned_core,

};

Fallback to core 1 as follows.

let partitioned_core = if let Some(i) = scheduler_type.partitioned_core() { if i == 0 || i >= num_cpu() as u16 { log::warn!( "Partitioned core should be between 1 and {}. Falling back to core 1. Given core: {}", num_cpu() - 1, i ); Some(1) } else { Some(i) } } else { None };

Copilot · 2026-04-27T10:20:57Z

+            let partitioned_core = partitioned_core as usize;
+            if partitioned_core >= self.data.len() {
+                panic!("PartitionedEDF: core {partitioned_core} exceeds max supported CPU count");
+            }
+
+            NUM_PARTITIONED_TASKS_IN_QUEUE[partitioned_core].fetch_add(1, Ordering::Relaxed);
+
+            let mut node_inner = MCSNode::new();
+            let mut data = self.data[partitioned_core].lock(&mut node_inner);
+            let internal_data = data.get_or_insert_with(EDFData::new);
+            internal_data.queue.push(PartitionedEDFTask {
+                task: task.clone(),
+                absolute_deadline,
+                wake_time,
+            });


The bounds check here uses self.data.len() (NUM_MAX_CPU), but not the actual online CPU count (num_cpu()). A partitioned_core in [num_cpu(), NUM_MAX_CPU) will pass this check, be enqueued, and then never be dequeued/woken because no such CPU exists. It would be safer to validate against num_cpu() (and core != 0) and reject or remap the task.

Check it as follows.

let partitioned_core = partitioned_core as usize; if partitioned_core >= num_cpu() || partitioned_core == 0 { panic!("PartitionedEDF: core {partitioned_core} is out of range"); }

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

atsushi421 · 2026-04-28T03:00:38Z

+        };
+
+        let partitioned_core = partitioned_core as usize;
+        if partitioned_core >= num_cpu() || partitioned_core == 0 {


Tasks::new already validates partitioned_core and falls back to Some(1) with a warning when invalid, but wake_task re-reads the original value from info.scheduler_type and panics here. The fallback path is therefore dead code: any task spawned with core == 0 or core >= num_cpu() will panic the kernel on first wake despite the friendly-warning intent. Read task.partitioned_core (the validated value) here, or rewrite info.scheduler_type at spawn time so both paths agree.

Use task's partitioned_core as follows.

let partitioned_core = task.partitioned_core.expect("Task has no partitioned core") as usize;

atsushi421 · 2026-04-28T03:00:38Z

        }

-        NUM_TASK_IN_QUEUE.fetch_add(1, Ordering::Release);
+        if self.partitioned_core.is_none() {


When a partitioned task panics, the wake path falls into the global panicked::SCHEDULER, but this conditional skips the NUM_TASK_IN_QUEUE increment because partitioned_core.is_some(). The partitioned scheduler's counter is also not touched in this path. Neither counter reflects the queued task, so wake_workers will not wake any CPU and the panicked task can sit indefinitely. Always increment NUM_TASK_IN_QUEUE for the panicked path.

Fixed. The increment condition in Task::wake() is now:

if panicked || self.partitioned_core.is_none() { NUM_TASK_IN_QUEUE.fetch_add(1, Ordering::Release); }

When panicked is true the counter is always incremented regardless of partitioned_core, so wake_workers will correctly wake a CPU for the panicked task. The panicked task is then routed to panicked::SCHEDULER which reads from NUM_TASK_IN_QUEUE.

atsushi421 · 2026-04-28T03:00:38Z

+};
+
+pub struct PartitionedEDFScheduler {
+    data: [Mutex<Option<EDFData>>; NUM_MAX_CPU], // Run queue.


Mutex<Option<EDFData>> plus get_or_insert_with(EDFData::new) adds a branch and an extra as_mut() to every wake_task / get_next. BinaryHeap::new() does not allocate until pushed, so the lazy init buys nothing — and the slot is created NUM_MAX_CPU times regardless. Use [Mutex<EDFData>; NUM_MAX_CPU] (or a BinaryHeap directly) and drop the Option.

Fixed. The field is now [Mutex; NUM_MAX_CPU] initialized via const fn EDFData::new() which returns an empty BinaryHeap. Since BinaryHeap::new() does not allocate, there is no cost difference compared to the Option-based lazy init, and the branch and as_mut() are gone from the hot path.

atsushi421 · 2026-04-28T03:00:38Z

-    let num_cpu = awkernel_lib::cpu::num_cpu();

-    for i in 1..num_cpu {
+    for (i, partitioned_tasks) in NUM_PARTITIONED_TASKS_IN_QUEUE[..num_cpus]


wake_workers iterates NUM_PARTITIONED_TASKS_IN_QUEUE[..num_cpus].iter().enumerate().skip(1) twice — first to skip CPUs that have partitioned work, then to wake those same CPUs. Each atomic load acts as an optimization barrier, doubling memory traffic on a hot path. Fuse into one loop: load the count once, wake_cpu and continue if > 0, otherwise fall through to the global wake logic.

Fixed. The second loop has been removed. The single loop now handles both cases in one pass: load NUM_PARTITIONED_TASKS_IN_QUEUE[i] once per CPU; if it is > 0 call wake_cpu(i) and continue; otherwise fall through to the global task logic in the same iteration.

atsushi421 · 2026-04-28T03:00:38Z

    tasks
 }

+pub fn get_task_running(cpu_id: usize) -> RunningTask {


This new public function lacks a doc comment. The convention that task_id == 0 means "idle" and the valid range of cpu_id (< num_cpu()) are non-obvious, and the relationship to the existing get_tasks_running() is unclear. Add a /// comment covering return-value semantics and bounds.

Fixed. A doc comment has been added:

/// Get the running task for the given CPU. /// /// `cpu_id` must be less than `awkernel_lib::cpu::num_cpu()`. /// The returned [`RunningTask::task_id`] is `0` when the CPU is idle. /// /// To get the running tasks for all CPUs at once, use [`get_tasks_running`].

atsushi421 · 2026-04-28T03:00:38Z

        )
    }
+
+    pub const fn partitioned_core(&self) -> Option<u16> {


This new public method has no doc comment. The semantics of Some(n) (the assigned core index, expected to be in 1..num_cpu()) and the fact that all non-PartitionedEDF variants return None should be documented for consumers.

Fixed. A doc comment has been added:

/// Return the partitioned core index if this is a [`SchedulerType::PartitionedEDF`] scheduler. /// /// Returns `Some(n)` where `n` is the CPU core index (`1..num_cpu()`) assigned to the /// partitioned EDF scheduler. Returns `None` for all other scheduler variants.

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

Address review feedback from atsushi421 on PR tier4#686. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…or partitioned tasks - wake(): change counter increment so that panicked and non-partitioned tasks both use NUM_TASK_IN_QUEUE. Non-panicked partitioned tasks are left to partitioned_edf::wake_task(), which was already the original design. This removes the double-increment introduced in 848a478 where both task.rs and partitioned_edf.rs incremented NUM_PARTITIONED_TASKS_IN_QUEUE for the same non-preempted task. - spawn(): validate and normalise the partitioned core *before* creating TaskInfo, so info.scheduler_type and task.partitioned_core always agree after a fallback to core 1. Previously info.scheduler_type could still hold the original invalid core number while task.partitioned_core was 1. Addresses review feedback from atsushi421 on PR tier4#686. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tionedTask<T> Add PartitionedTask<T> to scheduler.rs — a RAII smart pointer that increments NUM_PARTITIONED_TASKS_IN_QUEUE[cpu_id] on construction and decrements it on Drop. The PartitionedEDF queue now stores BinaryHeap<PartitionedTask<PartitionedEDFTask>> instead of a bare BinaryHeap<PartitionedEDFTask>. This removes all manual fetch_add / fetch_sub calls from partitioned_edf.rs and the outer fetch_sub from get_next_task(): - Terminated/Panicked entries: Drop fires automatically when `continue` is reached in the loop. - Valid entries: Drop fires after entry.task.clone() evaluates, decrementing exactly once per dequeue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Change PartitionedTask<T>::inner from T to Option<T> and add a take() method. The counter is now decremented exactly once: - take(): explicitly transfers ownership and decrements (used when returning a valid task from get_next). - Drop: decrements only when inner.is_some(), i.e. take() was not called (used for Terminated/Panicked tasks discarded in the loop). This avoids Arc::clone() in get_next() — take().unwrap().task moves the Arc directly — and makes the decrement site unambiguous. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove Deref impl (required unwrap on Option<T>) - Delegate Ord/PartialEq to Option<T>'s built-in impls (no unwrap needed) - Replace unwrap+Deref access in get_next() with let-else + take() pattern Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ytakano added 14 commits April 27, 2026 11:54

feat(igc): add on-demand netdump observability

8c0cda5

feat(sched): add partitioned_gedf.rs

caf9414

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

fix bugs

e26d4a5

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

fix bugs

41a20f0

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

fix bugs

59ec1bd

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

fixed

58f7942

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

fix

146c359

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

use const fn

576516c

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

add a test for partitioned EDF

11690b7

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

update docs

225c498

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

fmt

17b10dc

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

Revert "Address netdump review feedback"

91abfa1

This reverts commit b4176f8.

Revert "feat(igc): add on-demand netdump observability"

830e513

This reverts commit 8c0cda5.

ytakano requested a review from Copilot April 27, 2026 10:16

ytakano marked this pull request as ready for review April 27, 2026 10:16

Copilot started reviewing on behalf of ytakano April 27, 2026 10:16 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

ytakano added 2 commits April 28, 2026 00:29

reflect the review

9875069

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

fix

39321f8

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

atsushi421 reviewed Apr 28, 2026

View reviewed changes

ytakano and others added 6 commits April 28, 2026 17:43

reflect the review

848a478

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

docs: add doc comments to get_task_running and partitioned_core

c3aa2ae

Address review feedback from atsushi421 on PR tier4#686. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Conversation

ytakano commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key design

Files

Related links

How was this PR tested?

Notes for reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ytakano commented Apr 27, 2026 •

edited

Loading