Skip to content

feat(sched): add Partitioned EDF scheduler#686

Open
ytakano wants to merge 22 commits intotier4:mainfrom
ytakano:partitioned_edf
Open

feat(sched): add Partitioned EDF scheduler#686
ytakano wants to merge 22 commits intotier4:mainfrom
ytakano:partitioned_edf

Conversation

@ytakano
Copy link
Copy Markdown
Collaborator

@ytakano ytakano commented Apr 27, 2026

Description

This PR adds a Partitioned EDF (Earliest Deadline First) scheduler (SchedulerType::PartitionedEDF(deadline, core)), which statically assigns tasks to specific CPU cores and applies EDF scheduling independently within each core.

Key design

  • Per-CPU run queues: Each core maintains its own BinaryHeap ordered by absolute deadline, ensuring scheduling decisions on one core do not affect others.
  • Static task-to-core binding: Tasks are pinned to a designated core at spawn time via PartitionedEDF(relative_deadline_us, core_id). The assigned core is stored in Task::partitioned_core.
  • Intra-core EDF preemption: When a newly woken task has an earlier deadline than the task currently running on the same core, an IPI is sent to trigger preemption. The preemption target is determined as max(running_task, pending_preemption) to avoid unnecessary preemption inversion.
  • Priority over GEDF: PartitionedEDF sits at the top of PRIORITY_LIST, above GEDF. CPUs with partitioned tasks in queue exclusively serve those tasks.
  • DAG support: Absolute deadline calculation reuses gedf::calculate_and_update_dag_deadline for DAG-aware task graphs.
  • Worker wake-up: wake_workers wakes cores with partitioned tasks via a dedicated loop, separately from the general non-partitioned task distribution.

Files

File Change
awkernel_async_lib/src/scheduler/partitioned_edf.rs New scheduler implementation
awkernel_async_lib/src/scheduler.rs Add PartitionedEDF variant, get_num_partitioned_schedulers() const fn, updated get_next_task
awkernel_async_lib/src/task.rs Add Task::partitioned_core, NUM_PARTITIONED_TASKS_IN_QUEUE, get_task_running, updated wake_workers
applications/tests/test_partitioned_edf/ Test application (core pinning, EDF preemption, multi-core independence)

Related links

How was this PR tested?

  • Build: make x86_64 RELEASE=1 passes with no errors.
  • Test application (test_partitioned_edf): built and linked via --features test_partitioned_edf. Three test scenarios are implemented:
    1. Core pinning — each task logs cpu_id() and checks it matches the assigned core; mismatch is reported as [FAIL].
    2. EDF preemptionheavy (deadline=9900ms) and light (deadline=990ms) are pinned to core 1; light is expected to preempt heavy based on EDF ordering, observable from start/end timestamps in the log.
    3. Multi-core independence — two tasks on core 1 and core 2 (skipped when num_cpu < 3) run in parallel; each task's cpu_id() log confirms no cross-core migration.

To run:

RUSTFLAGS="-C panic=unwind" cargo +nightly-2025-11-16 x86 --release --features test_partitioned_edf

Notes for reviewers

  • get_num_partitioned_schedulers() is a const fn that counts leading PartitionedEDF entries in PRIORITY_LIST at compile time. If a new partitioned scheduler type is added, it must be placed at the front of PRIORITY_LIST and the matches! pattern updated accordingly.
  • Core ID 0 (BSP) is intentionally excluded from partitioned task assignment. wake_workers skips core 0 (.skip(1)), and the task spawn path emits log::warn! if core == 0 or core >= num_cpu() is given.
  • The preemption logic in invoke_preemption compares the new task against max(running_task, pending_preemption), mirroring the GEDF implementation. This prevents unnecessary preemption when a higher-priority task is already pending.
  • NUM_TASK_IN_QUEUE was changed from AtomicU64 to AtomicU32 on 64-bit targets to match the 32-bit path; task counts do not require 64-bit range.

ytakano added 14 commits April 27, 2026 11:54
Clone the selected network device while holding the net manager read lock, then release the lock before invoking the debug dump path.

This keeps potentially slow diagnostic dumping outside the shared manager lock and matches the existing interface operation pattern.
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
@ytakano ytakano requested a review from Copilot April 27, 2026 10:16
@ytakano ytakano marked this pull request as ready for review April 27, 2026 10:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new Partitioned EDF (Earliest Deadline First) scheduling mode that pins tasks to specific CPU cores and runs EDF independently per core, along with a userland test application and documentation updates.

Changes:

  • Add SchedulerType::PartitionedEDF(deadline, core) and a new partitioned_edf scheduler implementation with per-core run queues and IPI-based preemption.
  • Track partitioned runnable-task counts per CPU and update worker wake-up logic to prioritize cores with partitioned work.
  • Add a test_partitioned_edf application and wire it into userland/kernel feature flags; update scheduler documentation/output HTML.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
userland/src/lib.rs Adds feature-gated invocation of the new test app.
userland/Cargo.toml Adds optional dependency + feature flag for test_partitioned_edf.
kernel/Cargo.toml Wires kernel feature to userland feature for the test.
awkernel_async_lib/src/scheduler.rs Adds PartitionedEDF variant and integrates it into scheduler selection/priority.
awkernel_async_lib/src/scheduler/partitioned_edf.rs Implements the new partitioned EDF scheduler with per-core queues and preemption.
awkernel_async_lib/src/task.rs Adds Task::partitioned_core, per-core queue counters, and adjusts worker wake-up logic.
applications/tests/test_partitioned_edf/src/lib.rs Implements test scenarios for pinning, preemption, and multi-core independence.
applications/tests/test_partitioned_edf/Cargo.toml Adds the new test application crate.
mdbook/src/internal/scheduler.md Documents the new scheduler and updates scheduler directory listing.
docs/print.html Regenerates/updates rendered documentation with PartitionedEDF content.
docs/internal/scheduler.html Regenerates/updates rendered documentation with PartitionedEDF content.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread kernel/Cargo.toml
Comment on lines +95 to +102
let mut node = MCSNode::new();
let _guard = GLOBAL_WAKE_GET_MUTEX.lock(&mut node);
if !self.invoke_preemption(task.clone()) {
let partitioned_core = partitioned_core as usize;
if partitioned_core >= self.data.len() {
panic!("PartitionedEDF: core {partitioned_core} exceeds max supported CPU count");
}

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wake_task() calls invoke_preemption() before validating partitioned_core. invoke_preemption() indexes per-CPU state using task.partitioned_core (and expects it), so an invalid core_id (e.g. >= NUM_MAX_CPU) will panic before the later bounds check in wake_task(). Validate partitioned_core (non-zero, < num_cpu(), and < NUM_MAX_CPU) before invoking preemption, or pass a validated core into invoke_preemption() so it can’t index out of range.

Suggested change
let mut node = MCSNode::new();
let _guard = GLOBAL_WAKE_GET_MUTEX.lock(&mut node);
if !self.invoke_preemption(task.clone()) {
let partitioned_core = partitioned_core as usize;
if partitioned_core >= self.data.len() {
panic!("PartitionedEDF: core {partitioned_core} exceeds max supported CPU count");
}
let partitioned_core = partitioned_core as usize;
if partitioned_core == 0
|| partitioned_core >= awkernel_lib::cpu::num_cpu()
|| partitioned_core >= awkernel_lib::cpu::NUM_MAX_CPU
|| partitioned_core >= self.data.len()
{
panic!("PartitionedEDF: core {partitioned_core} exceeds max supported CPU count");
}
let mut node = MCSNode::new();
let _guard = GLOBAL_WAKE_GET_MUTEX.lock(&mut node);
if !self.invoke_preemption(task.clone()) {

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this as follows.

        let partitioned_core = partitioned_core as usize;
        if partitioned_core >= num_cpu() || partitioned_core == 0 {
            panic!("PartitionedEDF: core {partitioned_core} is out of range");
        }

Comment thread awkernel_async_lib/src/task.rs Outdated
Comment on lines +308 to +317
let partitioned_core = scheduler_type.partitioned_core();
if let Some(i) = partitioned_core {
if i == 0 || i >= num_cpu() as u16 {
log::warn!(
"Partitioned core should be between 1 and {}. It is addressed as {}.",
num_cpu() - 1,
i
);
}
}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid PartitionedEDF core IDs are only logged as warnings during spawn, but the task is still created with partitioned_core = Some(i). If i == 0 this contradicts the stated design (BSP excluded), and if i >= num_cpu() (but < NUM_MAX_CPU) the task will be enqueued for a non-existent CPU and effectively never run (and may also lead to panics in preemption paths). Consider failing fast (panic / return an error) or falling back to a non-partitioned scheduler when the core ID is out of range.

Suggested change
let partitioned_core = scheduler_type.partitioned_core();
if let Some(i) = partitioned_core {
if i == 0 || i >= num_cpu() as u16 {
log::warn!(
"Partitioned core should be between 1 and {}. It is addressed as {}.",
num_cpu() - 1,
i
);
}
}
let partitioned_core = match scheduler_type.partitioned_core() {
Some(i) if i == 0 || i >= num_cpu() as u16 => {
log::warn!(
"Partitioned core should be between 1 and {}. It is addressed as {}. Falling back to a non-partitioned task.",
num_cpu() - 1,
i
);
None
}
partitioned_core => partitioned_core,
};

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fallback to core 1 as follows.

                let partitioned_core = if let Some(i) = scheduler_type.partitioned_core() {
                    if i == 0 || i >= num_cpu() as u16 {
                        log::warn!(
                            "Partitioned core should be between 1 and {}. Falling back to core 1. Given core: {}",
                            num_cpu() - 1,
                            i
                        );
                        Some(1)
                    } else {
                        Some(i)
                    }
                } else {
                    None
                };

Comment on lines +98 to +112
let partitioned_core = partitioned_core as usize;
if partitioned_core >= self.data.len() {
panic!("PartitionedEDF: core {partitioned_core} exceeds max supported CPU count");
}

NUM_PARTITIONED_TASKS_IN_QUEUE[partitioned_core].fetch_add(1, Ordering::Relaxed);

let mut node_inner = MCSNode::new();
let mut data = self.data[partitioned_core].lock(&mut node_inner);
let internal_data = data.get_or_insert_with(EDFData::new);
internal_data.queue.push(PartitionedEDFTask {
task: task.clone(),
absolute_deadline,
wake_time,
});
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bounds check here uses self.data.len() (NUM_MAX_CPU), but not the actual online CPU count (num_cpu()). A partitioned_core in [num_cpu(), NUM_MAX_CPU) will pass this check, be enqueued, and then never be dequeued/woken because no such CPU exists. It would be safer to validate against num_cpu() (and core != 0) and reject or remap the task.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check it as follows.

        let partitioned_core = partitioned_core as usize;
        if partitioned_core >= num_cpu() || partitioned_core == 0 {
            panic!("PartitionedEDF: core {partitioned_core} is out of range");
        }

ytakano added 2 commits April 28, 2026 00:29
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
};

let partitioned_core = partitioned_core as usize;
if partitioned_core >= num_cpu() || partitioned_core == 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tasks::new already validates partitioned_core and falls back to Some(1) with a warning when invalid, but wake_task re-reads the original value from info.scheduler_type and panics here. The fallback path is therefore dead code: any task spawned with core == 0 or core >= num_cpu() will panic the kernel on first wake despite the friendly-warning intent. Read task.partitioned_core (the validated value) here, or rewrite info.scheduler_type at spawn time so both paths agree.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use task's partitioned_core as follows.

        let partitioned_core =
            task.partitioned_core.expect("Task has no partitioned core") as usize;

Comment thread awkernel_async_lib/src/task.rs Outdated
}

NUM_TASK_IN_QUEUE.fetch_add(1, Ordering::Release);
if self.partitioned_core.is_none() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a partitioned task panics, the wake path falls into the global panicked::SCHEDULER, but this conditional skips the NUM_TASK_IN_QUEUE increment because partitioned_core.is_some(). The partitioned scheduler's counter is also not touched in this path. Neither counter reflects the queued task, so wake_workers will not wake any CPU and the panicked task can sit indefinitely. Always increment NUM_TASK_IN_QUEUE for the panicked path.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The increment condition in Task::wake() is now:

if panicked || self.partitioned_core.is_none() {
    NUM_TASK_IN_QUEUE.fetch_add(1, Ordering::Release);
}

When panicked is true the counter is always incremented regardless of partitioned_core, so wake_workers will correctly wake a CPU for the panicked task. The panicked task is then routed to panicked::SCHEDULER which reads from NUM_TASK_IN_QUEUE.

};

pub struct PartitionedEDFScheduler {
data: [Mutex<Option<EDFData>>; NUM_MAX_CPU], // Run queue.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutex<Option<EDFData>> plus get_or_insert_with(EDFData::new) adds a branch and an extra as_mut() to every wake_task / get_next. BinaryHeap::new() does not allocate until pushed, so the lazy init buys nothing — and the slot is created NUM_MAX_CPU times regardless. Use [Mutex<EDFData>; NUM_MAX_CPU] (or a BinaryHeap directly) and drop the Option.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The field is now [Mutex; NUM_MAX_CPU] initialized via const fn EDFData::new() which returns an empty BinaryHeap. Since BinaryHeap::new() does not allocate, there is no cost difference compared to the Option-based lazy init, and the branch and as_mut() are gone from the hot path.

let num_cpu = awkernel_lib::cpu::num_cpu();

for i in 1..num_cpu {
for (i, partitioned_tasks) in NUM_PARTITIONED_TASKS_IN_QUEUE[..num_cpus]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wake_workers iterates NUM_PARTITIONED_TASKS_IN_QUEUE[..num_cpus].iter().enumerate().skip(1) twice — first to skip CPUs that have partitioned work, then to wake those same CPUs. Each atomic load acts as an optimization barrier, doubling memory traffic on a hot path. Fuse into one loop: load the count once, wake_cpu and continue if > 0, otherwise fall through to the global wake logic.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The second loop has been removed. The single loop now handles both cases in one pass: load NUM_PARTITIONED_TASKS_IN_QUEUE[i] once per CPU; if it is > 0 call wake_cpu(i) and continue; otherwise fall through to the global task logic in the same iteration.

tasks
}

pub fn get_task_running(cpu_id: usize) -> RunningTask {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new public function lacks a doc comment. The convention that task_id == 0 means "idle" and the valid range of cpu_id (< num_cpu()) are non-obvious, and the relationship to the existing get_tasks_running() is unclear. Add a /// comment covering return-value semantics and bounds.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. A doc comment has been added:

/// Get the running task for the given CPU.
///
/// `cpu_id` must be less than `awkernel_lib::cpu::num_cpu()`.
/// The returned [`RunningTask::task_id`] is `0` when the CPU is idle.
///
/// To get the running tasks for all CPUs at once, use [`get_tasks_running`].

)
}

pub const fn partitioned_core(&self) -> Option<u16> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new public method has no doc comment. The semantics of Some(n) (the assigned core index, expected to be in 1..num_cpu()) and the fact that all non-PartitionedEDF variants return None should be documented for consumers.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. A doc comment has been added:

/// Return the partitioned core index if this is a [`SchedulerType::PartitionedEDF`] scheduler.
///
/// Returns `Some(n)` where `n` is the CPU core index (`1..num_cpu()`) assigned to the
/// partitioned EDF scheduler. Returns `None` for all other scheduler variants.

ytakano and others added 6 commits April 28, 2026 17:43
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Address review feedback from atsushi421 on PR tier4#686.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…or partitioned tasks

- wake(): change counter increment so that panicked and non-partitioned
  tasks both use NUM_TASK_IN_QUEUE. Non-panicked partitioned tasks are
  left to partitioned_edf::wake_task(), which was already the original
  design. This removes the double-increment introduced in 848a478 where
  both task.rs and partitioned_edf.rs incremented NUM_PARTITIONED_TASKS_IN_QUEUE
  for the same non-preempted task.
- spawn(): validate and normalise the partitioned core *before* creating
  TaskInfo, so info.scheduler_type and task.partitioned_core always agree
  after a fallback to core 1. Previously info.scheduler_type could still
  hold the original invalid core number while task.partitioned_core was 1.

Addresses review feedback from atsushi421 on PR tier4#686.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tionedTask<T>

Add PartitionedTask<T> to scheduler.rs — a RAII smart pointer that
increments NUM_PARTITIONED_TASKS_IN_QUEUE[cpu_id] on construction and
decrements it on Drop. The PartitionedEDF queue now stores
BinaryHeap<PartitionedTask<PartitionedEDFTask>> instead of a bare
BinaryHeap<PartitionedEDFTask>.

This removes all manual fetch_add / fetch_sub calls from
partitioned_edf.rs and the outer fetch_sub from get_next_task():
- Terminated/Panicked entries: Drop fires automatically when
  `continue` is reached in the loop.
- Valid entries: Drop fires after entry.task.clone() evaluates,
  decrementing exactly once per dequeue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Change PartitionedTask<T>::inner from T to Option<T> and add a take()
method. The counter is now decremented exactly once:
- take(): explicitly transfers ownership and decrements (used when
  returning a valid task from get_next).
- Drop: decrements only when inner.is_some(), i.e. take() was not
  called (used for Terminated/Panicked tasks discarded in the loop).

This avoids Arc::clone() in get_next() — take().unwrap().task moves
the Arc directly — and makes the decrement site unambiguous.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove Deref impl (required unwrap on Option<T>)
- Delegate Ord/PartialEq to Option<T>'s built-in impls (no unwrap needed)
- Replace unwrap+Deref access in get_next() with let-else + take() pattern

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants