Sort incoming device tasks using heap by devreal · Pull Request #784 · ICLDisco/parsec

devreal · 2026-06-04T21:20:08Z

Currently, sorting of device tasks is broken (they don't actually contain a priority field) and linear over a slice of the input list. Instead, we can use a max-heap to keep a sorted array of device tasks, push all new tasks into the heap, and take out the highest priority task.

Complexity of inserting the ring into the heap is O(N+M) or O(N*logM), depending on the whether we just insert elements one by one or rebuild from scratch (beneficial if we insert more tasks than exist already).

We can detach the full lifo at once to avoid repeated atomic operations by the management thread.

Also, use nolock variants on stream-local lists and change to lifo for the shared queue to avoid the lock of the fifo. They are only modified by the device management thread. Inter-thread communication happens through the pending queue.

Once tasks are trickling through the system they do not need to be sorted anymore.

Currently, sorting of device tasks is broken (they don't actually contain a priority field) and linear over a slice of the input list. Instead, we can use a max-heap to keep a sorted array of device tasks, push all new tasks into the heap, and take out the highest priority task. Complexity of inserting the ring into the heap is O(N+M) or O(N*logM), depending on the whether we just insert elements one by one or rebuild from scratch (beneficial if we insert more tasks than exist already). We can detach the full lifo at once to avoid repeated atomic operations by the management thread. Also, use nolock variants on stream-local lists and change to lifo for the shared queue to avoid the lock of the fifo. They are only modified by the device management thread. Inter-thread communication happens through the pending queue. Once tasks are trickling through the system they do not need to be sorted anymore. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

Copilot

Pull request overview

This PR reworks GPU device task intake to restore proper priority-based selection and reduce overhead by switching from FIFO-based pending queues and ad-hoc sorting to a lock-free LIFO intake plus a manager-private max-heap for prioritized pop.

Changes:

Replace shared pending queues in CUDA and Level Zero devices with a lock-free parsec_lifo_t, and introduce a pending_heap (parsec_heap_*) used by the device management thread to pop the highest-priority task.
Remove the old “sort pending tasks” mechanism and MCA parameters (sort_pending_tasks) along with the associated function pointer in parsec_device_module_t.
Add parsec/class/parsec_heap.{h,c} and integrate it into the build, plus add parsec_lifo_detach_all() helpers to batch-move tasks with fewer atomic operations.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
parsec/mca/device/level_zero/device_level_zero_module.c	Switch Level Zero pending queue to LIFO and initialize/finalize the new pending heap.
parsec/mca/device/level_zero/device_level_zero_component.c	Remove the Level Zero MCA knob and wiring for pending-task sorting.
parsec/mca/device/device.h	Remove the `sort_pending_list` function pointer from the device module interface.
parsec/mca/device/device_gpu.h	Introduce `pq_priority`, swap pending type to LIFO, and add a manager-private pending heap field.
parsec/mca/device/device_gpu.c	Detach all pending tasks from LIFO, assign heap priorities, push into heap, and pop highest-priority task; switch stream-local queues to nolock list ops.
parsec/mca/device/cuda/device_cuda_module.c	Switch CUDA pending queue to LIFO and initialize/finalize the new pending heap.
parsec/mca/device/cuda/device_cuda_component.c	Remove the CUDA MCA knob and wiring for pending-task sorting.
parsec/CMakeLists.txt	Add `class/parsec_heap.c` to the build.
parsec/class/parsec_heap.h	Add new array-based max-heap API (pointer elements, `int32_t` priority key by offset).
parsec/class/parsec_heap.c	Implement heap push/pop and ring batch-push with adaptive heapify strategy.
parsec/class/lifo.h	Add helpers to detach the entire LIFO at once and convert the internal chain into a doubly-linked ring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+typedef struct parsec_heap_s {
+    parsec_object_t   super;
+    void            **nodes;       /**< pointer array; nodes[0] is the maximum */
+    size_t            size;        /**< current number of elements */
+    size_t            capacity;    /**< allocated slots in nodes[] */
+    size_t            comp_offset; /**< byte offset of int32_t priority key */
+} parsec_heap_t;


+        for (ssize_t i = (ssize_t)(heap->size / 2) - 1; i >= 0; i--)
+            heap_sift_down(heap, (size_t)i);


devreal · 2026-06-04T21:58:04Z

Looking into merging the two heap implementations. Didn't see the existing one because it wasn't in the class/ directory.

Similar to the existing maxheap implementation. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

The old implementation now uses the new class and was renamed to parsec_task_heap_t, since it is specialized for hierarchical heaps. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

+static inline int heap_cmp(const parsec_heap_t *h,
+                            const parsec_list_item_t *a,
+                            const parsec_list_item_t *b)
+{
+    return COMPARISON_VAL(a, h->comp_offset) - COMPARISON_VAL(b, h->comp_offset);
+}


+        char tmp[MAX_TASK_STRLEN];
+        PARSEC_DEBUG_VERBOSE(20, parsec_debug_output,
+                             "MH:\tStole exec C %s (%p) from heap %p",
+                             parsec_task_snprintf(tmp, MAX_TASK_STRLEN, task), task, heap);
+    }


        char tmp[MAX_TASK_STRLEN];
-        PARSEC_DEBUG_VERBOSE(20, parsec_debug_output, "MH:\tStole exec C %s (%p) from heap %p",
+        PARSEC_DEBUG_VERBOSE(20, parsec_debug_output,
+                             "MH:\tStole exec C %s (%p) from heap %p",
                             parsec_task_snprintf(tmp, MAX_TASK_STRLEN, to_use), to_use, heap);
    }


    assert(NULL == task);
    if( NULL == stream->tasks[stream->start] ) {  /* there is room on the stream */
-        task = (parsec_gpu_task_t*)parsec_list_pop_front(stream->fifo_pending);  /* get the best task */
+        task = (parsec_gpu_task_t*)parsec_list_nolock_pop_front(stream->fifo_pending);  /* get the best task */


Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

Carefully handle list_prev/list_next since PARSEC_ITEM_DETACH may reset them. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

devreal requested a review from Copilot June 4, 2026 21:20

devreal requested a review from a team as a code owner June 4, 2026 21:20

Copilot started reviewing on behalf of devreal June 4, 2026 21:20 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

devreal added 2 commits June 4, 2026 18:07

Heap: Replace heap-allocated array with linked-list

74a08cb

Similar to the existing maxheap implementation. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

Rename legacy parsec heap and implement using class/parsec_heap

ab2c274

The old implementation now uses the new class and was renamed to parsec_task_heap_t, since it is specialized for hierarchical heaps. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

devreal requested a review from Copilot June 4, 2026 22:09

Copilot started reviewing on behalf of devreal June 4, 2026 22:09 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

devreal added 4 commits June 4, 2026 18:57

Overflow-safe priority comparison in heap

1fd3151

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

Detach items from lifo when turning it into a ring

3c6b6d6

Carefully handle list_prev/list_next since PARSEC_ITEM_DETACH may reset them. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

maxheap: print heap pointer, not indirect pointer

a3e9160

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

Consistently use parsec_task_heap_t in hbbuffer

826f3cf

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

devreal force-pushed the device-task-priorities-heap branch from 524089e to 826f3cf Compare June 5, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort incoming device tasks using heap#784

Sort incoming device tasks using heap#784
devreal wants to merge 7 commits into
ICLDisco:masterfrom
devreal:device-task-priorities-heap

devreal commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

devreal commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		for (ssize_t i = (ssize_t)(heap->size / 2) - 1; i >= 0; i--)
		heap_sift_down(heap, (size_t)i);

Conversation

devreal commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

devreal commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants