Skip to content

fix(bthread): refactor sharded priority queue with lock-free MPSC inb…#3270

Open
yannan-wyn wants to merge 1 commit intoapache:masterfrom
yannan-wyn:fix_priority_bthread
Open

fix(bthread): refactor sharded priority queue with lock-free MPSC inb…#3270
yannan-wyn wants to merge 1 commit intoapache:masterfrom
yannan-wyn:fix_priority_bthread

Conversation

@yannan-wyn
Copy link
Copy Markdown

@yannan-wyn yannan-wyn commented Apr 15, 2026

What problem does this PR solve?

Problem Summary:
The original priority queue implementation misuses WorkStealingQueue — multiple producers call push() concurrently, but push() is designed for single-owner use only. This leads to potential data races under contention.
See also: #2819, #3055, #3078, #3096

What is changed and the side effects?

Changed:

  • Replace the single WSQ per tag with per-ED WSQs — each EventDispatcher gets its own dedicated WorkStealingQueue, indexed by priority_index
  • Store priority_index in TaskMeta, used by priority_to_run to locate the correct WSQ
  • steal_task iterates all per-ED WSQs within the tag to find priority bthreads
  • Add unit tests covering both start_background and start_foreground (the actual ED preempt-and-resume path), including multi-ED concurrent preemption

Why per-ED WSQ instead of sharded MPSC?

  • Each ED bthread runs on exactly one worker at a time — push() is naturally single-producer per WSQ. This avoid the race condition. The implementation is minimal (~30 lines of core logic change) with identical overhead to PR epoll bthread deal first #2819.

Side effects:

QPS Metric PR#2819 per-ED WSQ Δ
10k avg 47 47 0
p99 68 71 +3
50k avg 81 79 −2
p99 148 139 −9
100k avg 112 112 0
p99 171 174 +3
  • Breaking backward compatibility:
    • None. Gated by FLAGS_enable_bthread_priority_queue (default false).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the bthread “global priority” scheduling path to avoid unsafe multi-producer use of WorkStealingQueue by introducing a sharded design with an inbound lock-free queue per shard, plus adds new unit tests and a microbenchmark.

Changes:

  • Introduces PriorityShard per tag and routes priority submissions via shard inbound queues with owner bind/unbind lifecycle.
  • Updates scheduling/stealing logic to prefer owner shards, steal across shards, and provide fallback behavior during teardown.
  • Adds unit tests for correctness/concurrency/owner changes and a microbenchmark suite for the new primitives.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/bthread_priority_queue_unittest.cpp Adds correctness + concurrent-producer tests for priority queue behavior.
test/bthread_priority_queue_owner_unittest.cpp Adds tests around dynamic worker/owner binding behavior under load.
test/bthread_priority_queue_benchmark.cpp Adds microbenchmarks for inbound enqueue/dequeue, flush, steal, and baseline comparisons.
src/bthread/task_group.h Adds _priority_shard_index for O(1) lookup of a TaskGroup’s owned shard.
src/bthread/task_control.h Adds PriorityShard definition and TaskControl helper APIs for shard management.
src/bthread/task_control.cpp Implements sharded priority queue logic, owner lifecycle, flushing, stealing, and fallback enqueue behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/bthread_priority_queue_unittest.cpp Outdated
Comment thread test/bthread_priority_queue_unittest.cpp Outdated
Comment thread test/bthread_priority_queue_owner_unittest.cpp Outdated
Comment thread src/bthread/task_control.cpp Outdated
Comment thread src/bthread/task_control.cpp Outdated
Comment thread src/bthread/task_control.cpp Outdated
@yannan-wyn
Copy link
Copy Markdown
Author

@wwbmmm
comments from copolit replied, please review again

@wwbmmm
Copy link
Copy Markdown
Contributor

wwbmmm commented Apr 16, 2026

LGTM

@yannan-wyn
Copy link
Copy Markdown
Author

#2819 #3078
Hi @wwbmmm @chenBright,
all comments have been addressed and CI is green.
Please take another look when convenient, thanks!

@chenBright
Copy link
Copy Markdown
Contributor

@yannan-wyn Could you provide some performance test data, like in #2819?

Comment thread src/bthread/task_control.cpp Outdated
if (_priority_queues[tag].steal(tid)) {
return true;
// priority queue: owner-first, then steal from other shards
if (_enable_priority_queue && !_priority_shards[tag].empty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to encapsulate the priority_queue-related logic in a separate function?

Comment thread src/bthread/task_control.cpp Outdated
@yannan-wyn
Copy link
Copy Markdown
Author

yannan-wyn commented Apr 28, 2026

@yannan-wyn Could you provide some performance test data, like in #2819?

Hi @chenBright , I run the e2e benchmark and attach the results below.

Params:

  • CPU: Intel Xeon E5-2640 v3 @ 2.60GHz, 64 cores
  • Workers: num_threads=9
  • Event Dispatchers: event_dispatcher_num=1
  • Priority Queue: enable_bthread_priority_queue=true
  • Client: rpc_press, thread_num=4
  • Protocol: echo, method example.EchoService.Echo, input {"message":"hello"}

Summary:

Latency (us) — 10k QPS

Percentile PR#2819 per-ED WSQ
avg 47 47
p50 47 47
p90 54 53
p99 68 71
p99.9 84 91
p99.99 132 218

Latency (us) — 50k QPS

Percentile PR#2819 per-ED WSQ
avg 81 79
p50 76 77
p90 114 112
p99 148 139
p99.9 383 202
p99.99 1558 1127

Latency (us) — 100k QPS

Percentile PR#2819 per-ED WSQ
avg 112 112
p50 112 112
p90 143 142
p99 171 174
p99.9 215 701
p99.99 1409 2275

perf stat (50k QPS, 30s)

Metric PR#2819 per-ED WSQ Diff
context-switches 2,103K 1,624K −22.8%
cpu-migrations 151K 164K +8.6% ↑
cycles 103.5B 97.2B −6.1%
IPC 0.63 0.63 0% —

Pictures:
per-ED 10K
截屏2026-04-28 09 52 36

per-ED 50k
截屏2026-04-28 09 50 26

per-ED 100k
截屏2026-04-28 09 52 09

@yannan-wyn yannan-wyn force-pushed the fix_priority_bthread branch 2 times, most recently from 8c6f4a6 to 8c434ca Compare April 28, 2026 02:03
Each EventDispatcher gets its own WorkStealingQueue, making concurrent
push from multiple EDs naturally SPMC-safe without spinlocks.
@yannan-wyn yannan-wyn force-pushed the fix_priority_bthread branch from 8c434ca to 886cc48 Compare April 28, 2026 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants