ix: stop printing and order saving from freezing after extended runtime or heavy workload#375
Merged
Merged
Conversation
…ifts Four bugs combined to freeze printing and order saving on busy days or when end-of-day is not run. Bug A (thread_pool.hh): enqueue_detached() called condition_variable::wait when the queue was full, blocking the main Xt event loop thread until a worker freed a slot. Under heavy load with slow CUPS jobs all 2 workers could stall, filling the queue and freezing the POS completely. Fix: replaced the blocking wait with a non-blocking check that drops the job and logs to stderr if full. Also increased thread count 2->4 and queue cap 64->256 to make the drop scenario much rarer. Bug B (data_persistence_manager.cc): CheckCUPSHealth() was called from Update() which runs on the main event loop timer every 500ms. Every 60s it fork()+exec()d 'systemctl is-active cups' and busy-polled for up to 5 seconds; on CUPS failure AttemptCUPSRecovery() added another 10-second busy-poll, all on the main thread. Fix: moved the entire CUPS health check into a dedicated background thread (CUPSMonitorLoop) that sleeps between checks. ProcessPeriodicTasks() now only reads an atomic<bool>. Bug C (data_persistence_manager.cc): SaveAllChecks() iterated every open check and wrote each to disk synchronously on the main event loop thread every 30 seconds. On a busy shift with 100-200 open checks (and more without end-of-day) this caused hundreds of sequential file opens per cycle blocking the main thread. Fix: auto-save is now dispatched to the thread pool via enqueue_detached. An atomic save_in_progress_ flag prevents double-dispatch. Bug D (remote_printer.cc): RemotePrinter::Send() only flushed when buffer_out->size > 4096, but buffer_out is a CharQueue(1024). The threshold was never reached, so intermediate Send() calls were no-ops. Large receipts silently overflowed the ring buffer and dropped bytes, producing garbled or truncated prints. Fix: flush threshold changed to buffer_size/2 (512 bytes), matching CharQueue's send_size design.
Member
|
It is a good day when such fixes as these are introduced into the ViewTouch code base. This is some mighty fine work ! My highest gratitude and appereciation, Ariel !! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes a class of bugs that caused the POS system to stop printing receipts and saving orders after extended runtime or on busy shifts. The symptoms were:
Seven bugs across five files have been fixed across two commits.
Root Causes & Fixes
1. Stale Xt input handler spin loop (remote_printer.cc)
When a remote printer connection failed after 8 retries,
PrinterCB()closed the socket and setfailure = 999but never calledRemoveInputFn(p->input_id). The X event loop continued polling the closed file descriptor on every tick in an infinite error loop. Over 10-12 hours this CPU waste progressively starved the main event thread.Fix: Call
RemoveInputFn(p->input_id)and setinput_id = -1before closing the socket.2. Thread pool blocking the main event loop (thread_pool.hh)
enqueue_detached()calledcondition_variable::waitwhen the job queue was full (64 jobs, 2 workers). Under heavy load with slow CUPS jobs the queue filled, and the nextPrinter::Close()call blocked the main Xt event loop thread — freezing printing, order saving, and the UI completely.Fix: Replaced the blocking wait with a non-blocking check. If full, the job is dropped and logged to stderr. Thread count raised 2→4, queue cap raised 64→256.
3. CUPS health check running
fork()+exec()on the main thread (data_persistence_manager.cc)CheckCUPSHealth()was called every 60 seconds fromUpdate()on the main event loop timer. It forked a child, ransystemctl is-active cups, and busy-polled for up to 5 seconds. On CUPS failure,AttemptCUPSRecovery()added another 10-second busy-poll — all on the main thread.Fix: Moved the entire CUPS health check into a dedicated
CUPSMonitorLoopbackground thread. The main thread now only reads anatomic<bool>.4.
SaveAllChecks()running synchronously on the main thread (data_persistence_manager.cc)Every 30 seconds,
SaveCriticalData()iterated the entire open check list and wrote every check to disk (open + write + close) on the main event loop thread. On a busy shift with 100–200 open checks this caused hundreds of blocking file I/O operations per cycle. Without end-of-day, the list grows all day.Fix: Auto-save is dispatched to the thread pool via
enqueue_detached. Anatomic<bool> save_in_progress_flag prevents double-dispatch.5.
Printer::Close()calling blockingsystem("lpr ...")on the main thread (printer.cc)The synchronous
LPDPrint()path usedsystem()which blocked the calling thread. If CUPS was slow or unresponsive this froze the entire event loop.Fix:
Close()now routesTARGET_LPDandTARGET_SOCKETthroughCloseAsync(), which dispatches to the thread pool. Dropped jobs callReportError()so the operator sees a message on screen.6.
RemotePrinter::Send()flush threshold exceeds buffer size (remote_printer.cc)Send()only flushed whenbuffer_out->size > 4096, butbuffer_outis aCharQueue(1024). The condition was never true. Large receipts silently overflowed the 1024-byte ring buffer and dropped bytes, producing garbled or truncated prints.Fix: Flush threshold changed to
buffer_size / 2(512 bytes), matching theCharQueue::send_sizedesign.7.
Archive::SavePacked()silently ignoring I/O errors (archive.cc, data_file.hh)Write errors after the drawer and check loops were swallowed silently and the archive was marked as successfully saved.
Fix: Added
OutputDataFile::HasError()(viaferror/gzerror) and checked it after both write loops withReportError()and early return on failure.Files Changed
Commits:
4169ab4,122af9b,5964922