Skip to content

ix: stop printing and order saving from freezing after extended runtime or heavy workload#375

Merged
No0ne558 merged 2 commits into
ViewTouch:masterfrom
No0ne558:master
Jun 9, 2026
Merged

ix: stop printing and order saving from freezing after extended runtime or heavy workload#375
No0ne558 merged 2 commits into
ViewTouch:masterfrom
No0ne558:master

Conversation

@No0ne558

@No0ne558 No0ne558 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR fixes a class of bugs that caused the POS system to stop printing receipts and saving orders after extended runtime or on busy shifts. The symptoms were:

  • System stops printing after ~10-12 hours of uptime
  • System stops printing and saving on busy days, even after shorter runtimes
  • Garbled or truncated receipts on large multi-item orders
  • Both problems are worse when end-of-day is not run (checks accumulate)

Seven bugs across five files have been fixed across two commits.


Root Causes & Fixes

1. Stale Xt input handler spin loop (remote_printer.cc)

When a remote printer connection failed after 8 retries, PrinterCB() closed the socket and set failure = 999 but never called RemoveInputFn(p->input_id). The X event loop continued polling the closed file descriptor on every tick in an infinite error loop. Over 10-12 hours this CPU waste progressively starved the main event thread.

Fix: Call RemoveInputFn(p->input_id) and set input_id = -1 before closing the socket.


2. Thread pool blocking the main event loop (thread_pool.hh)

enqueue_detached() called condition_variable::wait when the job queue was full (64 jobs, 2 workers). Under heavy load with slow CUPS jobs the queue filled, and the next Printer::Close() call blocked the main Xt event loop thread — freezing printing, order saving, and the UI completely.

Fix: Replaced the blocking wait with a non-blocking check. If full, the job is dropped and logged to stderr. Thread count raised 2→4, queue cap raised 64→256.


3. CUPS health check running fork()+exec() on the main thread (data_persistence_manager.cc)

CheckCUPSHealth() was called every 60 seconds from Update() on the main event loop timer. It forked a child, ran systemctl is-active cups, and busy-polled for up to 5 seconds. On CUPS failure, AttemptCUPSRecovery() added another 10-second busy-poll — all on the main thread.

Fix: Moved the entire CUPS health check into a dedicated CUPSMonitorLoop background thread. The main thread now only reads an atomic<bool>.


4. SaveAllChecks() running synchronously on the main thread (data_persistence_manager.cc)

Every 30 seconds, SaveCriticalData() iterated the entire open check list and wrote every check to disk (open + write + close) on the main event loop thread. On a busy shift with 100–200 open checks this caused hundreds of blocking file I/O operations per cycle. Without end-of-day, the list grows all day.

Fix: Auto-save is dispatched to the thread pool via enqueue_detached. An atomic<bool> save_in_progress_ flag prevents double-dispatch.


5. Printer::Close() calling blocking system("lpr ...") on the main thread (printer.cc)

The synchronous LPDPrint() path used system() which blocked the calling thread. If CUPS was slow or unresponsive this froze the entire event loop.

Fix: Close() now routes TARGET_LPD and TARGET_SOCKET through CloseAsync(), which dispatches to the thread pool. Dropped jobs call ReportError() so the operator sees a message on screen.


6. RemotePrinter::Send() flush threshold exceeds buffer size (remote_printer.cc)

Send() only flushed when buffer_out->size > 4096, but buffer_out is a CharQueue(1024). The condition was never true. Large receipts silently overflowed the 1024-byte ring buffer and dropped bytes, producing garbled or truncated prints.

Fix: Flush threshold changed to buffer_size / 2 (512 bytes), matching the CharQueue::send_size design.


7. Archive::SavePacked() silently ignoring I/O errors (archive.cc, data_file.hh)

Write errors after the drawer and check loops were swallowed silently and the archive was marked as successfully saved.

Fix: Added OutputDataFile::HasError() (via ferror/gzerror) and checked it after both write loops with ReportError() and early return on failure.


Files Changed

  • thread_pool.hh
  • data_persistence_manager.hh
  • data_persistence_manager.cc
  • remote_printer.cc
  • printer.cc
  • archive.cc
  • data_file.hh
  • changelog.md

Commits: 4169ab4, 122af9b, 5964922

No0ne558 added 2 commits June 9, 2026 01:48
…ifts

Four bugs combined to freeze printing and order saving on busy days or
when end-of-day is not run.

Bug A (thread_pool.hh): enqueue_detached() called condition_variable::wait
when the queue was full, blocking the main Xt event loop thread until a
worker freed a slot. Under heavy load with slow CUPS jobs all 2 workers
could stall, filling the queue and freezing the POS completely. Fix:
replaced the blocking wait with a non-blocking check that drops the job
and logs to stderr if full. Also increased thread count 2->4 and queue
cap 64->256 to make the drop scenario much rarer.

Bug B (data_persistence_manager.cc): CheckCUPSHealth() was called from
Update() which runs on the main event loop timer every 500ms. Every 60s
it fork()+exec()d 'systemctl is-active cups' and busy-polled for up to
5 seconds; on CUPS failure AttemptCUPSRecovery() added another 10-second
busy-poll, all on the main thread. Fix: moved the entire CUPS health
check into a dedicated background thread (CUPSMonitorLoop) that sleeps
between checks. ProcessPeriodicTasks() now only reads an atomic<bool>.

Bug C (data_persistence_manager.cc): SaveAllChecks() iterated every open
check and wrote each to disk synchronously on the main event loop thread
every 30 seconds. On a busy shift with 100-200 open checks (and more
without end-of-day) this caused hundreds of sequential file opens per
cycle blocking the main thread. Fix: auto-save is now dispatched to the
thread pool via enqueue_detached. An atomic save_in_progress_ flag
prevents double-dispatch.

Bug D (remote_printer.cc): RemotePrinter::Send() only flushed when
buffer_out->size > 4096, but buffer_out is a CharQueue(1024). The
threshold was never reached, so intermediate Send() calls were no-ops.
Large receipts silently overflowed the ring buffer and dropped bytes,
producing garbled or truncated prints. Fix: flush threshold changed to
buffer_size/2 (512 bytes), matching CharQueue's send_size design.
@No0ne558 No0ne558 merged commit 035936f into ViewTouch:master Jun 9, 2026
@GeneMosher

Copy link
Copy Markdown
Member

It is a good day when such fixes as these are introduced into the ViewTouch code base. This is some mighty fine work ! My highest gratitude and appereciation, Ariel !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants