fix: preserve split UTF-8 across PersistentTerminal feed() calls by aarcamp · Pull Request #9 · remorses/ghostty-opentui

aarcamp · 2026-03-06T13:04:12Z

When reading from a PTY, byte chunks arrive at arbitrary boundaries. Multi-byte UTF-8 characters (emoji = 4 bytes, CJK = 3 bytes, accented chars = 2+ bytes) can be split across consecutive read() calls. The old code in PersistentTerminal.feed():

// OLD - each call decoded independently
str = data.toString("utf-8")      // Buffer path
str = new TextDecoder("utf-8").decode(data)  // Uint8Array path

Both of these treat each chunk as a complete, self-contained UTF-8 sequence. When a multi-byte character is split (e.g., the first 2 bytes of a 4-byte emoji in one call, the remaining 2 in the next), each independent decode produces U+FFFD replacement characters (�) instead of the intended character.

- use a persistent TextDecoder in streaming mode for Buffer/Uint8Array input - share binary UTF-8 decoding across ptyToJson, ptyToText, and ptyToHtml - accept Uint8Array in ghostty-terminal ansi/feed typings - add regression tests for split multibyte chunk feeds

remorses · 2026-03-06T13:19:47Z

is there a way to remove hasBinaryDecoderState state and use the decoder directly? cc @codex review

aarcamp · 2026-03-06T15:20:20Z

is there a way to remove hasBinaryDecoderState state and use the decoder directly? cc @codex review

Yes, it's an optimization. We could swap it out for simpler logic as follows -- it just means some extra TextDecoder() allocations in certain conditions:

diff --git a/src/ffi.ts b/src/ffi.ts
index 79bc6a3..6a4244c 100644
--- a/src/ffi.ts
+++ b/src/ffi.ts
@@ -255,7 +255,6 @@ export class PersistentTerminal {
   private _rows: number
   private _destroyed = false
   private _streamDecoder = new TextDecoder("utf-8")
-  private _hasBinaryDecoderState = false

   constructor(options: PersistentTerminalOptions = {}) {
     if (!native) {
@@ -297,14 +296,9 @@ export class PersistentTerminal {
     this.assertNotDestroyed()

     if (typeof data === "string") {
-      if (this._hasBinaryDecoderState) {
-        const flushed = this._streamDecoder.decode()
-        this._streamDecoder = new TextDecoder("utf-8")
-        this._hasBinaryDecoderState = false
-        if (flushed.length > 0) {
-          native!.feedTerminal(this._id, flushed)
-        }
-      }
+      // Discard any partial UTF-8 bytes from prior binary feeds before
+      // crossing into a plain string boundary.
+      this._streamDecoder = new TextDecoder("utf-8")

       if (data.length > 0) {
         native!.feedTerminal(this._id, data)
@@ -312,7 +306,6 @@ export class PersistentTerminal {
       return
     }

-    this._hasBinaryDecoderState = true
     const decoded = this._streamDecoder.decode(data, { stream: true })
     if (decoded.length > 0) {
       native!.feedTerminal(this._id, decoded)
@@ -337,7 +330,6 @@ export class PersistentTerminal {
     this.assertNotDestroyed()
     native!.resetTerminal(this._id)
     this._streamDecoder = new TextDecoder("utf-8")
-    this._hasBinaryDecoderState = false
   }

   /**
@@ -414,7 +406,6 @@ export class PersistentTerminal {
     if (this._destroyed) return
     this._destroyed = true
     this._streamDecoder = new TextDecoder("utf-8")
-    this._hasBinaryDecoderState = false
     native!.destroyTerminal(this._id)
   }

I can either push this update, or add comments to the existing PR so it's clearer what's happening, just lmk. :)

remorses · 2026-03-06T16:03:01Z

Yeah I like this one better. I will publish after you push this commit

- remove _hasBinaryDecoderState bookkeeping - recreate the streaming TextDecoder on string feeds, reset, and destroy - add a regression test for binary-to-string feed boundaries

refactor: simplify PersistentTerminal UTF-8 stream decoder state

6262592

- remove _hasBinaryDecoderState bookkeeping - recreate the streaming TextDecoder on string feeds, reset, and destroy - add a regression test for binary-to-string feed boundaries

aarcamp force-pushed the multibyte-split-fix branch from fcc9405 to 6262592 Compare March 6, 2026 16:09

remorses merged commit 750870e into remorses:main Mar 6, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve split UTF-8 across PersistentTerminal feed() calls#9

fix: preserve split UTF-8 across PersistentTerminal feed() calls#9
remorses merged 2 commits intoremorses:mainfrom
aarcamp:multibyte-split-fix

aarcamp commented Mar 6, 2026

Uh oh!

remorses commented Mar 6, 2026 •

edited

Loading

Uh oh!

aarcamp commented Mar 6, 2026

Uh oh!

remorses commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aarcamp commented Mar 6, 2026

Uh oh!

remorses commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aarcamp commented Mar 6, 2026

Uh oh!

remorses commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

remorses commented Mar 6, 2026 •

edited

Loading