Skip to content

cmd/loop: CLI session recording and replay tests#1072

Open
starius wants to merge 18 commits into
lightninglabs:masterfrom
starius:cli-tests20
Open

cmd/loop: CLI session recording and replay tests#1072
starius wants to merge 18 commits into
lightninglabs:masterfrom
starius:cli-tests20

Conversation

@starius
Copy link
Copy Markdown
Collaborator

@starius starius commented Feb 5, 2026

This work adds a repeatable, high-level testing path for the Loop CLI by recording real command runs and replaying them in tests. The goal is to make CLI behavior verifiable and regression-resistant without depending on a live loopd or local environment details. The recorded sessions become fixtures that exercise success and error paths across subcommands and flags, while replay ensures deterministic output checks in CI.

What changed

  • Added a session recorder that captures CLI stdin/stdout/stderr and gRPC traffic when LOOP_SESSION_RECORD=true is set.
  • Wired a pluggable gRPC transport, stdio hooks, and a test clock so recording and replay use the same code paths as normal CLI execution.
  • Normalized help output to show home paths as ~ and normalized JSON output and timestamps during replay so sessions are portable across machines.
  • Added replay tests that load fixtures, inject recorded env/stdin, replay the recorded gRPC stream, and compare outputs.
  • Recorded a broad set of CLI scenarios into cmd/loop/testdata/sessions/ and documented how to add more.

Why this approach

The CLI previously relied on manual testing or live daemons, which makes it hard to detect regressions and impossible to run in CI without complex setup. Recording/replay gives us deterministic, reproducible coverage of real behavior, including error messaging, output formatting, and gRPC request/response handling, while still exercising the production code paths.

How it works

Recording wraps the CLI process and intercepts stdio and gRPC calls, saving a JSON session file with events and metadata. Replay loads these events, supplies recorded input, and provides a recorded gRPC connection so commands run without a live daemon. Output is normalized where needed so sessions can be replayed across machines.

Testing

  • go test ./cmd/loop -v
  • To record a session: LOOP_SESSION_RECORD=true loop <command> --network regtest

Pull Request Checklist

  • Update release_notes.md if your PR contains major features, breaking changes or bugfixes

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @starius, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the testing infrastructure for the Loop CLI by introducing a robust session recording and replay mechanism. The primary goal is to establish a high-level, deterministic, and regression-resistant testing path that does not depend on a live loopd daemon or specific local environment details. By capturing and replaying real command executions, the CLI's behavior, including output formatting, error messaging, and gRPC interactions, can be thoroughly verified in a controlled and reproducible manner, greatly improving the reliability of future changes.

Highlights

  • CLI Session Recording: Introduced a session recorder that captures CLI stdin, stdout, stderr, and gRPC traffic when the LOOP_SESSION_RECORD=true environment variable is set. This allows for repeatable, high-level testing of the Loop CLI.
  • Pluggable Transport and I/O Hooks: Implemented a pluggable gRPC transport, standard I/O hooks, and a test clock. This ensures that both recording and replaying CLI sessions utilize the same code paths as normal CLI execution, enhancing test reliability.
  • Output Normalization: Added logic to normalize help output (e.g., home paths as ~) and JSON output/timestamps during replay. This makes recorded sessions portable and ensures deterministic output checks across different environments.
  • Comprehensive Replay Tests: Developed replay tests that load recorded session fixtures, inject environment variables and stdin, replay the recorded gRPC stream, and compare outputs. This provides robust regression resistance for CLI behavior.
  • Extensive Recorded Scenarios: A broad set of CLI scenarios have been recorded into cmd/loop/testdata/sessions/, covering various commands like terms, getinfo, quote in/out, listauth, fetchl402, getparams, setrule, suggestswaps, reservations list, instantout, listinstantouts, static withdraw/listwithdrawals/listswaps, listswaps, swapinfo, loop out, and abandonswap.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • cmd/loop/debug.go
    • Modified getDebugClient to use the new sessionTransport.Dial for gRPC connection, integrating it with the session recording/replay system.
  • cmd/loop/default_path_text_test.go
    • Added a new test file default_path_text_test.go to verify the defaultPathText function's behavior for HOME-based path elision.
  • cmd/loop/loopout.go
    • Replaced time.Now() with cliClock.Now() when setting swapDeadline to ensure deterministic timestamps in recorded sessions.
  • cmd/loop/main.go
    • Added imports for os/signal, syscall, and github.com/lightningnetwork/lnd/clock to support session recording features.
    • Updated loopDirFlag, tlsCertFlag, and macaroonPathFlag to use the new defaultPathText function for normalized help output.
    • Introduced global variables cliClock, sessionRec, and forceDeterministicJSON to manage the CLI's clock, active session recorder, and JSON output determinism.
    • Added installSessionSignalHandler to record signals and manage context cancellation during session recording.
    • Implemented defaultPathText to replace user home directories with ~ in path strings for consistent output.
    • Modified printJSON and printRespJSON to use maybeNormalizeJSON for stable JSON output across different Go versions and environments.
    • Updated the fatal error handling function to finalize the session recorder on exit.
    • Refactored the main function to initialize the sessionRecorder, hook the clock and gRPC transport, and install a signal handler if session recording is enabled.
    • Changed getClient and getClientWithConn to utilize the sessionTransport.Dial method, allowing for gRPC interception during recording/replay.
    • Added hookClock to override the CLI's time source for deterministic testing.
    • Implemented maybeNormalizeJSON to re-encode JSON output using the standard library for stable formatting.
  • cmd/loop/quote.go
    • Replaced time.Now() with cliClock.Now() when setting swapDeadline to ensure deterministic timestamps in recorded sessions.
  • cmd/loop/session_recorder.go
    • Added a new file session_recorder.go containing the sessionRecorder struct and its associated methods.
    • Implemented functionality to create and manage session files, including metadata (args, env, version, duration, error).
    • Developed logEvent to record timestamped payloads for stdout, stderr, stdin, gRPC, exit, and signal events.
    • Provided Start and stopHooks methods to attach and detach standard I/O hooks for capturing CLI interactions.
    • Implemented logExit to record the final outcome and duration of a CLI session.
    • Added Finalize to write the complete recorded session to a JSON file.
    • Included Dial, UnaryInterceptor, and StreamInterceptor methods to integrate gRPC traffic capture into the recording process.
    • Developed utility functions deriveSessionSlug, sanitizeSlug, nextSessionCounter, and parseSessionCounter for managing session file naming and numbering.
  • cmd/loop/session_recorder_test.go
    • Added a new test file session_recorder_test.go to verify the correctness of deriveSessionSlug, sanitizeSlug, and parseSessionCounter functions.
  • cmd/loop/session_replay_test.go
    • Added a new test file session_replay_test.go containing the TestRecordedSessions function to replay all recorded CLI sessions.
    • Implemented recordedSession struct and loadRecordedSessionFS/parseRecordedSession to load recorded data.
    • Developed applyEnv to set environment variables for replay and replayTransport to provide a recorded gRPC connection.
    • Created recordedClientConn and replayStream to simulate gRPC interactions based on recorded events.
    • Included compareMessageWithContext and compareJSONWithContext for detailed comparison of actual vs. recorded gRPC messages.
    • Added normalizeTimestamps to ensure consistent timestamp comparison during replay.
    • Implemented cloneCommandForReplay and related helper functions (cloneCommandStruct, cloneCommands, cloneFlagsWithGroups, cloneFlags, cloneMutuallyExclusiveFlags, cloneFlag, cloneArguments, cloneArgument, cloneStructWithExportedFields, copyExportedFields, cloneValue) to create fresh CLI command instances for each test, preventing state leakage.
  • cmd/loop/session_stdio.go
    • Added a new file session_stdio.go providing hookStdout, hookStderr, and hookStdin functions to intercept and record standard I/O streams.
    • Implemented composeWriter and chunkWriter helpers for flexible output handling and chunking.
  • cmd/loop/session_transport.go
    • Added a new file session_transport.go defining the grpcTransport interface and its directGrpcTransport implementation.
    • Introduced daemonConn interface to abstract gRPC client connection details.
    • Provided hookGrpc to dynamically switch the active gRPC transport for session recording/replay.
    • Implemented dialDirectConn as the standard method for establishing gRPC connections.
  • cmd/loop/stop.go
    • Updated waitForDaemonShutdown to accept the new daemonConn interface, making it compatible with the pluggable transport.
  • cmd/loop/swaps.go
    • Removed the cmd.IsSet("id") check in abandonSwap as the ID is now expected as a positional argument.
  • cmd/loop/testdata/sessions/AGENTS.md
    • Added a new Markdown file documenting how to record sessions, control panel HTTP server helpers, useful regtest flow observations, coverage notes, and replay stability notes.
  • cmd/loop/testdata/sessions/basic-swaps/01_loop-out.json
    • Added a new JSON fixture for a successful loop out command.
  • cmd/loop/testdata/sessions/basic-swaps/02_loop-in.json
    • Added a new JSON fixture for a successful loop in command.
  • cmd/loop/testdata/sessions/basic-swaps/03_loop-monitor.json
    • Added a new JSON fixture for the loop monitor command, including multiple swap statuses and a simulated interrupt.
  • cmd/loop/testdata/sessions/getinfo/01_loop-getinfo.json
    • Added a new JSON fixture for the loop getinfo command.
  • cmd/loop/testdata/sessions/instantout/01_loop-reservations-list.json
    • Added a new JSON fixture for listing instant out reservations.
  • cmd/loop/testdata/sessions/instantout/02_loop-instantout.json
    • Added a new JSON fixture for a successful loop instantout command using 'ALL' reservations.
  • cmd/loop/testdata/sessions/instantout/03_loop-listinstantouts.json
    • Added a new JSON fixture for listing instant outs.
  • cmd/loop/testdata/sessions/instantout/04_loop-instantout-no-confirmed.json
    • Added a new JSON fixture for loop instantout when no confirmed reservations are found.
  • cmd/loop/testdata/sessions/instantout/05_loop-instantout-cancel.json
    • Added a new JSON fixture for a canceled loop instantout command.
  • cmd/loop/testdata/sessions/instantout/06_loop-instantout-invalid-selection.json
    • Added a new JSON fixture for loop instantout with an invalid reservation selection.
  • cmd/loop/testdata/sessions/instantout/07_loop-instantout-channel.json
    • Added a new JSON fixture for loop instantout specifying a channel.
  • cmd/loop/testdata/sessions/instantout/08_loop-instantout-select-index.json
    • Added a new JSON fixture for loop instantout with a specific index selection.
  • cmd/loop/testdata/sessions/l402/01_loop-listauth.json
    • Added a new JSON fixture for listing L402 tokens.
  • cmd/loop/testdata/sessions/l402/02_loop-fetchl402.json
    • Added a new JSON fixture for fetching an L402 token.
  • cmd/loop/testdata/sessions/liquidity/01_loop-getparams.json
    • Added a new JSON fixture for getting liquidity parameters.
  • cmd/loop/testdata/sessions/liquidity/02_loop-setparams-no-flags.json
    • Added a new JSON fixture for loop setparams with no flags, resulting in an error.
  • cmd/loop/testdata/sessions/liquidity/03_loop-setparams-feepercent.json
    • Added a new JSON fixture for loop setparams with feepercent.
  • cmd/loop/testdata/sessions/liquidity/04_loop-setparams-feepercent-conflict.json
    • Added a new JSON fixture for loop setparams with conflicting feepercent and sweeplimit flags.
  • cmd/loop/testdata/sessions/liquidity/05_loop-setrule-missing-threshold.json
    • Added a new JSON fixture for loop setrule with a missing threshold, resulting in an error.
  • cmd/loop/testdata/sessions/liquidity/06_loop-setrule-no-args.json
    • Added a new JSON fixture for loop setrule with no arguments, resulting in an error.
  • cmd/loop/testdata/sessions/liquidity/07_loop-suggestswaps.json
    • Added a new JSON fixture for loop suggestswaps with no rules set, resulting in an error.
  • cmd/loop/testdata/sessions/liquidity/08_loop-setrule-success.json
    • Added a new JSON fixture for a successful loop setrule command.
  • cmd/loop/testdata/sessions/liquidity/09_loop-suggestswaps-success.json
    • Added a new JSON fixture for a successful loop suggestswaps command.
  • cmd/loop/testdata/sessions/liquidity/10_loop-setrule-incoming.json
    • Added a new JSON fixture for loop setrule with an incoming threshold.
  • cmd/loop/testdata/sessions/liquidity/11_loop-setrule-outgoing-type-in.json
    • Added a new JSON fixture for loop setrule with an outgoing threshold and type 'in', resulting in an error.
  • cmd/loop/testdata/sessions/liquidity/12_loop-setrule-clear.json
    • Added a new JSON fixture for loop setrule with the --clear flag.
  • cmd/loop/testdata/sessions/liquidity/13_loop-setparams-many-flags-error.json
    • Added a new JSON fixture for loop setparams with many flags, resulting in a sweep fee rate error.
  • cmd/loop/testdata/sessions/liquidity/14_loop-setparams-many-flags-error-minamt.json
    • Added a new JSON fixture for loop setparams with many flags, resulting in a minimum swap amount error.
  • cmd/loop/testdata/sessions/liquidity/15_loop-setparams-many-flags.json
    • Added a new JSON fixture for loop setparams with many flags, successfully setting parameters.
  • cmd/loop/testdata/sessions/liquidity/16_loop-setparams-destaddr.json
    • Added a new JSON fixture for loop setparams with a destination address.
  • cmd/loop/testdata/sessions/liquidity/17_loop-setparams-account.json
    • Added a new JSON fixture for loop setparams with an account and account address type.
  • cmd/loop/testdata/sessions/liquidity/18_loop-setparams-includeallpeers.json
    • Added a new JSON fixture for loop setparams with easyautoloop_includeallpeers.
  • cmd/loop/testdata/sessions/liquidity/19_loop-setrule-type-out.json
    • Added a new JSON fixture for loop setrule with an incoming threshold and type 'out'.
  • cmd/loop/testdata/sessions/loopin/01_loop-in-invalid-amount.json
    • Added a new JSON fixture for loop in with an invalid amount.
  • cmd/loop/testdata/sessions/loopin/02_loop-in-external-conf-target.json
    • Added a new JSON fixture for loop in with conflicting external and conf_target flags.
  • cmd/loop/testdata/sessions/loopin/03_loop-in-route-hints-private.json
    • Added a new JSON fixture for loop in with conflicting route_hints and private flags.
  • cmd/loop/testdata/sessions/loopin/04_loop-in-external-cancel.json
    • Added a new JSON fixture for a canceled external loop in.
  • cmd/loop/testdata/sessions/loopin/05_loop-in-external-verbose-cancel.json
    • Added a new JSON fixture for a verbose and canceled external loop in.
  • cmd/loop/testdata/sessions/loopin/06_loop-in-external-force.json
    • Added a new JSON fixture for a forced external loop in.
  • cmd/loop/testdata/sessions/loopout/01_loop-out.json
    • Added a new JSON fixture for a forced and fast loop out.
  • cmd/loop/testdata/sessions/loopout/02_loop-out-invalid-amount.json
    • Added a new JSON fixture for loop out with an invalid amount.
  • cmd/loop/testdata/sessions/loopout/03_loop-out-addr-account.json
    • Added a new JSON fixture for loop out with conflicting addr and account flags.
  • cmd/loop/testdata/sessions/loopout/04_loop-out-invalid-account-addr-type.json
    • Added a new JSON fixture for loop out with an invalid account address type.
  • cmd/loop/testdata/sessions/loopout/05_loop-out-amt-channel-maxfee-timeout.json
    • Added a new JSON fixture for loop out with amount, channel, max routing fee, and payment timeout flags.
  • cmd/loop/testdata/sessions/loopout/06_loop-out-addr-flag.json
    • Added a new JSON fixture for loop out with an address flag.
  • cmd/loop/testdata/sessions/loopout/07_loop-out-addr-positional.json
    • Added a new JSON fixture for loop out with a positional address argument.
  • cmd/loop/testdata/sessions/loopout/08_loop-out-account.json
    • Added a new JSON fixture for loop out with an account and account address type.
  • cmd/loop/testdata/sessions/misc/01_loop-terms.json
    • Added a new JSON fixture for the loop terms command.
  • cmd/loop/testdata/sessions/quote/01_loop-quote-out.json
    • Added a new JSON fixture for loop quote out.
  • cmd/loop/testdata/sessions/quote/02_loop-quote-in-help.json
    • Added a new JSON fixture for loop quote in --help.
  • cmd/loop/testdata/sessions/quote/03_loop-quote-out-help.json
    • Added a new JSON fixture for loop quote out --help.
  • cmd/loop/testdata/sessions/quote/04_loop-quote-in.json
    • Added a new JSON fixture for loop quote in with a deposit outpoint.
  • cmd/loop/testdata/sessions/quote/05_loop-quote-in-last-hop.json
    • Added a new JSON fixture for loop quote in with a last hop.
  • cmd/loop/testdata/sessions/quote/06_loop-quote-out-verbose.json
    • Added a new JSON fixture for a verbose loop quote out.
  • cmd/loop/testdata/sessions/quote/07_loop-quote-in-verbose.json
    • Added a new JSON fixture for a verbose loop quote in.
  • cmd/loop/testdata/sessions/static-filters/01_loop-static-listdeposits-deposited.json
    • Added a new JSON fixture for loop static listdeposits --filter deposited.
  • cmd/loop/testdata/sessions/static-filters/02_loop-static-listdeposits-withdrawing.json
    • Added a new JSON fixture for loop static listdeposits --filter withdrawing.
  • cmd/loop/testdata/sessions/static-filters/03_loop-static-listdeposits-withdrawn.json
    • Added a new JSON fixture for loop static listdeposits --filter withdrawn.
  • cmd/loop/testdata/sessions/static-filters/04_loop-static-listdeposits-looping_in.json
    • Added a new JSON fixture for loop static listdeposits --filter looping_in.
  • cmd/loop/testdata/sessions/static-filters/05_loop-static-listdeposits-looped_in.json
    • Added a new JSON fixture for loop static listdeposits --filter looped_in.
  • cmd/loop/testdata/sessions/static-filters/06_loop-static-listdeposits-publish_expired_deposit.json
    • Added a new JSON fixture for loop static listdeposits --filter publish_expired_deposit.
  • cmd/loop/testdata/sessions/static-filters/07_loop-static-listdeposits-sweep_htlc_timeout.json
    • Added a new JSON fixture for loop static listdeposits --filter sweep_htlc_timeout.
  • cmd/loop/testdata/sessions/static-filters/08_loop-static-listdeposits-htlc_timeout_swept.json
    • Added a new JSON fixture for loop static listdeposits --filter htlc_timeout_swept.
  • cmd/loop/testdata/sessions/static-filters/09_loop-static-listdeposits-wait_for_expiry_sweep.json
    • Added a new JSON fixture for loop static listdeposits --filter wait_for_expiry_sweep.
  • cmd/loop/testdata/sessions/static-filters/10_loop-static-listdeposits-expired.json
    • Added a new JSON fixture for loop static listdeposits --filter expired.
  • cmd/loop/testdata/sessions/static-filters/11_loop-static-listdeposits-failed.json
    • Added a new JSON fixture for loop static listdeposits --filter failed.
  • cmd/loop/testdata/sessions/static-loop-in/01_loop-static-new.json
    • Added a new JSON fixture for loop static new.
  • cmd/loop/testdata/sessions/static-loop-in/04_loop-static.json
    • Added a new JSON fixture for loop static (help output).
  • cmd/loop/testdata/sessions/static-loop-in/05_loop-static-listunspent.json
    • Added a new JSON fixture for loop static listunspent.
  • cmd/loop/testdata/sessions/static-loop-in/06_loop-static-l.json
    • Added a new JSON fixture for loop static l (alias for listunspent).
  • cmd/loop/testdata/sessions/static-loop-in/07_loop-static-listdeposits.json
    • Added a new JSON fixture for loop static listdeposits.
  • cmd/loop/testdata/sessions/static-loop-in/08_loop-static-listunspent.json
    • Added a new JSON fixture for loop static listunspent (with confirmations).
  • cmd/loop/testdata/sessions/static-loop-in/09_loop-static-listunspent.json
    • Added a new JSON fixture for loop static listunspent (with more confirmations).
  • cmd/loop/testdata/sessions/static-loop-in/10_loop-static-listdeposits.json
    • Added a new JSON fixture for loop static listdeposits (with a deposited item).
  • cmd/loop/testdata/sessions/static-loop-in/11_loop-static-summary.json
    • Added a new JSON fixture for loop static summary.
  • cmd/loop/testdata/sessions/static-loop-in/12_loop-static-in.json
    • Added a new JSON fixture for loop static in with an unknown quote request error.
  • cmd/loop/testdata/sessions/static-loop-in/13_loop-static-in.json
    • Added a new JSON fixture for loop static in --all with a swap amount too high error.
  • cmd/loop/testdata/sessions/static-loop-in/14_loop-static-in.json
    • Added a new JSON fixture for loop static in --help.
  • cmd/loop/testdata/sessions/static-loop-in/15_loop-static-in.json
    • Added a new JSON fixture for a successful loop static in with amount and all deposits.
  • cmd/loop/testdata/sessions/static-loop-in/16_loop-static-in-duplicate-outpoints.json
    • Added a new JSON fixture for loop static in with duplicate outpoints, resulting in an error.
  • cmd/loop/testdata/sessions/static-loop-in/17_loop-static-in-positional-low-amt.json
    • Added a new JSON fixture for loop static in with a positional low amount, resulting in an error.
  • cmd/loop/testdata/sessions/static-loop-in/18_loop-static-in-positional-payment-timeout.json
    • Added a new JSON fixture for loop static in with positional amount, payment timeout, and last hop.
  • cmd/loop/testdata/sessions/static-loop-in/19_loop-static-in-all-cancel.json
    • Added a new JSON fixture for a canceled loop static in --all command.
  • cmd/loop/testdata/sessions/static/01_loop-static-withdraw-no-selection.json
    • Added a new JSON fixture for loop static withdraw with no selection, resulting in an error.
  • cmd/loop/testdata/sessions/static/02_loop-static-listwithdrawals.json
    • Added a new JSON fixture for loop static listwithdrawals.
  • cmd/loop/testdata/sessions/static/03_loop-static-listswaps.json
    • Added a new JSON fixture for loop static listswaps.
  • cmd/loop/testdata/sessions/static/04_loop-static-withdraw-invalid-utxo.json
    • Added a new JSON fixture for loop static withdraw with an invalid UTXO format.
  • cmd/loop/testdata/sessions/static/05_loop-static-withdraw-all.json
    • Added a new JSON fixture for loop static withdraw --all.
  • cmd/loop/testdata/sessions/static/06_loop-static-withdraw-utxo-destaddr.json
    • Added a new JSON fixture for loop static withdraw with a specific UTXO and destination address.
  • cmd/loop/testdata/sessions/swaps/01_loop-listswaps.json
    • Added a new JSON fixture for loop listswaps.
  • cmd/loop/testdata/sessions/swaps/02_loop-swapinfo.json
    • Added a new JSON fixture for loop swapinfo.
  • cmd/loop/testdata/sessions/swaps/03_loop-abandonswap.json
    • Added a new JSON fixture for loop abandonswap (help output).
  • cmd/loop/testdata/sessions/swaps/04_loop-listswaps-conflicting-filters.json
    • Added a new JSON fixture for loop listswaps with conflicting filters (--loop_out_only and --loop_in_only).
  • cmd/loop/testdata/sessions/swaps/05_loop-swapinfo-invalid-id.json
    • Added a new JSON fixture for loop swapinfo with an invalid ID.
  • cmd/loop/testdata/sessions/swaps/06_loop-abandonswap-invalid-id.json
    • Added a new JSON fixture for loop abandonswap with an invalid ID.
  • cmd/loop/testdata/sessions/swaps/07_loop-listswaps-loop-out-filtered.json
    • Added a new JSON fixture for loop listswaps filtered by loop out, channel, last hop, label, start time, and max swaps.
  • cmd/loop/testdata/sessions/swaps/08_loop-listswaps-loop-in-only.json
    • Added a new JSON fixture for loop listswaps filtered by loop in only.
  • cmd/loop/testdata/sessions/swaps/09_loop-swapinfo-id-flag-parse-error.json
    • Added a new JSON fixture for loop swapinfo with an ID flag that causes a parsing error.
  • cmd/loop/testdata/sessions/swaps/10_loop-swapinfo-id-flag.json
    • Added a new JSON fixture for loop swapinfo with an ID flag, resulting in an invalid swap ID error.
  • cmd/loop/testdata/sessions/swaps/11_loop-abandonswap-success.json
    • Added a new JSON fixture for a successful loop abandonswap.
  • go.mod
    • Added github.com/google/go-cmp v0.7.0 to module dependencies.
Activity
  • The pull request introduces a new testing framework for the CLI, involving significant code additions for session recording, replay, and related utilities.
  • Numerous test data files (JSON fixtures) have been added, representing a wide range of CLI command executions and their expected outputs.
  • The core CLI logic has been modified to integrate with the new session recording and replay mechanisms, including changes to gRPC client connections and standard I/O handling.
  • The changes reflect an initial implementation phase, focusing on establishing the framework and populating it with comprehensive test cases.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive session recording and replay framework for testing the Loop CLI. This is a significant improvement that will make the CLI more robust and regression-resistant. The implementation is well-structured, abstracting away dependencies like the gRPC connection, clock, and stdio to allow for deterministic testing. The recorded test sessions are extensive and cover a wide range of commands and scenarios. The code is of high quality, and I only have one minor suggestion to improve the clarity of the replay logic.

Comment thread cmd/loop/session_replay_test.go Outdated
@starius starius force-pushed the cli-tests20 branch 2 times, most recently from 6fca25d to 763c480 Compare February 5, 2026 07:02
@starius starius marked this pull request as ready for review February 5, 2026 07:02
@starius starius requested a review from hieblmi February 19, 2026 15:23
@bhandras bhandras self-requested a review March 31, 2026 16:04
@starius starius mentioned this pull request Apr 9, 2026
@starius
Copy link
Copy Markdown
Collaborator Author

starius commented Apr 9, 2026

Rebased, add recorded sessions for newly added commands and flags (the last commit).

@kaldun-tech
Copy link
Copy Markdown

kaldun-tech commented Apr 12, 2026

My biggest question about the design for fixture maintenance. With 114 fixture files, what's the plan for bulk updates when CLI output format changes? A few things that would help long-term:

  1. A make record-sessions target, even if it requires manual regtest setup
  2. An update mode for tests in session_replay_test.go (UPDATE_FIXTURES=true go test ...) to regenerate expected output
  3. Consider normalizing metadata.version in comparisons so releases don't invalidate all fixtures

I don't think it's a blocker for this test update. Just want to understand the maintenance story before this grows further. Otherwise seems solid!

}

// finalize writes the recorded session to disk once.
func (r *sessionRecorder) finalize(runErr error) error {
Copy link
Copy Markdown

@kaldun-tech kaldun-tech Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: consider writing session even if stopHooks fails, to preserve debugging data

From Claude, the realistic risk is low. stopHooks() just calls unhook functions that restore original file descriptors. Failures here would be unusual (maybe a closed pipe, or dup2 failure). In normal operation this won't happen.

It's suggestion is:

r.finalizeOnce.Do(func() {
    // Still try to stop hooks, but don't abandon ship on error
    hookErr := r.stopHooks()

    r.logExit(runErr)

    // ... write session to disk ...
    // Return hook error only if write succeeded
    if finalizeErr == nil {
        finalizeErr = hookErr
    }
})

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in cmd/loop: keep sessions on hook errors. Finalization now still records the exit state and writes the session file even if hook cleanup returns an error, and only reports the cleanup error after the file has been persisted. That keeps the debugging artifact without hiding an unusual teardown failure.


// copyExportedFields copies exported fields from src into dst.
func copyExportedFields(dst, src reflect.Value) {
// Iterate fields and copy only exported ones.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moderate concern here: This copies only exported fields from cli.Command, cli.Flag, and cli.Argument structs. Here's the points where it's fragile:

  1. Unexported state is silently ignored

If urfave/cli stores important state in unexported fields, the clone won't have it. For example, if
cli.StringFlag had:

type StringFlag struct {
    Name  string      // exported - copied
    value string      // unexported - NOT copied
    isSet bool        // unexported - NOT copied
}

The test at line 1071 (require.False(t, clonedAlpha.IsSet())) actually relies on this behavior - it expects
IsSet() to return false because the unexported state isn't copied. So currently this works in your favor.

  1. Structural assumptions about cli.Command

The code explicitly handles these fields (line 688-692):

  • Flags
  • MutuallyExclusiveFlags
  • Arguments
  • Commands

If urfave/cli v3.5 adds a new field like ConditionalFlags []cli.Flag, the cloning code won't deep-clone it - it'll share references with the original.

  1. Version pinning mitigates but doesn't eliminate risk

The repo is on urfave/cli v3.4.1. When you upgrade, the test TestCloneCommandForReplayResetsFlagState should catch obvious breaks, but it doesn't comprehensively cover all cli.Command fields.

Suggestions:

  1. Add a comment documenting the urfave/cli version dependency:
// cloneCommandForReplay deep-clones a command tree for deterministic replays.
// NOTE: This relies on urfave/cli/v3 internal structure. If upgrading cli
// versions, verify TestCloneCommandForReplayResetsFlagState still passes.
  1. Consider a simpler alternative: Since this is test-only, you could just call newRootCommand() fresh for each test instead of cloning. Is cloning is actually necessary?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in cmd/loop: document replay cloning. I kept cloning and added the missing explanation: the CLI keeps package-level command pointers with mutable flag state, so reusing one root would leak IsSet and related state across replayed sessions. The comment also calls out that the clone logic depends on the current urfave/cli structure and points future upgrades at TestCloneCommandForReplayResetsFlagState.

Comment thread cmd/loop/session_stdio.go
defer r.Close()

writer := composeWriter(forward, onChunk)
_, _ = io.Copy(writer, r)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor/Nit: You have a lot of error swallowing in this file. Consider logging io.Copy errors in session_stdio.go for debugging incomplete recordings, or documenting why they're intentionally ignored.

Claude suggests:

go func() {
    defer wg.Done()
    defer r.Close()

    writer := composeWriter(forward, onChunk)
    if _, err := io.Copy(writer, r); err != nil {
        // At minimum, log it for debugging
        log.Debugf("session stdout copy error: %v", err)
    }
}()

Or track errors in a struct field and surface them in finalize():

type sessionRecorder struct {
    // ...
    copyErr error  // Set if io.Copy fails
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in cmd/loop: clarify ignored stdio errors. I kept the current behavior but documented why those background io.Copy errors are intentionally ignored. During hook restoration the pipe endpoints are closed on purpose, and that can terminate the copy loops with an expected error. Logging those teardown paths would mostly add noise rather than improve diagnosis.

}

// cloneCommandForReplay deep-clones a command tree for deterministic replays.
func cloneCommandForReplay(cmd *cli.Command) *cli.Command {
Copy link
Copy Markdown

@kaldun-tech kaldun-tech Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be more concise on this method. Do you need to clone at all vs calling newRootCommand()? If there is a reason cloning is needed I recommend to add a comment to document this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out why cloning is necessary. The package-level command pointers (loopOutCommand, etc.) have mutable flag state that would bleed otherwise between test runs. Recommend to add a comment explaining this, something like:

// cloneCommandForReplay deep-clones a command tree because the CLI uses
// package-level command pointers with mutable flag state. Without cloning,
// flag state from one replay would bleed into subsequent replays.
//
// NOTE: This uses reflection to copy exported fields only. If upgrading
// urfave/cli versions, verify TestCloneCommandForReplayResetsFlagState
// still passes.
func cloneCommandForReplay(cmd *cli.Command) *cli.Command {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in cmd/loop: document replay cloning. I kept cloning and added the missing explanation: the CLI keeps package-level command pointers with mutable flag state, so reusing one root would leak IsSet and related state across replayed sessions. The comment also calls out that the clone logic depends on the current urfave/cli structure and points future upgrades at TestCloneCommandForReplayResetsFlagState.

Comment thread cmd/loop/main.go

// forceDeterministicJSON is enabled by tests to obtain stable JSON
// output.
forceDeterministicJSON bool
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these package-level variables be encapsulated in a context or config struct? Looks fine for a CLI. May complicate future refactorings.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the package-level CLI hooks as-is in this series. They are scoped to the CLI runtime and the record/replay harness, and moving them behind a config or context object would turn this into a broader refactor than this PR needs.

return false
}

// TestRecordedSessions replays all recorded sessions and compares output.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test parallelism safety (minor): The tests modify package globals (sessionRec, sessionTransport, cliClock, forceDeterministicJSON) without synchronization. This is fine since tests run sequentially. A comment could prevent future breakage:

// TestRecordedSessions replays all recorded sessions and compares output.
// NOTE: Do not add t.Parallel() - this test modifies package-level globals.
func TestRecordedSessions(t *testing.T) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in cmd/loop: note replay test stays serial. I added an explicit note above TestRecordedSessions that it must not use t.Parallel() because the replay harness swaps package-level globals for transport, clock, recorder state, and JSON normalization.

@lightninglabs-deploy
Copy link
Copy Markdown

@bhandras: review reminder
@hieblmi: review reminder
@starius, remember to re-request review from reviewers when ready

@starius
Copy link
Copy Markdown
Collaborator Author

starius commented May 4, 2026

My biggest question about the design for fixture maintenance. With 114 fixture files, what's the plan for bulk updates when CLI output format changes? A few things that would help long-term:

1. A make record-sessions target, even if it requires manual regtest setup

2. An update mode for tests in session_replay_test.go (UPDATE_FIXTURES=true go test ...) to regenerate expected output

3. Consider normalizing metadata.version in comparisons so releases don't invalidate all fixtures

I don't think it's a blocker for this test update. Just want to understand the maintenance story before this grows further. Otherwise seems solid!

Thanks for suggestion!

I implemented re-generation of all fixtures when only stdout and stderr changed (see commit "cmd/loop: bless text-only session updates", env var LOOP_UPDATE_RECORDED_SESSIONS). If a broader change was deployed, it makes sense to regenerate the fixtures. Fortunately AI is capable of doing it.

I also changed the format of fixtures to have multi-line stdout and stderr outputs actually stored as multiline ([]string in JSON). Then it is easier to spot the changes than when it is all in one long line with \n inside.

I pushed new commits instead of squashing to make it easier to follow the changes. I'll rebase the changes later. Also I'm aware that the CI is broken now - I'll fix it later as well.

@starius
Copy link
Copy Markdown
Collaborator Author

starius commented May 4, 2026

Rebased - that was the only way to fix CI, because it runs against a merged state with master, and there was a change in CLI texts in master recently. Fixed that in "fixup! testdata: add static address and swaps sessions"

@starius starius requested a review from kaldun-tech May 4, 2026 18:37
@kaldun-tech
Copy link
Copy Markdown

Cool, I'm wrapping up my review. This is a really excellent PR with a lot of strong changes including the deterministic replay and bless mode safety. Have a couple more non-blocking suggestions.

Comment thread cmd/loop/session_stdio.go
_ = w.Close()
if !useOrig {
wg.Wait()
}
Copy link
Copy Markdown

@kaldun-tech kaldun-tech May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I'd like to add a comment here to explain the intentional behavior of the goroutine, that it skips waiting for copy. For example:

// When reading from the original terminal stdin, we skip waiting // for the copy goroutine. It blocks on terminal input indefinitely // and will terminate when the pipe closes or the process exits. // Waiting here would hang the CLI.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I added the comment.

data, err := json.Marshal(payload)
if err != nil {
return
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: consider logging marshal errors here to stderr

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging to stderr may be problematic here, since stderr may be hooked.

I fixed it by saving the error and checking it from Finalize() method.

Copy link
Copy Markdown

@kaldun-tech kaldun-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple other things I checked here:

  • The testdata fixtures do not contain sensitive data.
  • Did not find any concerns with race conditions in session_recorder
  • We could add a Makefile target for recording sessions in a follow-up PR, but this would only wrap a one-liner already documented in AGENTS.md. Don't think it's needed.
  • Bless mode is great!

@starius starius force-pushed the cli-tests20 branch 2 times, most recently from f4fff1a to 71d2249 Compare May 10, 2026 04:38
@starius
Copy link
Copy Markdown
Collaborator Author

starius commented May 10, 2026

@kaldun-tech Thanks for review! I addressed the comments and rebased the PR, to make atomic Git commits.

@hieblmi
Copy link
Copy Markdown
Collaborator

hieblmi commented May 12, 2026

@claude please review this

@claude
Copy link
Copy Markdown

claude Bot commented May 12, 2026

Code Review: CLI Session Recording and Replay

This is a well-conceived feature that brings deterministic, CI-friendly regression testing to the Loop CLI. The overall design is sound and the implementation is thorough. Below are observations organized by severity.


Potential Bugs / Behavioral Changes

1. Silent behavioral regression in abandonSwap (swaps.go)

The removal of the cmd.IsSet("id") case silently changes argument priority. Previously, --id <value> took precedence over a positional argument when both were supplied. After this change, a positional argument always wins (since the flag case is gone and only cmd.NArg() > 0 remains). This could break users who rely on --id.

If this is intentional (e.g., the flag is being deprecated in favour of positional args), it should be documented and probably also enforced with a runtime warning when users still pass --id.

2. Hook error silently dropped when file write fails in finalize

Inside finalizeOnce.Do, the function collects hookErr := r.stopHooks() early on, but if os.Create or encoder.Encode fail, the function returns early and hookErr is never surfaced. Consider using errors.Join(err, hookErr) in those early-return paths so hook errors are not silently lost.


Architecture / Design Concerns

3. Package-level global state makes tests non-parallelisable

sessionRec, cliClock, sessionTransport, and forceDeterministicJSON are all package-level globals mutated by the test harness. The note in TestRecordedSessions acknowledges this. This is understandable given the CLI's design, but it means the entire session_replay_test.go suite is inherently sequential. A potential follow-up: wrapping these in a cliEnv struct passed through context could eventually allow parallelism.

4. Reflection-based command cloning is fragile against urfave/cli upgrades

cloneCommandForReplay copies exported fields via reflection and relies on urfave/cli storing all replay-relevant state in exported fields. The existing TestCloneCommandForReplayResetsFlagState provides a safety net, but only covers fields explicitly set in that test. Consider filing an upstream issue requesting a Clone() method on cli.Command for a more stable interface.

5. sessionDefaultDir is a hardcoded relative path

sessionDefaultDir = "cmd/loop/testdata/sessions" only works when the binary is run from the repository root. This is documented in the new AGENTS.md, but a runtime check or warning when the directory does not exist would be more user-friendly.


Minor Issues

6. Redundant contains wrapper

The contains(values []string, v string) bool function only wraps slices.Contains and is called in just two places. Call slices.Contains directly and remove the wrapper.

7. Five separate sessionRec nil checks in main()

The recording setup is scattered across five separate if sessionRec != nil blocks. Consider consolidating into a single helper like installSessionRecorder() that handles clock, transport, context injection, and signal handler in one place, keeping the normal code path cleaner.

8. Double-signal os.Exit bypasses deferred cleanup

In installSessionSignalHandler, a second signal calls os.Exit(130) from a goroutine, bypassing all deferred functions in main(). This is acceptable for a signal handler since sessionRec.Finalize is correctly called before exit, but a comment noting that deferred cleanup will not run would help future maintainers.

9. TestUpdateRecordedSessionsEnabled env var setup

The test saves the existing env var value via t.Setenv and then immediately clears it with os.Unsetenv. The t.Setenv to an empty string is redundant before the immediate manual unset and can be simplified.


Positive Highlights

  • The bless mode (LOOP_UPDATE_RECORDED_SESSIONS=true) is well-designed: it refuses to bless success-to-failure transitions and validates the gRPC interaction is unchanged before updating text output.
  • Timestamp normalization to UTC via regex ensures sessions are portable across timezones.
  • collectSessionEnv limiting capture to LOOPCLI_* variables is a good security practice — no accidental credential leakage.
  • finalizeOnce correctly prevents double-write when both fatal() and the normal completion path call Finalize.
  • The comprehensive fixture coverage map in AGENTS.md is very useful for future contributors.
  • chunkWriter.Write correctly copies the buffer before handing it to the callback, preventing data races with the caller's slice.

Summary

The main concern is the behavioral regression in abandonSwap (point 1) — please confirm whether dropping the --id flag priority is intentional and document it if so. The hook-error drop on write failure (point 2) is a correctness issue worth fixing. The rest are improvements that could be addressed in follow-up PRs.

starius added 18 commits May 13, 2026 00:13
Make the output homedir-independent to facilitate tests.
abandonSwap used to access flag --id which does not exist.
Add an env-gated bless mode to recorded-session replay so
CLI-only text changes can refresh fixtures without a live
recording pass. The replay still uses the recorded gRPC
traffic, stdin, and environment, and it only rewrites
stdout, stderr, and run_error after the command preserves
the recorded success or failure shape.

Keep the updater strict by refusing to bless sessions when
replay leaves recorded gRPC events unconsumed, and add
focused tests for the rewrite rules. Document the bless
workflow next to the session fixtures, including the need
for -count=1 so the Go test cache does not skip updates.
Correct the duplicated article in the static openchannel
help text so the command summary reads cleanly. Refresh the
recorded help fixtures that surface that summary directly,
including the dedicated openchannel help output and the
parent static command listing.
@starius
Copy link
Copy Markdown
Collaborator Author

starius commented May 13, 2026

1. Silent behavioral regression in abandonSwap (swaps.go)

The removal of the cmd.IsSet("id") case silently changes argument priority. Previously, --id <value> took precedence over a positional argument when both were supplied. After this change, a positional argument always wins (since the flag case is gone and only cmd.NArg() > 0 remains). This could break users who rely on --id.

If this is intentional (e.g., the flag is being deprecated in favour of positional args), it should be documented and probably also enforced with a runtime warning when users still pass --id.

This one is not a supported CLI behavior change. loop abandonswap does not define an --id flag, so the removed branch was orphan flag access rather than a user-visible flag priority rule. loop swapinfo has an --id flag and keeps the existing --id precedence; loop abandonswap only accepts the positional ID plus --i_know_what_i_am_doing.

Relevant current shape:

var abandonSwapCommand = &cli.Command{
	Name:      "abandonswap",
	ArgsUsage: "ID",
	Flags: []cli.Flag{
		&cli.BoolFlag{Name: "i_know_what_i_am_doing"},
	},
	Action: abandonSwap,
}

I left the code as-is here because reintroducing the branch would keep dead code around for a flag the command cannot parse.

2. Hook error silently dropped when file write fails in finalize

Inside finalizeOnce.Do, the function collects hookErr := r.stopHooks() early on, but if os.Create or encoder.Encode fail, the function returns early and hookErr is never surfaced. Consider using errors.Join(err, hookErr) in those early-return paths so hook errors are not silently lost.

Fixed in cmd/loop: add session recorder plumbing. finalize now joins write errors with deferred recorder event errors and hook cleanup errors, including the early-return paths for directory creation, file creation, and JSON encoding.

Relevant part:

file, err := os.Create(r.filePath)
if err != nil {
	finalizeErr = errors.Join(err, eventErr, hookErr)
	return
}

encoder := json.NewEncoder(file)
encoder.SetIndent("", "  ")
if err := encoder.Encode(fileContent); err != nil {
	finalizeErr = errors.Join(err, eventErr, hookErr)
	return
}

I also updated the finalize test so it asserts both the write failure and hook failure are returned:

err := recorder.Finalize(nil)
require.ErrorContains(t, err, "is a directory")
require.ErrorContains(t, err, "stdout restore failed")

3. Package-level global state makes tests non-parallelisable

sessionRec, cliClock, sessionTransport, and forceDeterministicJSON are all package-level globals mutated by the test harness. The note in TestRecordedSessions acknowledges this. This is understandable given the CLI's design, but it means the entire session_replay_test.go suite is inherently sequential. A potential follow-up: wrapping these in a cliEnv struct passed through context could eventually allow parallelism.

Leaving this as a follow-up. The replay test already documents that it must stay serial because the CLI uses package-level command/runtime state. Moving this into a cliEnv would be a larger CLI plumbing refactor and is not necessary for this PR’s test feature.

Current guardrail:

// NOTE: Do not add t.Parallel() here; the replay harness mutates package-level
// globals such as the active transport, clock, and JSON normalization mode.
func TestRecordedSessions(t *testing.T) {

4. Reflection-based command cloning is fragile against urfave/cli upgrades

cloneCommandForReplay copies exported fields via reflection and relies on urfave/cli storing all replay-relevant state in exported fields. The existing TestCloneCommandForReplayResetsFlagState provides a safety net, but only covers fields explicitly set in that test. Consider filing an upstream issue requesting a Clone() method on cli.Command for a more stable interface.

Leaving this unchanged in the PR. The code now has an explicit comment explaining why replay clones the command tree and what urfave/cli upgrade risk to watch for, and the regression test covers the mutable flag/argument state that caused the issue here.

Relevant comment:

// NOTE: This relies on urfave/cli storing replay-relevant configuration in
// exported fields. If upgrading urfave/cli, verify that
// TestCloneCommandForReplayResetsFlagState still passes.
func cloneCommandForReplay(cmd *cli.Command) *cli.Command {

An upstream Clone() API would be cleaner, but that is outside this PR and would not remove the need for the local safety test.

5. sessionDefaultDir is a hardcoded relative path

sessionDefaultDir = "cmd/loop/testdata/sessions" only works when the binary is run from the repository root. This is documented in the new AGENTS.md, but a runtime check or warning when the directory does not exist would be more user-friendly.

Fixed in cmd/loop: add session recorder plumbing. When session recording is enabled, the recorder now verifies that cmd/loop/testdata/sessions exists and is a directory before choosing the output file. If it is missing, recording fails with a clear error telling the caller to run from the repository root instead of silently creating a new tree in the wrong directory.

Relevant code:

func ensureSessionBaseDir(baseDir string) error {
	info, err := os.Stat(baseDir)
	if errors.Is(err, fs.ErrNotExist) {
		return fmt.Errorf("%s does not exist; run session recording "+
			"from the repository root", baseDir)
	}
	if err != nil {
		return err
	}
	if !info.IsDir() {
		return fmt.Errorf("%s is not a directory", baseDir)
	}

	return nil
}

I added a unit test for that failure mode:

t.Chdir(t.TempDir())
t.Setenv(sessionEnvVar, "true")

recorder, err := newSessionRecorder([]string{testLoopArg, testOutArg})
require.Nil(t, recorder)
require.EqualError(t, err, "cmd/loop/testdata/sessions does not "+
	"exist; run session recording from the repository root")

6. Redundant contains wrapper

The contains(values []string, v string) bool function only wraps slices.Contains and is called in just two places. Call slices.Contains directly and remove the wrapper.

Fixed in cmd/loop: add session replay tests. The wrapper is gone and the call site now uses slices.Contains directly.

if evt.Event == expected || slices.Contains(alternatives, evt.Event) {
	return &evt, idx, nil
}

7. Five separate sessionRec nil checks in main()

The recording setup is scattered across five separate if sessionRec != nil blocks. Consider consolidating into a single helper like installSessionRecorder() that handles clock, transport, context injection, and signal handler in one place, keeping the normal code path cleaner.

Leaving this unchanged. The separate checks are a little repetitive, but they keep each installed hook close to its lifetime/defer and avoid introducing a broader setup abstraction in this PR. Since all blocks are guarded by sessionRec != nil, the normal CLI path remains straightforward and avoids recorder work entirely.

Representative shape:

if sessionRec != nil {
	restoreClock := hookClock(
		clock.NewTestClock(time.Unix(sessionClockStartUnix, 0)),
	)
	defer restoreClock()

	if err := sessionRec.Start(nil, nil, nil); err != nil {
		fatal(err)
	}
}

8. Double-signal os.Exit bypasses deferred cleanup

In installSessionSignalHandler, a second signal calls os.Exit(130) from a goroutine, bypassing all deferred functions in main(). This is acceptable for a signal handler since sessionRec.Finalize is correctly called before exit, but a comment noting that deferred cleanup will not run would help future maintainers.

Fixed in cmd/loop: add session recorder plumbing. I added a short comment before the second-signal os.Exit(130) path explaining that the session is finalized first because os.Exit skips deferred cleanup.

// A second signal requests immediate shutdown. Finalize the
// session first because os.Exit skips deferred cleanup.
_ = sessionRec.Finalize(fmt.Errorf("signal: %s", sig))
os.Exit(130)

9. TestUpdateRecordedSessionsEnabled env var setup

The test saves the existing env var value via t.Setenv and then immediately clears it with os.Unsetenv. The t.Setenv to an empty string is redundant before the immediate manual unset and can be simplified.

Fixed in cmd/loop: bless text-only session updates. The test setup is simplified and lint-clean. It still covers the unset, true, and invalid cases.

if current, ok := os.LookupEnv(updateRecordedSessionsEnvVar); ok {
	t.Setenv(updateRecordedSessionsEnvVar, current)
} else {
	t.Setenv(updateRecordedSessionsEnvVar, "false")
}
_ = os.Unsetenv(updateRecordedSessionsEnvVar)

enabled, err := updateRecordedSessionsEnabled()
require.NoError(t, err)
require.False(t, enabled)

Copy link
Copy Markdown
Collaborator

@hieblmi hieblmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @starius, I've found a coupe of issues:

  1. cmd/loop/session_replay_test.go:689: normal replay mode does not assert that all recorded gRPC events were consumed. That check only runs in bless mode, so CI can miss regressions where a command stops making a recorded trailing RPC but the visible text still matches.
  2. cmd/loop/main.go:263: live recording replaces the CLI clock with a fixed timestamp. Since that timestamp is January 26, 2026, future loop out / quote out recordings
    send stale swap_publication_deadline values via cmd/loop/loopout.go:251 and cmd/loop/quote.go:209, which can diverge from real CLI behavior or be rejected by a live server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants