Skip to content

test(e2e): upgrade puppeteer to v25 to fix broken e2e navigation#666

Merged
B4nan merged 2 commits into
masterfrom
fix/e2e-puppeteer-flakiness
May 29, 2026
Merged

test(e2e): upgrade puppeteer to v25 to fix broken e2e navigation#666
B4nan merged 2 commits into
masterfrom
fix/e2e-puppeteer-flakiness

Conversation

@B4nan
Copy link
Copy Markdown
Member

@B4nan B4nan commented May 29, 2026

Problem

All 52 puppeteer-backed e2e cases started failing wholesale — every one timing out on navigation, ballooning the E2E tests (Node.js 24) job from ~2 min to ~57 min (run 26636921133).

Root cause is dependency rot triggered by a runner-image bump, not flakiness:

  • Last green master run used GitHub image ubuntu24/20260518.149; the failing runs use ubuntu24/20260525.161.
  • puppeteer was pinned at 19.11.1, whose bundled Chromium (~115, mid-2023) launches but can no longer navigate on the newer OS. curl through the same proxy still passes, confirming the proxy itself is fine and the problem is the stale browser.

Fix

Upgrade puppeteer 19.1125.x, which ships a current Chromium built for the new image. Two renamed launch options:

  • ignoreHTTPSErrorsacceptInsecureCerts
  • headless: 'new'headless: true (the 'new' value was removed; modern headless is the default true)

This also drops an earlier launch/navigate-retry + per-step-timeout experiment that was based on a wrong root cause (assumed flaky spawns) — it didn't help and tripled the failing run's wall-clock time by retrying a browser whose navigation could never succeed.

Verified locally with puppeteer 25 that the new launch options launch and navigate, both directly and through an HTTP proxy.

The puppeteer-backed e2e cases intermittently timed out, with every
puppeteer test in a run hanging for the full 30s mocha timeout (e.g. run
26625946119 took 29m as ~50 tests each burned 30s).

Two root causes:

- The tests opted into `headless: 'new'`, which in the bundled Chromium
  (pptr 19 / Chromium ~115) is markedly slower and flakier to start in CI
  than the legacy `headless: true` mode, for no functional gain here.
- The launch retry only caught thrown errors, not hangs. puppeteer's
  default launch/navigation timeouts equal the mocha timeout, so a stuck
  launch or navigation killed the test before the retry could help.

Switch to legacy headless, and retry the whole launch -> navigate -> read
cycle with a fresh browser, bounding each step with an explicit timeout so
a hang surfaces as a catchable rejection. Puppeteer test timeout is raised
to fit the retry attempts plus backoff.
@B4nan B4nan added the adhoc Ad-hoc unplanned task added during the sprint. label May 29, 2026
@github-actions github-actions Bot added this to the 141st sprint - Tooling team milestone May 29, 2026
@github-actions github-actions Bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels May 29, 2026
@B4nan
Copy link
Copy Markdown
Member Author

B4nan commented May 29, 2026

@bliuchak I'd reconsider running a CI check that takes ~30 minutes, we usually run e2e tests on master on a schedule once a day, it makes little to no sense to block PRs with this.

@B4nan B4nan requested a review from bliuchak May 29, 2026 12:30
The puppeteer e2e tests started failing wholesale (all 52 cases timing
out on navigation) after the GitHub-hosted runner image bumped from
ubuntu24/20260518 to ubuntu24/20260525. puppeteer was pinned at 19.11.1,
whose bundled Chromium (~115, mid-2023) launches but can no longer
navigate on the newer OS, while curl through the same proxy still works.

Upgrade puppeteer to v25, which ships a current Chromium built for the
new image. Adjust the two renamed launch options:
- `ignoreHTTPSErrors` -> `acceptInsecureCerts`
- `headless: 'new'` -> `headless: true` (the `'new'` value was removed; the
  modern headless mode is now the default `true`).

This also reverts the earlier launch/navigate retry + per-step timeout
experiment: it was based on a wrong root cause (assumed flaky spawns),
didn't help, and tripled the failing run's wall-clock time by retrying a
browser whose navigation could never succeed.

Verified locally with puppeteer 25 that the new launch options launch and
navigate, both directly and through an HTTP proxy.
@B4nan B4nan changed the title test(e2e): stabilize flaky puppeteer e2e tests test(e2e): upgrade puppeteer to v25 to fix broken e2e navigation May 29, 2026
@B4nan
Copy link
Copy Markdown
Member Author

B4nan commented May 29, 2026

Huh, so this was all about a dated puppeteer, this PR bumps it to resolve the flakiness and it now runs in ~3 minutes, so let's leave it in PR checks, that's completely reasonable.

@B4nan B4nan merged commit 217e77b into master May 29, 2026
9 checks passed
@B4nan B4nan deleted the fix/e2e-puppeteer-flakiness branch May 29, 2026 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants