How to fix flaky Playwright crashes and random browser exits in CI?

Random browser crashes in CI — where tests pass locally but fail unpredictably in CI runners — are almost always resource exhaustion problems. Chromium spawns multiple renderer processes per worker, each consuming significant RAM and shared memory. When the container's memory limit is hit, the OS kills one or more browser processes, producing a stream of TargetClosedError or Protocol error failures with no consistent pattern across runs. Other triggers include shared CI agents being recycled mid-test, GPU process crashes in headless mode, and sandbox permission failures on hardened Linux kernels.

Common mistake

// playwright.config.ts — all defaults, no CI-aware limits
import { defineConfig } from '@playwright/test';

export default defineConfig({
  fullyParallel: true,
  // workers defaults to logical CPU count — on a 16-vCPU runner this is 16 workers
  // Each worker launches a browser: 16 browsers × ~200 MB = 3.2 GB RAM
});

Running with default worker counts on constrained CI containers is the single most common cause of intermittent browser crashes.

The fix

Apply CI-specific limits and diagnostic settings:

import { defineConfig } from '@playwright/test';

export default defineConfig({
  retries: process.env.CI ? 2 : 0,
  workers: process.env.CI ? 2 : undefined,
  timeout: 60000,
  use: {
    trace: 'on-first-retry',
    video: 'on-first-retry',
    screenshot: 'only-on-failure',
    launchOptions: {
      args: [
        '--disable-dev-shm-usage',    // Use /tmp instead of /dev/shm
        '--no-sandbox',               // Required in some CI environments
        '--disable-setuid-sandbox',
      ],
    },
  },
});

For persistent crashes in a specific test suite, isolate it into its own shard with a lower worker count:

# Run the heavy suite with 1 worker in isolation
npx playwright test tests/heavy-suite/ --workers=1

# Run the rest in parallel
npx playwright test --ignore=tests/heavy-suite/ --workers=4

Monitor memory per worker in CI logs and set container memory limits appropriately:

# GitHub Actions example — increase runner memory
runs-on: ubuntu-latest
# For larger jobs:
# runs-on: ubuntu-latest-4-cores

Why it works

--disable-dev-shm-usage redirects Chromium's shared memory usage from the 64 MB Docker default /dev/shm to /tmp, which has access to the full container memory allocation. Reducing workers in CI cuts the peak concurrent memory footprint proportionally — 2 workers instead of 16 can reduce peak memory by 8x. Enabling trace: 'on-first-retry' captures the exact failure point for crashes that do occur, which is essential for distinguishing memory crashes from test logic bugs.

Tips

Check CI runner logs for Linux OOM killer messages (oom_kill_process, Killed process) — they confirm the crash is memory-related, not a Playwright bug.
For GitHub Actions, the standard ubuntu-latest runner has ~7 GB RAM — 2-4 Playwright workers is a safe limit; beyond 6 risks OOM crashes on test suites that load heavy pages.
If traces show the crash happens consistently after the same test, that test may be leaking browser contexts — verify cleanup in afterAll hooks.
Randomizing test order (--shuffle) can help distinguish true flakes from order-dependent failures that look like random crashes.