How to fix Playwright shard imbalance and long-tail CI jobs?

Playwright's --shard=N/M splits test files across M runners, but the distribution is based on file count — not test count or runtime. One shard can end up with a few large, slow spec files while another runs dozens of fast unit-level tests. The slow shard becomes the bottleneck for the entire CI pipeline, with the other shards sitting idle while they wait for it. Fixing shard imbalance requires either measuring and redistributing test runtimes or restructuring large spec files.

Common mistake

# Splits by file count — shard 1 might get 3 files at 5 min each = 15 min
# Shard 4 might get 30 files at 10 seconds each = 5 min
npx playwright test --shard=1/4
npx playwright test --shard=2/4
npx playwright test --shard=3/4
npx playwright test --shard=4/4

Pipeline wall time is determined by the slowest shard, so imbalance directly extends total CI time.

The fix

Step 1 — Identify slow specs using Playwright's built-in reporter:

npx playwright test --reporter=json > test-results/results.json

Parse the results to find which test files take the longest:

# Quick way to see test duration breakdown
npx playwright test --reporter=line 2>&1 | grep -E "^\s+\d+\.\d+s"

Step 2 — Split large spec files that contain more than a handful of tests:

# Before: one file with 40 tests
tests/
  billing.spec.ts       # 40 tests × 3s each = 2 minutes

# After: split by logical group
tests/billing/
  subscription.spec.ts  # 15 tests
  invoices.spec.ts      # 15 tests
  payment-methods.spec.ts # 10 tests

Step 3 — Configure shard count to match actual runtime distribution:

// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  // Increase shard count to give Playwright more granularity to balance
  // Run with --shard=1/8 through --shard=8/8 in CI
  workers: 2, // Per shard
  fullyParallel: true,
});

Step 4 — Use GitHub Actions matrix to parallelize shards:

strategy:
  matrix:
    shard: [1, 2, 3, 4, 5, 6, 7, 8]

steps:
  - name: Run Playwright tests
    run: npx playwright test --shard=${{ matrix.shard }}/8

Why it works

Playwright assigns test files to shards using a round-robin strategy based on the file list order. More shard count means each shard gets fewer files, reducing the maximum possible imbalance per shard. Splitting large spec files gives the scheduler more granular units to distribute. When all files have similar test counts and per-test runtimes, the distribution becomes naturally balanced without manual intervention.

Tips

Run npx playwright test --list to count tests per file — files with 50+ tests are candidates for splitting.
Use the HTML reporter after a full test run to sort tests by duration and identify the 5 slowest tests — they are often fixable with better waits rather than being inherently slow.
Consider using a dedicated "smoke" shard that runs a small critical subset on every PR and the full sharded run only on merge — reduces feedback loop for most commits.
Playwright's --shard does not balance by test count or duration — this is a known limitation. If imbalance is severe, consider external test orchestration tools that support dynamic sharding.