Integrating Screenshots into CI/CD Pipelines

Adding visual testing to your CI/CD pipeline catches design regressions before they reach production. With a screenshot API, you do not need to install browsers or manage rendering infrastructure in your CI environment.

GitHub Actions example

A complete workflow that captures screenshots of key pages after deployment and compares them to baseline images:

name: Visual Regression Test
on:
  pull_request:
    branches: [main]

jobs:
  visual-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Deploy to preview
        id: deploy
        run: |
          # Your deployment step here
          echo "preview_url=https://preview-${{ github.sha }}.example.com" >> $GITHUB_OUTPUT

      - name: Capture screenshots
        env:
          API_KEY: ${{ secrets.SAVEPAGE_API_KEY }}
          PREVIEW_URL: ${{ steps.deploy.outputs.preview_url }}
        run: |
          mkdir -p screenshots/current

          for page in "/" "/pricing" "/docs" "/about"; do
            filename=$(echo "$page" | sed 's/\//_/g' | sed 's/^_//')
            [ -z "$filename" ] && filename="home"

            curl -s "https://api.savepage.io/v1/?url=${PREVIEW_URL}${page}&width=1440&height=900&format=png" \
              -H "Authorization: Bearer $API_KEY" \
              | jq -r '.image' \
              | xargs curl -s -o "screenshots/current/${filename}.png"
          done

      - name: Compare with baseline
        run: |
          pip install Pillow numpy
          python scripts/compare_screenshots.py

      - name: Upload diff artifacts
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: visual-diff
          path: screenshots/diff/

The comparison script

A Python script that compares current screenshots against baseline images:

import sys
from pathlib import Path
import numpy as np
from PIL import Image

BASELINE_DIR = Path("screenshots/baseline")
CURRENT_DIR = Path("screenshots/current")
DIFF_DIR = Path("screenshots/diff")
THRESHOLD = 0.01  # 1% difference tolerance

DIFF_DIR.mkdir(exist_ok=True)

failures = []

for current_file in CURRENT_DIR.glob("*.png"):
    baseline_file = BASELINE_DIR / current_file.name

    if not baseline_file.exists():
        print(f"NEW: {current_file.name} (no baseline)")
        continue

    current = np.array(Image.open(current_file))
    baseline = np.array(Image.open(baseline_file))

    if current.shape != baseline.shape:
        print(f"FAIL: {current_file.name} (size changed)")
        failures.append(current_file.name)
        continue

    diff = np.abs(current.astype(int) - baseline.astype(int))
    diff_ratio = np.count_nonzero(diff > 10) / diff.size

    if diff_ratio > THRESHOLD:
        print(f"FAIL: {current_file.name} ({diff_ratio:.2%} changed)")
        failures.append(current_file.name)

        # Save diff image
        diff_image = Image.fromarray(
            np.clip(diff * 5, 0, 255).astype(np.uint8)
        )
        diff_image.save(DIFF_DIR / f"diff_{current_file.name}")
    else:
        print(f"PASS: {current_file.name} ({diff_ratio:.2%} changed)")

if failures:
    print(f"\n{len(failures)} visual regression(s) detected")
    sys.exit(1)

Updating baselines

When visual changes are intentional (new design, new feature), update the baseline images:

# After verifying the changes look correct
cp screenshots/current/* screenshots/baseline/
git add screenshots/baseline/
git commit -m "Update visual regression baselines"

GitLab CI example

visual-test:
  stage: test
  image: python:3.11
  script:
    - pip install requests Pillow numpy
    - python scripts/capture_screenshots.py $CI_ENVIRONMENT_URL
    - python scripts/compare_screenshots.py
  artifacts:
    when: on_failure
    paths:
      - screenshots/diff/
  only:
    - merge_requests

Best practices

Capture key pages, not every page. Focus on pages with distinct layouts: homepage, pricing, documentation, login, and the most-visited content pages. 10-20 pages provide good coverage without excessive API usage.

Use consistent viewport dimensions. Always use the same width, height, and scale factor. Inconsistent dimensions produce false positives in comparisons.

Allow a small tolerance. Font rendering and anti-aliasing can produce 0.1-0.5% pixel differences between captures. A 1% threshold avoids false positives while catching real regressions.

Store baselines in the repo. Baseline images should be committed alongside the code. This keeps them in sync with the codebase and provides a history of visual changes.

Run on pull requests. Visual tests are most useful as a PR check, before changes merge to main.