·SavePage Team

Website Archival with Automated Screenshots

archivalautomationresearch

Websites change. Pages are updated, redesigned, or removed entirely. For journalism, legal compliance, academic research, and historical documentation, preserving a visual record of web pages at specific points in time has real value.

Why screenshots for archival

Text-based web archives (like the Wayback Machine) preserve the HTML and assets of a page. This is thorough but can miss visual context: the layout, the typography, the advertising, the overall design language of a specific era.

Screenshots capture the page as it appeared to a user. The visual record includes everything: the layout, the colors, the advertising, the cookie banners, and the overall design. Combined with metadata (URL, timestamp, viewport), screenshots create a visual timeline.

Building a visual archive

A basic archival system captures screenshots at regular intervals and stores them with metadata.

import requests
import json
from datetime import datetime

API_KEY = "YOUR_API_KEY"

URLS = [
    "https://news.example.com",
    "https://competitor.example.com",
    "https://policy.gov.example.com/guidelines",
]

archive = []

for url in URLS:
    response = requests.get(
        "https://api.savepage.io/v1/",
        params={
            "url": url,
            "width": 1440,
            "height": 900,
            "fullpage": "true",
            "format": "png",
        },
        headers={"Authorization": f"Bearer {API_KEY}"},
    )

    data = response.json()
    record = {
        "url": url,
        "image": data["image"],
        "captured_at": datetime.utcnow().isoformat(),
        "width": data["width"],
        "height": data["height"],
        "size": data["size"],
    }
    archive.append(record)

with open("archive.json", "w") as f:
    json.dump(archive, f, indent=2)

Use cases

Journalism. Capture the state of a website before publishing a story about it. If the site changes after publication, the screenshot serves as evidence of what it looked like at the time of reporting.

Legal compliance. Regulated industries need to document their public communications. Automated screenshots of company websites, social media pages, and advertising ensure compliance records are complete.

Competitive intelligence. Track how competitors change their pricing, messaging, and product offerings over time. Monthly screenshots of key pages create a visual history.

Academic research. Web studies benefit from consistent, timestamped visual records. Screenshots standardize the capture process and ensure reproducibility.

Brand monitoring. Verify that partners, resellers, and affiliates display your brand correctly. Automated screenshots flag unauthorized modifications.

Storage considerations

Full-page PNG screenshots can be large (2-8 MB each). For a daily archive of 100 pages, that is 200-800 MB per day. Over a year, storage adds up.

Strategies to manage storage:

  • Use JPEG at quality 80 for archival (60-70% size reduction)
  • Compress and deduplicate (if a page has not changed, do not store a new copy)
  • Tier storage: recent screenshots on fast storage, older ones on cold storage
  • Use thumbnail versions for browsing, full resolution for detailed inspection

Building a timeline view

With timestamped screenshots of the same URL, you can build a timeline that shows how a page evolved. A simple viewer shows screenshots side by side or in sequence, allowing users to scrub through time.

This is useful for tracking gradual changes (A/B testing evolution, design iterations) and sudden changes (crisis response, policy updates, rebranding).