API Rate Limiting Best Practices for Screenshot Services

Rate limiting protects API services from abuse and ensures fair access for all users. For a screenshot API, this is especially important because each request consumes significant server resources: a browser instance, CPU time, memory, and storage.

Why rate limiting matters

Without rate limiting, a single user could consume all available rendering capacity, leaving other users unable to take screenshots. Rate limiting enforces boundaries that keep the service healthy for everyone.

The goals are:

Prevent any single user from monopolizing resources
Protect against accidental loops (a bug that sends thousands of requests)
Enforce plan-level quotas (free vs. paid tiers)
Maintain consistent response times

Common algorithms

Token bucket

The token bucket algorithm adds tokens to a bucket at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected. This allows short bursts of traffic while enforcing an average rate.

For example, a rate of 60 requests per minute with a bucket size of 10 means a user can send 10 rapid requests, but then must wait for tokens to refill.

Sliding window

The sliding window algorithm counts requests in a rolling time window. Unlike fixed windows (which can allow 2x the rate at window boundaries), sliding windows provide a smoother rate limit.

For a 60 requests/minute limit, the algorithm counts all requests in the past 60 seconds. If the count is at or above 60, new requests are rejected.

Fixed window counters

The simplest approach: count requests in fixed time periods (per minute, per hour, per day). Reset the counter at the start of each period. Simple to implement but allows bursts at window boundaries.

Response headers

Good rate limiting is transparent. The caller should always know their current status. Standard headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1680000000

X-RateLimit-Limit -- Maximum requests in the current window
X-RateLimit-Remaining -- Requests remaining in the current window
X-RateLimit-Reset -- Unix timestamp when the window resets

When a request is rejected, return HTTP 429 with a Retry-After header:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1680000000

Multi-tier rate limiting

Screenshot APIs typically have multiple tiers:

| Tier | Monthly | Per Minute | Per Second | |------|---------|-----------|-----------| | Free | 100 | 5 | 1 | | Pro | 10,000 | 60 | 5 | | Enterprise | Custom | Custom | Custom |

Each tier has its own limits at different time scales. A Pro user might be within their monthly quota but exceed their per-minute limit during a burst.

Client-side handling

API consumers should handle rate limits gracefully:

import time
import requests

def capture_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        response = requests.get(
            "https://api.savepage.io/v1/",
            params={"url": url},
            headers={"Authorization": "Bearer YOUR_KEY"},
        )

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 30))
            time.sleep(retry_after)
            continue

        response.raise_for_status()
        return response.json()

    raise Exception("Rate limit exceeded after retries")

The key practices for clients:

Read the Retry-After header and wait that long before retrying
Implement exponential backoff for repeated failures
Track X-RateLimit-Remaining to proactively slow down before hitting limits
Queue requests and process them at a controlled rate