API Rate Limiting Best Practices for Screenshot Services
Rate limiting protects API services from abuse and ensures fair access for all users. For a screenshot API, this is especially important because each request consumes significant server resources: a browser instance, CPU time, memory, and storage.
Why rate limiting matters
Without rate limiting, a single user could consume all available rendering capacity, leaving other users unable to take screenshots. Rate limiting enforces boundaries that keep the service healthy for everyone.
The goals are:
- Prevent any single user from monopolizing resources
- Protect against accidental loops (a bug that sends thousands of requests)
- Enforce plan-level quotas (free vs. paid tiers)
- Maintain consistent response times
Common algorithms
Token bucket
The token bucket algorithm adds tokens to a bucket at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected. This allows short bursts of traffic while enforcing an average rate.
For example, a rate of 60 requests per minute with a bucket size of 10 means a user can send 10 rapid requests, but then must wait for tokens to refill.
Sliding window
The sliding window algorithm counts requests in a rolling time window. Unlike fixed windows (which can allow 2x the rate at window boundaries), sliding windows provide a smoother rate limit.
For a 60 requests/minute limit, the algorithm counts all requests in the past 60 seconds. If the count is at or above 60, new requests are rejected.
Fixed window counters
The simplest approach: count requests in fixed time periods (per minute, per hour, per day). Reset the counter at the start of each period. Simple to implement but allows bursts at window boundaries.
Response headers
Good rate limiting is transparent. The caller should always know their current status. Standard headers:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1680000000
- X-RateLimit-Limit -- Maximum requests in the current window
- X-RateLimit-Remaining -- Requests remaining in the current window
- X-RateLimit-Reset -- Unix timestamp when the window resets
When a request is rejected, return HTTP 429 with a Retry-After header:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1680000000
Multi-tier rate limiting
Screenshot APIs typically have multiple tiers:
| Tier | Monthly | Per Minute | Per Second | |------|---------|-----------|-----------| | Free | 100 | 5 | 1 | | Pro | 10,000 | 60 | 5 | | Enterprise | Custom | Custom | Custom |
Each tier has its own limits at different time scales. A Pro user might be within their monthly quota but exceed their per-minute limit during a burst.
Client-side handling
API consumers should handle rate limits gracefully:
import time
import requests
def capture_with_retry(url, max_retries=3):
for attempt in range(max_retries):
response = requests.get(
"https://api.savepage.io/v1/",
params={"url": url},
headers={"Authorization": "Bearer YOUR_KEY"},
)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 30))
time.sleep(retry_after)
continue
response.raise_for_status()
return response.json()
raise Exception("Rate limit exceeded after retries")
The key practices for clients:
- Read the
Retry-Afterheader and wait that long before retrying - Implement exponential backoff for repeated failures
- Track
X-RateLimit-Remainingto proactively slow down before hitting limits - Queue requests and process them at a controlled rate