·SavePage Team

Web Scraping Ethics and Legal Considerations in 2024

ethicslegalscraping

Screenshot APIs and web scraping tools are powerful. They can also be misused. As operators of a screenshot service, we think about these issues constantly. This post covers the practical ethical and legal considerations.

Legal considerations

Web scraping legality varies by jurisdiction and use case. A few key principles have emerged from court cases:

Publicly accessible data is generally fair game. If a page is publicly visible without authentication, capturing a screenshot of it is similar to viewing it in a browser. The hiQ Labs v. LinkedIn case (2022) reinforced that accessing publicly available data does not violate the Computer Fraud and Abuse Act.

Terms of service matter. If a website's ToS explicitly prohibits automated access, scraping that site may constitute breach of contract. Whether this is enforceable as a legal claim depends on jurisdiction.

Copyright applies to content. A screenshot of a page contains copyrighted content (text, images, layout). Using that screenshot for purposes beyond fair use (commentary, research, news reporting) may infringe copyright.

Personal data has additional protections. Under GDPR, CCPA, and similar laws, collecting personal data through scraping requires a lawful basis and compliance with data protection principles.

Ethical guidelines

Beyond what is legal, here is what we consider ethical:

Respect robots.txt. If a site asks automated tools not to access certain paths, honor that request. It is not legally binding in most jurisdictions, but it is a clear signal from the site operator.

Do not overwhelm servers. Rapid-fire requests can degrade a site's performance for real users. Rate limit your requests. A few seconds between requests is a reasonable default.

Do not capture private content. Pages behind logins, paywalls, or access controls are not meant for automated access. Even if you have legitimate credentials, automated capture may violate the service's terms.

Disclose AI and automation. If you are using screenshots in a product or publication, be transparent about how they were captured.

Do not misrepresent. Screenshots should not be modified to misrepresent what a site looks like. If you crop or annotate a screenshot, make that clear.

Common legitimate use cases

  • Visual regression testing -- Comparing screenshots of your own site before and after changes
  • Archival and research -- Capturing public web pages for academic or journalistic purposes
  • SEO monitoring -- Tracking how your site appears in search results
  • Competitive analysis -- Reviewing publicly available competitor pages
  • Documentation -- Including website screenshots in presentations or reports
  • Social media previews -- Generating link previews for sharing

How SavePage.io handles this

Our terms of service prohibit using the API to capture illegal content, circumvent access controls, or engage in phishing. We do not monitor the content of captured screenshots, but we respond to abuse reports and may suspend accounts that violate our terms.

We rate limit all API keys to prevent any single user from using the service to overwhelm a target website. The free tier's 5 requests per minute limit is designed to be useful for legitimate purposes while being too slow for aggressive scraping.

The bottom line

The technology is neutral. A screenshot API is a tool, like a camera. The ethics depend on what you point it at and what you do with the results. When in doubt, apply the test: would the site owner be surprised or harmed by what you are doing? If yes, reconsider.