Table of ContentsIntroduction Is It Legal to Scrape a Web Page and Use Its Data?
- Types of Data You Are Scraping
- Frequent Web Scraping Legal Issues
- Existing Legislation
- Precedent Cases
Table of ContentsIntroduction Is It Legal to Scrape a Web Page and Use Its Data?
IntroductionBeing as old as the internet, web scraping technology is today the backbone of many marketing and lead-generation strategies. But the question “Is web scraping illegal or legal?” arises more and more often, especially in relation to some high-impact cases. There is no explicit answer to the question, and there are certain gray areas to the web scraping legal issues. That’s why we’ve made up our minds to explain the matter in more detail and satisfy our readers’ legitimate interest in the question. data you are scraping are quite variable, and though in most cases scraping is legal, your activities can be classified as illegal or fall into a gray area in particular situations.
Scraping public sites is a completely legal web scraping practice. The logic is quite clear: the entry of a web scraping bot does not differ from the entry of the browser and only open data is provided in both cases. Though many site owners make attempts to technically protect their open information from the competitors’ crawlers, legally scraping such sites is neither theft nor illegal conduct.The two types of data that merit concern and caution are personal and copyrighted data.
PII – personally identifiable information is any data that can be used for identifying a specific person. Name, date of birth, address, contact and employment information, and financial and medical details are just some items in the list. Personal data is a hot topic, and different jurisdictions have diverse regulations. In general, it’s illegal to collect, store, and use someone’s data without the owner’s consent or a legal reason for doing so.
As a rule, when scraping data from websites on the internet, we do not have the consent of the personal data owner, and it’s difficult to lawfully argue for personal data collection, this makes such web scraping essentially illegal. So it’s better not to extract any personal data. Keep in mind that EU and Californian legislation are the strictest in this aspect.
This is the type of data owned by an individual or business with full control over its reproduction and use. Copyrighted data includes anything like articles, images, songs, databases, and more, and even though this data is openly available online, it’s illegal to use without the consent of the owner. Therefore, while scraping is not illegal in such a case, any further usage of the data might be, depending on a country’s laws. Instead of replicating a piece of writing in full after you have scraped a website, you can, for instance, use snippets of the original text or provide the reference to the source of the image, table, or video you use.However, factual data, is not copyrightable. The names, prices, and features of products or services aren’t covered by copyright laws, and it’s legal to scrape them. Be aware of the issue of database rights, however. It is often illegal to scrape and reproduce a full database from the web, but using pieces of information without replicating the original database structure does not violate most web scraping copyright regulations. Let’s talk a bit about the most frequent legal issues, and then proceed to a brief explanation of existing legislation and regulations in a few different jurisdictions.
“All rights reserved.” This phrase is pretty familiar to most of us. What is its relation to web scraping? The key aspect that matters is how the parsed information is used.
If you have harvested copyright-protected data, to stay within the legal framework, you cannot publish it or use it for commercial purposes. It’s not, for instance, forbidden to search YouTube for videos, but it’s illegal to repost them on other sites, since they are covered by copyright legislation. In general, the copyright for media files is prosecutable, regardless of the way the data was obtained.
Back in 1986, the Computer Fraud and Abuse Act was passed in the US to protect specific computers that contained military, fiscal, or other sensitive data from hacking and unauthorized access. Later on, in 1996, it was extended to protect private information. Due to the fact that someone using data scraping techniques can reach only publicly available information, the CFAA regulations do not apply to web crawlers.
The law has nothing to do with data scraping, unless it’s used for harmless data collection.
A trespass to site security (or chattel) occurs when a website is violated or when a site server is hurt by any means. It’s easy to forget about this possible issue, since it does not look like a legal issue at first glance. However, frequent web crawler requests can decrease the target site’s performance and slow or stop its server.
If the natural operation of the website is disturbed because of the scraping, something is wrong with the crawler software, and the site owner may think it is an intentional attack and refer it to the web scraping legal cases.
As you can see from above, it is important to be careful and know and understand the laws and legal regulations of the jurisdictions you are scraping in.Let’s proceed to our discussion about the existing legislation.
In spring 2018, the General Data Protection Regulation came into effect as a “one-stop” legal principle to put into practice through a single authority. It is now applied to the personal details of people within the European Economic Area. However, anonymized data is not covered by the GDPR web scraping regulation.
Though the document is over a hundred pages of legal language, just a few articles comprise its key IT-related parts. The web scraping GDPR sets rules for personal information protection when it is gathered by data controllers and passed to data processors, including those in the cloud.
Besides, there is a breach notification requirement—data authorities and consumers should be notified when there’s been a data exposure. Companies are required to specify the nature of the breach, the categories and amount of information affected, as well as the measures initiated to mitigate the breach.
What is more, the GDPR rules that any company, even those not present physically within the EU, are subject to the Regulation when collecting data about European subjects.
Unlike the European Union, the US does not have single federal privacy legislation, just several federal laws and consumer-oriented legal acts concerning privacy for finance (GLBA), health care (HIPPA), and children’s data (COPPA) coming from each state.
The Consumer Privacy Act (CPA), passed by Congress in 1974, is the most significant one. It remains the first legal reference in most court cases. It confirms the right of American citizens to access, copy, and correct data held by governmental bodies. But the CPA has no impact on data collected online by private companies.
At present, the internet remains a deregulated area where social media and tech companies practice an anything-goes approach. However, American states start stepping in with their own laws on data privacy, and California takes the leading position.
It’s tricky to compare the two approaches whether web scraping is legal in 2020. While the European Union has its GDPR combined with data security laws, the US does not have any federal-level regulation for consumer data privacy in force, and only California, Maine, and Nevada have legal privacy regulations in effect. They apply data breach notification rule in the other American states.
Nevertheless, California has taken the lead and adopted the CCPA—Californian Consumer Privacy Act. At the moment, it’s the most comprehensive internet-focused legislative document. The CCPA became the clue for other American states to draft their own data privacy laws. It contains a long list of personal information identifiers that are protected—biometric data, geolocation, browsing history, employee information, email, and more.Both the GDPR and CCPA allow people to access, remove, and opt-out of data processing at any moment. Unlike EU consumers, Californians cannot correct their inaccurate personal information. The EU Regulation requires user consent, while the Californian CPA only asks for a privacy notice on the websites. The New York Privacy Act is currently on hold, yet it contains CCPA legal hallmarks and provides a user with the right to correct, delete, and request PII, coming close to the European GDPR. The Act is quite strict, claiming an exclusive right of action for any law violation over all the companies that have divisions in the US and are therefore subject to it.
This case was the first web scraping-related case heard in a US court. On December 10, 1999, Bidder’s Edge made more than 100,000 entries to the eBay site without the due authorization. That resulted in eBay computer system damage.
Though the legal disputes between the companies were settled out of court in 2001 with an undisclosed sum and an agreement not to access eBay’s data, it became the first legal precedent.
In 2012, the Federal Trade Commission filed a complaint against Facebook because it violated the privacy of the user data it collects. In its applications and privacy notices, Facebook assured it would not sell anybody’s personal information and claimed the users could limit access to their information. However, the FTC accused the network of deceiving people about how their data was handled, and the social media giant was required to revamp its data security practices. This legal case cost Facebook authorities US$5 billion and served as a good lesson not to violate any regulations ever since.
After several years of tolerance, LinkedIn sent a cease-and-desist letter to a company dealing with data analysis—hiQ Labs. The company automatically collected information from open LinkedIn profiles and used it to consult employers whose workers posted resumes on the site. After receiving the letter, hiQ Labs sued LinkedInIn late 2019, both the first instance and the later appeal to the Ninth Circuit Court of Appeals determined that the CFAA does not apply to data available to the general public, and so prohibited LinkedIn to interfere with hiQ’s web scraping. This case became a historic moment that in fact claimed LinkedIn scraping legal, and has fundamentally changed the balance of power in legal data regulation cases. However, unlimited usage of the scraped data for commercial purposes is prohibited. It’s worth mentioning that LinkedIn continues filing petitions to the Supreme Court to review the decision, while hiQ files oppositions. We are going to keep an eye on this confrontation in the future.