Personal Data Leaks and Web Data Scraping | Karageorgiou & Associates

On April 4, 2021, a database with records of 533 million Facebook users from 106 countries has been leaked online, on several Telegram channels along with hacker forums. According to recent reports from the Hellenic Data Protection Authority, more than 600.000 Greek Facebook users’ records were leaked. The leaked database mainly contains information such as users’ full names, gender, location, phone numbers and e-mail addresses, marital status, service provider, Facebook profile UIDs, Facebook profile links, Educational and Occupation details, and other information from their About me section.

This latest Facebook database leak was neither a result of hacking the Facebook site, nor were its servers breached to steal users’ data. Instead, it was the result of web data scrapping. Data scraping is a technique in which a bot is used to extract data from a website and create organized files. It may also simulate visiting a website and perform actions such as logging into an account. Data scraping is a fast, powerful and low-cost way to collect data from websites, compared to human manual data collecting. It also happens to be a fairly common practice to extract the personal information of users from websites, and especially social networks. Data filed through scraping is usually then sold to third party companies, in order to be used for several purposes, such as advertising, market research, or even fraudulent activities.

Although data scraping is not an illegal practice per se, several legal issues may arise from its operation. First of all, it may be illegal in certain jurisdictions, for example if there is a lack of legal basis for personal data processing under the GDPR, or violation of the Computer Fraud and Abuse Act, in case of unauthorized access. Secondly, even if web scraping is lawful, processing, using or selling data obtained from websites without permission may be considered Copyright infringement. Thirdly, it is also common that websites’ Terms of Service and Privacy Policies restrict the collection and processing of data included in the website. In this case, the data scraper may be violating such terms, thus raising claims from the websites’ owner. The last common issue with data scraping is that bots usually send a large number of requests to the website, causing heavy load, slows and dysfunction, even putting the website down for certain time. In the latter case, the scraper may be responsible for damage or loss.

So, what is the case for the abovementioned recent Facebook data leak? The most common argument in favor of data scaping through social media, is that the collected data is publicly available. However, Facebook’s policies state that data collection through automated means, such as bots or scrapers, as well as the use of such data, is not permitted without Facebook’s express written consent. Furthermore, automated processing of Facebooks users’ personal data, including profiling, and especially while lacking a legal basis for such processing, is a violation of the GDPR. In that context, Facebook is expected to take legal action against scrapers, as has happened in the past.

Nevertheless, the victims of data leaks are almost always unsuspecting social network users who may receive telephone scams, phishing scams, smishing attacks, or even identity theft, by using publicly available photos or other information shared on their profiles. In the case of data leaks, the best way to avoid them is proper prevention. Websites should consider using anti-bot protection software and effective ways to prevent unauthorized access, and website users should be careful when sharing their personal data on social networks.

The editorial team