top of page
Search
  • Writer's pictureClaire Leonelli

Online data scraping: illegal practice?

Updated: Dec 8, 2021

Written by Claire Leonelli and Claire Denoual, Avocats à la Cour

Published on 23.03.2021 - Paperjam


Claire Leonelli Avocat Claire Denoual
©Noah Sussman - Licence CC BY 2.0

The recent OpenLux case involving the Grand Duchy provides an opportunity to question the legality of the practice of "scrapping" data, i.e. the automated extraction and re-use of large data sets accessible online.


It should be remembered that the OpenLux affair did not originate with a leak of documents transmitted to the press, as was the case in the "Panama Papers" or "LuxLeaks" affairs, but with the massive extraction of information from public databases, namely the Trade and Companies Register and the Register of Beneficial Owners, and their reconstitution within a gigantic database comprising some 3.3 million documents.

However, before data can be exploited, it must first be collected, either manually, which can be very time-consuming when the volume of data to be recovered is large, or by means of computer programmes that automatically target, recover and store data contained on Internet sites, which is known as web scraping, screen scraping, web data mining or web harvesting. This was the case in the OpenLux case.


However, web scraping has many other applications: analysing the prices of competing e-commerce sites, monitoring a brand's reputation, obtaining information on data made public by a person (e.g. on social networks), etc. But is it always legal?


Just because data is freely available online on websites does not mean that it is freely reusable.

A website generally contains texts, photographs, drawings, videos, etc., all of which can be individually protected by copyright, provided they are original. In principle, these elements may not be reproduced and communicated to the public without the author's permission. Only in limited cases is it possible to override such prior consent (e.g. right to quote, private copy, etc.).


However, a website may also contain elements that are not protected by copyright, such as contact data, advertisements, statistics or other "raw" information.


However, these data are not always free of copyright: the website publisher may have specific rights to them as a database producer. In order to do so, the obtaining, verification or presentation of the contents of the database must demonstrate a substantial qualitative or quantitative investment. If this condition is met, the producer of the database is granted the exclusive right to authorise or prohibit the extraction or re-utilisation of all or a substantial part of the contents of the database. He may also object to the extraction or re-use of insubstantial parts of the content of the database if this is done repeatedly and systematically.


This means, a contrario, that an extraction of a non-substantial part of a website which remains isolated and non-systematic is not reprehensible. By way of exception, databases belonging to the State and lawfully made public are in principle reproducible, under the conditions laid down in the Grand Ducal regulation. No Grand-Ducal regulation has been enacted to date, but the law of 4 December 2007 on the re-use of public sector information expressly allows the bodies concerned to make the re-use of their data subject to conditions. Data scraping is therefore not illegal per se, but it must respect the intellectual property rights of third parties.


Regardless of this point, as soon as the data collected and processed through web scraping is personal, the applicable regulations, and in particular the RGPD, must be respected. This implies, in particular, informing the persons concerned of the processing of their personal data and the purposes of this processing, and even obtaining their prior consent in the case of use for electronic canvassing.


Favouring data under an open licence or asking the right questions.


The most legally secure method for legal large-scale data scraping is to favour public data distributed under a free and open licence, such as data published on the Open Data Portal, for example. The vast majority of the documents published on this portal are licensed under a Creative Commons Zero (CC0 - public domain) license, which means that the producer of the data has expressly waived the copyright and sui generis right of database producer on the information, and that the information can be exploited as widely as possible.


Beyond this particular case, any web scraping process requires asking the right questions beforehand about the rules to be respected in terms of intellectual property, personal data or even competition law.



The image above is under license CC BY 2.0

32 views0 comments
bottom of page