Recommendations to the European Commission
Our online environment today suffers from enormous information asymmetry: online platforms assemble information about us, while we know little about them. And while they share data with commercial third parties, the researchers who would hold them accountable to society or monitor societal concerns have had limited data access at best, and at worst, have faced technical barriers and legal threats. The voluntary data sharing between platforms and the research community has been fraught and fragile; research projects and essential watchdog efforts can crumble at a company’s whim.
The EU’s Digital Services Act (DSA) hopes to change this. In April, the European Commission named the first “Very Large Online Platforms and Search engines”, those reaching over 45 million users in the European Union. These platforms will be required to share publicly accessible data with researchers for the purpose of understanding “systemic risks” in the European Union, like negative effects on data protection and privacy, electoral processes, hate speech, or public health. This dovetails with the Strengthened Code of Practice on Disinformation, in which major signatories, including platforms like Facebook, YouTube, and TikTok, commit to sharing public data with researchers. But even as the European Commission prepares to implement the legislation, companies are taking steps in the opposite direction, moving to restrict public data sharing or deliver it on unworkable terms.
Historically, the terms of sharing data with the public have been set by the companies, not by their users or the public interest research community. Much of the data that is made publicly available by platforms is designed to serve advertisers and marketers. Meanwhile, data access for researchers is often depicted as an unacceptable risk to user privacy. But privacy-protecting research is itself necessary to understand and address harmful data practices and abuse of personal data. Similarly, data access is needed to protect consumers by allowing for scrutiny of a company’s practices beyond their promises. And data access forms the bedrock of evidence gathering for enforcement action. In other words, data access is not just in the interest of the research community - it is central to accountability.
We, the undersigned civil society organisations and independent public interest researchers, make the following recommendations to the European Commission and the designated platforms directly as they move to implement Article 40, paragraph 12, of the Digital Services Act.
- Public data should be complete, comprehensive, and include historical data.
- Even the best practices in data sharing are extremely limited, for instance omitting key metrics related to platform functionalities (like with Facebook’s “Reels”) and mitigations (like labels or fact-checks applied to content), or skewing towards certain languages or countries. Often, shared public data proves incomplete or inaccurate (ie: scattershot or poorly labelled ad archives).
- Publicly accessible data should be understood to include metadata and data that could have been captured historically over time. The term "publicly accessible", like the term "manifestly made public" suggests information that is accessible to any member of the public, without being required to create an account on the service to access the information. For the public interest purposes of article 40.12, this term should include any information that would be accessible to a user of the platform with an account on the service, since these data are essential for monitoring and comparison.
- Regulators and researchers need to know what is in fact “publicly accessible in their online interface.” This changes constantly, and keeping track of these changes as an external observer is impossible. Platforms should therefore be required to share a taxonomy of their publicly accessible data with researchers.
2. Data must be usable, accessible and verifiable, for which multiple access methods are needed
- Useability is a critical factor. Platforms must deliver data in a way that has real-world impact, meets researchers' needs, and fulfils the spirit of the data access and scrutiny obligation.
- API (Application Programming Interface) access is a minimum viable method for permissioned access to public data, but multiple methods of access are needed. Any single mode of permitted access should not preclude the use of other research methods necessary to ensure the integrity of the public data that platforms share formally, for instance through automated data collection (scraping), data donation or the use of unofficial APIs.
- Platforms should also provide visual interfaces themselves to facilitate research and cross-platform analysis and to empower a more general public interest research audience.
3. Permissioned access must come on fair and reasonable terms
- Permissioned access should be free or at a nominal cost. Higher costs risk discouraging the use of the DSA’s data access provisions and perpetuating inequity among less well resourced research organisations.
- Restrictions and mitigation measures to address privacy concerns, for instance related to combining data and data sharing with research partners should be modelled on the GDPR.
- Researcher access requests under Article 40.12 should be approved on a researcher or research organisation basis and not project by project. Access to continuous, real-time data is necessary for exploratory research, and repeated vetting should not be applied to particular project-by-project research questions.
- A standardised data access request process should be considered for all VLOP/SEs.
- Approved researchers should receive sustained access for a minimum of three years along with a streamlined, expedited renewal process.
- Provision of access must be timely and transparently communicated as any delay could compromise the research.
- The process for approving access requests must be transparent and with the possibility of appeal by researchers to an independent third party, such as the independent advisory body foreseen by the DSA Article 40.13.
4. Platforms must not hinder independent, public interest research
- Many platforms actively hinder research through their terms of service, through technical measures, or through intimidation and threats of legal action. In particular, platform efforts to prevent scraping have had a chilling effect on the researchers, even though they may ignore the same data-gathering techniques when they are used by marketers or “social listening” tools.
- The DSA Article 40.12 should be understood as a safe harbour for research addressing systemic risks and should at a minimum require non-interference into public interest, GDPR-compliant research.
5. Data sharing should include a diversity of researchers
- It should be clarified that researchers physically located outside of the EU will have access to data needed to conduct research related to systemic risks in the European Union. The current lack of clarity could also make commonplace international research collaboration difficult by limiting the potential research partners of EU-based researchers.
- Access to public data should also be possible for journalists, who have historically played a role in holding these very companies to account for these very concerns and made impressive use of platforms’ publically accessible data as part of their watchdog function.
- We understand well the responsibility accompanying privileged data access, in particular in relation to security and privacy. The past years have seen a flourishing of efforts to align on best practices to protect privacy and mitigate risks. While working with this data will never be zero risk, the implementation of the Digital Services Act provides a much-needed, structured avenue to further align on best practices as a public interest research community.
AMO Association for International Affairs
Institute for Strategic Dialogue (ISD)
Democracy Reporting International
Check First oy
Stiftung Neue Verantwortung (SNV)
The Institute for Data, Democracy & Politics, George Washington University
Algorithmic Transparency Institute / National Conference on Citizenship
The Coalition for Independent Technology Research
The Forum on Information and Democracy
Brandon Silverman, Former CEO & Co-Founder of CrowdTangle
Louis Barclay, Founder of Unfollow Everything
Transparency International EU
BEUC - The European Consumer Organisation
Friends of the Earth
The Union of Concerned Scientists (UCS)
Irish Council for Civil Liberties
European Center for Not-for-Profit Law Stitching (ECNL)
KaskoSan Roma Charity
Estonian Human Rights Centre
The Centre for Research on Multinational Corporations (SOMO)
Open Markets Institute
Stitching the London Story
Center for Countering Digital Hate (CCDH)
The Green Web Foundation
Global Forum for Media Development (GFMD)
Reporters Sans Frontiers (RSF)
Counter Extremism Project (CEP)
Corporate Europe Observatory
The European Fact-Checking Standards Network (EFCSN)
Prague Security Studies Institute (PSSI)
Who Targets Me
Danes je nov dan
Aufstehn – Verein zur Förderung zivilgesellschaftlicher Partizipation (#Aufstehn)
7amlet-The Arab Center for the Advancement of Social Media
European Federation of Journalists (EFJ)
The Good Lobby Italy
International Network Against Cyber Hate (INACH)
Latvian Centre for Human Rights
Greek Helsinki Monitor
Multi Kulti Collective
Women in AI Austria
European Centre for Press and Media Freedom (ECPMF)
Civil Liberties Union for Europe (Liberties)
If you would like to sign or support you can contact [email protected]
This is an updated version of the open letter. The previous letter was published on May 31, 2023 and was a response to the European Commission’s call for evidence for a Delegated Regulation on data access.