The EU’s Digital Services Act mandates greater researcher data access from platforms; Mozilla and the National Conference on Citizenship investigated 19 platforms’ responses

(BRUSSELS, BELGIUM | AUGUST 8, 2024) – Tech platforms’ efforts at greater transparency — many in the context of the EU’s Digital Service Act (DSA) — vary widely, according to new research by Mozilla. The findings follow Mozilla’s recent, more narrow investigation into platforms’ ad transparency libraries. This latest assessment offers a snapshot of each platform’s offerings as of May 2024, recognizing that many programs are new and evolving.

The research, titled ”Public Data Access Programs - A First Look”, is the first-ever initiative to systematically evaluate the data access programs of 19 major platforms based on the needs of the public interest research community. It entails a detailed scorecard ranking each platform, informed by 47 criteria grouped into five categories: Quality, Ease of Use, Accessibility, Terms of Use, and Privacy and Security.

“Data access for researchers, journalists, and NGOs is critical to ensuring that threats to public security and civic discourse are identified,” says Claire Pershan, EU Advocacy Lead Mozilla. “These platforms must be held accountable, and greater transparency is an essential first step.”

Cameron Hickey, CEO, National Conference on Citizenship says “We have witnessed a significant effort to increase researcher data access by some platforms, but there is still a long way to go before researchers actually have the access they need. Platforms already engaged in transparency efforts can do more to expand the kinds of data they provide, and the platforms just getting started can make a more concerted effort to move beyond just taking email requests for data.”

These findings come amid a historic election year, with dozens of democratic contests happening globally and billions of people eligible to vote.

The 19 platforms investigated vary greatly in how they provide data access and transparency. For example:

  • Five platforms offer formal programs to researchers: Facebook/Instagram (Meta), TikTok, AliExpress, LinkedIn, and Google Search.
  • Five platforms make existing APIs and access programs available to researchers: Bing, Google Maps, Wikipedia, X (Twitter), and YouTube
  • Several platforms grant researchers permission to scrape, either explicitly or implicitly: Alphabet (Google Play, Shopping, Search, and YouTube) as well as Booking.com, Amazon Store, and Pinterest
  • Three platforms enable researchers to request data: LinkedIn, Pinterest, and Snapchat

In addition to scoring platforms based on a developed set of criteria, the authors provide several concrete recommendations for improvements. This research complements Mozilla’s other work around the 2024 elections, like examining platforms’ election integrity policies and grading synthetic content detection techniques.

Data access for researchers, journalists, and NGOs is critical to ensuring that threats to public security and civic discourse are identified. These platforms must be held accountable, and greater transparency is an essential first step

Claire Pershan, EU Advocacy Lead, Mozilla

Key Findings and Recommendations

Public data access programs are new and often lack visibility and documentation. Many of these large tech platforms have only recently introduced data access programs. Not many researchers have gained access to them, and fewer have systematically probed the data to understand its benefits and limitations. Few programs offer researchers direct technical support and documentation for new programs is limited. Further, many platforms’ programs proved challenging to find. Better visibility for these programs, as well as additional documentation about programs’ offerings, would benefit researchers.

Regulators, platforms, and researchers lack a shared definition of “public data.” Platforms say their programs make “publicly accessible data” available to researchers. However, since this term is not clearly defined, there is no universal standard. A more formal definition of “public data” would enable a better understanding of whether or not platforms are doing as they say.

Platforms could improve the quality of data they offer. Platforms do not provide information about public data as it changes over time, such as engagement and account growth statistics. Meta has previously offered this through CrowdTangle, set to be terminated on August 14th, but this is not currently replicated in Meta’s replacement tool, the Meta Content Library. Platforms also do not generally make clear guarantees about the consistency of data over time, a key concern about reproducibility for researchers.