Datasets behind the largest-ever crowdsourced research peering into YouTube’s inscrutable algorithm are now publicly available.


Avenues to access and investigate quality data on some of the largest online platforms are dwindling and becoming more pricey. Even so, available data is sparse and often barricaded behind strict conditions that researchers must comply with. This leaves researchers to rely on a number of creative strategies to catch just a glimpse of how platforms' algorithms work. And in some cases, this necessitates the development of novel tools such as Mozilla’s YouTube RegretsReporter browser extension.

Mozilla’s RegretsReporter catalysed two of the largest-ever crowdsourced research studies scrutinizing YouTube’s recommender algorithm: “YouTube Regrets: A crowdsourced investigation into YouTube's recommendation algorithm” and “Does this button work? Investigating YouTube’s ineffective user controls.”

In collaboration with the University of Exeter Institute of Data Science and Artificial Intelligence, over 60,000 people across 191 countries donated their user data by reporting unwanted recommendations called “regrets.” The Viu Politica project, a collaboration between Mozilla, University of Exeter and Instituto Vero, also investigated targeted political propaganda in Brazil.

Today, Mozilla publicly releases these datasets, which encapsulate real-time experiences of users’ interaction with YouTube’s recommender algorithm. The goal behind this release is to encourage vigorous research into the platform and further advocate for similar data accessibility.

Says Brandi Geurkink, Mozilla senior fellow and board member at Coalition of Independent Researchers: “Interventions like RegretsReporter are very resource -intensive and not a viable option for independent researchers or journalists operating on shoestring budgets who study the internet’s impact on society.”

Says Jesse McCrosky, principal data scientist at Thoughtworks Finland: “With the release of this dataset, we want to encourage more research into platforms. But this sort of release should not be seen as an adequate end, but one that presses platforms to avail more data to researchers.”

There are two datasets from the studies carried out in 2021 and in 2022, featuring “regrets” and “recommendations” respectively. Videos included in the dataset are predominantly in English, but also feature over 100 different languages. There are also the Viu Política datasets, generated from collaborations with the University of Exeter and Vero Instituto.