YouTube

Datasets from Mozilla's YouTube RegretsReporter project are available to support further investigations into YouTube's recommendation algorithm.


RegretsReporter Dataset

The RegretsReporter dataset allows people around the world to conduct their own investigations into YouTube's recommendation algorithm. This data has already powered impactful research from Mozilla that has shaped public policy and led to YouTube releasing more information about how their algorithm works. By making anonymous RegretsReporter data available to more researchers, investigators and journalists around the world, we can continue to monitor the impact that YouTube's recommendation algorithm has on local communities and help to hold YouTube accountable.

Learn more about our YouTube RegretsReporter work.

FAQ

What is RegretsReporter?

RegretsReporter is a browser extension that powers the world’s largest crowdsourced investigation into YouTube’s recommendation system. YouTube’s recommendation system is one of the biggest AI systems in the world, yet there is almost no transparency into how the system works and what kinds of videos it surfaces to people. To date, more than 60,000 people from 191 countries have installed RegretsReporter to donate their YouTube data to researchers at Mozilla who are advocating for more trustworthy AI systems. This data has helped researchers at Mozilla uncover dangerous flaws in YouTube’s recommendation system—flaws which policymakers in Europe and the US have since taken notice of.

What does the dataset contain?

The dataset includes information about videos that our respondents either “regretted” (in our first study) or pressed “don’t recommend” (in our second study. In both cases, this set of videos represents unwanted content by our participants. For our second study, we also report the complete set of videos that was recommended to our participants. The dataset also contains data from the Viu Política project, a research collaboration into political propaganda on YouTube in Brazil from the University of Exeter and Instituto Vero.

Why are you releasing data?

We are releasing RegretsReporter data so that more people can research YouTube’s recommendation system and the impact that it may have on their communities. Our aim is to create more transparency around AI systems, which can lead to accountability for harms and ultimately, safer and more trustworthy products on the market. We believe this data, which is anonymous and does not include any identifying information, will be valuable to independent researchers, journalists and responsible technologists who are working towards this vision.

Who can access this data?

The data is available to anyone under a cc0-1.0 licence.

What can this data be used for?

The RegretsReporter dataset can be used for investigations and research into YouTube videos that the RegretsReporter volunteer community finds objectionable, offensive or otherwise harmful. For example, Mozilla’s own investigation using this data revealed that YouTube’s algorithm recommended videos with misinformation, violent content, hate speech, and scams and that people in non-English speaking countries were far more likely to encounter these videos than others. Since this dataset includes data from people spanning more than 191 countries, including data collected specifically for a research project on political propaganda in Brazil, we envision that it could be analysed by researchers with more localised context to produce impactful research. In addition, the recommendations data contained in the dataset can be used to study the behaviour of YouTube’s recommendation engine over time.

How are you protecting people's privacy?

The data shared is not linked to the participants who contributed it, consisting of only information about videos, the date they were regretted or recommended, and the country in which the activity took place.

Can you provide support for researchers who want to use this data?

Our team has limited capacity to provide individualized support to people and teams looking to use RegretsReporter data, but we are happy to help out where we can. Please review our detailed technical data spec for more information about the dataset and how to use it. If you have a more specific request, please write a brief email to [email protected] telling us who you are and what you want to do with RegretsReporter data, and we will get back to you if we can support your request.


Questions? Contact us at [email protected].

RegretsReporter Dataset

The RegretsReporter dataset allows people around the world to conduct their own investigations into YouTube's recommendation algorithm. This data has already powered impactful research from Mozilla that has shaped public policy and led to YouTube releasing more information about how their algorithm works. By making anonymous RegretsReporter data available to more researchers, investigators and journalists around the world, we can continue to monitor the impact that YouTube's recommendation algorithm has on local communities and help to hold YouTube accountable.

Learn more about our YouTube RegretsReporter work.

FAQ

What is RegretsReporter?

RegretsReporter is a browser extension that powers the world’s largest crowdsourced investigation into YouTube’s recommendation system. YouTube’s recommendation system is one of the biggest AI systems in the world, yet there is almost no transparency into how the system works and what kinds of videos it surfaces to people. To date, more than 60,000 people from 191 countries have installed RegretsReporter to donate their YouTube data to researchers at Mozilla who are advocating for more trustworthy AI systems. This data has helped researchers at Mozilla uncover dangerous flaws in YouTube’s recommendation system—flaws which policymakers in Europe and the US have since taken notice of.

What does the dataset contain?

The dataset includes information about videos that our respondents either “regretted” (in our first study) or pressed “don’t recommend” (in our second study. In both cases, this set of videos represents unwanted content by our participants. For our second study, we also report the complete set of videos that was recommended to our participants. The dataset also contains data from the Viu Política project, a research collaboration into political propaganda on YouTube in Brazil from the University of Exeter and Instituto Vero.

Why are you releasing data?

We are releasing RegretsReporter data so that more people can research YouTube’s recommendation system and the impact that it may have on their communities. Our aim is to create more transparency around AI systems, which can lead to accountability for harms and ultimately, safer and more trustworthy products on the market. We believe this data, which is anonymous and does not include any identifying information, will be valuable to independent researchers, journalists and responsible technologists who are working towards this vision.

Who can access this data?

The data is available to anyone under a cc0-1.0 licence.

What can this data be used for?

The RegretsReporter dataset can be used for investigations and research into YouTube videos that the RegretsReporter volunteer community finds objectionable, offensive or otherwise harmful. For example, Mozilla’s own investigation using this data revealed that YouTube’s algorithm recommended videos with misinformation, violent content, hate speech, and scams and that people in non-English speaking countries were far more likely to encounter these videos than others. Since this dataset includes data from people spanning more than 191 countries, including data collected specifically for a research project on political propaganda in Brazil, we envision that it could be analysed by researchers with more localised context to produce impactful research. In addition, the recommendations data contained in the dataset can be used to study the behaviour of YouTube’s recommendation engine over time.

How are you protecting people's privacy?

The data shared is not linked to the participants who contributed it, consisting of only information about videos, the date they were regretted or recommended, and the country in which the activity took place.

Can you provide support for researchers who want to use this data?

Our team has limited capacity to provide individualized support to people and teams looking to use RegretsReporter data, but we are happy to help out where we can. Please review our detailed technical data spec for more information about the dataset and how to use it. If you have a more specific request, please write a brief email to [email protected] telling us who you are and what you want to do with RegretsReporter data, and we will get back to you if we can support your request.


Questions? Contact us at [email protected].