Our Recommendation to YouTube
On October 15, Mozilla is publishing YouTube users' #YouTubeRegrets — videos that triggered bizarre and dangerous recommendations from the platform's algorithm. Read those stories at mzl.la/youtuberegrets.
This blog post provides a deeper look at why Mozilla is running this campaign. And, it provides insight into our recent meetings with YouTube.
Over the last year, there have been a number of news stories about extreme content coming from YouTube’s recommendation system, and how it is leading users down harmful pathways — including child exploitation or radicalisation.
Meanwhile, YouTube’s recommendations can be incredibly effective: they drive 70% of total viewing time on the site, which is a staggering 250 million hours per day. And so users are seeing the impact of the recommendation algorithm first-hand. One thing that surprised us when we recently collected users' #YouTube Regrets was how easily many users understood exactly what we meant when we asked them to share stories of YouTube recommendations that were far different (and not in a good way) from what they had originally searched for.
Today’s major content platforms have a dangerous mix of characteristics: they are driven by algorithms, highly targeted to specific users, and typically optimised for engagement which can amplify harmful — but highly-engaging — content. As a result, platforms give users their own private, addictive experience that can quickly become filled with extreme content. This approach undoubtedly drives the production of a lot of great, engaging content. But it also means that any harm that platforms are enabling is easily magnified and happens in private, not easily seen or understood by regulators, watchdog groups, or researchers.
In August, we wrote YouTube a letter raising this issue and urging the company to work openly with independent researchers who are trying to understand the scale of this harm and identify solutions. About a month later, we met with YouTube to discuss our letter and understand more about what they are doing to tackle this problem. They acknowledged this problem and told us about what they are doing to fix it, including removing content that violates their community guidelines and reducing recommendations of "borderline" content. They indicated that they are working towards doing this in a more open manner, but we’re still waiting for them to release data to independent researchers working to understand this problem, and to provide evidence to substantiate their claims about the progress they’ve made so far.
The era of “trust us” is over. We can no longer simply take major tech companies at their word that they are working to solve content problems. The problems are too serious. The consequences are too serious.
Major tech companies need to provide tools and data to allow independent third parties to understand what kind of content they are amplifying through recommendations, and how that impacts what people see and believe on and offline. That’s why we’re calling on the leadership of YouTube to release a plan that shows us how they will change instead of telling us. YouTube’s plan needs to:
Commit to working with independent researchers to understand this problem
YouTube’s plan should lay out a clear timeline and process for how they will work closely and collaboratively with social science researchers from around the world, delivering to them the datasets that their work relies on.* They should move beyond approaches to simply “audit the algorithm,” and instead work meaningfully with researchers who study the interplay between the content that is uploaded, the algorithms that decide what to suggest, and the consequent impact of this amplification on communities. These researchers will not only help YouTube change, but will be able to assess if the changes that they are making are solving bigger real-world problems.
Provide evidence to back up the claims they’ve made about progress on this issue
YouTube’s plan must detail the efforts that they are undertaking within the company to solve these issues and provide evidence that backs their findings so far, acknowledging the challenges that they are currently facing in solving this problem. Any future claims that YouTube makes about progress towards solving these issues should also be backed up by verifiable evidence.
 Researchers need access to meaningful data including:
- Data to determine the scale of potential issues: A large-scale, representative sampling of public videos and channels, with their associated data. Data should include impression data (e.g. number of times a video is viewed, number of times a video is recommended, number of views as a result of a recommendation), engagement data (e.g. number of shares, upvotes/downvotes), text data (e.g. creator name, video description, transcription and other text extracted from the video), and/or other annotations (e.g. geographic origin of the video, whether a video was reported or considered for removal, tags or categories assigned to the video, whether the video was demonetized).
- Data that enables network analysis of the recommendation system: Data and/or sampling that enables the study of network structures (e.g. “nearest neighbor” or clustering of videos, how videos are introduced to a community via recommendations).
 Researchers need better simulation tools, including:
- The ability to mimic user pathways through the recommendation algorithm: Researchers can create research accounts that allow them to simulate user pathways through the recommendation system by populating their viewing history with sampling data.
- A tool simulating personalized recommendations: A “pathways” tool that allows researchers to study forward and backward recommendations (e.g. tool would provide videos or a sequence of video views like a playlist, tool would return the next set of recommendation(s), global average, or on some population segment of videos, as well as providing videos and returning other videos that lead to that video).
 Researchers need tools that empower, not limit, large-scale research and analysis. This means that:
- YouTube should not place restrictive rate limits on researchers doing large-scale analysis of the platform (e.g. change its existing API rate limit).
- YouTube should provide researchers with access to a historical archive of videos so that they can perform bulk analysis of these videos.