Meager and inadequate: A quantitative analysis of YouTube’s user controls

SUMMARY

Overview

Our qualitative research revealed that YouTube’s current feedback tools leave people feeling frustrated, unable to control what they see. We wanted to learn whether the kinds of experiences our survey participants described were backed up with data: How does using these user controls impact the kinds of recommendations people get?

To answer this question, we ran a randomized controlled experiment across our community of RegretsReporter participants that could directly test the impact of the user control options that YouTube offers. This experiment allowed us to test how YouTube's user control features impact what videos people are recommended.

The extension

The current version of our RegretsReporter extension is designed to answer the quantitative research questions posed in this study:

How effective are YouTube’s user controls at preventing unwanted recommendations?
Can adding a more convenient button for user feedback increase the rate at which feedback controls are used?

After installing the extension, a “Stop Recommending” button is added to every video player or recommendation on participants’ YouTube.

Pressing that button sends a signal to YouTube that the participant doesn't want recommendations similar to that video. Depending on which experiment group the participant is part of, clicking the button will send one of many different types of feedback to YouTube (e.g. “Do Not Recommend Channel”, “dislike”, etc.), or it will send no feedback at all if the participant is in the control group.

For these participants who’ve opted into our research, the extension keeps track of which videos the “Stop Recommending” button is pressed for and what videos YouTube subsequently recommends.

Throughout this report, we use the following terms that describe aspects of the study:

Terms used in the study

Rejected video

In our study, to reject a video is to press the “Stop Recommending” button on it. This allows the participant to express that they do not want to see recommendations like this in the future and will (except for the control group) send a user control signal to YouTube to express this.

Video pair

A video pair is made up of a rejected video and a video that YouTube subsequently recommended. After a video is rejected, all following recommended videos will be paired with that rejected video for analysis. For example, if a participant rejects one vaccine skepticism video, and is later recommended a cat video, a music video, and another vaccine skepticism video, each of these recommendations will represent a pair with the rejected vaccine skepticism video.

Bad recommendation

A bad recommendation is a video pair for which the rejected and recommended videos are too similar according to our policy. Whether a video pair is a bad recommendation or not may be assessed by one of our research assistants, or by our machine learning video similarity model.

Bad recommendation rates

Our analysis is based on a metric that we refer to as a bad recommendation rate. When we analyze the video pairs for each experiment group, the proportion of those pairs that are classified as bad recommendations is referred to as the bad recommendation rate for that group.

Findings

1. YouTube’s user controls are inadequate tools for preventing unwanted recommendations.

Our study found that YouTube’s user controls do have a measurable impact on subsequent recommendations. But contrary to what YouTube suggests, this effect is small and inadequate to prevent unwanted recommendations, leaving people at the mercy of YouTube’s recommender system.

Research setup

In order to understand what kinds of recommendations people see after using YouTube’s controls, we designed the experiment so that there were different experiment groups to compare: a control group (with users for whom no feedback was sent) and four treatment groups (groups of users for which the “Stop Recommending” button sent different types of feedback signals to YouTube). People who signed up for RegretsReporter were randomly assigned to one of these five groups.

Our control group helped us set the baseline. People who were part of our control group had the option to reject videos by clicking the “Stop Recommending” button, but no feedback would be sent to YouTube. Using the data collected from this group, we were able to calculate the baseline “bad recommendation rate” — YouTube’s normal recommendation behavior without user feedback. By comparing the results of other experiment arms against this baseline rate, we were able to measure the effectiveness of YouTube’s user controls.

Our treatment groups were then compared against that baseline. For them, clicking the button sent one of four different types of feedback to YouTube, either “Dislike”, “Don’t recommend channel”, “Not interested”, or “Remove from watch history.”

When we compared the bad recommendation rates for each of these four groups to the baseline rate from the control group, we found that each control does slightly reduce the bad recommendation rate relative to the baseline. However, participants are still served many bad recommendations. This demonstrates that YouTube’s user controls are not very effective at doing what people might expect them to do.

Data analysis

In order to compare the experiment arms against one another, our research assistants reviewed about 40,000 pairs of videos and labeled them according to similarity. The goal was to determine whether the videos participants were being recommended were similar to videos they had rejected in the past, so that we could calculate the “bad recommendation” rate. We also used this data to train a machine learning model to analyze similarity for the rest of the video pairs.

What does a “bad recommendation” look like in practice? Below are some examples of videos that participants rejected (on the left), alongside videos that were subsequently recommended (on the right).^[1] These examples demonstrate that YouTube continues to recommend videos that people have clearly signaled they do not want to see, including disturbing content like war footage and gruesome horror clips.

Trigger warning: Gruesome and disturbing images appear in the following section.

Examples of reject/recommend video pairs

Rejected video

Title: Tucker: Justin Trudeau is attacking human rights
Channel: Fox News

Interpretation of findings

In our analysis of the data, we determined that YouTube’s user control mechanisms are inadequate as tools to prevent unwanted recommendations.

To illustrate how we came to that conclusion, we’ll walk through the various analyses we carried out and how we interpreted the data. Specifically, we’ll look at how recommendations are impacted not just by the type of feedback signal (e.g. “dislike” versus “do not recommend channel”), but also things like channel, recommendation types (homepage versus sidebar), and time since feedback (1 week versus 4 weeks). Overall, a consistent theme is that some of these tools have a small effect on improving recommendations but are inadequate as tools for exercising meaningful control.

Analysis: Type of feedback signal

Summary: YouTube’s “don’t recommend channel” and “remove from history” controls work better than others — but still don’t work very well at all.

Bad Recommendation Rate: Overall

For this analysis, we looked at the different feedback signals people can send YouTube. In the graph above we see bad recommendation rates for the five different experiment groups: control, “dislike,” “don’t recommend,” “not interested,” and “remove from history.” After calculating the bad recommendations rate^[2] for each group, we compared them against one another and determined that:

Using the “dislike” and “not interested” buttons seemed to slightly decrease the bad recommendation rate, so we can say they are marginally effective. But the impact on bad recommendations was very small.
The “don’t recommend channel” and “remove from history” buttons had slightly greater effectiveness, but our data still showed that the tools are inadequate and that people were still being served many bad recommendations after using these tools.

We don’t know how YouTube handles this feedback internally, but it is interesting to note that the more “effective” methods might be interpreted by users as specific instructions, whereas the “less effective” controls might be interpreted as expressions of user preferences:

“Don’t recommend channel” might be interpreted by users as a relatively clear instruction: don’t show me this channel (although people might also consider it an expression of preference). We can attempt to confirm or disprove how effective the control is by comparing how it performs against those expectations.
“Remove from history” might also be interpreted as a fairly clear instruction, and can be confirmed by the user (using YouTube’s history browser). However, it’s not completely clear how this signal should influence future recommendations.
“Dislike” and “not interested” might be interpreted as expressions of user preferences. These signals are less clear about how that preference will be accommodated.

As “don’t recommend channel” and “remove from history” are more effective, our assumption is that they send a stronger signal, whereas “not interested” and “dislike” send a weaker signal to YouTube. However, we do not know exactly how YouTube’s algorithm interprets various feedback signals because the platform does not make specific information available about the recommendation system’s parameters, inputs, and how people can adjust them.

Analysis: Channel

Summary: The “don’t recommend channel” control does have some impact even on similar videos from other channels, but does not consistently prevent recommendations from the unwanted channel.

Bad Recommendation Rate: Different Channels

For this analysis, we looked at only those video pairs where the two videos came from different channels. For instance, someone clicked “don’t recommend” on a video from Jordan Peterson’s channel and then got recommended a video from the Fox News channel.

In the graph above, we visualized the differences in bad recommendation rates across those video pairs. As you can see, the “don’t recommend channel” button is still the most effective tool, even when it’s different channels that are being recommended. There could be many reasons for this: Perhaps people are seeing fewer videos from that channel so there are fewer clicks on those types of videos, changing YouTube’s understanding of the user’s interests over time. Or perhaps YouTube interprets “don’t recommend channel” as more generalized negative feedback.

Handling of “Don’t recommend channel”

Compared to the other user controls we tested, people might have a clear idea about what the “don’t recommend channel” button is meant to do — block a channel from recommendations. We can analyze whether the same channel continues to pop up after a participant clicks this button.

For people in our control group, about 0.4% of subsequent recommendations after a rejected video were from the same channel as the rejected video. Meanwhile in the “don’t recommend channel” group, we see this rate drop to about 0.1%. In other words, telling YouTube to stop recommending a channel seems to have the impact you might expect in most cases: fewer videos from that channel.

However, this feedback doesn’t appear to be consistently respected. In about 0.1% of the video pairs we analyzed, a recommendation was made from the same channel. Even when we limited our analysis to just a one-week time period between rejected video and recommendation, we saw the same pattern persist.

Our data does not allow us to confirm that the participant has not given YouTube a reason to ignore the “don’t recommend channel” feedback, perhaps watching a video from that channel that shows up in a search result, but it seems unlikely that this explains all the cases that we see. It appears very likely that “don’t recommend channel” doesn’t always work. This echoes one of the themes that emerged in our qualitative research: Many people said that they continued to get recommended similar videos from different channels even after clicking the button. Our participants feel that they don’t have much control over their recommendations, and our data backs up these experiences.

Analysis: Recommendation types

Summary: YouTube’s controls are slightly more effective for homepage recommendations and slightly less effective for sidebar recommendations.

For this analysis, we looked at the impact on recommendations in different locations: the sidebar and the homepage.

Comparison of Youtube's Homepage UI vs Youtube's Sidebar UI

Bad Recommendation Rate: Homepage vs. Sidebar

As illustrated in the graph above, homepage recommendations seem to be impacted more by user feedback than sidebar recommendations. One possible interpretation could be that sidebar recommendations are made in a higher-information context (YouTube knows what video you are currently watching) and so it is easier to optimize for engagement and there is less weight given to user feedback. For homepage recommendations, YouTube has less ability to know what might be engaging at the moment and so considers user feedback slightly more.

Analysis: Time between reject and recommendation

Summary: It does not appear that the effectiveness of YouTube’s controls changes over time; however, changes in popular content over time or a user interest model that assumes interests decay over time result in lower bad recommendation rates over time.

Bad Recommendation Rate: Time

For this analysis, we looked at the amount of time that had passed between when a video was rejected and when another video was recommended. There were three time periods we looked at: within one week, within four weeks, and anything more than four weeks.

Although the impact of user feedback doesn’t significantly change based on the time between rejection and recommendation, we do see that the overall bad recommendation rates do drop as time between rejected video and recommendation increases. This might be explained by a general shift over time in the kind of content YouTube recommends, as well as YouTube using an interest model that decays over time: If YouTube’s algorithm determines that a user is interested in a topic, but that user doesn’t watch many videos on that topic, YouTube may slowly decrease its estimate of the user’s interest in that topic as time passes.

2. An alternative UX can double the rate of user feedback on YouTube.

One of the problems identified in Simply Secure’s usability audit of YouTube was that the platform’s tools are not easy to use. There are very few options available for people to “teach” the algorithm beyond the handful of tools we’ve discussed, which are all reactive. Ideally, there should be tools made available to YouTube users that would allow them to define their interests, express their preferences for recommendations, and more actively shape their overall experience.

Even when people try to use YouTube’s user controls, there are very basic obstacles to giving feedback. In this next section, we’ll talk about how the UI of RegretsReporter was designed as an experiment to test how a different interface might encourage more feedback.

YouTube offers its feedback tools through a couple different user interfaces:

“Don’t recommend channel” and “Not interested” are available through the three dots menu on a recommendation.
“Dislike” is available only in the video player screen.
Removing a video from watch history requires navigating to the history tab on YouTube, finding the appropriate video, and then clicking on the “X” icon next to it.

Our extension makes submitting feedback easier. Feedback can be provided with a single click on a recommendation — YouTube’s normal methods all require at least two clicks from a recommendation. In our study we investigated the degree to which this improved design increased the rate of user feedback submission.

To analyze this, we included a special UX-control group. For participants in the group we do not show our “Stop Recommending” button and they are thus unable to reject videos – but they can still use YouTube’s native user controls. For these participants, the extension has no apparent effect, but we still collect standard data. This allows comparisons of the frequency of user feedback submitted between those that do and do not see the “Stop Recommending” button.

We find that participants that see our button submit about 80 pieces of feedback per 1,000 videos watched, while those that do not see the button submit only about 37. This difference is not statistically significant due to enormous variation between participants, but if we restrict the analysis to participants with 50 or fewer feedback submissions to reduce variance, we still see a similar relationship (50 and 23 feedback submissions per 1,000 videos watched respectively) and a strong statistical significance. It is clear that adding our button more than doubles the rate of feedback submission.

In order to anticipate arguments that our button might be obtrusive and reduce video watch rate on YouTube, we also analyzed the video watch rates per participant in these two groups and found no statistically significant difference. Actually, the watch rate was higher among participants that saw the button.

Takeaway

Through our controlled experiment, we were able to measure the effectiveness of YouTube’s user control tools for preventing unwanted recommendations. While some effectiveness was observed for each tool, even the most effective tools were inadequate for preventing unwanted recommendations. Our research suggests that YouTube is not really that interested in hearing what its users really want, preferring to rely on opaque methods that drive engagement regardless of the best interests of its users.

Footnotes

[1]

View a selection of video pairs here: https://drive.google.com/file/d/19bqoM6YIJttNt_4x14ZW-DixcAYtC1cl/view or download them from our JSON endpoint: https://public-data.telemetry.mozilla.org/api/v1/tables/telemetry_derived/regrets_reporter_study/v1/files/000000000000.json.
[2]

Note that these rates were calculated over video pairs assessed by our RAs and thus the absolute rates are not representative of all recommendations. However, the comparisons that our analysis is based on are still valid.

Keep Scrolling For

Recommendations