Content Moderation

Anti-Defamation League
Avaaz
Decode Democracy
Mozilla
New America's Open Technology Institute

Written by Anti-Defamation League, Avaaz, Decode Democracy, Mozilla and New America's Open Technology Institute

Today, most social media companies engage in content moderation to enforce their content policies, which determine what content, individuals, and groups are permitted on their services. While overbroad content moderation raises freedom of expression concerns, content moderation is important for addressing misinformation, disinformation, harassment, and racist or hateful content online. Generally, companies rely on a combination of human moderators and AI and ML-based tools to carry out their content moderation efforts, which include flagging, reviewing, and making determinations about content.

Companies typically deploy AI and ML-based tools during two stages of the content moderation process: pre-moderation and post-moderation. During the pre-moderation stage, content is reviewed before it is published on a platform. In this situation, if a user drafted a post containing misinformation or disinformation, a company’s content moderation tools could flag the content as a violation and prevent the user from publishing the post. However, this approach is best deployed when categories of content are clearly defined. This is because automated tools are often unable to accurately determine context and make subjective decisions, and so the risk of error and overbroad content removals increases when it comes to categories of content that have fluid definitions. As a result, pre-moderation of misleading content is less common.

During the post-moderation stage, companies moderate content that has already been published on a platform. For example, if a user shared a post containing misinformation, and another user or the company’s automated tools flagged the content as potentially violating the company’s content policies, the post could then be routed to a human moderator for review—or it could be automatically removed, depending on how automated content moderation tools are used.

Using AI and ML-based tools to identify and remove misinformation and disinformation online has certain benefits for both social media companies and consumers of these companies’ products and services. The first and most obvious benefit is efficiency. With billions of pieces of content surfacing on platforms, technology companies can use automated tools to scan content and make determinations on the nature of the content and whether it should be permitted on a service at scale. Another benefit of using AI and ML-based content moderation tools is that doing so allows companies to offload content moderation work from human moderators to automated systems. This can allow human content moderators to navigate a more manageable volume of content and protect them from gratuitous exposure to harmful content. As research and recent reporting have indicated, human moderators often experience adverse consequences from reviewing vast amounts of harmful content online, including mental health concerns and themselves becoming more susceptible to conspiracy theories.

However, if bias is incorporated into the design AI and ML-based content moderation tools, they can amplify harmful content and generate discriminatory outcomes. Automated systems learn to make decisions based on training data. However, training data is often representative of societal and institutional inequities, and datasets can be influenced by human prejudices. This can result in bias against certain communities or forms of content within an algorithmic system. There are numerous examples of internet platforms relying on biased AI and ML-based tools to make content moderation decisions that have resulted in harmful and discriminatory outcomes offline. For example, throughout 2020, Instagram’s content moderation system flagged and removed the hashtag #Sikh from its platform for an extended period of time. While Instagram alleged that the block was a mistake due to a report “inaccurately reviewed” by Instagram’s teams, large-scale implementation of this decision was ultimately fueled by AI. This is discriminatory, problematic, and perpetuates systemic racism by erasing voices of religious minorities, particularly as the incident occurred in the midst of protests in support of farmers in India, many of whom are Sikh. Instances such as these outline why it is so critical for companies to provide adequate notice to users who have had their content or accounts impacted by moderation, and to give these users access to a timely, scalable, and robust appeals process.

Another example of biased AI and ML-based tools producing harmful outcomes occurred in 2017, when Microsoft released a “teenage” chat bot that used AI, which was inadvertently programmed to shut down any conversation about religious identity or the Middle East. For example, if a human told the bot “I get bullied sometimes for being Muslim,” the bot would respond “so I really have no interest in chatting about religion,” or “For the last time, pls stop talking politics..it’s getting super old.” The bot gave a similar response when the words “Jew,” “Middle East,” “Hijabs,” and “Torah” were used, although it would not provide a similar response when a user discussed Christianity.

Additionally, social media companies’ content moderation efforts are also limited in that platforms have consistently had unequal content moderation support and capabilities for non-English content. Facebook, for example, claims to be available in over 100 languages; however, its content moderators speak only about 50 languages, and Facebook’s automatic moderation tools are only able to flag hate speech in about 30 languages. Fewer human moderators (and for some languages no human moderators) means there is less accurate training data for ML systems used to detect disinformation in non-English languages.

This is a self-perpetuating problem: non-English violative content is less likely to be seen by human moderators for review; therefore, someone who uses a social media platform in a language other than English may be more likely to be exposed to harmful disinformation.

This issue was clearly illustrated during the recent U.S. presidential election when APIAVote, a nonpartisan organization that mobilizes AAPI individuals in electoral and civic participation, expressed serious concern about how voters with limited English proficiency were vulnerable to voting disinformation spread on social media. In one article, APIAVote told Vox, “It appeared that certain communities were more vulnerable and targeted, with the information translated into their language and posted onto WeChat or Facebook.” Similarly, in 2021 the Ya Basta! Facebook coalition was formed after several nonprofit organizations discovered rampant disinformation campaigns targeting Latinx communities in the U.S. There are approximately 41 million Spanish speakers in the U.S. and millions consume Spanish language content on social media; however, several reports have found that social media platforms are falling short in addressing online disinformation campaigns that target Latinx communities. This is a gap that those companies must work to address.

Many automated content moderation tools are also limited in their ability to effectively moderate certain categories of content. Theoretically, if an automated content moderation tool is operating using clear definitions for a category of content and it is trained on a diverse and robust enough dataset, it should be able to flag violating content easily and more effectively than a human moderator. This is because these automated systems are easily scalable, meaning that once an AI system is trained, it can be duplicated. In comparison, training new human content moderators can be difficult, costly, and time intensive. However, the vast majority of automated content moderation tools are deployed against categories of content that have fluid definitions, such as misinformation, disinformation, hate, and extremism. These categories of content often require context and subjective understanding in order to determine the meaning of a word, image, or video, without relying on specific terms (for moderation of words or phrases) or hashes (for moderation of images and videos). Additionally, companies often change the parameters for these categories of content in response to real-world events. As a result, the effectiveness and accuracy of these systems is limited, as they can fail to flag and remove violating content or erroneously take action against content or accounts that do not violate a platform's policies.

The consequences of these limitations have been profound, often further marginalizing already vulnerable populations. On numerous platforms, disinformation about the November 2020 U.S. elections circulated widely, with bad-faith actors posting inaccurate information about voting logistics, hindering individuals’ ability to vote. Disinformation campaigns also cast doubt on election security in the United States, with claims about tampered votes and a stolen election rapidly gaining steam online. When companies fail to moderate misleading information in this context, it can prevent people from participating or trusting in the U.S. election process. This has a grave effect on democracy and reinforces systems of oppression. In order to help combat this, policymakers should clarify that offline anti-discrimination statutes, such as the Voting Rights Act, apply in the digital environment.

The limitations of automated content moderation tools were also especially visible in the wake of the COVID-19 pandemic. In March 2020, many companies, including Facebook, were unable to initially make use of their large human content moderator workforces due to the COVID-19 work-from-home requirements. Instead, many platforms increased their reliance on AI and ML-based tools for content moderation purposes. As a result, more content was flagged and removed than before, including posts featuring legitimate news articles about the pandemic, which were incorrectly flagged as spam. Simultaneously, numerous posts containing misleading information slipped through the cracks and continued to circulate online, including conspiracy theories about COVID-19’s existence, COVID-19 vaccines, testing, and symptoms.

When users share misleading information, it can establish echo chambers online, which can amplify and fuel extremism and hate. Additionally, since headlines around COVID-19, including disinformation, have dominated mainstream and social media for over a year, there has been a significant spike in hate and racism directed at the Asian American and Pacific Islander (AAPI) community—both offline and online. For example, in the days immediately following then-President Trump’s COVID-19 diagnosis, there was a significant spike in anti-Asian sentiment and conspiracy theories about COVID-19 on Twitter. Asian-Americans have experienced the largest single rise in severe online hate and harassment year-over-year in comparison to other groups. In this way, limited and flawed content moderation systems can exacerbate hate by amplifying misinformation and disinformation.

Although some platforms partner with third-party fact-checkers to identify potentially misleading content, the scale of these fact-checking efforts often varies. In addition, some platforms remove content that has been fact-checked and deemed to be misleading, while others opt to algorithmically reduce or label it. While these efforts could be helpful in combating the spread of misleading information there is a great deal of inconsistency in how fact-checking and alternative moderation techniques are applied, and there is a fundamental lack of transparency around what policies guide the implementation of these practices. This makes it difficult to monitor how platforms are combating misleading information online and hold them accountable for these efforts.

As content moderation tactics become more complex, some bad actors aiming to spread disinformation have also identified ways to circumvent both pre-moderation and post-moderation practices. For example, some individuals attempt to evade automated content moderation systems and obfuscate their messages by typing “C0v1D” instead of “Covid,” In response to such evasion efforts, many companies have invested significant resources in training their AI and ML-based systems to identify altered and duplicated versions of text, images, and other forms of communication, with the aim of augmenting their misinformation and disinformation moderation efforts—although as these bad actors adapt their evasion efforts, content may still slip through the cracks. While some platforms currently publish transparency reports outlining the scope and scale of their content policy enforcement efforts, very few platforms publish data around their efforts to moderate misleading content. Some platforms share this data, but in a disparate manner, which renders it hard to track and find, making it difficult to hold these platforms accountable.

Given the limitations of AI and ML-based content moderation tools, the best form of online content moderation is a combination of AI and human review. Decreased human oversight increases the risk of errors from automated systems, which can result in the amplification of hate, extremism, systemic biases, discrimination, and misleading information. For these reasons, it is essential that social media companies invest in their content moderation systems, increase resources for both human and AI content moderation, and work to decrease the harmful impact of biased AI systems to meet the goal of reducing disinformation on social media platforms. In addition, given that there is often a spike in misleading information surrounding major world events, companies should invest more resources in preparing for important world events could result in the spread of more misleading information.