This is a profile of Dr. Abeba Birhane, a Mozilla Senior Fellow in Trustworthy AI
By Shandukani Mulaudzi
Many of the AI systems we interact with everyday, like voice assistants and news feeds, have a polished feel - just what you’d expect in a product by a major tech company. But these sleek exteriors can hide something ugly.
The training datasets powering these systems are frequently riddled with harmful stereotypes about the most marginalized groups in our society. And tech companies do little to confront them. For example, a recent New York Times article revealed that when searching for the word "gorilla" or any term related to apes on Google or Apple’s photo apps, no results are displayed. This issue originated from an incident that occurred eight years ago, when a user discovered that Google's image recognition algorithm mislabeled Black people as gorillas. Despite the immense public outcry and Google's apology, the company decided to address the problem by removing all references to apes.
Dr. Abeba Birhane — a recent Mozilla Senior Fellow, current Mozilla consultant, and an Adjunct Professor at Trinity College in Dublin (Ireland) — would have taken a far different approach. She believes the matter should be addressed by combing through the data to correct racist and dehumanizing labels. And, just as importantly, by engaging with Black scholars’ work on the ape trope association.
“These problems are not just a matter of ‘fixing’ training data,” Birhane explains. “They are rooted in society and history, and grounded in context. You really have to go back in history to understand why we have such issues permeating datasets and models. But going back in history to investigate is not something people often do.”
Birhane is the exception: She spends much of her time analyzing datasets and uncovering the stereotypes and biases within them. She then shines a spotlight on these issues, stressing that there are no quick-fix solutions. Until models are built with the most marginalized in mind, she notes, nothing will change.
These problems are rooted in society and history, and grounded in context. You really have to go back in history to understand why we have such issues permeating datasets and models.
Dr. Abeba Birhane, Mozilla Fellow in Trustworthy AI
Although she holds a PhD in cognitive science, Birhane is not confined by specific academic disciplines. She believes pivotal insights are gained by bridging various disciplines and approaches. And so her work sits at the intersection of cognitive science, AI ethics, data auditing, and decolonial studies - particularly Afro-feminism.
“Investigating datasets from the perspective of Afro-feminism is not a common combination,” she says, “but it's something I find very helpful and most fruitful.”
During her 15-month Mozilla fellowship, which concluded in May, Birhane published a number of papers and articles about her work. Most recently, Birhane and her colleagues Vinay Uday Prabhu, Sang Han, and Vishnu Naresh Boddeti released a study titled “On Hate Scaling Laws for Data Swamps.” It reveals how expanding AI training datasets leads to disproportionately more bias and discrimination
Specifically, Birhane and others found that scale exacerbates negative racial stereotypes and dehumanization of Black bodies. An AI model trained on the bigger dataset was twice as likely to associate Black females with "criminal" and five times as likely to associate Black males with "criminal."
“Dataset size is a key element of the current AI arms race, with ‘scale the data’ as the common drive,” Birhane explains. “Some in the field claim scale is a solution to bias and discrimination - a way to drown out the noise. But this research shows the polar opposite is true: Scale only degrades the datasets further, amplifying bias and causing real-world harms.”
She continues: “Given how critical datasets are, we are not paying enough attention to cleaning them, detoxifying them, and making sure that various cultures, identities, and genders are represented in a just way. We have to look at our datasets because data is the backbone of AI models.”
During her fellowship, Birhane also assessed saliency cropping algorithms - the AI tool that automatically crops photos in social media feeds. This work was inspired by a Twitter user who experimented with the algorithm, witnessing it crop a Black man out of a photo and prioritize a white man. To investigate further, Birhane and her collaborators examined the cropping tools used by Twitter, Apple, and Google.
They discovered that the Twitter user's experience was not an isolated incident; the cropping tools consistently favored white individuals over Black individuals. In addition, they observed a tendency to objectify women by emphasizing their bodies rather than their faces, reflecting the "male gaze." Birhane wasn’t surprised: She believes it is essential to assume inherent biases in datasets until they are thoroughly investigated and proven otherwise.
It’s not just tech that overlooks the most marginalized, Birhane explains. It’s academia, too. When she shifted the focus of her PhD from embodied cognitive science to AI, Birhane quickly noticed that the field overlooked crucial social factors like the influence of power dynamics and historical injustices on human thinking.
Birhane faced resistance to her work and grew frustrated with the tendency to sideline these factors in the understanding of how people reason and engage with the world. This only motivated her to further apply a critical perspective to AI and ethics.
The path Birhane has chosen is not glamorous: She often describes it as cleaning up the mess created by big corporations. The images she encounters when auditing datasets can be extremely sexual or violent in nature, taxing her mental health and emotions. But she’s driven by the impact it has on marginalized communities.
“It’s a push and pull,” Birhane says. ”Somebody has to do this sort of work. And if we don't do this kind of work, people – often people that look like me – end up being harmed. People that are at the margins of society. People that are most disenfranchised.”