New research by Mozilla Fellow Dr. Abeba Birhane and others reveals how expanding AI training datasets leads to disproportionately more bias and discrimination

(JULY 3, 2023) — As AI companies race to scale the datasets that power generative multimodal models like Stable Diffusion, they are also disproportionately scaling the amount of hateful content and its downstream effects, according to new research published today.

The paper — titled “On Hate Scaling Laws for Data Swamps” — was authored by Mozilla Senior Fellow Dr. Abeba Birhane, alongside Vinay Prabhu (Aksha.ai), Sang Han (Aksha.ai) and Vishnu Naresh Boddeti (Michigan State University).

Dr. Birhane and her colleagues audited the text of two popular open-source multimodal (image and text pairs) training datasets: LAION-400M, which consists of 400 million samples, and LAION-2B-en, which consists of 2 billion samples. They found that hateful content, negative stereotypes, and other harmful representations appeared at a significantly higher rate — nearly 12% — in the larger dataset.

The research also reveals the downstream impact dataset increase can have on models, through the investigation of two models trained on these two datasets. For example, when the Chicago Face Dataset was passed through the larger model, it associated Black individuals with criminality up to five times more frequently than the smaller model.

Says Dr. Birhane, an Ireland-based cognitive scientist: “Larger scale is a key trend in the current AI arms race, with ‘scale the data’ as the reigning sentiment. Some in the field claim scale is a solution to bias and discrimination — a way to drown out the noise. But this research shows the polar opposite is true: Scale only degrades the datasets further, amplifying bias and causing real-world harm.”

This research shows scale degrades datasets further, amplifying bias and causing real-world harm.

Dr. Abeba Birhane, Mozilla Fellow

The study’s methodology entailed sampling text from both datasets and running those samples through a Natural Language Processing framework calibrated to detect hateful content. The sample content was then assigned a quantitative score in three categories: “hateful,” “targeted,” and “aggressive.” Many of the samples that emerged included explicitly misogynistic, racist, and violent text.

KEY RESEARCH FINDINGS

The larger dataset resulted in 12% more instances of hate. The amount of hateful content didn’t scale in proportion to the amount of content in general. Indeed, the LAION-2B-en dataset contained 12% more instances of hateful content, negative stereotypes, and other harmful representations.

This increase can have significant downstream effects. When the larger model was applied to the Chicago Face Dataset, it was twice as likely as the smaller model to associate human faces with offensive classes like “criminal” or “suspicious.” Further, the larger model was twice as likely to associate Black females as “criminal,” and five times as likely to associate Black males as “criminal.”

Meanwhile, dataset scaling is becoming routine. The race to scale AI datasets has become a fixation within the field. There is also a rampant misconception in the industry that scaling correlates with better model performance and can eliminate hateful content and downstream harms.

There are solutions. Dr. Birhane and her colleagues conclude their paper with a number of recommendations for AI practitioners, especially at large tech companies. They urge companies to better audit their datasets in-house; introduce standardized metrics for measuring hateful content; provide transparency into their auditing process; and more.

_____

Press contact: Kevin Zawacki | [email protected] | +1 (914) 837-4333