Examining the Human Labor Behind AI

http://trk.network/ is a tool and advocacy initiative by Caroline Sinders spotlighting unjust labor in the machine learning pipeline

Caroline Sinders is a 2019 Mozilla Fellow.

There’s a persistent myth about the AI systems that power our social networks, video platforms, and online shopping sites: That AI is exclusively built, trained, and maintained by highly-paid engineers.

While companies like Amazon and Facebook do rely on highly-paid computer scientists, they also rely on a sprawling workforce that internet users don’t hear about. Around the world, many thousands of gig workers label data and train algorithms that power consumer technology. And these gig workers are rarely paid a fair wage or provided with basic benefits.

In short: There’s an invisible, ill-treated workforce that makes today’s AI possible.

When I began my Mozilla Fellowship over one year ago, I sought to help pull back the curtain on this invisible workforce, and also build tools to address these inequalities. I spent months interviewing the gig workers who train algorithms and face disenfranchisement and obfuscation.

Today, I’m launching http://trk.network/ — it’s part tool, part advocacy campaign, and part research report, focusing on the wage inequality and labor injustice that permeates the machine learning pipeline. The project entails:

A wage calculator that reveals how little gig workers are paid per hour

This tool isn’t just calculating a price for tasks. Instead, the calculator reveals how underpriced data labeling or data training tasks really are. Unlike other tools, this calculator takes into account workdays and a living wage — not just pricing a bunch of tasks in aggregate. The calculator defines a living wage by the standards of Washington State, where Amazon is headquartered. Washington’s minimum wage is around $11 — much more than Amazon Turkers are generally paid.

This tool can be used by journalists to determine whether a company is paying fair wages. It can be used by workers to determine if they’re being fairly compensated. And it can be used by employers who want to offer a fair wage .

Design research investigating the labor injustice and wage inequality that permeate the gig economy

I spoke with more than one dozen gig economy workers from CrowdFlower, Mechanical Turk, and Fiverr,  as well as researchers and professors who analyze labor and worker inequality on Mechanical Turk.

Furthermore, after interviewing artists, researchers in labs, startups, and employees of big technology companies, I realized a major takeaway: Even those that wanted to be “ethical” or create equity in their practices and pipelines, still had a hard time understanding pricing in Mechanical Turk. Meaning, even if they wanted they weren't pricing fairly, because tools like Mechanical Turk don’t show a calculation in relationship to time spent on a task. People were accidentally under-pricing work, even when they were trying to give a good wage.

As an AI artist and researcher, I believe analyzing the entire AI pipeline from data gathering and data set structure to data set labeling and data modeling training — as well as the structure and creation of algorithms — is important in confronting harmful AI and bias. All of these different sets should be analyzed separately and also together in terms of studying AI’s potentially negative impacts in society.

This work is part of a bigger research initiative of mine called Feminist Data Set. Feminist Data Set is a multi-year project that interrogates every step of the AI process that includes data collection, data labeling, data training, selecting an algorithm to use, the algorithmic model, and then designing how the model is then placed into a chatbot (and what the chatbot looks like). Every step exists to question and analyze the pipeline of creating using machine learning — is each step feminist, is it intersectional, does each step have bias and how can that bias be removed? Pedagogically, Feminist Data Set operates in a similar vein to Thomas Thwaites’s “Toaster Project,” a critical design project in which Thwaites builds a commercial toaster from scratch. Feminist Data Set, however, takes a critical and artistic view on software, particularly machine learning. What does it mean to thoughtfully make machine learning, to carefully consider every angle of making, iterating, and designing? Every step of this process needs to be thoroughly re-examined through a feminist lens.

Often the tools I need to make Feminist Data Set don’t exist. For example, what is a feminist data training platform? Mechanical Turk and CrowdFlower, which underpay their workers, are not intersectionality feminist, so I can’t use them.

A free and open-source training tool

This tool can be used to label and train image and text datasets for machine learning. It can be used by gig workers themselves, but also artists and researchers who would prefer an open-source alternative to Amazon's Mechanical turk.

As a creator, artist, and designer, I care a lot about the structure and intention of my own tools — who makes them, how were they made, and what is the power associated with the tool I’m using? How does the maintainer or company of this tool interact with society writ large? For me, using open source when I can is as much of a political statement as a personal one. Open source isn’t perfect and isn’t a panacea when it comes to technology inflicting harm in society, but a focus on better, more accessible, and more usable designed open source tools can help provide alternatives to for-profit tools maintained by companies with questionable values. As a designer, I believe that what I make should be useful and usable for others, not just a tool for myself. Especially as a designer who works in open source, when I make something, it’s important that it’s easy for anyone to use, regardless of technical background. This was my inspiration behind making a browser-based tool that anyone could use to train and label data sets. The point here isn’t to just offer up a codebase and hope that it’s a solution. By providing a tool with usable interfaces, and good design, now anyone has a browser-based, ready-to-go tool for data labeling and training. Design is as political as code and policy — it helps make things useful for all users, not just engineers.

Inequality in the AI space is an urgent issue that needs addressing as AI technologies become more and more common. And it’s work that’s especially relevant now, when millions of unemployed and quarantined individuals are turning to remote gig work.