Reward Reports: An audit tool to better document AI

This is a profile of Reward Reports, a Mozilla Technology Fund awardee.

Anticipating how Artificial Intelligence (AI) systems will behave is a challenge for many designers. For example, we are seeing generative AI applications like chatbots do things that their designers didn’t anticipate or account for, like Microsoft’s Tay, a Twitter chatbot that was supposed to have conversations with people and began to post racist and offensive comments within hours of its launch.

This is an issue that could potentially be solved by better documenting AI systems.

Reward Reports, a project by Graduates for Engaged and Extended Scholarship in Computer Science and Engineering (GEESE), is a documentation framework for tracking an AI system's behavior over time. Instead of simply looking at the technical dimensions of AI, the project aims to look at how these systems affect human contexts and repeated interactions with the AI.

“Rather than just showing any particular time step, [like] what an algorithm has done or why it has shown you something on social media, Reward Reports will track something like how the user's mental health is changing when they're on social media for hours a day. How traffic is changing as self-driving cars become more common. How electricity usage is changing as parts of the electrical grid are automated using machine learning. Those sorts of system-wide behavioral shifts are the effects that we're tracking,” said Thomas Krendl Gilbert, Project Lead for Reward Reports.

Currently, AI system documentation consists of tracking the particular technical components of the design. This information includes details on algorithms, models, data sources, and other technical details that determine how the AI system operates. The documentation of self-driving cars, for example, would include testing how the car's hardware and software are running: are the sensors, cameras, and GPS working as expected?

“The current paradigm of documentation has two limitations. One is that documentation is currently static. In other words, it tracks just how a model was trained, and how accurate or inaccurate that model is at representing specific features. That’s all it does,” Gilbert said.

The current paradigm of documentation has two limitations. One is that documentation is currently static. In other words, it tracks just how a model was trained, and how accurate or inaccurate that model is at representing specific features. That’s all it does.

Thomas Krendl Gilbert, Project Lead for Reward Reports

This is a problem, as it says nothing about the performance or behavior of the model when integrated into larger systems.

“The second [limitation] is related to that. There is a difference between documenting a model and documenting the entire system. AI systems are comprised of many different components: the model, the data, the algorithm, implicitly the users, and all sorts of other things. And it's often the relationships between these components that are not documented, but are also the source of the most pernicious types of effects and harms,” Gilbert said.

This is what Reward Reports seek to address: empowering designers and users to document the entire system over time so that all of these feedback loops between users, between system components, or between the company and regulators, are able to be understood and made legible.

How a user interacts with Reward Reports would depend on what role they play. If a designer used it, they would be someone who regularly updates the reports by tracking what the system was designed to do, what the system is optimizing for, mapping that, and then recording how the system ended up actually behaving. If that's surprising to the designer, they can update the definition of “reward” or change how the system is optimized, and then see what happens next.

“If you're a [chatbot] user and it does something weird, or it spits out something odd, you might document that by going onto some kind of Reward Reports interface and putting in your experience. Later, a range of user feedback is aggregated and distilled into the next updated report. These reports will be issued regularly, in order to see exactly how this behavior manifests over time,” Gilbert said.

Procedures for documenting human behavior when interacting with AI include the use of biometrics (which come with a range of questions and challenges like privacy), conducting ethnographies of a social media platform, or interviewing users and soliciting self-reported feedback from them about how they think or how they feel when they log on or off every day.

The idea for Reward Reports crystallized when Gilbert, together with his collaborators, wrote a whitepaper that looked at the types of risks coming out of Reinforcement Learning (RL). Reinforcement Learning is a subfield of Machine Learning (ML) where “an agent tries to figure out the best actions to take in order to achieve a specific goal, by learning from the consequences of its actions. It's like a trial-and-error process, where the agent tries different actions and learns from the rewards or penalties it receives for each action,” Gilbert explained. This is used in a wide range of game-playing, apps and recommendation systems.

“People who work in AI safety have been really paying attention to RL for some time, but we didn't see sufficient work being done on risks in different human domains. That really became the seed for this larger idea, which is, you can think of any AI system, any machine learning system, whether or not it is RL specifically, as in effect, reinforcing people, reinforcing contexts, and reinforcing domains,” said Gilbert.

People who work in AI safety have been really paying attention to RL for some time, but we didn't see sufficient work being done on risks in different human domains. That really became the seed for this larger idea, which is, you can think of any AI system, any machine learning system, whether or not it is RL specifically, as in effect, reinforcing people, reinforcing contexts, and reinforcing domains.

Thomas Krendl Gilbert, Project Lead for Reward Reports

He became interested in how AI affects human beings in 2016 when he worked on an interdisciplinary doctorate in Machine Ethics and Epistemology. Gilbert applied the interdisciplinary and humanist training he'd already received through his background in philosophy and political theory. This synergy between the social sciences and computer science training is similar to what Mozilla’s Responsible Computing Challenge seeks to achieve.

“I was in grad school originally for training in social science, and applying that to the world of AI. So I was inherently motivated to bridge different types of understanding and different types of human contexts and semantics for thinking about what AI could be,” said Gilbert.

No other framework currently exists for what Reward Reports is trying to solve. This means their undertaking is challenging, to say the least.

“It's very ambitious, what we're trying to do. We're making the case to document AI differently. We're building industry collaborations and partnerships, at the same time that we're creating tools that permit a greater democratization of what we think AI could be, and how it could be written and talked about. We have to have many different kinds of balls in the air simultaneously,” Gilbert said.

The Mozilla Technology Fund (MTF) supports open-source technologists whose work furthers promising approaches to solving pressing internet health issues. The 2023 MTF cohort will focus on an emerging, under-resourced area of tech with a real opportunity for impact: auditing tools for AI systems.

Reward Reports: An audit tool to better document AI

This is a profile of Reward Reports, a Mozilla Technology Fund awardee.

Contenido relacionado