Auditing AI: Announcing the 2023 Mozilla Technology Fund Cohort

These eight projects — all focusing on auditing tools for AI systems — comprise the second-ever Mozilla Technology Fund (MTF) cohort

(FRIDAY, MARCH 3, 2023) — About one year ago, we welcomed our inaugural Mozilla Technology Fund cohort, focused on reducing the bias in and increasing the transparency of artificial intelligence (AI) systems. We funded research labs, art projects, and other awardees that did everything from measuring bias in voice assistants to exposing the inner workings of social media recommendation engines.

Building on that momentum, we’re excited to announce the 2023 Mozilla Technology Fund cohort. These awardees will focus on an emerging, under-resourced area of tech with a real opportunity for impact: auditing tools for AI systems. Auditing processes help build accountability for the AI systems that play an increasingly important role in our daily lives.

The 2023 cohort includes eight open-source projects, each receiving up to $50,000 to build tools and provide support to AI auditors. These projects will help fuel the growing AI transparency space, and also contribute to the learnings and community of the Open Source Audit Tooling (OAT) Initiative, a project developed and led by Mozilla Fellow Deb Raji.

Says Mehan Jayasuriya, Senior Program Officer at Mozilla: “We are excited to have this group of open-source technologists joining us. Our goal is to help these teams with the resources necessary to unlock their full potential and make their projects sustainable in the long run.”

Our goal is to help these teams with the resources necessary to unlock their full potential and make their projects sustainable in the long run

Mehan Jayasuriya, Senior Program Officer

Read about the projects:

Reward Reports | U.S. | by Graduates for Engaged and Extended Scholarship in Computer Science and Engineering (GEESE)

Current AI documentation tools are able to account for model outputs, but not for outcomes generated by the system itself over time. For the past year, the team at the Graduates for Engaged and Extended Scholarship in Computer Science and Engineering (GEESE) have developed Reward Reports to address this gap. Reward Reports is an audit tool for comparing a system’s observed behaviors with the assumptions and expectations of its designers.

Inspired by reinforcement learning, Reward Reports interprets the sequential decisions that guide system optimization and the distinct types of feedback that make that optimization possible. As such, Reward Reports helps designers audit their own assumptions and biases about the terms of deployment and the domain in which the AI system operates.

Evaluation Harness | U.S. | by Big Science

Evaluation Harness is an open-source tool for the evaluation of large language models. The framework defines a flexible API for both model and evaluation implementation and handles all of the work to orchestrate evaluations internally. Historically, the team has used prior datasets and proposed prompts for 0-shot and few shot evaluation of language models. Their current project focus is to provide additional datasets with human judgements for summarization in multiple languages, to allow richer evaluation of multilingual language models.

AI Risk Checklists | by Responsible AI Collaborative

A common approach across all risk audit processes is risk identification. Once risks are identified, auditors determine their probability and impacts. While several firms dedicated to the audit of AI have built their own risk checklisting artifacts, the commercial checklists are considered proprietary and confidential. In short, there is an incipient closing of the core processes and technologies backing AI audits. The Responsible AI Collaborative believes far greater social impacts can be achieved if risk checklisting is shared more broadly by both audit practitioners and the general public. Their plan to enable open-source AI risk checklisting is to build on top of the code, data, and metadata of their collaborative open-source project the AI Incident Database.

CrossOver | Finland | by Check First

CrossOver simulates social media and big platform users to collect data about recommended content, and then compares it to data provided by the platform's own APIs when available. Its monitoring devices are hosted by volunteers at residential addresses. Monitored websites are Twitter, YouTube, Facebook pages and groups, Google search predictions, Google News, Reddit, Odysee and Mastodon.

Building on top of their pilot project in Belgium, they will add 14 monitoring devices to seven French-speaking countries around the world: Canada (Québec), Democratic Republic of the Congo, France, Mali, Morocco, Senegal and Switzerland (Romandie) And with the help of partners from the investigative media sector they will initially focus on Russian influence, energy crisis and costs, inflation, the war in Ukraine, and the COVID pandemic and health issues.

AI Forensics | France | by Algorithms Exposed

AI Forensics is building a free-software toolkit to make investigations on TikTok and YouTube's recommendation engine accessible to a wider and more diverse community of researchers.

The team has been pioneering methodologies to investigate social media recommender systems for more than seven years. They have developed various tools to support this research. Although most of them are already released as free software, the code-base is quite difficult to work with, and is not well tested and documented. The goal of this project is to make these tools more accessible, in order to increase the audit capacity of the research community, and hold platforms more accountable to their users and to the law.

In particular, they will promote our tools to communities in the global south, who are often left out of research on algorithmic biases, despite being most directly impacted.

Zeno | U.S. |by Carnegie Mellon University

Zeno is a framework for evaluating and auditing AI models. It combines an extensible Python API with an interactive UI to empower anyone to explore how complex AI systems behave.

Countering Tenant Screening | U.S. | by Wonyoung So

Countering Tenant Screening Using Tenant Screening Data aims to develop a crowdsourcing tool and/or campaign to audit tenant screening services used by property owners. It seeks to reveal the patterns of the inner algorithms, data structures, and representations, by collecting tenant screening reports as well as denied renters’ experiences.

Gigbox | U.S. | The Workers’ Algorithm Observatory (WAO)

The Workers’ Algorithm Observatory (WAO) is a crowdsourced auditing collaboration for investigating black-box systems in the platform economy. Investigations of platform economy algorithms, such as those used by Uber and DoorDash, face major challenges in accessing the data necessary for a meaningful audit. To solve this, the team behind WAO is developing an algorithm observatory for workers and allies to audit the black-box algorithms behind platforms through crowdsourced data collection among peers.

Press contact: Kevin Zawacki | [email protected]