How to Make AI Audits Societal, Not Just Technical

Lessons from the Data Nutrition Project’s 2023 convening

By Kasia Chmielinski, Sarah Newman, and Matt Taylor

Increased reliance on AI and ML systems brings with it increased scrutiny into how they work. Yet many auditing proposals are largely technical, lacking social, qualitative, domain-specific context, and thus potentially scoring differently in technical reviews than these models will perform in the wild. There is growing awareness that algorithmic harms are sociotechnical in nature. In other words, their societal context (and actual versus intended use) dovetails with their technical capacity in their impacts.

What might audit practices that are both societal and technical in nature look like? What are the major challenges and opportunities faced by the diverse and rapidly growing auditing community? To answer these questions, the Data Nutrition Project, with the support of the Mozilla Data Future's Lab, designed and hosted a multistakeholder convening to build community, surface best practices and current challenges around the sociotechnical auditing of algorithmic and other statistical systems.

The virtual convening took place over two days in September 2023 and included over thirty practitioners with representation from academia, industry, civil society, and foundations. The cross-sector nature of the conversation was unique considering the siloed nature of the current tech ecosystem, and enabled conversations that included respectful agreement and disagreement, which we feel is critical to surfacing real issues and opportunities in the space.

The first day was designed to build community, spark ideas, and surface known challenges and already existing interventions in sociotechnical auditing. Some of the major challenges surfaced by the group whiteboard session included:

Representation issues

Under-representation of specific communities in the audit process (developing economies, non-English speakers, historically under-represented races and ethnicities, gender representation)
Audits and evaluations do not highlight all experiences equally (deployers, users, downstream stakeholders)

Audit standards

The push for one-size-fits-all audit is unnecessary or unrealistic
There is currently no consensus or clear regulatory mandate for what audit / evaluation ought to consist of, so what they end up being will be shaped by audits people conduct and public contestation around whether they are adequate or not. This is an opportunity to expand what an audit can and should be.

Contents of an audit

We don’t need to start from scratch; we can leverage existing mechanisms, artifacts, or shared components
Inflexible auditing practices do not leave space for iterative auditing of features and values in a system that vulnerable users care about as they emerge (with users identifying those values)
Lack of red-teaming best practices at scale; adversarial testing norms allow for more transparency and ability to audit system design/mitigations
Audits need to cover multiple levels of analysis, including HCI and societal/ structural factors that co-determine whether an AI system in a particular context is safe

Output and utility of audits

Audit and documentation can help with development, not just after the fact. Artifacts produced throughout development can be used to inform launch decisions (by reviewing decisions/processes throughout development)

Incentives for Audits

There is currently no external regulatory framework in place that calls for audits. This means there is a dependency on organizations building AI systems to decide to internally audit their systems, an added cost that not all organizations choose to take on.
- Lack of public trust in organizations to do internal audits without significant transparency prevents internal audits from seeming like a robust solution.

On day two, we focused on three areas that emerged from the previous day's work, broadly around mechanisms and workflows, potential collaborations, and designing audits for impact. Some of the highlights are described below:

What mechanisms & development workflows exist that we can apply to audit frameworks? How do we use these to build audit frameworks that scale?

audit decomposition along the AI lifecycle,
workflow integration,
use of datasheets / DNL / model cards / common AI components

Which collaborative efforts are necessary to support holistic socio-technical audits? How do we enable this sort of collaboration?

regulator + industry + civil society + communities,
harms registries,
third party bodies,
regulatory approaches to supporting collaboration

What happens during and after the audit? How do we design audits, in process and output, that are usable and meaningful (ie, that are motivating of behavior change)?

audit decomposition along the AI lifecycle,
domain specificity,
user experience,
transparency of audit results (or existence of audit)

We are grateful to the dedicated participants whose expertise and insights made our recent convening a success. The valuable lessons gleaned from this gathering will play a pivotal role in shaping the next phase of our work at the Data Nutrition Project. As we investigate dataset documentation validation with the aid of subject matter expertise, we invite you to stay connected and learn more about our ongoing endeavors at datanutrition.org. Feel free to reach out – the conversation continues, and we welcome your involvement!