Exploring better data stewardship at Mozilla

Over the last few years, Mozilla has increasingly turned its attention to the question of ‘how we build more trustworthy AI?’ Data is at the core of this question. Who has our data? What are they using it for? Do they have my interests in mind, or only their own? Do I trust them?

We decided earlier this year that ‘better data stewardship’ should be one of the three big areas of focus for our trustworthy AI work.

One part of this focus is supporting the growing field of people working on data trusts, data cooperatives and other efforts to build trust and shift power dynamics around data. In partnership with Luminate and Siegel, we launched the Mozilla Data Futures Lab in March as a way to drive this part of the work.

At the same time, we have started to ask ourselves: how might Mozilla itself explore and use some of these new models of responsible data governance? We have long championed more responsible use of data, with everything we do living up to the Mozilla Lean Data Practices. The question is: are there ways we can go further? Are there ways we can more actively engage with people around their data that builds trust -- and that helps the tech industry shift the way it thinks about and uses data?

This post includes some early learning on these questions. The TLDR: 1. the list of possible experiments is compelling -- and vast; and 2. we should start small, looking at how emerging data governance models might apply to areas where we already use data in our products and programs.

Digging into more detail: we started looking at these questions in 2020 by asking two leading experts -- Sarah Gold from Projects by IF and Sean McDonald from Digital Public -- to generate hypothetical scenarios where Mozilla deployed radically new approaches to data governance. These scenarios included three big ideas:

Collective Rights Representation: Mozilla could represent the data rights of citizens collectively, effectively forming a ‘data union’. This could include negotiating better terms of service or product improvements, or enforcing rights held under regimes like GDPR or CCPA.
Data Donation Trust: As Mozilla projects like Rally, Regrets Reporter and Common Voice demonstrate, there can be great power in citizens coming together to donate and aggregate their data. We could take these platforms further by creating a data trust or coop to actively steward and create collective value from this data over time.
Consent Management via a Privacy Assistant: a digital assistant powered by a data trust could mediate between citizens and tech companies, handling real time ‘negotiations’ about how their data is used. This would give users more control -- and ultimately more leverage over how individuals and companies manage data.

Other scenarios included Mozilla as a consumer steward, creating and building an advocacy infrastructure platform, or managing an industry association. Sarah and Sean have each written up their work and shared in these blog posts: Bringing better data stewardship to life; and A Couch-to-5K Plan for Digital Governance.

This reflective process was at once exciting and sobering. The ideas are compelling -- and include things we might do one day (and that we’re even doing now in small ways). But, by their nature, they are without context, leadership or products. Reading these scenarios, the path from a ‘big data governance idea’ to something real in the world wasn’t at all clear to us.

As Sean pointed out in his post: “There isn’t ‘a’ way to design data governance - as a system or as a commercial offering. Beyond this point, the process relies a lot on context, and the unique value a person or organization brings to a process.”

For me, this was really the key ‘aha’ (even though it should have been obvious). We need to start from the places where we have data and context and leaders -- not from big ideas. With this in mind, Mozilla Foundation Data Lead, Jackie Lu, and Data Futures Lab Lead, Champika Fernando have offered to take over this internal exploration by identifying practical ways Mozilla can improve the ways we collect and use data today.

They will begin this work later this year with a review of data governance practices and open questions within Mozilla Foundation, where our trustworthy AI work is housed. This will include a look at data-centric projects like Common Voice and YouTube Regrets Reporter as well as programs like online campaigning and MozFest that rely heavily on the Foundation’s CRM. This work explores questions like: what would it look like for Mozilla Foundation to more fully “walk the talk” when it comes to data stewardship? And, what kind of processes might we need to put in place, to have our own organization’s use of data be a learning opportunity for how we shift power back to people, and imagine new ways to act collectively through, and with, our data? They are starting to explore those questions and more.

In parallel, the Data Futures Lab and Mozilla’s Insights team will be working on a Legal and Policy Playbook for Builders outlining existing regulatory opportunities that can be leveraged for experimentation across the field in various jurisdictions. While the primary audience of this work is external, we will also look at whether there are ways to apply these practices to the Mozilla Foundation’s work internally.

Personally, I believe that new models of responsible data governance have huge potential to shift technology, society and our economy -- much like open source and the open web shifted things 20 years ago. I also think that the path to this shift will be driven by people who just start building things differently, inventing new models and shifting power as they go. I’m hoping that looking at new, more responsible ways to steward data everyday will set Mozilla up to again play a significant role in this kind of innovation and change.

This is part one of a four-part series on how we approach data at Mozilla. Read the others here: Part 2; Part 3; Part 4.