Part III: Entrenchment
Will data stewardship concepts lead to more data-driven exploitation?
Power asymmetries persist and grow worse. Entrenched interests co-opt, outcompete, or crush alternative models.
There is a potential mismatch between the goals Mozilla articulated in commissioning this brief and its subsequent analysis of data stewardship. While Mozilla seeks to shift economic and social power, many of the data stewardship initiatives the landscape analysis cites have more modest goals: to make data available, to improve privacy, to afford individuals more situational agency over how data is used.
This serves to underscore that data stewardship is an insufficient frame to truly shift power. Worse, if data stewardship is deployed as an antidote to platform dominance and surveillance economies, it threatens to validate the exploitation of human lives and relationships — the true source of data — for financial gain or “value.” The idea that we can conduct this exploitation “responsibly” is fiction, and avoids the more important question of whether we should do it at all. There is nothing about search, social networks, or our online lives that requires our consent to data-driven exploitation. It is merely the price that has been set for us.
At a minimum, to shift power around data would require uprooting the targeted advertising model that much of big tech relies on, either directly or indirectly. Moreover, this would indirectly serve as an existential threat to Mozilla’s current funding model, which relies heavily on commercial search engines.
This part reviews failure models that relate to entrenchment: how existing power asymmetries can co-op, resist, and crush data stewardship initiatives.
Platforms and governments may choose not to engage with data stewardship initiatives, or attempt to undermine their efforts to force change. Data availability alone may not be enough to force accountability where none exists.
Story: Rideshare company plays hardball
Rideshare drivers in the UK use subject access requests (SARs) to build an alternative database of driver wage data, and hope to use that to negotiate better terms with a rideshare company on behalf of all drivers.
Borrowing from litigation and insurance company playbooks, the rideshare company begins taking measures to increase the effective cost of the SARs. Requests are routinely delayed or lost. Data is delivered via PDF or mail courier. Data formats change at random. Location data and ride data are hashed and deidentified, limiting the data eligible for disclosure.
In the meantime, the world has changed. Ridership since the pandemic has cratered. Some of that drop turns out to be permanent, as people work from home, go out less, and have less money to spend, while many of the businesses people would take a rideshare to — restaurants, bars, entertainment venues, clubs, retail — have closed. Many drivers start looking for new work, shrinking the pool of potential participants. Moreover, the data from pre-pandemic are less useful, because they reflect an entirely different set of market conditions and social habits. With its financial losses mounting, the rideshare decides to leave the UK rather than negotiate with drivers.
Story: City government ignores citizens and their data
A group of neighborhood commissions begin to adopt a software tool for citizens to report potholes, broken streetlights, and other non-emergency maintenance issues. They hope to pressure the city into taking a more proactive approach to maintenance in their neighborhoods. The city government ignores the software. Six months and one unsuccessful lawsuit later, more than 90% of the citizen-reported issues are still unfixed. Use of the tool begins to decline.
Many data stewardship models are new, untested, and relatively unknown. This leaves them vulnerable to co-opting by technology companies or other hostile actors. Even within a “successful” data stewardship initiative, big tech companies may seize a disproportionate share of the economic benefit.
Story: An urban data trust free-for-all
A technology company sponsors a data trust for a smart city. The default rules of the data trust make city sensor data publicly available, including from the technology company's competitors. The trust does not say whether software developed from those datasets must remain open. Because of its size, the technology company is better positioned to develop proprietary software based on the open data, and sells it back to the city.
Story: Profiting from school data analytics
A company manages an analytics platform and data collaborative for a dozen school districts across the American Southeast. The collaborative is designed to allow school districts and their partners to better understand long-term student outcomes. While most access to student data is governed by the collaborative's IRB, a loophole in the agreement allows the managing company to analyze the student data independently without going through the IRB, develop algorithms based on their analysis, and retain all of the intellectual property related to those algorithms. The company later sells itself for $350 million. The school districts get nothing.
Story: Infiltrating a driver co-op
A rideshare driver data co-operative allows anyone who has driven a minimal amount of rides to join. The co-op allows members to vote on major decisions. A bloc of rideshare company employees pose as drivers to join the co-op, and begin pushing company-friendly policies.
Data stewardship models that entrust control to a fiduciary or other trusted party are vulnerable to abuse or poor judgment. If conducted on a platform, this abuse may be difficult to identify. Legal uncertainty about how fiduciary duties are applied to data may make it difficult to distinguish between negligence and (allowable) bad judgment.
This highlights a broader challenge: trust doesn’t scale. What works in a smaller community or network may not scale up to a national or global level.
Story: Intermediary fraud in benefits system
A national government develops a system to help citizens manage where their personal data is used. The system is integrated with a national identification and benefits system, and requires a mobile phone to use. People who have low-literacy or poor access to mobile phones must rely on intermediaries to access services on the system. An intermediary could be a (usually male) head of household, a neighbor, or even a complete stranger. Intermediary fraud becomes common, and difficult to uproot: because most information about the system is distributed via phone, victims are either unaware or unable to report issues or seek accountability.
Story: Contact-tracing workarounds
A privacy-preserving contact tracing app begins to see wide use in a country. Fearful of infections, businesses begin to require citizens to show app-based confirmation of "no exposure" before they are allowed to shop. A secondary market for phones and "clean" apps becomes rampant, undermining trust in the application.
Data stewardship initiatives are unlikely to reduce the large-scale exploitation of data. In part, this is because data is a nonrival good: one person’s use of data does not prevent anyone else’s use of it. This means that unlike natural resources or other physical goods, data stewardships may not be able to maintain a monopoly on data they protect. The practice of building “data doubles” — a model of a consumer based on indirect inferences about them — may continue unabated.
By attempting to build “guardrails” on the surveillance economy, data stewardship initiatives may actively make things worse. Allowing individuals to buy and sell access to their data warps privacy from a right into a commodity, and may enable new forms of exploitation.
We have succeeded in building an economy where it is normal to buy and sell information about how a person feels, about who they love and hate, about where they go, and what they hope and fear. Building a just world will require far more than haggling over the price.
Story: Data doubles, subprime data
A data co-op for individual consumer data is disrupted because it cannot maintain a monopoly on user data. Whether through phone applications, internet browsing history, subscription services, or financial transactions, data brokers have more than enough sources of information to target and market to users, and most decline to purchase access to individual data.
Ultimately, the only users who make money from the co-op are those who sell extremely sensitive data that is difficult to get otherwise, such as emotional and health data. These “subprime” data purchases are targeted towards people with low income and poor credit. In some cases, personal data is used as collateral against a payday loan. Ultimately, data is re-sold to tenants, employers, businesses, and governments.
The arguments in the introduction to this section are drawn from, among others: Nick Couldry and Ulises A. Mejias's "The Costs of Connection;" Lina Khan and David Pozen's paper on information fiduciaries and the related symposium on the Law and Political Economy Project (in particular Julie Cohen's contribution); and Ruha Benjamin's work on "Informed Refusal."
The urban data trust story in 3B is inspired by one of Sidewalk Labs' proposed "trusts" for Waterfront Toronto.
The school data analytics story in 3B is very loosely inspired by the sale of Flatiron Health.
The intermediary fraud story in 3C is inspired by Anita Gurumurthy, Deepti Bharthur, and Nandini Chami's research on exclusionary practices related to the Jandhan Aadhaar Mobile (JAM) payment platform in Rajasthan, India.
The data doubles argument and story in 3D is inspired by Kevin D. Haggerty and Richard V. Ericson's work, "The surveillant assemblage."
Often as not, data are stand-ins for people: for their desires, their anxieties, their livelihoods, their secrets. This has made it easy to frame the exploitation of people as a benign product feature: “we will identify depressed people and target them with advertising” becomes “we will analyze social media data for emotional affect.”
Data stewardship initiatives risk falling into a similar trap, as they attempt to productize trust, good governance, and stewardship. Initiatives may use the language of data stewardship for branding purposes (see, e.g., "data trust"), make promises that are easily broken or dodged ("we'll never sell your data"), or overstate their replicability and scalability.
In truth, nothing scales like exploitation. It's possible — even likely — that none of the data stewardship initiatives described in Mozilla’s research will scale, and that few success stories will ever be replicable. This isn't a sign that the initiatives are flukes, or that their underlying models are defective. Rather, human relationships — trust between people — can't be copied from one initiative to the next. And a good data stewardship model can only take an initiative so far. Regardless of their form, data initiatives must make difficult choices about whether to collect and keep data; about who can access data; and about how to use and analyze data. They must do the hard work to continuously earn and re-earn people's trust, as technology advances and grows more difficult to understand. And even then, it may not be enough to stave off failure, or prevent harm.
Asking how to shift power through data governance, then, is not enough. The challenge before us is much bigger: to remake our relationships with one another, with the digital communities we join, and the leaders we trust. To successfully meet the moment requires far more than innovative models for managing data. Truly shifting power requires — demands — a policy and popular movement that rebuilds public understanding of data, rehabilitates digital communities, and redefines our digital economy from the ground up.