Getting started with data governance

This is the first in a series of posts that will cover the various design elements that underpin robust, participatory data governance models. In this first instalment we will provide an introduction into the whats and whatsnots of data governance. Future installments will delve into the nuts and bolts of ‘how to’ data governance.

Wouldn't it be great if we could all derive value from our data? And, what if we would all benefit from the systems we train on our data? What if it was not just up to those with the technical skills to decide how we use data? What if we could all have a say?

After more than a decade of news reports and academic investigations into data related harms and automated decisions threatening our human rights, we are getting increasingly clear on what we do not want. No more platform dictatorships in which a few dictate the fates of many. No more machines making decisions about us based on faulty information or bad (if not racist) assumptions. No more data collection for the sole purpose of selling us things and manipulating our behaviors.

But what we do want remains far less clear. Faced with the vast power imbalances between those collecting data and those reflected in the data, the response has been a proliferation of proposals for alternative data stewardship models, ranging from data trusts and fiduciaries to data commons and cooperatives. What they have in common is a desire to hold those taking decisions about data accountable, to realign the interests of those stewarding our data with those the data is about, facilitate greater participation in decisions about data and to share the benefits of that data more widely.

But while these concepts are sorely needed, getting from idea to implementation is far from straightforward. As it turns out, getting data governance right is hard and highly context dependent. Governing financial data in the context of banking is a different thing from governing data pertaining to the milk production of cows, which is altogether different from the governance of health related data by our medical practitioners. And in each of these contexts, the specific use of the data and problem it attempts to solve further impacts how it needs to be governed.

In the words of data governance expert Sean McDonald “Data governance systems that ignore the politics and context of their data’s use are weaker and less effective and, ultimately, suffer.” Consequently, abstract data governance models only take us so far, the real challenge is in understanding the context in which they are to be implemented, as well as the needs and interests of all the various humans involved.

Let’s be clear: subliminal manipulation of consumers pre-dates the internet, just as the struggle between workers and their employers pre-dates the gig economy.

We can take this a step further and say that data governance is more about governance than about data. In many ways the surveillance enabled by the digitization of platforms and services has laid bare and amplified the inequities and power imbalances that pre-dated mass data collection. Digitization has allowed economic exploitation to play out at a larger scale . It allowed for a further decoupling of economic activities from local conditions, while externalizing the harms to those most vulnerable and least visible. But let’s be clear: subliminal manipulation of consumers pre-dates the internet, just as the struggle between workers and their employers pre-dates the gig economy.

Take Uber drivers as an example. Many battles are currently being fought by organisations like Worker Info Exchange to get Uber drivers access to, and control over data the platform has collected about them. This is sorely needed to understand how automated decisions are made about drivers and to subsequently investigate the fairness and legality of those decisions. And in fact, WIE has so far succeeded in having automated firing decisions reversed in court. But what would better data governance look like in this case? Surely it would involve workers having a say in what data is collected about them. But it should also involve their input into how that data is subsequently used to decide about work performance or how they go about executing their job. This clearly moves us beyond the world of data into the realm of the governance of work itself. And the inherent issue there has less to do with data and far more to do with the power imbalance between Uber and its drivers. Unless we tackle these power dynamics — for instance by starting and funding driver coops — we end up forever on the defensive.

In other words, the re-negotiations of our economic relationships and our struggle for (some semblance of) self-determination goes beyond data. Governance questions related to the collection of data or its use might help us open the door to those other conversations, they could even be an opportunity to challenge underlying inequities, but they are rarely themselves the conversation.

That’s bad news for anyone looking for a quick solution to a complex problem. And it gets worse. One of the core principles of good governance design is that those who are subjected to the rules have a say in formulating them: ‘nothing about us without us’. This principle stands opposed to most of the data governance we see today, in which a few technologists (or technocrats) make decisions that impact many without consulting or otherwise involving them. And of course, allowing for greater participation in our governance processes introduces friction — that’s kind of the point. Suddenly we find ourselves having to negotiate the often opposing needs and interests of many, while simultaneously ensuring their voices are heard. Anyone who has ever been to a town hall or trade union meeting can likely attest to the inherent messiness of these governance processes. Good governance also means we need to build in real accountability mechanisms to ensure that the agreements we reach are adhered to, and we are able to continuously trust the systems we help create. Not an easy task. Where corporate tech demands speed and efficiency, participatory governance processes require us to slow down and get our hands dirty.

You cannot (data) governance your way out of a bad plan

In the next few blog posts, we will look into the different components and design questions relevant to building robust and democratic data governance models. But before we do, there’s one more issue I would like to get out of the way: All the data governance in the world will not turn a bad idea into a good one.

Before you get into the nitty-gritty of designing the rules of the data game, you need to ask yourself the hardest question of all: should I even be doing this? You cannot governance your way out of a bad plan, and even the best intentioned plans carry risks. This is especially true when your plan involves the collection and use of data. As we have witnessed time and time again, data collected for the best of reasons, might still end up in the wrong hands or end up being used in ways that harm vulnerable populations. Therefore, before you get started, you need to answer these two crucial questions:

Should anyone do this?

Some ideas are net negative. Facial recognition is a great example of this. It’s a technology that has few positive impacts and lots of negative ones in the form of increased surveillance. It is for this reason that various governments now ban its use in public spaces.

What are the potential risks and harms of doing this?
You can answer this question with the help of a risks, harms, and benefits analysis that allows you to understand what the benefits of your plan are and who they accrue to, as well as provide a clear sense of the risks and harms involved and who will bear these. UN Global Pulse created a two step tool for exactly this purpose. In addition to your team conducting this assessment, we highly recommend instituting a so-called red team. This is a group of well-informed individuals and experts who are not involved in the development of your project and whose only role is to find flaws in your designs.

Finally, the goal of this exercise is not merely to determine whether the potential benefits outweigh the potential risks and harms. It is also a way for you to determine which risks and harms are simply unacceptable (and motivate a change of plans), the nature of the harms and in what ways you might mitigate potential harms.

Should we do this?

Even if this is a good idea, are you best-placed to execute this vision? Do you have the resources to do so safely? Do you have enough buy-in from your community to do this well? For instance, if your plan depends on a high level of data encryption, but you do not have the resources to attract a skilled security engineer, you may want to rethink your plan!

Want to know more?

In the next installment, we will address the core components of a robust data governance system. We’ll look at decision-making, transparency around common agreements and how to build accountability mechanisms.

In the meantime, if you are struggling with these issues and want help, we’re here. We offer a workshop to help you get started with data governance! Want to know more, let us know you’re interested by filling out this short form!

Getting started with data governance

You cannot (data) governance your way out of a bad plan

Related content