A Research Year in Review
As part of our first phase of developing a set of commitments to data subjects, I spent time reading, researching, and connecting with people to understand the current state of data governance. My goal was to identify existing and proposed practices that could be adopted to enable trust in data sharing and be practical to implement. With Jim Fruchterman and Steve Francis, we decided to start from the perspective of a “pragmatic idealist”, in looking for tools and practices that people would actually use (and actually be able to use).
I was fortunate to be working in community with Tech Matters’ broader network. OpenTEAM (which supported this research) convened throughout the year to talk about design and trust in Ag Tech. When you read the six reports, you’ll see that I visited themes from data in agriculture, such as transparency, fairness, portability, security, compliance, ownership/stakeholder access, public production & use of Ag Data, control, sovereignty, privacy, erasure, informed consent, benefit, and correction.
What I found, across the board – both in Ag Tech and beyond – is a challenging balance between orienting towards principles and orienting towards implementation. As a result, the Better Deal for Data proposes a set of commitments that help define a minimum set of requirements that would help take a basic set of principles and put them into practice.
I looked at six key concepts that are both principles- and implementation-oriented, and tried to include as many summaries, resources, examples, and references as possible in both a short- and long-form papers. I looked at research on emerging governance models, initial research on the resilience of these emerging models (see the Data Economy Lab), and I found one key gap that has confirmed the BD4D’s problem statement (see our white paper) and approach: there is no lightweight, practical solution that builds trust in data sharing for data that cannot or should not be open.
What did I learn, and where do I see progress towards a Better Deal? The following is a brief roadmap to the reports and what I think shows promise and progress towards better uses of data.
1. Better data stewardship can facilitate participation and empowerment around data for public good
A starting point for BD4D research was to look at emerging methods for community-centric governance, or participatory models of engagement around data. I discovered that the term data stewardship is used in various ways to describe the processes and people involved in making decisions about data, and found many different definitions of the term. Some themes that came up in my reading:
- Going beyond matters of compliance
- Participatory and rights-preserving frameworks
- Control (over data)
- Long-term care and sustained ability to use data
2. There are tools for sharing and reusing data in the form of open licenses and data sharing agreements
As a next step, our team recognized that sharing data is (mostly) good; the more shared knowledge we have in the world, the more good can be done through efficient re-use. At the same time, we recognized the importance of protecting people’s privacy and ensuring that there are safeguards for how, when, why, and to whom data is being shared. Also: if agreements are too complicated it becomes impractical or impossible for people to understand if their rights are protected, and they are less likely to share at all.
So, I compared open data – which can be licensed with freely available licenses, with non-open data – which requires agreement between parties before sharing can occur. I looked at models of inspiration such as the Creative Commons licenses for copyright, the Open Database Licenses, and the requirement that Digital Public Goods use an approved open license.
For non-open data, I looked at some sample data sharing agreements, as well as principles such as the Five Safes, for when it is or is not appropriate to share. Here, three themes for sharing emerged across agreements:
- Keeping people safe
- Building trust
- Collaborating well
3. Data governance involves repairing trust and protecting rights around collections of data
There is a growing number of emerging mechanisms for data governance that respond to the loss of trust in the current data sharing status quo, which involves one-sided online contracts/Terms of Service that fail to enable informed consent for data use (see e.g. Zuboff (2019)). Looking at the zeitgeist of data governance, Marcucci et al.’s (2023) analysis shows that three themes emerge across a large set of governance documents. Data governance is meant to enable the following:
- Trust
- Protecting citizen and user rights
- Use of data for public interest
In thinking about sociotechnical systems, it is also important to recognize that data’s value is not about an individual’s data points, but rather in a collection of data – which puts people in relationship with others in a dataset. For example, advertising inferences take an individual datapoint and situate that person within a group of people with certain characteristics. These population-level relations need to be considered in systems of data governance – where the focus on individual rights may fail to recognize this (Viljoen, 2021).
4. Easier ways to implement emerging and existing data governance models are needed
I explored several existing typologies of data governance models that define and compare data trusts, data collaboratives, and data collectives (among several others). Lots of people are exploring how to classify data governance, and what models are emerging (I see similar reports, references, and typologies as I continue reading, even today). The goals and concerns that emerging models address varied, both in whether they were more theoretical or practical in nature, but also what level of governance they address:
- When comparing data collaboratives to data trusts, the focus is on establishing who the decision-makers/negotiators are for data sharing and use, and where the accountability lies.
- When focusing on personal data stores and marketplaces, there is an assumption that most of the decision-making aspects of governance are retained by the individual user, and then the focus is on technical solutions for creating ways to interact and share that data (Solid, blockchain, etc.) as well as on security of the information.
- When you look at data sovereignty, it is about who decides who decides – empowering communities to create governance frameworks that work for their data and context.
It seems to me that many of these models solve different parts of the problem or address different levels of governance, and they could be effectively combined – i.e. an organization could create a data trust that enables indigenous data sovereignty, with a data repository that lets individual users set certain permissions on the use of their data even while the trustee negotiates the general use of the repository – all addressing different aspects of the puzzle of data governance.
Even the complexity of the typologies themselves makes the question of implementation all the more challenging. What does it mean to have lightweight data governance? Are there models that are low cost? Low barrier to implementation? Data governance is needed, it is understood at a principled level, and there are checklists. But: it’s still hard to implement.
5. We need to do the work in Context
I had the opportunity to explore context through two lenses: Data Feminism and Privacy. In putting these side-by-side, there were some similarities that were reinforced by both approaches. First, data is context-dependent: any analysis, evaluation or use is incomplete without an understanding of context. Second, it is critical to consider data use as part of context. Even given this, it does seem possible to develop an understanding of which parameters are relevant to contextual evaluation, a la Nissenbaum (2004, 2019), in order to create a framework for evaluating a set of principles against situations involving data. A careful goal to develop a practical set of tools that allows us to evaluate data in context does not seem completely out of reach.
6. People want to understand how their data will be collected and used: Consent
But, what does it mean to give consent, for example, to participating in or utilizing a service? How is consent managed, both by institutions and via technologies? I had an opportunity to look at a variety of ethical and legal perspectives on consent, which (for the most part) involves asking an individual to agree to the use of their personal data. Definitions of informed consent involve autonomy or voluntariness, and knowledge or understanding. Arguably, most of the time there is not informed consent when a service or technology is utilized. How often do you read (and understand) a terms of service agreement before clicking through to a webpage or downloading an app?
In attempts to enable more informed consent, it is useful to consider approaches which rely on either one or both of: (i) changing which technologies are used to collect and manage data/consent, or (ii) changing the humans’ decision-making power (governance).
In some ways, we could consider moving “beyond consent” into thinking about a social license to operate (Verhulst et al., 2023) – allowing for judgment calls and a level of trust in the decision-making that works at a community level.
7. Now What?
I hope you will join us in developing a Better Deal for Data: our next steps involve collecting use cases and building connections amongst the Coalition of the Willing (individuals and organizations endorsing the need for a Better Deal for Data). One of the primary goals of a Better Deal for Data is to identify a minimum set of commitments that would build immediate trust between data providers and data users or, perhaps, within a data ecosystem. Outcomes of this would be the ability of two parties to more rapidly and efficiently communicate about expectations to enable data sharing and use, while protecting data subjects’ rights and providing compensation/spreading the economic value of data to all involved individuals, as possible. Most importantly, it would unlock the value of data reuse in the social sector, so that we can better understand and improve our human systems.
References
Marcucci, S., Alarcón, N. G., Verhulst, S. G., & Wüllhorst, E. (2023). Informing the Global Data Future: Benchmarking Data Governance Frameworks. Data & Policy, 5, e30.
https://doi.org/10.1017/dap.2023.24
Nissenbaum, H. (2004). Privacy as Contextual Integrity. Washington Law Review, 79(1), 119.
Nissenbaum, H. (2019). Contextual Integrity Up and Down the Data Food Chain. Theoretical Inquiries in Law, 20(1), 221–256.
https://doi.org/10.1515/til-2019-0008
Verhulst, S. G., Sandor, L., & Stamm, J. (2023). The Urgent Need to Reimagine Data Consent. Stanford Social Innovation Review.
https://ssir.org/articles/entry/the_urgent_need_to_reimagine_data_consent
Viljoen, S. (2021). A Relational Theory of Data Governance. The Yale Law Journal.
Zuboff, S. (2019). The Age of Surveillance Capitalism. PublicAffairs.
https://www.hachettebookgroup.com/titles/shoshana-zuboff/the-age-of-surveillance-capitalism/9781610395694/?lens=publicaffairs
