Amrita Nanda (Senior Manager, Aapti Institute) and Rattanmeek Kaur (Senior Research Analyst, Aapti Institute)

What are health data infrastructures?

The increased availability of high-quality health data and health-related data, has resulted in a growing impetus to develop data infrastructure for health care and research. Concerted efforts are being made to develop networks of interoperable and accessible health data repositories, at national and global levels. So far, the health sector has been slow in using data for research and innovation – parallelly, no unified understanding of health data infrastructure and its components is discernible in theory and literature. In practice however, it is being envisaged as an ecosystem of digital and non-digital systems, processes, and organisations which facilitate the capture, use and sharing of health-related data. These systems may range from consent forms, bio samples storage units to FAIR APIs, or wearable monitoring devices. Furthermore, health-related data repositories typically contain large-scale or population-level datasets on health information, which may range from bio samples or electronic health records to data from wearables or social media platforms. Health data infrastructures are considered to be critical not only for the furtherance of the scientific enterprise but also for “safe, better and more efficient health systems and healthier populations.

Challenges presented by health data infrastructures

Health systems are intricately related to existing social, economic, and political structures. Thinking about the functioning and governance of health data repositories within the existing structures and contexts is imperative, as data infrastructures are superstructures built on top of existing social, economic, and political structures. Consequently, existing fault lines across social hierarchies, systemic inequality, or exclusion are manifested in data infrastructure. This is demonstrated further by the existence of bias and exclusion in healthcare and research, disenfranchisement of vulnerable populations, and extractive business or research practices among other things.

Pertinently, given also the bio-colonial legacy of healthcare and medical research – several other legal, social, economic, and fundamentally ethical challenges arise. Data colonialism, for instance, manifests in health data infrastructures in myriad forms. One of the most significant manifestations is digital dependence and reliance on technologically advanced countries, especially through digital platforms which further reify hegemonies and colonial power dynamics. This is further exacerbated under circumstances where data is extracted without consent and used in ways that prioritise biased hierarchies or compound systemic disparities. Additionally, epistemic colonialism in data infrastructures and health research in terms of (i) dominance of the English language and (ii) increased focus on the knowledge and contexts of developed countries. This form of colonial prioritisation results in the active suppression of local language, culture, knowledge, and ideologies. Furthermore, the research and advancement based on these data structures are so far removed from the local populations’ context that there is no return of value for them or may even lead to further discrimination.

These systemic issues represent a failure to prioritise the needs and narratives of impacted communities when developing and using data infrastructures for health research. To mitigate this, it is key to adopt governance mechanisms that place community rights and interests at the heart of decision-making. Furthermore, governance mechanisms must embed community representation in a meaningful manner, working to democratise decision-making power and not solely to ensure representation. By doing so, governance channels can serve to collectivise control around data, enabling ‘value’ articulation and distribution to be directed by and based crucially on community interest and lived realities.

Participatory data governance

Participatory data governance in particular, presents a pathway to operationalise collective control on data. It entails integrating patients and/or impacted individuals in the decision-making process at each stage of the data and research lifecycle by creating an enabling environment for engagement. The participatory data governance approach is instrumental in achieving people-centric governance of data infrastructures to address the crucial role played by health data repositories in health outcomes and the experience of health. Additionally, a participatory approach to law-making for data infrastructures can also be adopted. This allows the legal frameworks to be alive and reflexive to the needs of impacted and marginalised communities and ensure downstream benefits.

It is important to think more critically about the role of multifaceted experiences of impacted communities in the functioning and governance of data infrastructure. Crucially, experience of healthcare and interaction with health-related technology is heavily influenced by the social and economic identities of different groups. Structurally marginalised groups and intersectional identities are impacted in granular, complex and unique ways – and it becomes crucial for the decision making to be reflective of these experiences. Data activities have real, direct, and long-term impacts on various communities. Often, this impact is disparate for marginalised and vulnerable communities. The meaningful and sustained representation of diverse and intersectional groups in decision making across various data activities helps in mitigating harms, while also carving out data activities are just and reflective of community preferences.

Challenges to adopting participatory data governance

While participatory data governance helps mitigate challenges and risks presented by the use of health data, there are several challenges faced when adopting and implementing participation in data governance. Furthermore, currently, there are few examples of using participatory data governance in health data infrastructure, and even more limited literature dealing with the challenges. However, this piece attempts to bring forth some of the challenges that we have observed so far while building out these practices:

  1. Need for a structural shift in the understanding of participation from informed consent collective control.
  2. Lack of cultural specificity and understanding of social norms in contemporary data governance models.
  3. Epistemic, linguistic, access, and resource barriers to adequate representation of diverse and intersectional identities.
  4. Historical experiences of violated trust around health data.

First and foremost, it is important to highlight a broader issue within the health data governance discourse which is the need for a structural shift within the health data governance and data ethics discourse. So far, participation in health data governance is limited to consent which has evolved into granular and dynamic consent to provide more autonomy to patients/ data contributors over their data. Nevertheless, individual consent still fails to safeguard patients and communities at large from several other systemic harms and ethical challenges highlighted above. Therefore, there is a need for the health data governance discourses to center around participatory approaches and reimagine governance structures accordingly. This means, that building participatory data governance must be a foundational requirement, and integration of participatory mechanisms must be envisaged at the planning and design stage itself.

While building out participatory mechanisms, one of the most significant challenges is the lack of understanding of social norms and cultural nuances. For the databank to truly embed within the community it aims to work with, it is essential to understand the social fabric of such a community and build participatory mechanisms accordingly. More often than not, these norms are not easily understood and are difficult to adopt pragmatically to build participatory methods and data governance standards. This can be observed in the case of CARE principles developed for the governance of Indigenous health data, where the principles have been developed based on the social and cultural norms of Indigenous populations. However, there is still a long way to go to develop practical measures to embed these principles into other data governance standards and how to actually engage Indigenous populations in decision-making processes.

Another significant challenge is to ensure that all social identities are represented in the participatory mechanisms and that engagement with the community does not result in further marginalisation. This challenge is even more consequential where large and heterogeneous populations are involved. Furthermore, there are several barriers to enabling diverse identities to participate in the decision-making processes. The barriers may be resource, epistemic, cultural, or accessibility. For example, some groups may not be able to involve meaningfully due to linguistic barriers or the use of jargonistic or technical language by the researchers. Another example may be the lack of adequate compensation for people to participate in these mechanisms or geographical inaccessibility. Such barriers cause significant day-to-day challenges for researchers to actively involve the communities in data governance. This takes us to another challenge faced by researchers- resource limitations and dependencies, where they may not have the resources required to overcome these barriers such as providing adequate compensation for participation or building multi-lingual materials for participation.

Additionally, the critical role played by civic trust in building participatory mechanisms is often overlooked. While the importance of participation for building trustworthy data infrastructure is well recognised, it is crucial to understand that patient trust and participation go hand in hand. Therefore, it is essential to involve communities to understand their needs and anticipate risks from the beginning. This becomes even more critical when engaging with historically marginalised communities due to their lack of trust in healthcare systems. The All of Us study illustrates the importance of trust building and integration of marginalised communities in research. They found that it is important to not only engage with the leaders of the communities but also to acknowledge past abuses and inequality through transparent conversations to build trust. Furthermore, they intensively worked with the trusted community members to understand the needs and concerns of the community and how to best communicate with them.

Towards good practice in health data governance

While much needs to be done in theory and practice to tackle these challenges, there are numerous promising instantiations of participatory methods worth exploring. One of the leading examples of health databanks using participatory governance mechanisms is the Native BioData Consortium. It is led by indigenous scientists and tribal community members. The databank demonstrates the importance of building trust and accountability with the impacted community by thoroughly integrating them within the decision-making bodies. Furthermore, the tribal members and leaders decide on several key questions such as research agenda and community benefit, culturally consistent research practices, structure, and function of the databank.

With respect to large-scale health infrastructure, a significant example is the BBMRI-ERIC, a consortium of European biobanks, which includes a stakeholder forum that includes patient representation organisations across Europe to facilitate bidirectional discourse between biobanks and their stakeholders on key issues such as informed consent in health research, health research priorities or secondary use of healthcare data. Furthermore, large-scale health data repositories such as All of Us have embedded participatory mechanisms across different stages and bodies of governance- tribal advisories, participant partners, and diversity in Institutional Review Board.

Lastly, there is a long way to go to find pragmatic pathways and build scalable solutions for embedding sustainable participatory practices in the governance of health data. There is a need to build open, transparent, and accountable data repositories that keep patients and communities at the heart of their decision-making – and it must be understood here that no ‘one size fits all’ – solutions must be community, context, and region oriented. Thus, it is important to focus on 1) building representative participatory mechanisms, 2) transparent functioning of the databank and honest communication with not only stakeholders, but rightsholders, and finally 3) processes for layering accountability and building greater obligation upon databanks and infrastructural actors to justify actions.

Part of the SLSA Blog Series, Exploring the Intersections of Technology, Health, and Law, guest edited by Prof. Sharifah Sekalala and Yureshya Perera. Written as part of the project There is No App for This! Regulating the Migration of Health Data in Africa, funded by the Wellcome Trust (grant number: 224856/Z/21/Z).