Getting our Data Ducks in a Row

7 minute read

At SIMLab, we’re intent on examining the challenges of setting up communications technology projects in international aid and development from a practitioner’s perspective. The first few projects have arisen from our work with clients and Frontline users, many of whom are asking similar questions. For example, we’ll be looking at how to isolate and effectively monitor the impact of the use of technology on an intervention and on an organization’s functioning—and we’ll have more to share on that in the coming weeks.

Another of those projects we’re puzzling our way through at this week’s Ethics of Data conference at Stanford in Palo Alto, California: clients and partners are asking how they can safeguard themselves, their staff and their end clients or ‘beneficiaries’, as their food distribution or HIV support projects increasingly involve gathering and storing potentially sensitive, sometimes tightly regulated information, in environments and systems where their legal and ethical positions as data collectors are not always clear or stable. This year’s Open Knowledge Festival saw packed sessions on mitigating data risks for project participants and implementers, and we’ll also be at the Responsible Data event in Budapest next month. As often happens, there’s a flurry of interest in this issue right now.

The Ethics of Data conference has shared a wealth of material, from the excellent Provocation paper and other suggested reading to Lucy Bernholz’ very helpful blog posts ahead of the event. Rather than repeat the Provocation’s wonderful analysis of the issues, which I thoroughly recommend, I’ll just contribute a couple of examples from our practice and refer you to the hashtag, for now.

One of our partners recently shared one very extant problem–one which predates digital data, but which evolving understandings of the ethics of data acquisition, storage and manipulation, together with the demands of an information-rich global society, has brought to the fore. Child sponsorship is a controversial but still relatively common fundraising mechanism, allowing long-term relationships between communities and NGOs doing slow development work over many years. In this model, sponsors sponsor one child, receiving updates, letters and photographs. How involved the sponsored child or family is, how broadly the sponsorship funds are applied across communities, and how far the sponsor is informed about how the funds are spent vary between organizations—you can read thoughtful posts from aid specialists and comments by sponsors reflecting on the model here and here. I don’t want to debate the model here, but instead highlight the issues it raises for the ethics of data.

By its very nature, this model requires collecting data about certain children and families and repackaging it or sending it directly to people elsewhere in the world. As some organizations work in small villages and settlements, supporting whole communities but with perhaps few sponsored children in them, spread across different ages and genders, it is possible for that data to allow those families to be identified if the data are shared beyond the respective sponsors. In the case of our partner, parents and guardians review and give (or refuse) consent to the types of information about their child that will be shared with their respective individual sponsor.

In this example and many others, how is truly ‘informed’ consent to be obtained, when the person giving consent has a completely different level of digital connectivity than the person obtaining it? How can you accurately describe what you’ll be doing with that person’s data, in a way that’s completely clear to someone who isn’t online, and doesn’t understand ‘cloud storage’ or ‘data visualization’? How do you explain effectively that sharing or not sharing more detailed information does not affect a child’s sponsorship benefit, when child sponsorship is as individualized as development funding can be? When that organization examines how it uses those datasets more broadly, for learning and research beyond program improvement, the questions become more tangled: anonymizing data is variably successful; those one shares data with might make decisions acting on inadvertently biased data; and, as our partner says, ‘how do we empower people to make these decisions for themselves, instead of assuming the position of deciding for them or patronizing them in our efforts to protect (them)?’. Additionally, there are potential legal issues which may not have been confronted before, some of which are discussed below.

Some of the issues around informed consent are also of concern to researchers seeking to conduct research in a respectful way. MIT and Tufts are leading an initiative to explore the notion of ‘lean research’, beginning with a convening last month in Boston, which you can read about here. This is a challenge that goes beyond aid and development practice—a notion mentioned in the Provocation paper. Don’t we need good guidance on ethical dealings that applies across sectors, which can then be refracted for specific needs? For example, SIMLab might focus on applying and considering such principles for cases where the data belongs to someone who is interacting with a digital system through SMS, through another person, or through another channel such as radio, but is not themself connected.

Regulation and liability

FrontlineSMS, in addition to being free and open source, has often attracted users simply because the data collected is hosted locally. As this can also be a liability, its web-hosted sister service, FrontlineCloud, has proven helpful for organizations wanting safe back-ups and global access to their data. But FrontlineCloud’s data is currently stored on Amazon Web Services, which are located in the US.

This raises, for example, the following hypothetical jurisdictional conundrum: with which country’s law should a UK-registered charity undertaking a health project in Kenya, and managing patient data via FrontlineCloud, comply? Multiple jurisdictions are involved here: Kenya, which regulates against removing health data from the country; the UK, which has relatively fierce data protection legislation; and the US, whose government might be monitoring this type of data. Frontline is committed to ensuring that users retain control over the data stored for them, and is even investigating how practical it might be to offer a variety of geographic hosting options. While there are a number of legal theories of jurisdiction that apply, none of them adequately reflect how much easier it is now to have potential compliance issues in multiple jurisdictions, because of the multiple ways technologies can connect each player in this hypothetical.

But there’s no substitute for the user themselves understanding and questioning their own liability, and further, what their own values and ethos would lead them to prioritize in questions of data ownership and use. Very few users and clients raise these types of questions with us on their own, and getting the kind of legal advice that would be required to understand their position would be prohibitive for many organizations. Indeed, how many of us working in NGOs could look down our open-plan offices or peruse our shared drive and say, hand on heart, that we can even identify the location of every copy of every database containing beneficiary data? Our Capture the Ocean project hopes to kickstart debate and collaborative analysis to help examine some of these questions, but realistically, practitioners have a ways to go to even understand the scale of the problem in their own organizations.

Oftentimes in discussing both of these debates, the word ‘lean’ comes up: lean data, lean research. As principles supporting these two concepts start to emerge, I keep being struck that ‘lean’ here is just a synonym for ‘good’. Let’s make both research and data gathering right-sized, respectful and rigorous, as the MIT/Tufts Lean Research principles suggest, and perhaps we’ll be in a better place.