A few weeks back, I facilitated a meet-up of one of my favourite non-profits. DataKindUK applies the power of data science to the problems of frontline charities in the UK (full disclosure: my partner Emma leads DataKindUK). They work with charities to shape pressing data science questions about their work, while developing a community of volunteer data scientists to help answer them. DataKindUK have to deal with many challenging responsible data considerations. These considerations range from how to responsibly share charity data between volunteers, to how volunteers and charity partners can protect the privacy of individual beneficiaries when new insights may reveal information about beneficiaries that they didn’t share themselves.
To address those considerations and inform future decision-making, DataKind UK decided to develop a set of principles to guide their work in a process led by their network of volunteers (in DataKindUK fashion).
What is the goal of the principles?
DataKindUK wants its volunteers to build a shared understanding of how values influence practice. As the largest network of data scientists in the country, it’s also in a uniquely strong position to influence the corporate sector. As such, the principles are designed to be portable, so that volunteers can take them back to colleagues and shape their companies’ approach to using data for profit.
How we did it
We started with case studies of past DKUK projects, so that volunteers could share experiences they have had working on particular challenges. We weren’t trying to make a set of principles for all of data science, so we used case studies to narrow the focus of the principles based on actual examples.
Then we split up into groups and parsed existing, relevant sets of principles (e.g. Ten Simple Rules for Responsible Big Data Research and UK National Statistician’s Data Ethics Advisory Committee Principles). The idea was to see how other organisations and researchers were developing principles in order to inform DKUK’s approach and to adapt and build off of good ideas in the public domain.
Finally, we deduped into a shared document – reducing repetition by consolidating points that overlapped. The next steps for the process were for the DKUK volunteer Programmes Committee to develop a draft that could be shared with the community for feedback, which you can read here [Updated 11 Jan 2018].
Decisions hold power
An obvious principle should be not to discriminate against historically oppressed groups. But there was a fascinating discussion between volunteers about how to manage that principle with the fact that data science is all about discriminating. It is using data and numbers to segment groups and make choices. There were no conclusions to this point but a general agreement that the important consideration when using data science to segment and make choices was the relative power a group had in relation to other groups, and the way that demographics were used in prediction.
Consider who is missing
Data can be used to include and to exclude. This ties to the previous point, but it is an important insight in its own right. Data science can be used to identify and segment for the purpose of intentionally including groups that are otherwise left out, but it can also be used to exclude. Being excluded from a data set can be a powerful assault on a community, and that exclusion can happen intentionally or unintentionally (as in when certain demographics are less likely to contribute data because of existing inequalities).
Break the rules, with humility
It’s important to know when to break the rules. This was a suggested principle as the set of principles weren’t designed to be prescriptive and volunteers need to keep their critical wits about them. That said, a volunteer pointed out that with technical expertise can come hubris, and encouraging data scientists to be comfortable breaking the rules was both unnecessary and potentially erred on the wrong side of the challenge. In other words, there’s no need to explicitly encourage rule breaking.
When inaction is not harmless
There was an interesting debate about the wording of a Do No Harm principle, and a concern that a poorly articulated Do No Harm principle might lead to inaction in the face of uncertain risk, even when inaction might cause more harm than the potential harm of action.
I’m excited to see what the Programmes Committee does with the set of principles, and will be sure to share once a final version goes live! Thanks to DataKindUK and volunteers for including The Engine Room and responsible data thinking into their approaches.
If you face responsible data challenges in your work, have ideas to contribute, or want to listen in on these kinds of discussions, be sure to join the Responsible Data discussion list!