Questions to guide the work of data investigators

Earlier this summer, I spent six days in Montenegro with 60 ‘data investigators’ – or people working on data investigation – from around the Balkans and around the world. At this Data Investigation Camp, hosted by Tactical Tech and Share Lab, our shared goal was to focus on ‘how we can use data to understand and expose misconduct and abuses of power within a human rights, investigative journalism and anti-corruption framework’. There were three main tracks: methods, questions and strategy, and storytelling. I spent most of my time thinking about questions and strategy.

The six days were rich with projects and discussion. As often happens, I left the event with more questions than with which I arrived, and I’m excited to see how they shape my work and that of the field at large. Many of my questions gathered around ways in which the work of data investigators can continue to be improved.

On context

Data investigation is at its most powerful when it tackles social, political and technical issues in tandem. But sometimes, there seems to be a risk of only focusing on the technical – on the data – and not taking into account the system within which that data is situated, and how it could lead to change.

This misses out many critical steps, including the analysis of the broader context, politics, and systems in which the problem is situated. I don’t mean to say that data-focused projects should be discouraged, but that adding additional context and thoughtfulness could be valuable. What are the best ways that we can do this as a sector, so as to encourage both quick creativity and a slower consideration of context?

What are the best ways that we can do this as a sector, so as to encourage both quick creativity and a slower consideration of context?

This speaks to a bigger question: once we’ve gathered data to prove an issue, problem, corruption or malpractice – how do we use that data to push for social change? For some, going after policy change is a concrete way of making sure it doesn’t happen again. But for others, keeping that data and evidence out of the hands of governments is a crucial part of the investigation. At the Data Investigation Camp, it was fascinating to see what “success” meant for each of us, coming from different backgrounds and contexts.

On doing the less glamorous work

As a sector, we take for granted a lot of ‘maintenance’ work that goes on. Whether it’s maintaining databases or gathering data regularly, there’s a big need here for accurate, high quality data, and it will only continue to grow. As data gets put online in even more random places, the importance of that maintenance will continue to grow, too. While it’s not glamorous work, and is by nature quite repetitive, I wonder: how can we best support this work on an ongoing basis?

I’ve written before about the importance of valuing maintenance in digital infrastructure and others have argued similar points more eloquently, such as this essay on why maintenance matters more than innovation. When it comes to data, too, there are a lot of hidden costs and ongoing burdens that come from being the one (person, organisation or institution) who maintains certain databases.

In a similar vein, it’s much easier for people – whether they’re advocates, human rights defenders, supporters, or funders – to get excited about the data or tech aspect of a project than it is to take a deep look at what’s already been done. For example, cross-referencing existing projects in related fields could bring huge benefits, but it’s rarely done. Reading existing literature (often behind paywalls, when it comes from academia) or speaking to people who have done similar initiatives often gets de-prioritised in the face of limited resources. Or, another example, groups can spend dozens of hours of their time developing a ‘new’ project, only to ultimately find out that it’s already been done.

Can we better incentivize the less glamorous work, so that we can produce even more exciting ‘new’ work? How can we encourage more people to support and carry out the maintenance work to support others?

On how we produce – and use – resources

One topic that’s been on my mind for a long time has been the varying levels of energy we spend on producing guides and resources, how we think about best sharing them, and then, how we actually use them. On the production side, I’m guilty of this, too. It’s easy to focus mostly on getting attention on launch day, and then move on to the next project rather than working on long-term engagement strategies.

Here at The Engine Room, we try through our direct support to make sure we recommend resources regardless of their publication date, as long as they are accurate and useful for the partner in question.

We also think about how we can boost engagement around useful resources for the audiences that need them the most. Our current activities in this realm include sharing our own research in our Library rather than in large PDFs; curating other people’s resources to make them easier to find, like with our Investigative Web Research library entry, which includes tools and resources to support data investigations; and translating our research so communities most affected by the topic in question can also benefit from it – like our Participatory Budgeting library entry, available in Portuguese, Spanish, and English.

We’ve been trying out different, non-written, formats of community engagement over the past months, such as our human rights + technology community calls. (We’re batting around some other ideas too, so watch this space! We’d love to hear ideas on how we could do this better – drop me a line on zara[at]theengineroom.org if you’d like to talk more about it.)

On identity

Working in a space as new and quickly changing as ‘data investigation’, or, more broadly for The Engine Room, ‘data and tech for social change’, means that it can be hard to describe who we are, and what we do. There are many fields we draw inspiration from, and others that we collaborate with, and these might change over time.

The ethics of who we consider ourselves to be, make a big difference to the standards we seek to uphold for ourselves.

I have to admit though, that I hadn’t thought so carefully about the implications of that question of professional identity until a particular discussion at the camp got me thinking. The ethics of who we consider ourselves to be make a big difference to the standards we seek to uphold for ourselves. For example – it is considered bad practice for a journalist to lie outright to a source in an investigation, but is far more accepted for some artists. For activists who go undercover for their investigation, where do those boundaries lie? And what about when those boundaries of identity are even more blurry – activist/artists, artists who end up collaborating with journalists – which standards should we aim for?

Questions, questions, questions…

I’m grateful to have had the opportunity to work through some of these questions with the fantastic crew at the Data Investigation Camp, and look forward to continuing to think through them with many communities doing this work. Much as with our Responsible Data work, forming these questions is almost as important as finding the answer – and sometimes, the answers can change quickly.

Zara Rahman

Zara was a member of The Engine Room team for seven years, and was Deputy Director until April 2022.

Questions to guide the work of data investigators

On context

On doing the less glamorous work

On how we produce – and use – resources

On identity

Questions, questions, questions…

Zara Rahman

MORE

Introducing CATio Spaces: A Learning Space to Talk Cybersecurity

Empowering narratives, strengthening ecosystems: A partnership for digital resilience

From memory to action: Radio as a tool for a healthier information ecosystem

Community Call: How to approach cybersecurity in turbulent times

Re-homing the Cybersecurity Assessment Tool (CAT)

Announcing our two new Matchbox partners