This week sees the The Engine Room’s next replication sprint take place, as part of our Matchbox Program. Replication sprints are designed to reuse technology to help new partners, while also helping with the strategic design based on their unique needs.
What is a replication sprint?
The idea is simple: we take a former Matchbox project, and spend a week with a few organisations who are facing similar challenges and are interested in developing similar projects. The teams will walk away with tools they can use right away and with end products that are complete in design and features, developed around their specific needs.
Read more about the replication sprint process in this post.
The teams we’re working with during this week’s replication sprint – Opora and K-Monitor – are based in Eastern Europe and Western Balkans, fighting for more transparency and accountability in their own contexts. They also have a large number of messy documents about a specific issue, such as asset disclosures and political donations.
During the week, the participants of the replication sprint will get access to a tool and method that helps them easily categorise large numbers of documents (like scanned PDFs or paper copies).
Perhaps they’ve already tried to analyse these documents using technology, but bumped into challenges and now think they can do more with help from the wider community. And that’s where The Engine Room and our team of experts come in.
They have the PDFs. We have the methods for pulling information out of them.
Machines and algorithms are very efficient at certain functions, such as identifying patterns in an image or performing advanced calculations. However, other tasks – including identifying the contents of an image or recognising obscure text – are still best performed by humans.
The tool we’ve developed with former Matchbox partner ¿Quién Compró? is an online platform to simplify the process of liberating data from documents through microtasking – the process of splitting a large job into small tasks that can be distributed, over the Internet, to many people.
Here are the two organisations that will be taking part in the replication sprint this week, along with details of their projects.
A non-governmental network of public activists whose mission is to enhance public participation in the political process in Ukraine, by increasing citizens’ influence on what the government does – both on the national and local levels.
The Opora team want to start working with financial electoral reports to create a database for election donations starting in 2014. Their challenge is that not all of these reports are published online, and information is missing in some of the reports.
A grassroots anti-corruption initiative from Hungary started by a group of enthusiastic students in 2007. K-Monitor operates the biggest news database on corruption cases in Hungary with over 20,000 visitors a month.
The K-Monitor team want to develop a tool to process new asset declarations that will be published in Hungary in February 2017. They want to make the assets and incomes of decision makers transparent and comparable over time, develop a system of signals to detect suspicious enrichment, and advocate for better data publication and legal reform.
Updates from the event
We’ll be updating this blog post as the week goes on with all the action from each day, so make sure you check back during the week to see how the teams are getting on.
And we’ll produce a wrap up report at the end of the week, so you can see what went well and what we’d improve for next time.
Disclaimer: Our Global Matchbox Lead, Julia Keseru is a former employee of K-Monitor. She was not part of the selection committee for this replication sprint.
Day 1 Update
The goals of Day 1:
- For participants to get to know each other
- For participants get to know the projects and the platform we’re replicating
- To create a feature set, needs and goals for each project
- Set up the technical development environment
Opora and K-Monitor gave in-depth presentations of their projects, followed by questions and answers. All participants asked in-depth questions to understand better the country contexts, the political environment, as well as previous challenges they’ve faced.
In the afternoon, each team dove into creating complete feature lists. Based on the existing features of our technology, participants then compared those features to their team’s needs and goals, and developed aspirational additional feature lists. We then discussed all the features to understand technical and time constraints, and at the end of the day we had a clear vision for success – what we want to build by Friday.
In parallel, we also worked on setting up the development environment: from deciding on best communication and task management, to locally installing the codebase, to setting up local file servers.
We are now ready to deep dive into the design and development process: Tuesday morning is dedicated to defining specific tasks we will be working on for the rest of the week.
— Krzysiek Madejski (@KayMadejski) January 23, 2017
— K-Monitor (@k_monitor) January 23, 2017
— The Engine Room (@EngnRoom) January 23, 2017
— The Engine Room (@EngnRoom) January 23, 2017
Day 2 Update
The goals of Day 2:
- Develop user stories and outline tasks needed to achieve them
- Begin UX interaction flow for K-Monitor and data modelling for Opora
- Update our “Progress Wall”, our Kanban board with tasks that are “In Progress” and assign them to domain experts
The morning of Day 2 was the most intense part of the replication sprint up to now: the entire group brainstormed tasks needed to make our features and we organised the tasks into user stories that will guide our progress during the sprint.
— Alan Zard (@alanzard) January 24, 2017
— Tin Geber (@tingeber) January 24, 2017
In the afternoon, K-Monitor worked on user experience and design, while Opora dug into the intricacies of microtasking hundreds of pages of PDFs. We also explored options for integrating OCR (optical character recognition) and semantic text analysis.
On Day 3, we’ll deep dive into the heart of the work: finalising the UX discussions and data modelling, preparing the server, collecting design assets, and scraping documents for both projects.
Day 3 Update
The goals of Day 3:
- Work through our priority user stories: design and implement the websites’ flow and landing page, and prepare the platform for data imports
- Refactor codebase, remove dependencies
- Develop advocacy strategies for each project
Day 3 morning started with a “stand-up”: each team member described their work for the morning, flagged any “blockers”, and placed their tasks in either To-Do, In Progress, or Q/A. While everyone made awesome progress, moving new tasks into In-Progress and Q/A, the morning also brought new challenges. We hit a snag downloading Opora’s documents, and more generally, our internet connection was slowing us down.
— Tin Geber (@tingeber) January 25, 2017
The afternoon was split between a session on advocacy strategy with Seember Nyager and continued progress on design and backend development. Seember worked with both Opora and K-Monitor to sharpen their mission statements, define the audience for their platform, and craft a realistic outreach plan. Tamara Puhovski, an open government expert from Croatia, joined us for the session to provide valuable insight from the local perspective.
With only 1.5 days left to build and test the platform, we’re picking up the pace. On Day 4 we’ll start finalising the platform design, writing the copy for the landing page, and finishing all data preparation: structuring, modelling and scraping.
Day 4 Update
The goals of Day 4:
- GET. IT. DONE.
- Finish designs for entire platform
- Begin the front-end development based on finalised landing page design
- Develop localised content for landing page and form
- Automate PDF data processing to the fullest extent possible
Day 4 was the longest and most productive day: all participants dug deep into code, design, content creation, data modeling, and testing. In addition to developing the microtasking platform for both organisations, we also dedicated time with Opora to extract tables from PDFs automatically, since about half of Opora’s PDFs are machine-readable. Our resident data expert Vanja from web.burza is creating automatic scripts for Opora to liberate future PDFs with the minimum amount of hassle.
— Dimitri S. (@UTCplus8) January 25, 2017
In the afternoon, following a high-stakes game of rock, paper, scissors, we jumped back into data modelling, design and development of the platform. Teamwork and cohesiveness were at an all-time high. Everyone needed to work concurrently; we mapped out the dependencies and bottlenecks’ so no one was waiting on anyone else to finish critical tasks. The sprint continued well into the night, with a brief respite around 23:00 when all the electricity in the neighbourhood went out.
We are nearing the last day, so we are kicking our work into hyperdrive to finish development of the platform. Day 5 will include a jam-packed morning of backend and front end development, and user testing, finally ending with a presentation of the platform. In the afternoon, we’ll discuss next steps to achieve each organisation’s goals and reflect on the progress and challenges of the week.
Day 5 Update
The goals of Day 5:
- Finish script that scrapes Opora’s 115-page PDFs
- Finish localisation of all content
- Finalise front-end of the K-Monitor platform
- Implement verification functionality
- Plot next steps for the platform
Although we had worked late into the night on Day 4, we awoke on Day 5 with a good amount of work left. K-Monitor needed the platform ready for an event on January 31st, so we prioritized the implementation of their platform. For Opora, we focused on finishing an automatic PDF liberation system that will provide Opora with the ability to crunch more than 60% of their PDFs..
— Dimitri S. (@UTCplus8) January 27, 2017
The afternoon rolled around and we assessed how much time we had left, and what was mission critical for K-Monitor’s launch. We decided to forego the discussion on next steps in order to continue building and testing the platform. We planned, instead, to have a follow up call in the first week of February, to discuss what the support will look like going forward.
The rest of the day was spent building scripts to process the PDFs faster, designing assets for Opora’s platform, and implementing the front-end for K-Monitor. K-Monitor spent the day localising and testing the platform.
At the end of Day 5, the K-Monitor platform was almost ready. We had completed a total of 52 tasks from Tuesday to Friday. Marit, our marvellous scrum master, put all the remaining user stories and tasks into a Kanban tool called Taiga, so we don’t lose track of them.
We’re nearing the finish line for K-Monitor and will be stress testing the platform today.
It’s been an exhausting but productive week. We couldn’t have done it without our spectacular domain experts, Marit, Krzysztof, Dimitri, Alan, and Vanja, or our incredibly helpful project leads: Levente and Attila with K-Monitor, and Grigorii with Opora.
Stay tuned for more reflections on the entire replication sprint process!
just have come back to UA from #replicationsprint to say that i was awesome – to say nothing. it was awesome-awesome. people, U are great)
— Grygorii (@from_chernivtsi) January 28, 2017
— marit (@la_marite) January 27, 2017