The engine room collected data from 101 Global Voices community members in Nairobi at the 2012 Citizen Media Summit. But, who has time to manually input data from 200 pages of surveys?
Lots of valuable information gets collected on paper and trapped there until a dedicated data entry person takes the time to free it by making it machine-readable. There is so much more that can be done with data once it is in digital form (it can be moved seamlessly, it can be analyzed by computers, it can be protected, etc). We wanted to speed up this process, and that’s where a new tool called Capricity comes in.
The idea behind Capricity is pretty simple. When a human describes to a computer where the answers to a survey should be and gives a general idea of what the answer will be (a detailed answer set for multiple choice or many choice, open text field, etc) a computer should be able to read the answers and pump out a CSV for a researcher.
Yesterday I sat down to process the scanned copies of paper surveys filled out by the GVers. The two-page survey was designed on the fly (though with hours and hours of last-minute survey crunching by Christopher Wilson and Zeynep Tufekci). In that kind of working environment, printing and distributing was a lot easier (and likely resulted in a higher response rate) than a circulated online survey – we all know hard it is to muster the energy to fill out post-conference surveys. The surveys were scanned and I was ready to try the software out.
So how does Captricity work?
First we uploaded a pdf template of the survey that wasn’t filled out. This lays a foundation so that the software could recognize which parts of text were answers and which were questions. And then comes the fun part.
The green boxes in this screenshot are individual question fields. For each field selected, there is a corresponding tool box that builds the question and the answer set so that the software knows whether it’s looking for multiple choice selection or open text. This can be cumbersome for a question like the one in this screenshot that asks for a “Please check all that apply” response and includes a free text “Other”field.These have to be done separately even though they are the same question. But the user interface is still pleasant to work with.
Another feature, The ToolBox, is even easier to work with. It is intuitive and quick. One suggestion: when we got our data in CSV, it would have been much easier had I put a little bit more time in exactly recreating the survey answer set. Otherwise there’s a lot of time with Find and Replace and it’s more challenging to get a quick overview of the data set.
Once the entire template is marked up with question boxes, the next step is to upload the completed surveys.
The uploading was kind of a pain, though the site did use a few tricks to make it easier. We merged all the scanned pdfs into a single document with 202 pages of 2-page surveys. Unfortunately, that is a 100mb file and Captricity doesn’t tell you about that limit until you try. So I broke the merged document into 5 pieces and uploaded those. A dialogue box kindly asked if I wanted the bigger file to be broken into 20 2-page surveys (which I did). The files uploaded quickly after the issue with size limits was handled, and I was at the confirmation page, ready to pay and submit the documents. we paid 36USD for the data to be processed which seems very reasonable when compared to the time, energy, and money, we would have poured into the other options for data entry.
My one major complaint with the site is that I wasn’t allowed to test out the field selection markup to see how it performed. Instead I submitted a first try, paid 36USD, and crossed my fingers. As we crunch the data, we’ll update you on how well the software performed. Ultimately, in this test run, we won’t be comfortable trusting the output without comparing the paper surveys to the data churned out by Captricity. More soon on both how Captricity performed and the results of the survey!