Bringing a text to life: 6 platforms for annotating text online

Posted May 19, 2015 by Zara Rahman
Credit: POP

(Credit: POP)

In September, a group of us got together to write a book on Responsible Data in international development. As we’ve written about already, it was a great experience, and we’re pretty proud with the product too – a collaboratively written book, from start to finish (including design!) – in 5 days. But, as we’ve noticed in the last few months, few people among our desired target audience have enough time to read an entire book, and – perhaps more importantly – we want the book to be the start of a conversation with a broader group of people, too.

So, the engine room asked me to take a look at how the book could be put online, to facilitate this conversation online, and try to make the book a little more engaging for people we’d love to learn from and hear from. This blog post is a summary of what we learned along the way.

For reference: the book is 31,471 words long, including a reasonably long ‘further resources’ sections, and thanks to the PubSweet platform that we wrote it on, we have it available in PDF, OpenOffice, .doc, and raw html.

What do we want to be able to do?

To provide background to the initial scoping, here are three user stories I had in mind while looking at the various options:

  1. As an engaged reader of the book, I want to comment on a certain sentence or section, so that I can discuss certain sections of the book with others with similar interests.
  2. As an advocate for responsible data in my organisation, I want to link to/send certain sections of the book to others in my organisation, so that they can read just the section that is most relevant to them. I also want to be able to download certain sections of the book so that we can use them offline, too.
  3. As an engaged member of the responsible development data community, I want to edit/adapt sections of the book based on people’s feedback, so that the book is a living, updated source of knowledge.

Platform final recommendation: Hypothesis

I recommended that the engine room use Hypothesis, built on Annotator.js, and running this on a static Github Pages site. Benefits of this platform:

  • The text itself (via the site) would be hosted on Github, a reliable service, with a strong community already present, and with lots of functionality around exporting it in different formats, editing it easily and by various people, accessing it in raw formats, and with version control.
  • The appearance of the site is very easily customisable, and lots of good looking templates are available already.
  • There is a cost free option!
  • Different chapter headings could easily have different slugs on the site, and having the actual text available on Github would provide alternatives for community interaction.

This would then have two options for community interaction:

  1. The ‘low tech skills’ option: using the built in annotation tools in the browser to comment, annotate, have conversations. (satisfying user story 1), and simply linking to relevant sections via the chapter heading slugs (satisfying user story 2).
  2. The ‘higher tech skills’ option: submitting pull requests and edits via Github, which also provides version control, forking and remixing totally new versions of the content with a low barrier to entry. (satisfying user story 3)

I would also recommend splitting the book into PDF chapters, and having a downloadable PDF per chapter linked to from the relevant section of the site. We’ve put the book on this platform, so you can see for yourself what this looks like in action!

Note: Genius seems to also do a lot of what we’re looking for, but from first exploration it doesn’t seem to offer tools for others to use to annotate on their own sites. One to watch for the future, though!

The 6 platforms I explored

Considering our main need – wanting to annotate online – there are is quite a selection of platforms out there already. For reference, here are a few others that I found, and the reasoning behind not choosing them. Note: if there are any errors in the summaries below, please drop me an email and we’ll correct them!

1. is a WordPress plugin that offers paragraph-level commenting in the margins of a text. is geared toward in-depth discussions of longer documents: article, essay or even book-length. Comments are attached to separate paragraphs.

Implementation options:

  1. As a WordPress plugin on an already set up WordPress site
  2. Sign up for a ready made site on


  • Works across various WordPress sitefarms, so different document sites could be set up if needed.
  • Open source


Conclusion: No go – seems to be no longer maintained nor working on any sites that I could find: because of this, ruling it out.

2. Document Cloud

DocumentCloud is a tool for organizing and working with large documents and document collections, a document viewer that makes it easier for reporters to share source material with readers and a publicly accessible repository of primary source documents that were used in journalists’ investigations. It includes automatic entity analysis through Open Calais.


  • Every note has a unique URL, so you can point readers right to the passage you want to highlight.
  • Annotations can remain public or private.
  • Documents can be embedded within other websites


  • Annotations can be made, seemingly, only by the user uploading the document and members of their team.
  • Very focused on the use case of an investigative journalist; much less so on our use case.

Conclusion: No go – annotations can’t be made by anyone but the user uploading the document/members of their team, meaning that no conversations/feedback can be drawn from the community.

3. Co-ment

Co-ment® is the reference Web service for submitting texts to comments and annotations. Co-ment is an open Web service powered by free software licensed under the GNU Affero GPL license.


  • Use cases given sound similar to our needs: being able to Annotate, Discuss, and re-write
  • Exports to various (printable)formats
  • Allows for annotations, responses to those annotations or comments, and ‘like’ or ‘plus ones’ by other users.
  • Annotations can be nested, and users can be notified when someone responds to their particular annotation
  • Allows version management, and comparison between different versions
  • Has different user rights control: good for creating a specific working group around the text (observer, commentator, moderator, editor or manager)
  • Provides analytics on number of comments, users, etc, as a standard part of the dashboard. (also could be considered as a disadvantage!)
  • Each comment has a distinct URL.


  • Name and email address of person doing the annotating seems to be ‘required’
  • Text would be hosted by them (ie. the book would be imported into their system, and the site hosted on a domain owned by them)
  • Costs:
    • 12 months (365 days) 84 €
    • 1 month (31 days) 10 €

Conclusion: Looks pretty good in terms of features – test case up here. Only concern would be around actually hosting the text with them (though it can be exported) – and also the visual appearance, the design doesn’t look great and seems to have limited options for customisation. Examples: the GPL v3 was written together with comments and annotations from the community. More documentation on using it is here, from that community.

4. Comment Press

CommentPress is an open source theme and plugin for the WordPress blogging engine that allows readers to comment paragraph-by-paragraph, line-by-line or block-by-block in the margins of a text.


  • Allows threaded conversations
  • Conversations appear in the margins, not above or below the main text (easier to read)


  • Comment by paragraph rather than by selection
  • No URLs given for specific highlights
  • Fairly restrictive on structure of text-  needs to fit into assigned ‘CommentPress’ structure as the plugin has been restructured into a CommentPress Core blog theme.

Conclusion: No go – too restrictive on text structure, having to use the CommentPress WordPress theme, doesn’t seem very widely used.

5. HyLighter

HyLighter is built for intensive collaboration with virtually any size group on a single document or a large collection of related documents including Office files, PDFs, images, and other file types.

Advantage: Has an unusual (potentially confusing, potentially useful) colour-coded system for highlights

Disadvantage: Confusing sales pitch – unclear what they offer

Conclusion: No go – seems to be about collaboration on offline document formats, rather than an in-browser solution, and prices are vague (potentially expensive)

6. Hypothesis / Annotator

Annotator.js is an open-source JavaScript library to easily add annotation functionality to any webpage. Annotations can have comments, tags, links, users, and more. Hypothesis is built on Annotator.js.


  • Annotations can have tags
  • Annotations can be private or public, editable/viewable to different people
  • Simple/lightweight to add to a website (couple of lines of script)
  • Actively maintained (and we/I know the maintainers pretty well!) – and actively used
  • Annotations can also be added to images (not just text)
  • Can also be used to annotate local + web based PDFs (as well as websites)
  • Can have threaded conversations, and be notified when people reply
  • Lots of plugins for extra functionality (though unsure how maintained/good these all are!)
  • Can host the text on a site ourselves, and simply add Annotator to it.


  • The book would need to be set up on a website first – so extra tech skills needed.
  • Setting up Hypothesis also seems to require a higher level of tech knowledge than the other options listed above.

Conclusion: Seems really good for this use-case; lots of functionality, nice and lightweight, looks good. We might need a plugin or two to make it do exactly what we want (eg. having URLs per annotation) – and it would require someone with a little bit of technical knowledge to be able to set the site up and add the script. Also has the benefit that we could style the site as we like.

Continue reading

From election monitoring to POINT: the engine room does the Balkans

Posted May 15, 2015 by Tin Geber

We’ve been spending quite a lot of time lately in the Balkans: more precisely Bosnia & Herzegovina, Serbia, Macedonia and Montenegro. And let me tell you, sibling: the Balkans are teeming with activists for transparency, accountability and civic engagement.

The story so far

It all started in August 2014, when the Bosnian organization Zasto Ne called us to participate in a hackathon to try and make sense of the Bosnian election with data:

For the love of bureaucracy. Courtesy of the  Bosnian Election Hackathon team

For the love of bureaucracy. Courtesy of the Bosnian Election Hackathon team

Our pattern master Tin Geber, project manager and resident Balkan, joined an international team of coders and storytellers for a 5-day sprint and deep dive into the most complicated electoral system ever invented (School of Data wrote a great blog post about it). While in Bosnia, we got to know the awesome folks from Zasto Ne and learned about ActionSEE, a coalition of organizations involved in tech for transparency and accountability spanning the Western Balkan region. Evolved out of CommunityBoostr and with support from TechSoup Europe, ActionSEE was just kickstarting the Balkan Data Academy, a region-wide project for transparency and accountability: we hopped on the chance to support it like a feline on a cardboard box.

If I fits, I sits. Courtesy of Reddit.

If I fits, I sits. Courtesy of Reddit.

Balkan Data Academy? Tell me more!

The Data Academy started in Christmas 2014 with an inception workshop in Sarajevo, where we defined what an Academy workshop would look like: 2.5 days of intense hands-on brain share and skill learning with activists, researchers, and coders, with a healthy mix of domain experts and new people.

The Academy’s goal is to build connections and provide concrete skills by working from scratch on data-driven transparency and accountability projects. From bringing participants up to speed on concepts of technology for advocacy, open data, and digital ethics, the sessions dive right into participant-led projects with expert support in data scraping, data wrangling, and visualizing information.

In the month of April 2015, we supported the Data Academy with four pilot Data Academy workshops in four countries: Macedonia, Montenegro, Serbia and Bosnia & Herzegovina. The trainings followed the agenda template, but each training was heavily adapted to fit the local context, and participants’ desires on what skills they wanted to learn, and what project they wanted to work on. In Macedonia, participants investigated crowd-sourced reporting tools for health and safety of public toilets, safer biking lanes, and accountability of universities. In Montenegro and Serbia participants concentrated on making government-released financial data on public officials more usable, by scraping and cleaning the released datasets. In Bosnia, participants dug deep into visual storytelling on election data (because honestly, we can’t get enough of Bosnian elections).

Not a hackathon

While hackathons are a great way to do strong pushes towards answering specific problems (and there are some amazing ones out there), the Data Academy is all about creating lasting connections. That is why each Academy session is just a first step in a bigger march towards  a techno-activist critical mass: by building capacity and strengthening networks, the Data Academy is betting on enabling long-term social change. In fact, the Academy projects will continue their work in the Data Academy datathon, a two-day sprint during the POINT conference.

What is the POINT conference, you say?

the engine room to the POINT – internet privacy, Data Academy, and the Matchbox project

The POINT conference (Sarajevo, 20-24 May 2015) is a regional conference of CSOs from across Southeast Europe and international experts, all of whom intensively use technologies in their work. POINT will investigate how tech is used for the social good, how it can be improved, and how it reflects on government policy. We will be there to talk digital security and privacy on the internet, our work with the Matchbox project, and the Data Academy datathon.

Let’s get to know each other

We’re really excited to meet new people: if you’re attending POINT and want to know more about the engine room, make sure to find Tin (he has a frosty beard), buy him a Bosnian coffee and pour your heart out.

Vidimo se u Sarajevu!

Continue reading

What we’re learning about using WhatsApp in advocacy initiatives

Posted May 12, 2015 by Lesedi Bewlay

(Credit: Jan Persiel)

For advocacy groups, NGOs and activists, SMS has been the go-to solution for tech-assisted data collection and citizen reporting. With smartphones getting cheaper and more accessible, and data costs following the same route, chat apps like WhatsApp are being used for general communication more than ever. This also means there is a potential for apps like WhatsApp to be used in outreach for advocacy or campaigning purposes. We wanted to share some of the things we’ve been learning as we explore the use of WhatsApp for our Matchbox partners.

Using WhatsApp for advocacy in Zimbabwe

The engine room recently completed a three-month partnership with Kubatana, an NGO based in Zimbabwe. Kubatana wanted to use its WhatsApp subscriber base (of around 6000) to collect data on water availability throughout the city and oversee the city of Harare’s implementation of water rehabilitation infrastructure projects. Kubatana sends weekly requests via WhatsApp asking for their neighborhood and the water availability for that week. The information collected would allow civil society to hold the city accountable for 24/7 water supply for Harare by the end of 2015.

WhatsApp has advantages in terms of data collection and cost. Mobile data costs are generally significantly lower than the cost of using SMS. WhatsApp adds a new dimension by not restricting the length of the message to the traditional limitations of 160 characters. Audio and video can now also be shared, which adds more detail and engagement to conversations.

What we helped build

We worked with Kubatana  to find a way to easily process all of the information received in their WhatsApp conversations. Finding a way to pull data from WhatsApp into a more usable format is something that many organizations are currently struggling with. Someone often has to manually retype the information into a spreadsheet to aggregate and carry out analysis. For Kubatana, we helped build a workaround solution that included a web-based tool that would take an encrypted version of the WhatsApp database and import it into an online database, and then export it into an Excel file for more efficient analysis.

Other things to consider

One challenge we faced during this work with Kubatana was overcoming the fact that WhatsApp is proprietary. We were unable to build in new functionality directly into the app which resulted in the development of a workaround that could quickly break once the platform is updated.

Another consideration when exploring the use of WhatsApp for your initiative is the inherent insecurity of collecting and storing data on a mobile phone. WhatsApp uses an encryption protocol (though the strength of the implementation has been questioned), but an experienced hacker could figure out a way to intercept and decrypt messages with root access to the phone being used to send the messages. WhatsApp is also likely to have the capability to intercept communications.

As more people migrate from SMS to apps like WhatsApp, it’s important that advocacy organizations are able to keep up with these technology changes, and meet their your audience where they are. Just be sure to understand the challenges and risks that you might face along the way so that you can mitigate them as much as possible.

What’s your experience?

Are you using WhatsApp to collect or share information with your audience? What challenges did you face in implementing this tool? What solution did you find to move the data from WhatsApp to a spreadsheet? Please share any stories, examples, questions and advice in the comments below!

Continue reading

Research finds: Do online civic engagement platforms mostly involve the privileged – and does that matter?

Posted May 1, 2015 by Tom Walker

The engine room has been collecting and summarizing research on ‘what works’ in projects that use technology to solve complex problems. We’re sharing research that we find useful on our blog, in the hope that a wider range of people can use it to inform their work.

Jonathan Mellon, Tiago Peixoto and Fredrik Sjoburg at the World Bank’s Digital Engagement Evaluation Team (DEET) have assessed four online civic engagement initiatives in Brazil, the UK, Uganda and the US, finding that citizens who engaged online were “systematically more privileged than the population or offline participants.”

However, the authors also found that these initiatives didn’t necessarily produce demands that reflected these inequalities. Neither did the outcomes uniformly reflect the priorities of these more privileged groups. In other words, the fact that these initiatives engaged more privileged citizens might not be a problem in itself – the structure of an initiative and the government’s response to it could play much more significant roles. The authors suggest that this means “a new methodological approach that looks beyond the profile of users and also to the institutional design and the government’s response” is needed.

The presentation is here (slides marked ‘preliminary’ represent initial analysis where more work will be done later). Here’s a quick summary of each of the case studies:

  • Participatory budgeting in Rio Grande do Sul (Brazil): “Online voters are from traditionally privileged groups” but there was “no significant difference in voting behaviour between online and offline communities”. The authors suggested that this might be because the participatory process vets any proposal before it is issued.  This was the same when it came to results: government implementation was designed to systematically favour poorer areas.
  • Fix My Street (a platform for reporting street problems to local authorities in the UK): Preliminary findings suggest that “requests for help are coming from more privileged areas”, but that the demands that come from those areas are “less unequal than the participants’ profiles”. However, they also suggested that “government response replicates unequal demands.”
  • U-Report (an SMS platform run by UNICEF in Uganda): The typical participant profile is “highly unequal”, but preliminary findings suggest that the effect this has on responses varies depending on the question being asked. The authors are still assessing outcomes related to U-Report.
  • (a global online petition platform): “Women are systematically underrepresented in petition creation across nearly all countries,” though preliminary findings indicated that men and women tend to sign petitions in roughly equal numbers. Government response “seems to generally follow the categories that get most signatures.”
Continue reading