Collecting 50 million data points across 200 projects in 40 countries

January 22nd, 2015 by Waylon Brunette

Brian Chu is an Associate Program Director with the Neglected Tropical Diseases Support Center (NTD-SC) where he provides managerial, statistical, and technical leadership to multiple operational research projects and monitoring and evaluation efforts. In his guest post on Nafundi's blog, he explains how Nafundi and ODK helped NTD-SC collect 50 million data points across 200 projects in 40 countries.

Brian Chu writes:

Mobile data collection using Open Data Kit (ODK) has allowed global health staff, scientists, and researchers to rapidly gather and process data in the global effort to control and eliminate neglected tropical diseases (NTDs). NTDs are a group of 17 diseases that cause substantial morbidity and chronic suffering to over 1 billion people in 149 countries worldwide.

A child in Ethiopia being examined for trachoma, one of many neglected tropical diseases that affects the world's poor. Photo courtesy of Dominic Nahr, Magnum Photos, and Sightsavers.

NTDs most commonly affect the world's poor through manifestations such as blindness, anemia, stunting, swollen limbs, and skin deformity that perpetuate cycles of poverty and stigma. NTDs were originally coined as 'neglected' due to their low awareness and low-profile funding, but recent years have seen incredible commitments by leading pharmaceutical companies, global health organizations, private foundations, and donor and endemic country governments towards combating NTDs.

Need for instantaneous data in push to eliminate neglected tropical diseases

The NTD-SC at the Task Force for Global Health is part of this effort to eliminate and control NTDs through operational research initiatives funded primarily by the Bill and Melinda Gates Foundation and USAID. With a plethora of research projects being deployed simultaneously in multiple countries, it made considerable sense for our organization to collect data electronically.

Early NTD-SC (formerly Lymphatic Filariasis Support Center) efforts utilized PDAs but we quickly found them expensive, bulky, unreliable, and the technology had a steep learning curve. A switch was, therefore, made to mobile phones and since 2009, the NTD-SC has worked with Nafundi to incorporate ODK as our primary data collection platform. The ease of setting up and using ODK, along with the flexibility and scalability of the system to provide instantaneous data provided considerable advantages for data speed, security, and accuracy.

Customization of ODK scales data collection for NTD-SC's research partners

ODK has become so indispensable for NTD-SC's global portfolio of projects that we worked with Nafundi to develop a customized branch of ODK called LINKS. LINKS meets specific NTD-SC requirements such as cascading selects and additional language support and navigation options.

A woman enters survey data into ODK-based LINKS for a trachoma prevalence tracking project. Photo courtesy of Dominic Nahr, Magnum Photos, and Sightsavers.

LINKS has essentially revolutionized data collection for the NTD-SC and has even spawned LINKS System, a small mHealth department within the organization that services partners in the NTD research community including the World Health Organization, CDC, RTI International, Washington University in St. Louis, University of Georgia, Liverpool School of Tropical Medicine, and many others. The success and inherent benefits of LINKS has also led to NTD-SC supporting mobile data collection for non-NTD organizations such as CARE and Children's Investment Foundation Fund.

Collecting 50 million data points across 200 projects in 40 countries

LINKS is truly setting new standards and benchmarks for what is possible with mHealth in the global environment. We estimate that the NTD-SC has supported approximately 200 projects in 40 different countries and collected over 50 million data points using the LINKS system. A couple project highlights include:

  • The Global Trachoma Mapping Project, which is the world's largest single disease mapping initiative ever. In the last 24 months, LINKS has allowed more trachoma data to be captured and analyzed than probably all previous trachoma data combined.
  • Transmission Assessment Surveys, which in a few short years has moved from an operational research project to a standardized WHO protocol for lymphatic filariasis monitoring and evaluation. This would not be possible without data being processed and analyzed so quickly using LINKS.

The NTD-SC has supported several other projects ranging from water and sanitation assessments to vaccine knowledge and practices to mosquito DNA analysis. The flexibility of LINKS to support all types of projects has been truly phenomenal. Moreover, increased use of LINKS in so many countries by so many organizations and local health teams has built great mHealth capacity for future projects.

The NTD-SC is really proud to work with Nafundi and leverage the power of mobile technology and dynamic ODK solutions to tackle NTDs and other global health challenges.

ODK 1.4.5 released - major enhancement to ODK Collect

January 13th, 2015 by Mitchell Sundt

There are new v1.4.5 versions available on the downloads page

Many thanks to Survey CTO ( for working with the ODK core team to achieve vast improvements in the correctness and speed of the ODK Collect form evaluation logic -- particularly in the handling of repeat groups.

ODK Collect

  • numerous bug fixes and extreme performance improvements to the form evaluation logic resulting from a close collaboration between the ODK core team and SurveyCTO.
  • Added Admin Setting for "Form Processing Logic" to select among different form evaluation logic implementations:
    • Recommended form evaluation logic (default - whatever logic is the current best going forward)
    • January 2015 (fastest) form evaluation logic
    • January 2015 (safest) form evaluation logic
    • Mid 2014 form evaluation logic (ODK Collect 1.4.4 and 1.4.3)
    • Early 2014 form evaluation logic (ODK Collect 1.4.2 and earlier)
  • update to newest GME API (October 2014)
  • new function: enclosed-area() (or area()) contributed by SurveyCTO
  • new Japanese translation and numerous translation updates
  • see for additional changes.

ODK Aggregate

  • Fix: mark-as-complete on encrypted submissions (requires ODK Briefcase v1.4.5 or higher).
  • Fix: add a server preference to ignore partially inserted/deleted submissions. Logs them but ignores them so that you can access all other rows in your dataset. Disabled by default. By default, all actions fail upon encountering any malformed submission. You should not ignore these failures but should correct them as soon as is practical.
  • incompatible 2.0 Data model and Sync protocol changes. Incompatible with device releases: rev 122 and earlier. See the release notes for upgrade steps.
  • updated javarosa jar (supporting enclosed-area() and area() functions).
  • see the release notes for additional changes and upgrade steps.

ODK Briefcase

  • Fix: mark-as-complete on encrypted submissions (requires ODK Aggregate v1.4.5 and higher); impossible to access encrypted submissions that were marked-as-complete while running earlier ODK Aggregate releases without hand editing.
  • updated javarosa jar (supporting enclosed-area() and area() functions).
  • see the release notes for additional changes

xlsform.exe for Windows
ODK Validate
ODK FormUploader
ODK ClearBriefcasePreferences

  • updated javarosa jar (supporting enclosed-area() and area() functions).

ODK CsvConverter

  • no changes, just updating the version to 1.4.5

Using ODK in Nahr el Bared camp, Lebanon

January 6th, 2015 by Waylon Brunette

A guest blog post from Alia Aghajanian, PhD Candidate from Conflict, Violence and Development Cluster in the Vulnerability and Poverty Reduction Team from the Institute of Development Studies. This was also cross-posted to Alia's blog.

Alia writes:
As part of the research for my PhD thesis, I collected a household survey of 600 Palestinian refugee households from Nahr el Bared camp in Lebanon in 2012. My thesis looks at the consequences of returning home, and my case study was Nahr el Bared camp, a Palestinian refugee camp that witnessed a destructive war in 2007. All residents of the camp were evacuated and displaced to a nearby camp as the Lebanese military fought with Fatah el Islam. The camp was almost completely destroyed and required a large reconstruction effort, leaving around 80% of households to still be displaced 5 years after the war, at the time of my field work.

I have written a previous blog about how this data was collected.

Data entry
The main advantage of using tablet PCs was the fact that data was entered once only, as opposed to the case of using paper and pencil interviewing (PAPI), where responses are first recorded on paper and then entered in digital format into a computer. In the case of PAPI, entering the data twice increases the likelihood of data entry error, especially when written responses can be illegible or misinterpreted – even more so if the data is not entered by the same person who wrote the responses, as is usually the case. This also sped up the data collection process, skipping the data entry stage, which in the case of a 600 household survey was estimated to have taken up to two weeks.

Data Validation
Another benefit of data entered immediately onto the tablet was that we were only a simple step away from observing the data. At the end of a day of interviewing, the data was transferred from the tablet PCs to my laptop (through a USB cable) and then compiled into an excel sheet through ODK briefcase (this could have been done through a server if internet was available, but that would have been too expensive in Lebanon).

Once I had the excel file I could look for outliers among the variables. I also looked out for any logical inconsistencies, such as those concerning education levels and age, or employment status and age. At the end of each day I prepared a report of these inconsistencies and met with the teams early the next morning before setting out for the next day of data collection.

Skipping patterns
The questionnaire used for this household survey was quite complicated as the sequence and inclusion of questions depend on the response to a previous question. In this case the skip codes can be automatically included into the ODK form. Data collectors saved time as they did not have to carefully think about which question to answer next. In addition, mistakes where data collectors miss a question or answer the wrong question were avoided.


Battery life
The tablets that we used had a short battery life and need to be recharged after three to four hours of interviewing. While we had set up two offices in Nahr el Bared camp and Beddawi camp for data collectors to recharge their tablets, they found that if electricity was available, respondents allowed them to charge their tablets while conducting the interview. In some settings where electricity is not readily available (or respondents are not as hospitable and cooperative), battery life can be problematic, and investing in long lasting batteries or extra batteries might be necessary. As I was taking the tablets back with me each day, I ended up charging all the tablets overnight as well. Lots of electricity sockets are a must if this is how you do it!

Back up
Data collectors were also worried that there was no back up to their interview other than the digital files on the tablets. A slip of the hand might delete a questionnaire, or the loss or theft of a tablet might mean the completed questionnaire would be lost forever. To avoid losing data like this, we had to transfer data to my laptop as soon as possible, and then back it up on Dropbox or a hard drive. Luckily no questionnaires were lost this time, but it might be worth having an internet connection so that forms can be sent to the server as soon as the interview is over.

A justifiable concern prior to fieldwork was that respondents, or potential respondents, might be intimidated by the electronic devices. While several respondents initially asked if they were being recorded, data collectors did not feel that respondents were intimidated. Interestingly, respondents were quite curious to know more about the devices and how they would be used in the interview. This curiosity served as an icebreaker in many interviews, and in others a starting point to explain the research project.

Unfortunately, being on a tight budget as a PhD student meant that I could not afford very good quality tablets, and I relied on 2nd hand tablets that had been used for data collection by my PhD supervisors in Maharashtra, India. After shipping the tablets from Mumbai to Beirut and uploading my form, I realised that the tablets did not display Arabic font! After a couple of days of worrying about this, I decided that we could use Latin characters transliterated from Arabic. This was not too much of a problem as most of the enumerators were used to using these characters on their mobiles or computers for sending messages, emails or chatting. In fact, one enumerator said she found it much easier.

The consensus among the field team, and specifically the data collectors, was that the use of tablet PCs was a huge (if not necessary) bonus on many levels. The main advantage for the data collectors was that they did not have to struggle with complicated skip codes during the interview, but I found the main advantage to be in aiding data validation.