Blog

The Digital Agency for International Development

Rough Guide to rural data collection with ODK

By Chris Wilson on 05 December 2011

This post has three purposes, which I think overlap sufficiently to combine them:

  • A User Guide for the system that we developed for UNICEF, IDS and RuralNet Zambia
  • A Developers' Guide for anyone wishing to build something similar
  • Notes on lessons learned that may assist future implementers

view project

Project goals

Automate the data entry part of a long paper-based survey, by replacing the paper forms with electronic devices.

Hardware and application selection

The survey has several long and complex questions, and long sets of multiple-choice answers. The data collection needs to be done in dusty rural Zambia, and the devices might need to be used for a full day without power. Collected data should be sent wirelessly to a secure data repository at some time after collection.

Text entry is required for many fields. That means either a real keyboard with keys, or a sufficiently large touch screen to type comfortably on. Use of the device camera, and presentation of reports and graphs on the same device, might be required in future.

Two possible hardware platforms were identified:

  • Tablet laptops with touch screens
  • Tablet mobile devices (iPad or Android tablet)
We selected the latter for this project due to lower cost, lighter weight, better usability and longer battery life.

The available software options that we identified were:

  • EpiSurveyor (Java J2ME, partly closed source, we have used before and fixed bugs)
  • OpenXdata (Java J2ME, open source, developed and supported by an Aptivate alumnus among others)
  • Open Data Kit (ODK) (Android, open source, active community)
  • Bespoke online/offline survey in HTML5
Of these, we eliminated EpiSurveyor and OpenXdata due to lack of compatibility with the hardware platform(s) we had chosen.

We chose ODK over a bespoke system due to limited time available for development, and ability to easily take photos and record GPS coordinates using the device's hardware.

Of the available Android tablet devices, we chose the Samsung Galaxy Tab for the pilot project, due to its high quality construction. For future projects we would probably use a lower cost device; see the lessons learned for details.

Form creation

Since the survey is quite long (about 230 questions) we wanted an easy way to enter the questions. The ODK application requires the form to be in XForms format. We identified the following tools for creating XForms: We decided to use XLS2XForm, which enabled us to enter the large number of questions easily in Excel. The others all have graphical builders, which have advantages and disadvantages for less technical users:
  • More visually appealing
  • All available options presented visually (types of controls, groups, etc.)
  • Less likely to make a mistake and produce an invalid form
  • Cumbersome user interface slows down data entry
Unfortunately, none of these designers were able to import an existing form in XForms format, which means that the modifiable "source code" of the form must be maintained in a "proprietary" format in each case, and it's difficult to switch between tools.

You can download the conversion tools, and the Excel spreadsheet with the completed questionnaire as we delivered it to RuralNet, here. RuralNet staff, please use the latest version of the spreadsheet that you can find locally. To use the tools, you will need to download and install Python 2.7 and Java (JRE). Then download the tools as a ZIP file and extract it somewhere. I recommend that you keep the master copy of the spreadsheet in Dropbox to ensure that it's backed up, and it's always clear what the latest version is.

For help in building surveys using XLS2XForm, please see the documentation. In addition to the question types listed there, we have used the following shortcuts, which also work in this customised version of XLS2XForm:

  • text is short for add text prompt (a text field, such as a person's name)
  • note is short for add note prompt (a read-only field, providing additional information for the user)
  • time is a time field without a date (for example, survey start and end times)
To compile the spreadsheet into an XForms form, run the build_and_validate.py script by double-clicking on it. If it works, it will show the message "Success!", otherwise it will show an error message, usually caused by a mistake in the Excel spreadsheet. If it works, it will create (replace) the file called zambia-ranq-round3.xml in the same directory. If your spreadsheet has a different name, you can create a shortcut to call build_and_validate_custom.py with the name of the spreadsheet on the command line.

Software components

ODK Aggregate is the software that powers the Internet server. It is a repository for blank forms (designs) and completed forms (data). Our server is located at http://partimob.appspot.com/. This server is currently paid for by us, and will need to transfer to RuralNet at some point.

ODK Collect is the application runs on the device, and users interact with it to complete the survey. It's essentially a user interface for XForms. It can download blank forms (designs) from an ODK Aggregate server, and upload completed forms (data) to the Aggregate server as well.

ODK Briefcase is the software that downloads completed forms (data) from the Aggregate server and convert them into CSV (spreadsheet) format, which can be loaded into

Customised ODK Collect

We are using a custom version of ODK Collect. You can download the source code for it here, or the compiled application here. You can also find it in the ZIP file download. If you prefer, you can use the latest official version of ODK Collect. The two are compatible, but our version adds the following useful features:
  • Use supplied login and password by default to save a round trip and a prompt.
  • Add keyboard navigation, useful for form filling on android-x86 because the mouse interface is pretty clunky.
  • Restore ability to modify completed and submitted forms on the device, which was removed from the official version in 1.1.7.
  • Improved error messages and progress indication during form uploads.
  • Allow setting the instance name on the first page of the survey.
  • Allow saving incomplete surveys on required questions (in case a survey is interrupted; almost all of our questions are required).
There are several ways to install ODK Collect on a device:
  • Download it from the Android Market (official version only, not our customised version)
  • Copy the APK file onto a microSD card, insert the card into the device, and use the My Files application find and open it from the SD card.
  • Attach the USB cable from the device to a computer, enable mass storage mode on the device, and on the computer, drag and drop the APK file onto the device's internal memory, then use the My Files application to find and open it.
  • Attach the USB cable from the device to a computer, and use ADB's install command to install the APK file.
It's useful to put the application onto the device's desktop. To do that, open the Applications list, find ODK Collect, and press and hold it with your finger for a few seconds. The background will change to the desktop; release your finger to drop the application there.

It's also useful to remove all the other junk from the desktop. For each icon and widget on the desktop, press and hold it with your finger for a few seconds, until the trashcan icon appears, then drag your finger to the trashcan and release it there.

Form management on the device

There are several ways to put blank forms (designs) onto the tablets:

  • Download them from the ODK Aggregate server using ODK Collect.
  • Copy them onto a microSD card, insert the card into the device, and use the My Files application to copy them from the SD card to the /sdcard/odk/forms directory.
  • Attach the USB cable from the device to a computer, enable mass storage mode on the device, and on the computer, drag and drop the form into the /sdcard/odk/forms directory.
  • Attach the USB cable from the device to a computer, and use ADB or DDMS to push the file onto the device, into the /sdcard/odk/forms directory.
Of these methods, ADB or DDMS is recommended for rapid development, and using the Aggregate server is recommended for production use, since the form must be installed on the Aggregate server for it to be able to accept submissions.

Similarly there are several ways to copy completed forms (data) off the device:

  • Upload them to the ODK Aggregate server using ODK Collect.
  • Use the My Files application to copy them from /sdcard/odk/instances to a microSD card, then remove the card and connect it to the computer, and drop the files into the ODK Briefcase data directory.
  • Attach the USB cable from the device to a computer, enable mass storage mode on the device, and on the computer, drag and drop the files from the /sdcard/odk/instances directory to the ODK Briefcase data directory.
  • Attach the USB cable from the device to a computer, and use ADB or DDMS to pull the file from the device's /sdcard/odk/instances directory to the ODK Briefcase data directory.
Of these methods, using ODK Aggregate is recommended for development and production use.

Since the Aggregate server is on the Internet, this method requires that the device have Internet access. So it either needs a valid SIM card installed with credit and a data bundle, or a WiFi network connected. We had many problems with using SIM cards for data, so WiFi is preferred if possible.

The directories mentioned above will not exist until ODK Collect is installed on the device and run for the first time. Forms downloaded from the Aggregate server will also be placed in the /sdcard/odk/forms directory. Forms completed on the device will be placed in the /sdcard/odk/instances directory.

Configuring ODK Collect

Collect needs to know the details of the ODK Aggregate server to log into it, download blank forms and upload completed forms.

Open the ODK Collect application, press the Settings button and click on Change Settings. Click on URL and enter https://partimob.appspot.com. Similarly, complete the Username and Password using the details that you've been given by the Aggregate server operator, or the account that you've created on the Aggregate server. This account should only have Data Collector permissions, no more. Press the Back key to get back to the main menu of ODK Collect.

Downloading forms using ODK Collect

Open ODK Collect on the device, and click on the Get Blank Form button. Collect will try to log into the Aggregate server using the details that you've provided, and get a list of forms on the server that have the Downloadable box ticked. This is on by default for newly uploaded forms.

Tick the box next to all the forms that you want to download, and click on the Get Selected button.

Filling forms on the device

Open ODK Collect on the device, and click on the Fill Blank Form button. All the forms in the device's /sdcard/odk/forms directory should be listed. Choose the form that you want to complete.

You will see an introductory screen showing how to move between questions by swiping your finger across the screen, from right to left or left to right. This screen has a text box at the bottom, which you can use to name the form that you're completing. Naming forms is useful if your data collection is interrupted and you need to resume it later. It's much easier to identify the form using its name, rather than opening it and flicking through to find some identifying information. You might name the form based on the household code that you're surveying.

Depending on your answers to some questions, others may be hidden, or their text might change.

At the end of the form there is another chance to Name this form, and a tickbox to Mark form as finalized. Before you can upload the form to the Aggregate server, this box must be ticked, and you must press the Save Form and Exit button. Otherwise Collect will consider that the form is incomplete.

Sending completed forms to Aggregate

Open ODK Collect on the device, and click on the Send Finalized Form button on the main menu. Tick the box next to all the forms that you want to upload to Aggregate, and click on Send Selected. After the upload is complete, you should see the Upload Results message. Every form should have "Success" next to it, otherwise it was not sent successfully.

Downloading forms using Briefcase

We are using a customised version of ODK Briefcase with the following changes:
  • Fix the export of repeated groups, which before only worked for the first row (issue 461).
  • Shorten exported column names, to allow the CSV file to be imported into Access.
  • Allow the server name, username and password to be provided on the command line (or via a shortcut).
You can find the source code here and the pre-compiled version here, as an executable JAR file. You can also find it in the ZIP file download. If you make changes to the source and want to build the executable JAR again, install Maven and use the mvn package command.

To download the completed forms, open Briefcase by double-clicking on the briefcase-1.0-jar-with-dependencies.jar file. On the Transfer tab, click on the Connect button. For the URL, enter https://partimob.appspot.com, and for the user name and password, give the details of an ODK Aggregate account with Data Viewer permissions.

Then you should see a list of forms appear under the heading Forms to Transfer. Tick the box next to the one that your users have been completing, and then click on the Transfer button. If you do this after all the completed forms (data) have been submitted to the ODK Aggregate server, you will not need to do it again for that form template (design).

Now switch to the Transform tab and see if the form appears in the Form list. If it doesn't, then exit and restart the Briefcase application (issue 464).

For Output Type, choose .csv and media files. For Output Directory, choose the directory where you'd like to save the CSV files. Note that any previous files exported to that directory from the same form will be overwritten without warning, even if they have been modified (cleaned). Click on the Output button to write the CSV files.

Cleaning data in Excel

You can find the Excel spreadsheet that we use for data storage and cleaning here. Note that Excel is a long way from the best way to store and manipulate data like this. Microsoft Access would be far more appropriate. Yet again I wish there was a sufficiently powerful open source alternative desktop application to Access, allowing ordinary people (not developers) to develop and maintain their databases themselves.

Because the spreadsheet contains cleaned data, which is "better" than the raw data which is included in the CSV export, we don't want to overwrite existing rows. For the main section of the questionnaire (the so-called Single Responses) you can include only the new data like this:

  • Open the main spreadsheet and switch to the Single Responses tab
  • Highlight all rows from 3 down to the bottom, and Sort them by the SubmissionDate column.
  • Note the last submission date on this spreadsheet.
  • Open the newly exported CSV file for the single responses (something like RANQ-2011-Round-4-v5.csv).
  • Sort this file by the SubmissionDate column as well.
  • Highlight and copy all the rows whose submission date is later (more recent) than the last one in the main spreadsheet.
  • Paste them at the bottom of the Single Responses tab of the main spreadsheet, below the other data.
For the other tables, this process needs to be done completely manually at present.

You can then check and clean the data by viewing and modifying it in Excel. Note that each sheet has one or two columns at the end, which are filled by formulae that look up values from the Single Responses sheet, such as the Household Code.

Using the Android x86 Emulator

To be written.

Lessons learned

Project Goals

The actual aim of the project was never clear, because all the stakeholders wanted different things. But if it was to help our partners work more efficiently, then we could have attacked other parts of the process that would have yielded bigger improvements more quickly, such as helping with the output processing (data analysis and report writing) rather than data collection. If the project had incorporated a systems analyst early on, we would have identified and addressed these needs better.

Nobody wanted to be the product owner, to take responsibility for setting priorities for the project. We normally refuse to do this ourselves, because we see our role as assisting someone else to achieve their goals, and that person will need to maintain ownership of the project after our work is done. But in this case we had no choice but to become the product owners, because we couldn't function without one.

Development Process

We found that workflow elicitation was useful for generating user stories in the agile sense, but we did not have a clear map of the workflow during the development process.

During development we collectively discounted the need for data cleaning, because we thought incorrectly that all the errors were introduced during transcription. Data cleaning might be less necessary if we had collected more data, since the errors would tend to average out, but it was still essential, and not planned for, and there was no workflow to make it happen, so we had to hack something together in the field.

Procurement

We had difficulty actually purchasing equipment in country. Samsung Zambia could not accept credit card payments over the phone. In the end, we had to bring large amounts of cash into Zambia and change it locally. This resulted in late and risky procurement of the equipment.

We received disappointing service from the retailer of the tablets, including repeated failure to deliver the tablets to their shop for purchase despite prior agreements and assurances, and supplying us with devices in unsealed boxes. We suspect that some of the tablets had been used as demonstration models in shops, or returned by other customers.

The supplier's warranty was only one week, after which we would have to return the tablets to Samsung in China.

In future we would:

  • Ensure equipment is new (in sealed boxes)
  • Have a trusted in-country agent pick up and test the equipment -- this agent must take responsibility for functioning goods
  • Ensure that equipment is in warranty, and if it fails tests it must be returned

We expected that Samsung would supply high quality, reliable tablets at a high price. We were disappointed with their reliability and performance. We could have spent much less money on the tablets for a similar level of performance, and had more spares.

Hardware

We had only a limited number of hardware devices, and our user experience developer did not even have access to an Android phone or an emulator.

Two tablets had hardware problems: short battery life and failure to connect to the mobile network. These were discovered in the field, too late to return the devices to the supplier for replacement.

We had reports from enumerators that the touch screen became less sensitive as the battery drained.

Most tablets failed to submit data over the wireless network at all. We had many problems with data communications in Mufulira, which we expected, but we thought we would at least be able to submit small amounts of data wirelessly, and this turned out not to work.

Survey Instrument

The survey instrument itself was unclear, complex, and not designed for electronic use. The wording of the questions was unclear and we had many arguments over it. Some questions would have benefited from having a calculator embedded, to help with subtraction and conversion of units, such as calculating change in land area owned by a household.

We had many problems with the form logic, in particular skipping questions depending on the answers to previous questions. This was very difficult to get right, and took us a long time, especially when using XLS2XForm to input the survey. It would have been easier if we could have visualised the flows through the form. We suspect that we did not have time to provide enough training that the local staff would be comfortable maintaining the survey in the XLS2XForm spreadsheet by themselves.

ODK Software

We had problems with ODK Collect crashing many times while enumerators were entering data. This always resulted in loss of the completed survey, unlike paper forms. We debugged and patched several bugs in Collect in the field. This required us to have an Android and ODK development environment already set up, because there was no way to download that software in Mufulira. At least we were able to fix the bugs, as this was open source software; some other products would have left us powerless.

Enumerators reported that Collect would sometimes register a different response than intended. Perhaps touching the screen in the wrong place, between questions, or an unclear/lazy touch, might activate the wrong answer. I noticed several times that Collect did not register an answer at all, and I found myself repeatedly jabbing at the screen in frustration. I put this down to hardware problems with the tablets.

For some questions, a grid with three columns (Question, Yes and No checkboxes) would have been a faster way to enter data than repeatedly touching an answer and then dragging left to flip pages. This was not a component that we had available to us in Collect.

Data Analysis

We had intended to load the survey output into Microsoft Access for data processing and storage for future use. However we were not able to make the data load into Access successfully. ODK Briefcase outputs CSV files with field names (column headings) too long for Access to import. Importing 20 separate CSV files into Access was very painful. I strongly recommend building much better integration between ODK and Access in future. We have an open question as to whether Access is the best data storage and management solution, but it would fit the needs and fit within the comfort zone of the project members who would be carrying this process forward in future.

We identified the need to check that we could import multiple spreadsheets into SPSS during the development phase, but we ran out of time and did not actually complete this task. When we got around to processing the data, we discovered that it was not possible, which seriously disrupted our assumptions and plans for the data analysis.

In addition, our backup plan to process the data and produce graphs using Google Fusion Tables, also failed because Aggregate was unable to export the data successfully to Fusion Tables, and editing (cleaning) the data online turned out to be too difficult (complex, inconvenient and awkward).

Positive Outcomes

All the enumerators reported they would prefer to use tablets than paper forms in future.

We also realised that interviewing people can create social change by encouraging them to think in new directions, and question previous assumptions about how things worked. Being listened to is also empowering.

One of our enumerators took photos of the people he interviewed, using the tablet camera, without even being asked! Well done that man. I think the photos were one of the most valuable outputs from the project, more than the data collected. Politicians, all of us can relate to photos, they tell a much more powerful and personal story than numbers.

I would really like to see technology used to empower local people to reach out to their politicians and hold them to account.

May 12, 2012, 9:38 a.m. - GOwin

Thanks for sharing this information. Looking forward to reading the lessons learned section soon. Could you elaborate on what you meant by "Yet again I wish there was a sufficiently powerful open source alternative" when you explained your use of Excel? There's Libre Office Calc and Gnumeric which could easily handle CSVs as well as Excel. And there's SQLite (implemented using C) or Libre Office Base's Java implementation: hsqldb/hypersqldb if you want a *proper* database, even on a desktop. Would love to hear more about your use case and why these alternatives are not sufficient?

May 12, 2012, 11:22 a.m. - Chris Wilson

Hi GOwin, thanks for commenting! I have written the "lessons learned" section now. I was talking about Access, not Excel, when asking for open source alternatives. I have updated the blog post to make that clearer. I know there are good open source SQL databases, but I don't consider any available front-end application good enough to allow non-developers to maintain their own database, in the same way that Access does. Do you have a suggestion that has a nice GUI suitable for Office and SPSS users to be able to manage themselves?

May 19, 2012, 6:23 a.m. - GOwin

Hi Chris. Thanks for updating your site. Again, I mentioned there's Base in LibreOffice which is a direct replacement for Access. Of particular interest is the sub-section on ODK software re crashes and registering a different answer to the one selected. Has this severely affected the accuracy of your data? How was this rectified and at what point? SPSS is *expensive*. I've tried R before, and the learning curve is steep (started from scratch, no previous experience in SPSS) but has recently discovered SOFA: Statistics Open For All, a multi-platform and open source statistical package promoted as a potential replacement for SPSS (depends on use case). Check the features at their website: http://www.sofastatistics.com/features.php

May 21, 2012, 2:18 p.m. - Chris Wilson

Hi GOwin, I believe that Base is nothing like as full-featured as Access. I remember trying it a few years ago and struggling to build even the most simple application. Many things that should work do not, and I think programmability is severely lacking compared to Access. I don't have time to do a feature-for-feature comparison, but I will quote from the discussion at http://user.services.openoffice.org/en/forum/viewtopic.php?t=14060: Access can create single-file databases from scratch by using the MS JET engine. Base pretends to do the same thing when it wraps HSQLDB into a zip-archive. (that means that the performance is abysmal). Base is hardly more than a bridge to import data sources regardless of file formats into office documents. Reporting is based on the Writer component. All input forms are attached to office documents. Access has a strong focus on ease-of-use for the database developer. This is almost non-existent in Base. When it comes to developing a new database or input forms for existing databases, Base is best used by some SQL-literable developer able to avoid all of it's graphical tools. OOo-Base does not have the funcionality that you require, such as interlinking tables and forms as per your description and particularly calculating fields with automatic updating of the corresponding table cell. Even calculating between form fields is not as easy as in FM (if it can be done at all); I guess you'd need to write a macro for that. I know, maybe these functions are there, but even after looking around, I did not find them... I came to this thread looking for information on the possibility of importing/converting FileMaker databases into/to OOo-Base... so far I've found nothing directly useable and have come to the conclusion better to stay with FM. Obviously Base is a lightweight tool compared with Access and others, but for my purposes it's perfect. The developers did not add too many useful features since version 2.0. ODK crashes didn't appear to cause data accuracy problems for us, but only because the enumerators were sufficiently patient to complete the survey all over again when it did happen. I don't know about registering incorrect answers, as I haven't been involved with the data cleaning that wasn't supposed to be necessary. I did patch some crashing bugs in the field, and reported the problems and fixes to the ODK developers, so they should be fixed in future release versions. Thanks for the recommendation of SOFA, I will have to try that out. I really dislike SPSS, and yes it is expensive and no replacement for a real database, queries and reports.

May 25, 2012, 4:47 a.m. - GOwin

Hi Chris. I'm not sure what you mean by "full-featured" but this side-by-side comparison here doesn't lead to that conclusion: http://database-management-systems.findthebest.com/compare/15-24/HSQLDB-vs-Access I strongly recommend LO/OOo in orgs where piracy is an issue (often without the money to go legit) and there are local Linux/OSS champions who can support them through the transition. And in cases where we are starting database training from scratch, the learning curve is the same as Access, and in most cases where the features that are needed are there. This of course is not a recommendation against Access. Whatever will work out best for the client's needs gets the vote, and not always my personal software philosophy. I look forward to reading your experience about SOFA. Cheers

May 25, 2012, 11:08 a.m. - Chris Wilson

Hi GOwin, I'm afraid that you are not comparing apples with apples. I have no issue with HSQLdb's feature set, however it is NOT access. The largest difference is summed up in that single feature: "X GUI" (no GUI). Access allows non-experts to manage and maintain their data. HSQLDB does not, since it requires an in-depth knowledge of SQL. Open/LibreOffice Base, for all its merits, is lacking many useful features compared to Access. I do not argue against the benefits of using free/libre software, but please try to build a real (not toy) application in Access and LibreOffice side by side and you'll discover for yourself. I don't really want to debate this much more on the blog, since it feels that we're arguing at cross purposes. Cheers, Chris.