Data Transfer

ODK Aggregate supports three primary mechanisms for data transfer:

  • Exporting - one time snapshot
  • Publishing - continuous streaming of submissions
  • APIs - programmatic access to the data

Since ODK supports complex data structures such as question grouping, repeating questions, and multimedia, compromises have been made for each mechanism in regards to these data structures. This document explains each mechanism and what each supports.

The document also touches on other important aspects of data transfer:

Exporting data

The easiest way to get data from Aggregate is by using its ‘Export’ feature. Export allows the user to manually export all of the data (or filtered data) at any time into one of the following formats:

CSV (comma-separated values)

CSV files are a text only, tabular representation of the data. Multimedia files are represented by including standard web links to the files. Repeats are also represented with links to the underlying data. Grouping information is not preserved. Metadata is only preserved if a filter is created with the metadata.

To download CSV files of forms with repeats, consider using ODK Briefcase instead of ODK Aggregate's export functionality. Instead of repeats that are represented with links to the underlying data, ODK Briefcase will export a set of CSV files, one for each repeating group. ODK Briefcase will also export any multimedia as files (e.g., pictures will get exported as JPEGs). The only metadata that is preserved is the submission date in the CSV is preserved, but the XML files that ODK Briefcase downloads has all form metadata.

Note: To import CSVs into Excel, you cannot download and open in one step; nor can you double-click on the CSV. You must open Excel and choose Import. If you are asked, the file origin or encoding is UTF-8.

JSON

JSON is a text only representation of the data in a key:value format. Multimedia files are represented by including standard web links to the files. Repeating data is preserved, but grouping information not related to repeats is not. Metadata is only preserved if a filter is created with the metadata.

KML (Keyhole Markup Language)

KML is a text only representation of the data similar to XML, but used for mapping applications like Google Earth. Multimedia files are represented by including standard web links to the files hosted on the server. Pictures will appear embedded in popup windows (when pushpins are clicked) in programs that render .kml files. Repeats and grouping information is not preserved. Metadata is only preserved if a filter is created with the metadata.

Export data summary

Format Groups Repeats Multimedia Metadata
CSV (Aggregate) not preserved links to links to preservable
CSV (Briefcase) not preserved split into separate CSVs exported as files only submission date
JSON not preserved preserved links to preservable
KML not preserved not preserved links to preservable

Publishing data

Aggregate provides mechanism for either bulk publishing data to another service, or for streaming incoming data to a service as it is received, or both. Aggregate currently supports publishing data to the following services or in the following formats:

Please note that under certain failure conditions, the downstream service can receive multiple copies of a given submission. This is known, expected, behavior. It is your responsibility to detect and eliminate these duplicates should they occur (they will always have exactly the same information in all fields). Duplicates typically occur if the downstream service is slow to respond or acknowledge a request. If ODK Aggregate aborts its publishing attempt before it gets an acknowledgement, it will re-send the submission a short time later. If the downstream server successfully processed the first request, the re-send of that same submission can cause a duplicate record of it to appear in the downstream system. The instance ID of the submission will appear as the metainstanceid column in Google Sheets and as the *meta-instance-id* column in Google Fusion tables.

When the downstream server fails to respond or responds with an error code, ODK Aggregate first delays a re-send for 60 seconds, and, if that also fails, it then backs off its publishing attempts, and will re-send at either 7.5-minute or 15-minute intervals until the downstream service successfully responds, after which ODK Aggregate will resume its normal publishing behavior.

Google Fusion Tables

Fusion Tables is an experimental data visualization web application to gather, visualize, and share larger data tables. Multimedia files are represented by including standard web links to the files. Repeating groups of questions are split into separate tables. All metadata is preserved. Fusion Tables is hosted in Google’s cloud infrastructure. We know of no row limit, but one may exist. Fusion Tables also has an API that can be used to export/publish data.

Non-repeating grouping information is not preserved! This implies that if your form has a non-repeating group "purchaser" and a second non-repeating group "supplier" and within each of these groups, you have a "name" field, then when you publish to Fusion Tables, there will be two columns called "name." Unfortunately, that badly confuses Fusion Tables, and generally results in data not transferring successfully into Fusion Tables. The solution is to design your forms with field names like "purchaser_name" and "supplier_name"

Google Spreadsheets

Spreadsheets is Google’s cloud-hosted spreadsheet solution. Multimedia files are represented by including standard web links to the files. Repeats are also represented with links to the underlying data. All metadata is preserved. Spreadsheets has a 400,000 cell limit, above which errors will be reported. Spreadsheets also has an API that can be used to export/publish data.

Non-repeating grouping information is not preserved! This implies that if your form has a non-repeating group "purchaser" and a second non-repeating group "supplier" and within each of these groups, you have a "name" field, then when you publish to Google Spreadsheets, there will be two columns called "name." Unfortunately, that badly confuses Google Spreadsheets, and generally results in data not transferring successfully into Google Spreadsheets. The solution is to design your forms with field names like "purchaser_name" and "supplier_name"

JSON server

JSON preserves grouping and repeat structures. The user can choose to let multimedia files be represented as web links or embedded as base64 encoded strings. All metadata is preserved.

More at Aggregate Publishers Implementation Details

Google Maps Engine

Maps Engine is a cloud-hosted service to easily create and share maps. Multimedia files are represented by including standard web links to the files. Grouping information is preserved in the variable names. Repeating groups are forbidden and submissions must have at least one geopoint. Maps Engine also has an API that can be used to export/publish data.

More at Google MapsEngine Integration Instructions

RedCap Server

RedCap servers ignore grouping and repeating information. Multimedia files are sent as binary form elements in POST (currently broken, see note in information page).

More at Aggregate Publishers Implementation Details

Ohmag JSON server

Ohmag JSON preserves grouping and repeat structures. Multimedia files are sent as binary form elements in POST.

More at Aggregate Publishers Implementation Details

Publish data summary

Service Groups Repeats Multimedia Metadata
Google Fusion Tables not preserved split into separate tables links to preserved
Google Spreadsheets not preserved links to links to preserved
JSON server preserved preserved links to or embedded as base64 preserved
Google Maps Engine preserved forbidden links to not preserved
RedCap Server not preserved not preserved binary elements in POST
Ohmag JSON Server preserved preserved binary elements in POST

APIs

ODK has public APIs defined for sending data to and from its various components. This section gives references to those API implementations that can be used for connecting new applications to the ODK ecosystem.

Briefcase CLI

ODK Briefcase 1.4.4 introduces a new Command Line Interface (CLI) to automate downloading forms from ODK Aggregate (or ODK Collect) and exporting the forms to an Excel-compatible format like CSV.

To run the CLI, go to your command line (sometimes known as the Terminal), and run the following.

java -jar ODK\ Briefcase\ v1.4.4\ Production.jar --help;

Briefcase-Aggregate API

ODK Briefcase uses an API that external applications can use to pull all data from and push data to ODK Aggregate. The API is documented at Briefcase Aggregate API.

Collect-Aggregate API

Collect and Aggregate define APIs for pulling form information from Aggregate to Collect, and for sending collected form data from Collect to Aggregate.

https://bitbucket.org/javarosa/javarosa/wiki/OpenRosaAPI

https://bitbucket.org/javarosa/javarosa/wiki/FormListAPI

https://bitbucket.org/javarosa/javarosa/wiki/FormSubmissionAPI

Direct database connection

If you are running on a Tomcat server (not Google App Engine), you have access to the underlying MySQL or PostgreSQL tables. The structure of those tables are documented at Aggregate Database Structure.

Note this is a fragile way to pull data from Aggregate because table structure could change between versions. Moreover, changing any of this data could corrupt your Aggregate install.

Accessing Media

Many of the export and publishing options provide a URL to the media (image, audio or video) without providing the content itself. To enable the viewing (following) of this link without requiring a log-in:

  1. Go to the 'Site Admin / Permissions' sub-tab
  2. Check the checkbox for:
    "Allow anonymous retrieval of images, audio and video data (needed for GoogleEarth balloon displays)"
  3. Click the 'Save Changes' button

This checkbox is below the list of users and permissions, but above the 'Save Changes' button
(and remember to click that button after you check the checkbox).

This allows anyone to view the media files on your server. Even though you are granting anyone access to this information, it is still quite secure because the users would need to have a valid URL.