ODK Aggregate supports three primary mechanisms for data transfer:
- Exporting - one time snapshot
- Publishing - continuous streaming of submissions
- APIs - programmatic access to the data
Since ODK supports complex data structures such as question grouping, repeating questions, and multimedia, compromises have been made for each mechanism in regards to these data structures. This document explains each mechanism and what each supports.
The document also touches on other important aspects of data transfer:
- Accessing Media - how to enable access to media via URLs
The easiest way to get data from Aggregate is by using its ‘Export’ feature. Export allows the user to manually export all of the data (or filtered data) at any time into one of the following formats:
CSV files are a text only, tabular representation of the data. Multimedia files are represented by including standard web links to the files. Repeats are also represented with links to the underlying data. Grouping information is not preserved. Metadata is only preserved if a filter is created with the metadata.
To download CSV files of forms with repeats, consider using ODK Briefcase instead of ODK Aggregate's export functionality. Instead of repeats that are represented with links to the underlying data, ODK Briefcase will export a set of CSV files, one for each repeating group. ODK Briefcase will also export any multimedia as files (e.g., pictures will get exported as JPEGs). The only metadata that is preserved is the submission date in the CSV is preserved, but the XML files that ODK Briefcase downloads has all form metadata.
Note: To import CSVs into Excel, you cannot download and open in one step; nor can you double-click on the CSV. You must open Excel and choose Import. If you are asked, the file origin or encoding is UTF-8.
JSON is a text only representation of the data in a key:value format. Multimedia files are represented by including standard web links to the files. Repeating data is preserved, but grouping information not related to repeats is not. Metadata is only preserved if a filter is created with the metadata.
KML is a text only representation of the data similar to XML, but used for mapping applications like Google Earth. Multimedia files are represented by including standard web links to the files hosted on the server. Pictures will appear embedded in popup windows (when pushpins are clicked) in programs that render .kml files. Repeats and grouping information is not preserved. Metadata is only preserved if a filter is created with the metadata.
|CSV (Aggregate)||not preserved||links to||links to||preservable|
|CSV (Briefcase)||not preserved||split into separate CSVs||exported as files||only submission date|
|JSON||not preserved||preserved||links to||preservable|
|KML||not preserved||not preserved||links to||preservable|
Aggregate provides mechanism for either bulk publishing data to another service, or for streaming incoming data to a service as it is received, or both. Aggregate currently supports publishing data to the following services or in the following formats:
Please note that under certain failure conditions, the downstream service can receive multiple copies of a given submission. This is known, expected, behavior. It is your responsibility to detect and eliminate these duplicates should they occur (they will always have exactly the same information in all fields). Duplicates typically occur if the downstream service is slow to respond or acknowledge a request. If ODK Aggregate aborts its publishing attempt before it gets an acknowledgement, it will re-send the submission a short time later. If the downstream server successfully processed the first request, the re-send of that same submission can cause a duplicate record of it to appear in the downstream system. The instance ID of the submission will appear as the metainstanceid column in Google Sheets and as the *meta-instance-id* column in Google Fusion tables.
When the downstream server fails to respond or responds with an error code, ODK Aggregate first delays a re-send for 60 seconds, and, if that also fails, it then backs off its publishing attempts, and will re-send at either 7.5-minute or 15-minute intervals until the downstream service successfully responds, after which ODK Aggregate will resume its normal publishing behavior.
Fusion Tables is an experimental data visualization web application to gather, visualize, and share larger data tables. Multimedia files are represented by including standard web links to the files. Repeating groups of questions are split into separate tables. All metadata is preserved. Fusion Tables is hosted in Google’s cloud infrastructure. We know of no row limit, but one may exist. Fusion Tables also has an API that can be used to export/publish data.
Non-repeating grouping information is not preserved! This implies that if your form has a non-repeating group "purchaser" and a second non-repeating group "supplier" and within each of these groups, you have a "name" field, then when you publish to Fusion Tables, there will be two columns called "name." Unfortunately, that badly confuses Fusion Tables, and generally results in data not transferring successfully into Fusion Tables. The solution is to design your forms with field names like "purchaser_name" and "supplier_name"
Spreadsheets is Google’s cloud-hosted spreadsheet solution. Multimedia files are represented by including standard web links to the files. Repeats are also represented with links to the underlying data. All metadata is preserved. Spreadsheets has a 400,000 cell limit, above which errors will be reported. Spreadsheets also has an API that can be used to export/publish data.
Non-repeating grouping information is not preserved! This implies that if your form has a non-repeating group "purchaser" and a second non-repeating group "supplier" and within each of these groups, you have a "name" field, then when you publish to Google Spreadsheets, there will be two columns called "name." Unfortunately, that badly confuses Google Spreadsheets, and generally results in data not transferring successfully into Google Spreadsheets. The solution is to design your forms with field names like "purchaser_name" and "supplier_name"
JSON preserves grouping and repeat structures. The user can choose to let multimedia files be represented as web links or embedded as base64 encoded strings. All metadata is preserved.
Maps Engine is a cloud-hosted service to easily create and share maps. Multimedia files are represented by including standard web links to the files. Grouping information is preserved in the variable names. Repeating groups are forbidden and submissions must have at least one geopoint. Maps Engine also has an API that can be used to export/publish data.
RedCap servers ignore grouping and repeating information. Multimedia files are sent as binary form elements in POST (currently broken, see note in information page).
Ohmag JSON preserves grouping and repeat structures. Multimedia files are sent as binary form elements in POST.
|Google Fusion Tables||not preserved||split into separate tables||links to||preserved|
|Google Spreadsheets||not preserved||links to||links to||preserved|
|JSON server||preserved||preserved||links to or embedded as base64||preserved|
|Google Maps Engine||preserved||forbidden||links to||not preserved|
|RedCap Server||not preserved||not preserved||binary elements in POST|
|Ohmag JSON Server||preserved||preserved||binary elements in POST|
ODK has public APIs defined for sending data to and from its various components. This section gives references to those API implementations that can be used for connecting new applications to the ODK ecosystem.
ODK Briefcase 1.4.4 introduces a new Command Line Interface (CLI) to automate downloading forms from ODK Aggregate (or ODK Collect) and exporting the forms to an Excel-compatible format like CSV.
To run the CLI, go to your command line (sometimes known as the Terminal), and run the following.
java -jar ODK\ Briefcase\ v1.4.4\ Production.jar --help;
ODK Briefcase uses an API that external applications can use to pull all data from and push data to ODK Aggregate. The API is documented at Briefcase Aggregate API.
Collect and Aggregate define APIs for pulling form information from Aggregate to Collect, and for sending collected form data from Collect to Aggregate.
If you are running on a Tomcat server (not Google App Engine), you have access to the underlying MySQL or PostgreSQL tables. The structure of those tables are documented at Aggregate Database Structure.
Note this is a fragile way to pull data from Aggregate because table structure could change between versions. Moreover, changing any of this data could corrupt your Aggregate install.
Many of the export and publishing options provide a URL to the media (image, audio or video) without providing the content itself. To enable the viewing (following) of this link without requiring a log-in:
- Go to the 'Site Admin / Permissions' sub-tab
- Check the checkbox for:
"Allow anonymous retrieval of images, audio and video data (needed for GoogleEarth balloon displays)"
- Click the 'Save Changes' button
This checkbox is below the list of users and permissions, but above the 'Save Changes' button
(and remember to click that button after you check the checkbox).
This allows anyone to view the media files on your server. Even though you are granting anyone access to this information, it is still quite secure because the users would need to have a valid URL.