Data Preloading

Beginning with ODK Collect 1.4.3, you can pre-load data into a new submission.

If you have pre-existing data from a baseline survey, a listing survey, or any other source, you may want to reference that data in your survey form. For example, you might already have data on a household's agricultural plots when you go to collect follow-up data about those plots.

When you want to reference pre-existing data in a survey form, the first step is to upload one or more .csv files as support files when you upload your form definition (the same way you upload media support files in the Form Upload window). The first row of each .csv file should be a header that includes short, unique names for each column; subsequent rows should contain the data itself.

Columns in your .csv files must be separated by commas. Depending on your regional settings, Excel may save .csv files with semi-colons or other characters between columns, instead of commas; if that's the case, you will need to manually replace the column separators with commas, or save your data using a different program.

Each .csv file should contain at least one column that can be used to uniquely identify each row. Such columns will be used, at survey time, to look up which row's data to pull into the survey. For the columns that will be used for looking up rows, you should add "_key" to the end of the column name in the first row (as in "hhid_key"). Any columns with names ending in "_key" will be indexed for faster look-ups on your survey devices.

Once you have included one or more .csv files with your form, you can pull data from those .csv files at survey time. For each data field that you want to pull into your survey, you add a calculate field to your survey. You give that field a name, then in its calculation column, you call the pulldata() function, indicating which field to pull from which row of which .csv file. Here is an example:

pulldata('hhplotdata', 'plot1size', 'hhid_key', ${hhid})

This example will pull the value from the plot1size column of the hhplotdata.csv file, using the form's hhid field to identify the appropriate row in the .csv file's hhid_key column.

Once you have loaded .csv data into a survey field using the pulldata() function, you can reference that field in later relevance conditions, constraints, and labels, just as you would reference any other field that was filled in by the user.

For an example, see the pre-loading sample form here.

There are just four additional notes on usage of pre-loaded data:

1. If your .csv file is very large, you can compress it into a .zip archive before uploading it. Just compress one or more of your form's support files into a .zip file, and upload that .zip file as you would normally upload the .csv or other support files. When your form is loaded in Collect, the support files will be automatically unzipped.

2. If your pre-loaded data contains non-English fonts or special characters, you will need to save your .csv file in UTF-8 format in order for your Android device to render the text correctly. If you cannot directly save or export your .csv file in UTF-8 format, you can use ODK Briefcase to re-encode it; just choose Re-encode .csv file from the Tools menu, select your file and the current encoding for which your file's text appears correctly in the preview window, and then save the re-encoded .csv. UTF-8-encoded .csv text should appear properly on your Android device, provided that your device supports the font or language that you're using.

3. Even when they are numbers, data fields pulled from a .csv file are considered to be text strings. Thus, you may sometimes need to use the int() or number() functions to convert a pre-loaded field into numeric form. For example, it is necessary to use the int() function to convert a pre-loaded field into an integer before using it as a repeat_count (as in "int(${nplots})"); this is because the repeat_count column always expects an integer, not a string.

4. If your .csv files contain very sensitive data, you may not want to upload that data to your server. Instead, you can upload a blank .csv file as part of your form, then replace it with the real .csv file by hand-copying that file onto each of your devices. Once you've downloaded your form onto each device, just locate the /odk/forms directory on that device's storage, then, within that directory, locate the media subdirectory associated with your form; into that media subdirectory, you can hand-copy your .csv file. That way, sensitive pre-loaded data never passes through the server.

Finally, you can also use pre-loaded .csv data to dynamically populate the choice lists for select_one and select_multiple fields. See the next help topic for details.

Dynamically-Loaded Multiple-Choice Data

How do I dynamically load multiple-choice options from pre-loaded data?

See the previous help topic for details on how to pre-load .csv data with your survey forms. Once your form has one or more pre-loaded .csv files, you can dynamically pull the choice lists for select_one and select_multiple fields from those .csv files.

Multiple-choice fields with dynamic choice lists follow the same general syntax as regular, static select_one and select_multiple fields: you specify "select_one listname" or "select_multiple listname" in the type column (where "listname" is the name of your choice list), you specify any special appearance styles in the appearance column, and you include one or more rows for your listname on the choices worksheet. There are just three differences when the choice list should be pulled from one of your pre-loaded .csv files:

1. In the appearance column, you should include a search() expression that specifies which .csv rows to include in the choice list. If the field should use a non-default appearance style, that goes into the appearance column first, followed by a space, followed by the search() expression (as in "quick search()"). Syntax for the search() expression itself is included below.

2. On the choices worksheet, a row should indicate which .csv columns to use for the label and selected value. In the list_name column, specify the name of your choice list as you normally would. In the name column, however, include the name of the .csv column to use for uniquely identifying selected choices (this is normally an ID column of some sort). In the label column, include the name of the .csv column to use for labeling the choices; if you wish to include multiple columns in the labels, just include a comma-separated list of all columns to include. In essence, the name column will be dynamically populated based on the column name you put there, and the label column will be dynamically populated based on the column name(s) you put there.

3. In your choices worksheet row, you may also include a .csv column name in the image column. If you do, the image filename to use will be pulled from the specified .csv column. If you refer to image files in this way, you must always upload those image files as media file attachments when you upload your form to the server.

For the search() expression itself, there are a series of options to indicate which .csv rows to include in the choice list:

1. "search(csvName)": The single-parameter search expression includes all distinct rows as choices (e.g., "search('hhplotdata')"). All rows in the specified .csv file will be considered as choices, but only distinct rows -- those with unique selection values -- will be listed for the user. In other words, duplicates will be automatically filtered from the list shown to users.

2. "search(csvName, 'contains', columnsToSearch, searchText)": This search expression includes all distinct rows that contain the specified text in the specified column(s) (e.g., "search('hhplotdata', 'contains', 'respondentname', ${nametofind})"). The third parameter specifies either a single column name to search, or a comma-separated list of column names to search. Rows with matches in any specified column will be included.

3. "search(csvName, 'startswith', columnsToSearch, searchText)": This search expression includes all distinct rows that start with the specified text in the specified column(s) (e.g., "search('hhplotdata', 'startswith', 'respondentname', ${nameprefix})"). The third parameter specifies either a single column name to search, or a comma-separated list of column names to search. Rows with matches in any specified column will be included.

4. "search(csvName, 'endswith', columnsToSearch, searchText)": This search expression includes all distinct rows that end with the specified text in the specified column(s) (e.g., "search('hhplotdata', 'endswith', 'respondentname', ${namesuffix})"). The third parameter specifies either a single column name to search, or a comma-separated list of column names to search. Rows with matches in any specified column will be included.

5. "search(csvName, 'matches', columnsToSearch, searchText)": This search expression includes all distinct rows that exactly contain the specified text in the specified column(s) (e.g., "search('hhplotdata', 'matches', 'respondentname', ${nametofind})"). The third parameter specifies either a single column name to search, or a comma-separated list of column names to search. Rows with exact matches in any specified column will be included.

6. "search(csvName, searchType, columnsToSearch, searchText, columnToFilter, filterText)": Finally, any of the four search types above can be further filtered to only include a subset of .csv data. Simply add two extra parameters to any of the search types above, with the first extra parameter being the column name to filter and the second extra parameter being the exact value to filter. For whichever search is specified in the first four parameters, only rows exactly containing the sixth parameter value in the column named by the fifth parameter will be included (e.g., "search('hhplotdata', 'contains', 'respondentname', ${nametofind}, 'villageid', ${villageid})" to list all matching names within a particular village).

For an example, see the search-and-select sample form here.

There are just two additional notes on usage:

1. Choices will be ordered, by default, in the order that they appear in your .csv file. If you want to specify a different ordering, include a numeric column in your .csv file named sortby; choices will be ordered numerically, according to the sortby column (if present).

2. You can include one or more static choice options, in addition to the dynamic ones loaded from your .csv file. Simply include static choices, as you normally would, on the choices worksheet. These can appear before and/or after the row that indicates the columns to use for your dynamic choices. The one restriction is that the values you specify for your static choices in the name column must be numeric.