Data Cleaning
Typically with surveys, data will be entered incorrectly by interviewers or issues with recipes are discovered only after interviews had been conducted. Hence it is likely that respondent and consumption data will need correction as a post processing step.
Dita implements an interview-set transformer, that corrects interview data based on uploaded correction files. Those correction files have following structure:
respondents:
- alias: "EB_9999"
#...
# and so on ...
foodByName:
- name: "Maiswaffeln"
sid: "at.gd/2.0:food/02423"
# and so on ...
composites:
- coordinates:
#...
rename: # optional: new name or empty or omitted
deletions:
#...
additions:
#...
# and so on ...
Respondent Correction
Allows to correct defect data entered during interview such as the Respondent-Id, Sex or Date of Birth. Also allows to remove interviews from respondents, that later wished to withdraw from the survey.
Some examples:
respondents:
- alias: "EB9999"
newAlias: "EB_9999"
respondents:
- alias: "EB_9999"
dateOfBirth: 1999-11-22
sex: MALE
respondents:
- alias: "EB_9999"
withdraw: true
Consumption Correction
Food
Correcting food consumptions, that have no identifier, only a name.
foodByName:
- name: "Maiswaffeln"
sid: "at.gd/2.0:food/02423"
Composite
Correction of composite consumptions supports 3 basic changes:
-
Renaming of the composite consumption entry
-
ADD Ingredient:
-
requires identifier (
sid
) of food to add -
requires
amountGrams
of food to add -
requires
facets
of food to add
-
-
DELETE Ingredient:
-
requires identifier (sid) of food to remove
-
After those changes are applied, all the ingredient amounts are recalculated such that the composite’s total amount consumed stays the same (as compared to before the correction).
composites:
- coordinates:
sid: "at.gd/2.0:recp/00514"
respondentId: "EB_9999"
interviewOrdinal: 1
mealHourOfDay: "13:00:00"
source: "wave1/Interview-12345.xml"
deletions:
# DELETE food/02280 Fond, Fleisch {assocRecp=465} 413.56g (82.71%)
- sid: "at.gd/2.0:food/02280"
additions:
# ADD food/01399 Wasser, Leitung 302,72g
- sid: "at.gd/2.0:food/01399"
amountGrams: 302.72
facets: ""
# ADD food/01581 Streuwürze 6,05g
- sid: "at.gd/2.0:food/01581"
amountGrams: 6.05
facets: ""
composites:
- coordinates:
sid: "at.gd/2.0:recp/00514"
respondentId: "EB_9999"
interviewOrdinal: 1
mealHourOfDay: "13:00:00"
source: "wave1/Interview-12345.xml"
rename: "New Name"
deletions: []
additions: []
Consumption Identification
Consumption entries have no identifier per-se, so we use multiple coordinates to narrow down specific entries:
-
sid
: SemanticIdentifier of the recipe in question -
respondentId
-
interviewOrdinal
-
mealHourOfDay
-
source
: path of the interview source file in question
Special care needs to be taken when uploading new interview data, as this may render those coordinates invalid. It may also render any of the above corrections invalid! |
Working with multiple Correction Files
Multiple correction files can be uploaded each representing a Correction24
data structure.
Dita automatically collects these into a single Correction24
object for interview data post processing.
Here are some templates:
respondents:
- alias: "EB_9999"
#...
# and so on ...
foodByName: []
composites: []
respondents: []
foodByName: []
composites:
- coordinates:
#...
rename: # optional: new name or empty or omitted
deletions:
#...
additions:
#...
# and so on ...