Chapter 4 Data cleaning

4.1 VHIO cohort

Questionnaires were completed manually by patients enrolled and then registered in a REDCap database. Data from REDCap were exported as an R compatible format and imported here.
Information was divided into three datasets: demographics, pre genomic testing, and post genomic testing.

Demographics´ dataset is imported from a csv file.

Then, the pre-genomic testing dataset is imported using an RDS file format from REDCap. This file was built by combining row data and labels from the database and contains values and attributes to improve how the questions are shown. The first 66 variables are codified with integers and they have the labels as attributes. The second set of variables (from 67 to 129) are the same variables but codified as factors with their labels as the values. In this second part, the variable “pre_conocim_q12_g” is not present (it describes the “other” source of information from question 12).

Similarly, the post-genomic testing dataset is imported using the same file format (RDS). Variables from 5 to 36 are the questions codified with integers with labels as attributes, and variables from 37 to the end, 67, are the same items but codified as factors with their labels as the values. In this second part, the post_exp_cumpl_q2 item is not included.

4.2 HOPE cohort

Questionnaires were completed online by patients enrolled. The pre-genomic testing dataset is imported using an RDS file format. The first 66 variables are codified with integers and they have the labels as attributes. The second set of variables (from 67 to 129) are the same variables but codified as factors with their labels as the values. The dataset is split into two, one for demographics and the other for expectations.