Chapter 12 Factor analysis
At this stage, the main objective is to perform an exploratory factor analysis focusing on previous results. The pre-genomic test data analysis allows to identify items that are less relevant and which are the most informative ones, questions with adequate behavior, and how they group. The descriptive analysis of the post-genomic test questionnaire highlighted several common points with the pre-genomic test information, detecting a few conflicting items and the utility of those selected in the previous step. Now, having identified from the pre-genomic test exploratory factor analysis a few key approaches, the aim is to replicate only these factor analyses including the set of items that had been chosen to explore and define how they group in this setting.
12.1 Expectations and concerns domain
One approach is considered including the final set of expectations and concerns items. In this regard, expectations questions include Q2, Q5, Q5, Q7.i, Q7.iv, Q7.v, and all the concerns items. This approach had the best performance in both cohorts, however the matching between VHIO and HOPE was not complete.
12.1.1 Excluding items Q1, Q4, Q7_ii, Q7_iii and Q3.
According to what was defined previously, Q1, Q4, Q7.ii were conflicting, overlapped (Q1 with Q2), with no relevant information, and study dependent (Q7.ii). Then, Q7_iii and Q3 are collecting interesting data but probably they are not related with the other questions in a particular domain; thus, while these items will be included in the questionnaire, they could be excluded from the factor analysis.
The Barlett’s sphericity test is performed.
Thus, the p-value = 0 and the H0 is rejected confirming the utility of applying a factor analysis to this dataset.
Considering the fact that the Bartlett’s test usually rejects the H0 since the scenario of the null hypothesis is too extreme, the KMO analysis is studied to determine how well the data fit the factor analysis and how useful each item is.
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = post_test_Q_ExpConcern_facAn_4_corr)
## Overall MSA = 0.5
## MSA for each item =
## q2 q5 q6 q7_i q7_iv q7_v q8 q9 q10 q11
## 0.51 0.40 0.49 0.41 0.47 0.61 0.81 0.59 0.49 0.54
According to these results, there are five items below 0.5 (Q5, Q6, Q7.i, Q7.iv, Q10); plus three more below 0.6.
To explore the number of factors PCA is applied. Thus, the table with the PCA results is shown below:
PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 | PC10 | |
---|---|---|---|---|---|---|---|---|---|---|
Standard deviation | 1.7228 | 1.3391 | 1.2212 | 1.0590 | 0.8612 | 0.7740 | 0.6918 | 0.6848 | 0.5140 | 0.2711 |
Proportion of Variance | 0.2968 | 0.1793 | 0.1491 | 0.1121 | 0.0742 | 0.0599 | 0.0479 | 0.0469 | 0.0264 | 0.0073 |
Cumulative Proportion | 0.2968 | 0.4761 | 0.6252 | 0.7374 | 0.8116 | 0.8715 | 0.9193 | 0.9662 | 0.9926 | 1.0000 |
Then, the scree plot for this PCA analysis is displayed.
According to these results, 3 factors could be the best number with cumulative proportion 63%. However, the greatest elbow is observed on the second component.
Then, another approach is implemented. With this strategy several analysis are combined and depict in the same figure. The tools implemented are: the Kaiser rule (which drops the components with eigenvalues < 1), the parallel analysis, and the usual scree test (plotuScree), the acceleration factor (which indicates where the elbow of the scree plot appears).
Therefore, considering the current analysis four factors seem to be the best choice, however, in the pre-genomic test two factors were identified. Therefore, both approaches will be considered.
Running the factor analysis with 10 items and 4 factors, first the communalities are explored:
## Warning in cor.smooth(mat): Matrix was not positive definite, smoothing was
## done
## In factor.stats, I could not find the RMSEA upper bound . Sorry about that
## post_exp_preoc_q2 post_exp_preoc_q5 post_exp_preoc_q6
## 0.5019 0.8587 0.9950
## post_exp_preoc_q7_i post_exp_preoc_q7_iv post_exp_preoc_q7_v
## 0.9950 0.9950 0.6126
## post_exp_preoc_q8 post_exp_preoc_q9 post_exp_preoc_q10
## 0.4786 0.3207 0.9950
## post_exp_preoc_q11
## 0.3662
Considering the values of the communalities, all are above 0.3.
Then, the whole output is displayed.
## Factor Analysis using method = ml
## Call: fa(r = post_test_Q_ExpConcern_facAn_4, nfactors = 4, rotate = "oblimin",
## fm = "ml", cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML4 ML1 ML2 ML3 h2 u2 com
## post_exp_preoc_q2 0.67 0.50 0.4982 1.4
## post_exp_preoc_q5 0.96 0.86 0.1413 1.1
## post_exp_preoc_q6 0.82 1.00 0.0049 1.4
## post_exp_preoc_q7_i 0.98 1.00 0.0050 1.0
## post_exp_preoc_q7_iv 0.85 1.00 0.0049 1.3
## post_exp_preoc_q7_v 0.78 0.61 0.3872 1.2
## post_exp_preoc_q8 0.43 0.48 0.5214 3.1
## post_exp_preoc_q9 -0.43 0.32 0.6784 3.2
## post_exp_preoc_q10 0.94 1.00 0.0050 1.1
## post_exp_preoc_q11 0.58 0.37 0.6337 1.4
##
## ML4 ML1 ML2 ML3
## SS loadings 2.13 1.77 1.69 1.52
## Proportion Var 0.21 0.18 0.17 0.15
## Cumulative Var 0.21 0.39 0.56 0.71
## Proportion Explained 0.30 0.25 0.24 0.21
## Cumulative Proportion 0.30 0.55 0.79 1.00
##
## With factor correlations of
## ML4 ML1 ML2 ML3
## ML4 1.00 0.30 0.15 0.06
## ML1 0.30 1.00 0.11 0.30
## ML2 0.15 0.11 1.00 0.10
## ML3 0.06 0.30 0.10 1.00
##
## Mean item complexity = 1.6
## Test of the hypothesis that 4 factors are sufficient.
##
## df null model = 45 with the objective function = 26.18 with Chi Square = 650.2
## df of the model are 11 and the objective function was 19.85
##
## The root mean square of the residuals (RMSR) is 0.07
## The df corrected root mean square of the residuals is 0.15
##
## The harmonic n.obs is 30 with the empirical chi square 14.38 with prob < 0.21
## The total n.obs was 30 with Likelihood Chi Square = 440 with prob < 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000002
##
## Tucker Lewis Index of factoring reliability = -2.278
## RMSEA index = 1.14 and the 90 % confidence intervals are 1.068 NA
## BIC = 402.6
## Fit based upon off diagonal values = 0.95
## Measures of factor score adequacy
## ML4 ML1 ML2 ML3
## Correlation of (regression) scores with factors 1.00 1.00 1.00 1.00
## Multiple R square of scores with factors 0.99 0.99 0.99 0.99
## Minimum correlation of possible factor scores 0.98 0.98 0.99 0.99
There are 4 factors: * Q5, Q6, Q8, Q9. * Q7.iv, Q7.v. * Q2, Q7.i * Q10, Q11. Here, items are split in a different way than in the pre-genomic test, and even regarding the post-test from VHIO (where the number of factors was three).
Finally, plots show the relationship between the items and the factors.
Then, the factor analysis is performed with 2 factors. The communalities are explored:
## Warning in cor.smooth(mat): Matrix was not positive definite, smoothing was
## done
## post_exp_preoc_q2 post_exp_preoc_q5 post_exp_preoc_q6
## 0.14223 0.65429 0.99500
## post_exp_preoc_q7_i post_exp_preoc_q7_iv post_exp_preoc_q7_v
## 0.20531 0.99500 0.52082
## post_exp_preoc_q8 post_exp_preoc_q9 post_exp_preoc_q10
## 0.41338 0.21837 0.34485
## post_exp_preoc_q11
## 0.00913
Considering the values of the communalities, several items show lower communalities (Q2, Q7.i, Q9, and Q11).
Then, the whole output is displayed.
## Factor Analysis using method = ml
## Call: fa(r = post_test_Q_ExpConcern_facAn_4, nfactors = 2, rotate = "oblimin",
## fm = "ml", cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML1 ML2 h2 u2 com
## post_exp_preoc_q2 0.1433 0.8567 2.0
## post_exp_preoc_q5 0.84 0.6544 0.3456 1.0
## post_exp_preoc_q6 0.93 0.9951 0.0049 1.1
## post_exp_preoc_q7_i 0.49 0.2050 0.7950 1.2
## post_exp_preoc_q7_iv 0.92 0.9950 0.0050 1.1
## post_exp_preoc_q7_v 0.75 0.5208 0.4792 1.0
## post_exp_preoc_q8 0.57 0.4133 0.5867 1.1
## post_exp_preoc_q9 -0.50 0.2175 0.7825 1.5
## post_exp_preoc_q10 0.62 0.3455 0.6545 1.1
## post_exp_preoc_q11 0.0092 0.9908 1.0
##
## ML1 ML2
## SS loadings 2.49 2.01
## Proportion Var 0.25 0.20
## Cumulative Var 0.25 0.45
## Proportion Explained 0.55 0.45
## Cumulative Proportion 0.55 1.00
##
## With factor correlations of
## ML1 ML2
## ML1 1.00 0.38
## ML2 0.38 1.00
##
## Mean item complexity = 1.2
## Test of the hypothesis that 2 factors are sufficient.
##
## df null model = 45 with the objective function = 26.18 with Chi Square = 650.2
## df of the model are 26 and the objective function was 22.3
##
## The root mean square of the residuals (RMSR) is 0.16
## The df corrected root mean square of the residuals is 0.22
##
## The harmonic n.obs is 30 with the empirical chi square 73.5 with prob < 0.000002
## The total n.obs was 30 with Likelihood Chi Square = 524.1 with prob < 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000036
##
## Tucker Lewis Index of factoring reliability = -0.511
## RMSEA index = 0.798 and the 90 % confidence intervals are 0.753 0.874
## BIC = 435.6
## Fit based upon off diagonal values = 0.75
## Measures of factor score adequacy
## ML1 ML2
## Correlation of (regression) scores with factors 1.00 1.00
## Multiple R square of scores with factors 0.99 0.99
## Minimum correlation of possible factor scores 0.99 0.99
There are 2 factors: * Q5, Q6, Q7.i, Q8, Q9 * Q7.iv, Q7.v, Q10. Out: Q2, Q11 The items are split in a completely different pattern that what was seen in the pre-genomic test questionnaire.
Finally, plots show the relationship between the items and the factors.
12.2 Both domains, the expectations and concerns domain plus the attitudes domain
Now, two other approaches combining expectations and attitudes will be tested, the first one including only the more relevant attitudes items (Q2 and Q3), and the other one considering the four candidates selected from the attitude domain. In both, the same list of previously chosen expectations and concerns items will be used.
12.2.1 Including all the items selected for expectations plus Q2 and Q3 from attitudes.
The Barlett’s sphericity test is performed.
Thus, the p-value = 0 and the H0 is rejected confirming the utility of applying a factor analysis to this dataset.
Considering the fact that the Bartlett’s test usually rejects the H0 since the scenario of the null hypothesis is too extreme, the KMO analysis is studied to determine how well the data fit the factor analysis and how useful each item is.
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = post_test_Q_Exp_Attit_facAn_1_corr)
## Overall MSA = 0.39
## MSA for each item =
## q2 q5 q6 q7_i q7_iv q7_v
## 0.19 0.45 0.46 0.27 0.36 0.34
## q8 q9 q10 q11 exp_actit_q2 exp_actit_q3
## 0.55 0.40 0.32 0.56 0.64 0.34
According to these results, except two, the rest are below 0.5.
To explore the number of factors PCA is applied. Thus, the table with the PCA results is shown below:
PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 | PC10 | PC11 | PC12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Standard deviation | 1.7474 | 1.5259 | 1.2398 | 1.1929 | 1.0477 | 0.8210 | 0.7596 | 0.7050 | 0.6284 | 0.4864 | 0.3547 | 0.2348 |
Proportion of Variance | 0.2544 | 0.1940 | 0.1281 | 0.1186 | 0.0915 | 0.0562 | 0.0481 | 0.0414 | 0.0329 | 0.0197 | 0.0105 | 0.0046 |
Cumulative Proportion | 0.2544 | 0.4485 | 0.5766 | 0.6952 | 0.7866 | 0.8428 | 0.8909 | 0.9323 | 0.9652 | 0.9849 | 0.9954 | 1.0000 |
Then, the scree plot for this PCA analysis is displayed.
According to these results, 2 or 5 factors could be the best number according to two elbows and the cumulative proportion explained.
Then, another approach is implemented. With this strategy several analysis are combined and depict in the same figure. The tools implemented are: the Kaiser rule (which drops the components with eigenvalues < 1), the parallel analysis, and the usual scree test (plotuScree), the acceleration factor (which indicates where the elbow of the scree plot appears).
Therefore, considering two factors seem to be a possible approach.
Running the factor analysis with 12 items, first the communalities are explored:
## Warning in cor.smooth(mat): Matrix was not positive definite, smoothing was
## done
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In factor.stats, I could not find the RMSEA upper bound . Sorry about that
## post_exp_preoc_q2 post_exp_preoc_q5 post_exp_preoc_q6
## 0.1205 0.6567 0.9950
## post_exp_preoc_q7_i post_exp_preoc_q7_iv post_exp_preoc_q7_v
## 0.1640 0.4410 0.2459
## post_exp_preoc_q8 post_exp_preoc_q9 post_exp_preoc_q10
## 0.5131 0.1414 0.2286
## post_exp_preoc_q11 post_exp_actit_q2 post_exp_actit_q3
## 0.0311 0.8225 0.9121
Considering the values of the communalities, there some items with low values, such as Q2, Q7.i, Q7.v, Q9, Q10, and Q11.
Then, the whole output is displayed.
## Factor Analysis using method = ml
## Call: fa(r = post_test_Q_Exp_Attit_facAn_1, nfactors = 2, rotate = "oblimin",
## fm = "ml", cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML1 ML2 h2 u2 com
## post_exp_preoc_q2 0.12 0.8783 1.9
## post_exp_preoc_q5 0.82 0.66 0.3433 1.0
## post_exp_preoc_q6 0.99 1.00 0.0049 1.0
## post_exp_preoc_q7_i 0.16 0.8379 1.0
## post_exp_preoc_q7_iv 0.67 0.44 0.5589 1.1
## post_exp_preoc_q7_v 0.25 0.7544 2.0
## post_exp_preoc_q8 0.58 0.51 0.4867 1.6
## post_exp_preoc_q9 0.14 0.8604 1.1
## post_exp_preoc_q10 0.43 0.23 0.7713 1.6
## post_exp_preoc_q11 0.03 0.9699 1.4
## post_exp_actit_q2 0.81 0.82 0.1775 1.3
## post_exp_actit_q3 0.96 0.91 0.0879 1.0
##
## ML1 ML2
## SS loadings 3.09 2.18
## Proportion Var 0.26 0.18
## Cumulative Var 0.26 0.44
## Proportion Explained 0.59 0.41
## Cumulative Proportion 0.59 1.00
##
## With factor correlations of
## ML1 ML2
## ML1 1.00 -0.11
## ML2 -0.11 1.00
##
## Mean item complexity = 1.3
## Test of the hypothesis that 2 factors are sufficient.
##
## df null model = 66 with the objective function = 49.84 with Chi Square = 1204
## df of the model are 43 and the objective function was 45.04
##
## The root mean square of the residuals (RMSR) is 0.17
## The df corrected root mean square of the residuals is 0.21
##
## The harmonic n.obs is 30 with the empirical chi square 116 with prob < 0.000000012
## The total n.obs was 30 with Likelihood Chi Square = 1029 with prob < 1.6e-187
##
## Tucker Lewis Index of factoring reliability = -0.411
## RMSEA index = 0.873 and the 90 % confidence intervals are 0.842 NA
## BIC = 882.3
## Fit based upon off diagonal values = 0.71
## Measures of factor score adequacy
## ML1 ML2
## Correlation of (regression) scores with factors 1.00 0.97
## Multiple R square of scores with factors 0.99 0.94
## Minimum correlation of possible factor scores 0.99 0.88
There are 2 factors:
* Q5, Q6, Q7.iv, Q8.
* Q!10, *attQ2, attQ3**.
Several items are left out, such as Q2, Q7.i, Q7.v, Q9, and Q11.
Finally, plots show the relationship between the items and the factors.
12.2.2 Including all the items selected for expectations plus Q2, Q3, Q4, and Q5 inv from attitudes.
The Barlett’s sphericity test is performed.
Thus, the p-value = 0 and the H0 is rejected confirming the utility of applying a factor analysis to this dataset.
Considering the fact that the Bartlett’s test usually rejects the H0 since the scenario of the null hypothesis is too extreme, the KMO analysis is studied to determine how well the data fit the factor analysis and how useful each item is.
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = post_test_Q_Exp_Attit_facAn_2_corr)
## Overall MSA = 0.35
## MSA for each item =
## q2 q5 q6 q7_i
## 0.17 0.40 0.39 0.20
## q7_iv q7_v q8 q9
## 0.37 0.34 0.62 0.26
## q10 q11 exp_actit_q2 exp_actit_q3
## 0.28 0.51 0.53 0.31
## exp_actit_q4 exp_actit_q5_Inv
## 0.60 0.28
According to these results, most of the items are under the threshold of 0.5 (Q2, Q5, Q6, Q7.i, Q7.iv, Q7.v, Q9, Q10, attQ3, and attQ5 inv).
To explore the number of factors PCA is applied. Thus, the table with the PCA results is shown below:
PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 | PC10 | PC11 | PC12 | PC13 | PC14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Standard deviation | 1.7940 | 1.5648 | 1.3892 | 1.2284 | 1.0634 | 0.9443 | 0.8838 | 0.7668 | 0.7191 | 0.6267 | 0.4900 | 0.4377 | 0.3504 | 0.1947 |
Proportion of Variance | 0.2299 | 0.1749 | 0.1378 | 0.1078 | 0.0808 | 0.0637 | 0.0558 | 0.0420 | 0.0369 | 0.0281 | 0.0171 | 0.0137 | 0.0088 | 0.0027 |
Cumulative Proportion | 0.2299 | 0.4048 | 0.5426 | 0.6504 | 0.7312 | 0.7949 | 0.8507 | 0.8927 | 0.9296 | 0.9577 | 0.9748 | 0.9885 | 0.9973 | 1.0000 |
Then, the scree plot for this PCA analysis is displayed.
According to these results, there is no a clear elbow. The cumulative proportion for 3 components is 54%.
Then, another approach is implemented. With this strategy several analysis are combined and depict in the same figure. The tools implemented are: the Kaiser rule (which drops the components with eigenvalues < 1), the parallel analysis, and the usual scree test (plotuScree), the acceleration factor (which indicates where the elbow of the scree plot appears).
Three factors are condidered to emulate the number in the pre-genomic test. Nevertheless, according to these analyses, 1,4, and 5 could be adequate number of factors.
Running the factor analysis with 14 items, first the communalities are explored:
## Warning in cor.smooth(mat): Matrix was not positive definite, smoothing was
## done
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In factor.stats, I could not find the RMSEA upper bound . Sorry about that
## post_exp_preoc_q2 post_exp_preoc_q5
## 0.33083 0.72843
## post_exp_preoc_q6 post_exp_preoc_q7_i
## 0.99500 0.19401
## post_exp_preoc_q7_iv post_exp_preoc_q7_v
## 0.99500 0.53906
## post_exp_preoc_q8 post_exp_preoc_q9
## 0.51352 0.24376
## post_exp_preoc_q10 post_exp_preoc_q11
## 0.41381 0.03147
## post_exp_actit_q2 post_exp_actit_q3
## 0.72004 0.99500
## post_exp_actit_q4 post_exp_actit_q5_Inverted
## 0.45031 0.30621
Considering the values of the communalities, two have values below 0.3 (Q7.i, Q11, and Q9).
Then, the whole output is displayed.
## Factor Analysis using method = ml
## Call: fa(r = post_test_Q_Exp_Attit_facAn_2, nfactors = 3, rotate = "oblimin",
## fm = "ml", cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML3 ML2 ML1 h2 u2 com
## post_exp_preoc_q2 0.47 0.33 0.6690 2.8
## post_exp_preoc_q5 0.87 0.73 0.2716 1.0
## post_exp_preoc_q6 0.86 1.00 0.0049 1.2
## post_exp_preoc_q7_i 0.45 0.19 0.8062 1.2
## post_exp_preoc_q7_iv 0.95 1.00 0.0050 1.0
## post_exp_preoc_q7_v 0.70 0.54 0.4610 1.3
## post_exp_preoc_q8 0.45 0.51 0.4864 2.5
## post_exp_preoc_q9 -0.52 0.24 0.7565 1.5
## post_exp_preoc_q10 0.56 0.41 0.5864 1.6
## post_exp_preoc_q11 0.03 0.9697 1.8
## post_exp_actit_q2 0.77 0.72 0.2800 1.3
## post_exp_actit_q3 0.99 1.00 0.0050 1.0
## post_exp_actit_q4 0.55 0.45 0.5498 1.8
## post_exp_actit_q5_Inverted 0.31 0.6936 2.1
##
## ML3 ML2 ML1
## SS loadings 2.69 2.50 2.26
## Proportion Var 0.19 0.18 0.16
## Cumulative Var 0.19 0.37 0.53
## Proportion Explained 0.36 0.34 0.30
## Cumulative Proportion 0.36 0.70 1.00
##
## With factor correlations of
## ML3 ML2 ML1
## ML3 1.00 -0.07 0.35
## ML2 -0.07 1.00 0.03
## ML1 0.35 0.03 1.00
##
## Mean item complexity = 1.6
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 91 with the objective function = 71.58 with Chi Square = 1682
## df of the model are 52 and the objective function was 64.54
##
## The root mean square of the residuals (RMSR) is 0.13
## The df corrected root mean square of the residuals is 0.17
##
## The harmonic n.obs is 30 with the empirical chi square 90.8 with prob < 0.0007
## The total n.obs was 30 with Likelihood Chi Square = 1388 with prob < 3.4e-256
##
## Tucker Lewis Index of factoring reliability = -0.614
## RMSEA index = 0.925 and the 90 % confidence intervals are 0.899 NA
## BIC = 1211
## Fit based upon off diagonal values = 0.82
## Measures of factor score adequacy
## ML3 ML2 ML1
## Correlation of (regression) scores with factors 1.00 1.00 1.00
## Multiple R square of scores with factors 0.99 1.00 0.99
## Minimum correlation of possible factor scores 0.99 0.99 0.99
There are 3 factors:
* Q2, Q5, Q6, Q7.i, Q8, -Q9.
* Q7.iv, Q7.v, Q10.
* attQ2, attQ3, attQ4.
There are two items out: attQ4 inv, and Q11.
Finally, plots show the relationship between the items and the factors.
12.3 Global conclusions
In this setting, the exploratory factor analysis does not match with the analysis of the pre-genomic test. Items did not group in a similar way the pre questionnaire and here. While the questions are similar and no identical, the change between before and after could modify the underling meaning or the general behavior of the tool itself.