Beginning with SEM in lavaan III

This post concludes the translation from LISREL’s to R’s of Iacobucci’s paper: Everything You Always Wanted to Know About SEM (Structural Equations Modeling) But Were Afraid to Ask. This post covers the full structural equation model.

The full SEM model

The syntax for the full structural equation model combines the measurement model and the structural model into one.

# model syntax
fullMdlStx = "
   # measurement model
   quality =~ q1 + q2+ q3
   cost =~ c1 + c2 + c3
   value =~ v1 + v2 + v3
   csat =~ cs1 + cs2 + cs3
   repeaT =~ r1 + r2 + r3
   # structural model
   repeaT ~ csat 
   value ~ quality 
   csat ~ quality
   value ~ cost
   repeaT ~ cost
   csat ~ value 
# fit the model
completeModel = sem(fullMdlStx, sample.cov=costumerQuality, 

To report the fit of the model I can use my own-built function:


My output is not identical to Iacobucci’s. Part of the differences stems from the fact that my model has 83 instead of 86 degrees of freedom. Nonetheless, the significance tests are consistent among the two approaches, in spite of the numerical differences in the coefficients’ estimate.

The factor loadings were also all above 0.9. This time, to establish whether the factor loadings were significant I used the bootstrapped confidence intervals. Instead, in the previous post, I used p-values to establish whether the factor loadings were significant. With the confidence intervals, a factor loading is significant if the interval does not contain zero. A quick trick to check this in R is by multiplying the upper and lower limit of the confidence intervals and check if the result is positive. This is because confidence intervals that do not contain zeros are either both positive or both negative. Therefore, the multiplication of the confidence interval boundaries yields a positive value if zero is not part of the interval and a negative value otherwise (i.e., -1 * -1 = 1 * 1 = 1; but -1 * 1 = 1 * -1 = -1). To ease display of the significance information I added a ‘significance column’ to the data.frame containing the results. The column contains 1) a star if the factor loading is significant or 2) is empty otherwise. This addition to the output is straightforward using the output of the boundaries multiplication as input in an ifelse statement to assign the star to positive values.

range(estVals[estVals$op == '=~', 'std.all'])
estVals$sig = ''
estVals$sig[estVals$op == '=~'] = ifelse(
   (estVals[estVals$op == '=~', 'ci.lower'] 
   * estVals[estVals$op == '=~', 'ci.upper'])
   > 0, '*', '')

I used this approach for the factor loadings only, but it could be extended to include the constructs and all other parameters built with the structural model (e.g., identified by "op == ~").

Squared multiple correlations

The squared multiple correlations provide an indication of how well the independent variables predicted the dependent variable. The index ranges between 0 and 1. The closer the index is to one, the better the predictability of y given xs. The squared multiple correlations can be extracted by passing the "rsquare" argument to the function inspect, as in

inspect(completeModel, "rsquare")

Plotting the path diagram

This time I limited the customization of the plot as much as possible. semPaths defaults did a magnificent job. Moreover reorganizing the path diagram a-la Iacobucci’s would have required quite a bit of coding since this specific path diagram has many variables depicted. Especially re-arranging the observed variables around the respective factors presented itself as a rather tedious job. In spite of the fact that Iacobucci’s plot layout is brilliant in conveying the directionality of the hypotheses, I find it pretty ugly and a little messy. SemPlot’s defaults do a great job in organizing the layout (too bad the directionality is lost), and if the default size is rescaled to a bigger size there is no more overlapping between the labels in the paths cost->values and quality->csat. Superb!

semPaths(completeModel, whatLabels = "std", 
   nCharNodes = 0, edge.label.cex = .7)


As in the previous posts, also for the full model, there are diagnostics to determine whether the model that we are fitting is statistically sound. All the previous diagnostic tests are applicable here for the 1) measurement model and the 2) structural equations are applicable. However, an additional diagnostic clue can be extrapolated from the:

Squared Multiple Correlations
– A variable should be dropped if its R^2 is relatively lower than the others

Identification of variables with squared multiple correlations relatively lower than others could be automated by identifying variables two standard deviations below the mean squared multiple correlations values. For example:

# get squared multiple correlations
rSquared = inspect(completeModel, "rsquare")
# extract R^2 for IV
xVar = rSquared[1:(length(rSquared) - nLatentVariables)]
# which should be excluded?
names(xVar[xVar < (mean(xVar) - (2 * sd(xVar)))])
# returns
# none, in this case

Modification indexes

Iacobucci has two words to describe modification indices: statistics devil. I liked her description very much: “they are seductive because all users seek better fir indexes, but they result in models that are nonsensical … ” (p. 678). Modification indices allow estimating how much the model will improve if given parameters were included in the model. Since all possible parameters and paths are provided, the inclusion of some of those might improve the model fit adding a condition/constraint that does not make sense. But it is still fun to play around with them. In fact, whereas the checking of the modification indices might not be helpful for the current theory we are testing, it might provide insights into things which were not considered, which could be used to gain data-driven insights.

In lavaan, one can extract modification indices with the function modindeces(). The function returns a data.frame with a column for the constructs and their relation, the modification indices (‘mi’) and a series of expected parameter change (‘epc’), which is also available in various standardized (‘s’) forms. The value ‘mi’ represents the improvement in the \chi^2 test if the given parameter/relationship was unconstrained in the model.
To explore how the model changes when including parameters suggested by the modification indices one can select the ‘mi’ bigger than 5 and include them into the model syntax. The inclusion in the model specification is straightforward. Since the model syntax is specified into a string of characters we can use the paste function to add constructs and the operators into it as in, for example:

# add a new line to identify the inclusion of new parameters
fullMdlStx = paste0(fullMdlStx, "\n#DATA-DRIVEN PARAMETERS")
mi = modindices(completeModel)
# keep only modification indices higher than 5
mi = mi[mi$mi > 5, ]
# if there are modification indices above 5 then
if(sum(mi$mi > 5) > 0){
   # order the modification indices so that we can take the 
   # highest one
   mi = mi[order(mi$mi, decreasing=TRUE), ]
   # include the first modification index into model syntax
   fullMdlStx = paste0(fullMdlStx, '\n', mi$lhs[1], 
      mi$op[1], mi$rhs[1])
   completeModel2 = sem(fullMdlStx, 
      sample.cov = costumerQuality, sample.nobs = n)

To determine whether the model fit has improved I extract RMSEA, CFI, and SRMR with the fitMeasures function and bind the output into a data.frame to ease comparison between the old model and the new one. As a reminder, I included the ‘recommended’ values of the goodness of fit for the three indexes on the first row of the data.frame.

fitIndexes = c('rmsea', 'cfi', 'srmr') 
fits = data.frame(rmsea = .05, cfi = .97, srmr = .05)
fits = rbind(fits, 
   mdl = fitMeasures(completeModel)[fitIndexes], 
   newMdl = fitMeasures(completeModel2)[fitIndexes])

This shows that the model improved just by including one of the parameters selected among the possible modification indices since both RSMEA and SRMR are now below their threshold to index good fit (CFI was already above the threshold even without the inclusion of the parameter suggested by the modification index).

Of course, one could continue until all the modification indices above 5 are included in the model syntax. Down here I wrote a recursive function because I like to include parameters in the model specification one at the time:

improveFits = function(mdlFit, mdlStx, covarianceMatrix, nObservations){
   mi = modindices(mdlFit)
   if (sum(mi$mi > 5) == 0){
   } else {
      print("improving the model fit")
      # order the indexes so that we can take the highest one
      mi = mi[order(mi$mi, decreasing = TRUE), ]
      # paste the model improvements to the new model sintax
      mdlStx = paste0(mdlStx, '\n', mi$lhs[1], mi$op[1], mi$rhs[1])
      mdlFit = sem(mdlStx, sample.cov = covarianceMatrix, 
         sample.nobs = nObservations)
      improveFits(mdlFit, mdlStx, covarianceMatrix, nObservations)
newStx = improveFits(completeModel2, fullMdlStx, costumerQuality, n)
completeModel2 = sem(newStx, sample.cov=costumerQuality, 
# Do NOT forget to check how the new model looks
semPaths(completeModel2, what = "std", residuals = FALSE, 
   nCharNodes = 0, edge.label.cex = 1.2, sizeMan = 8)

The function returns the syntax to fit the model and not the fitted model. The modification indices suggest adding a structural equation relating intentions to value and some residual covariances. I am not sure whether the added model parameters make sense since I do not know the theory, but especially the structural equation goes in the opposite direction of Iacobucci’s model. In spite of the fact this particular equation was not justified by the theory, it could be plausible that the intention to repurchase contributes to the perception of value. For example, I can think of a couple of situations in which I bought something again because I was satisfied the first time and, by becoming satisfied with my purchase again, I experienced that the product was more valuable. It seems very convoluted, and I am derailing from lavaan and this discussion really does not matter. Exploring modification indices can be insightful and it is definitely a nice conclusion for this post/series. To the next one.

Beginning with SEM in lavaan III

One thought on “Beginning with SEM in lavaan III

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s