Beginning with SEM in lavaan

While reordering some old papers I rediscovered a paper by Dawn Iacobucci introducing structural equation modeling: Everything You Always Wanted to Know About SEM (Structural Equations Modeling) But Were Afraid to Ask. I liked Iacobucci’s introduction to SEM because it was eloquently written, clear and accessible. Moreover, Iacobucci accompanied the paper with a data-set and step-by-step explanation of the syntax to analyze it. This allowed the reproduction of her analysis to all the adventurous readers willing to start with structural equation modeling. Iacobucci used LISREL for the data-analysis and, at the time, I had no access to LISREL. Now, however, with lavaan available, when I encountered the paper again I saw an opportunity to make her approach reproducible by a wider audience since lavaan is available to anyone and its syntax is very intuitive. In this post I translate Iacobucci’s LISREL syntax to lavaan.

NOTE: all code is stored in a script file which is available here together with the data file.

The data

The data Iacobucci used are displayed in the covariance matrix in table 2 of her paper. I copied the data into a text file, which I named iacobucci2.txt (it’s available here). I pasted the data into a text file to avoid cluttering my blog post with meaningless numbers. After loading the data into R, I format them into a covariance matrix by using the function lav_matrix_lower2full. Before transforming the data into a covariance matrix I transposed the data from column to row. This is because lav_matrix_lower2full wants the data in row format but read.table reads the data as a column. Finally, to ease the comparison between R’s and Iacobucci’s output, I renamed the rows and columns to the variable names used by Iacobucci.

covMat <- read.table('iacobucci2.txt')
costumerQuality <- lav_matrix_lower2full(t(covMat))
rownames(costumerQuality) <- colnames(costumerQuality) <- c('q1', 
  'q2', 'q3', 'c1', 'c2', 'c3', 'v1', 'v2', 'v3', 'cs1', 'cs2', 
  'cs3', 'r1', 'r2', 'r3')
n <- 100 # number of respondents

Another important bit of information is the number of respondents (100). This must be assigned to the sample.nobs argument for the call to the lavaan functions sem and cfa when using a covariance matrix as data input instead of raw data.

The measurement model

Lavaan’s syntax to conduct confirmatory factor analysis is simpler than LISREL’s. In fact, Lavaan does not require to specify matrices and such, but only the specification of which variables load on which factors. This is as simple as writing syntax resembling equations in which a factor is composed of the sum of the variables contributing to that factor. Below is the model specification:

mdlStx <- "
quality =~ q1 + q2+ q3
cost =~ c1 + c2 + c3
value =~ v1 + v2 + v3
csat =~ cs1 + cs2 + cs3
repeaT =~ r1 + r2 + r3"

Note that in the model specification above the latent variable repeaT has a capital T. This is to distinguish it form the flow-control construct repeat(see here), a reserved word which initiates a Repeat Loop. In case someone tries, writing the model syntax with repeat with a lower case t yields the error ‘unexpected end of input’ because the reserved word repeat expects an input.

Iacobucci first addresses the fit of the model. Lavaan prints the indexes used by Iacobucci by pasting this syntax into R console. The code below may appear convoluted because of the formatting instructions. To increase similarity between lavaan’s output and the output reported by Iacobucci in her paper I limited most of R output to 2 digits after the decimal point. To increase readability I wrapped all the indexes into a line through the command paste and print them to screen with the print command:

mdlFit <- cfa(mdlStx, sample.cov=costumerQuality, sample.nobs=n)
  "chi^2_{", fitMeasures(mdlFit, "df"), 
  "} =", format(fitMeasures(mdlFit, "chisq"), nsmall = 2, digits = 2), 
  ", (p =", format(fitMeasures(mdlFit, "pvalue"), digits = 1), 
  "), CFI =", format(fitMeasures(mdlFit, "cfi"), digits = 2), 
  ", SRMR =", format(fitMeasures(mdlFit, "srmr"), digits = 2)

A quick comparison between the output printed in the R console and Iacobucci's output shows that the two outputs are consistent and almost identical.

Factor loadings can be extracted using the parameterEstimates function. The factor loadings can be extracted by indexing the “est” column with the output of the match between the string ‘=~’ to the “op” column. With the same strategy, we can extract the significance of the factor loadings from the “pvalue” column.

estVals <- parameterEstimates(mdlFit, standardized = TRUE)
estVals[estVals$op == "=~", "est"]
estVals[estVals$op == "=~", "pvalue"]

Note that p values are NA (i.e., not available) for some variables. These are the variables whose factor loadings were fixed to one, also known as marker variables. By convention, the first variables contributing to a factor are assigned the role of marker variable. Marker variables are constrained to 1.0 to set a scale upon which to measure the latent variables. This is to provide a unit of measurement for the latent variable because they do not have a scale since they are not directly observed. In R NAs can be filtered out with the function na.omit(). Furthermore, we can check if factor loadings were significant by testing whether they were below .05.

# show factor loadings
na.omit(estVals[estVals$op == "=~", "pvalue"])
# show significant factor loadings
na.omit(estVals[estVals$op == "=~", "pvalue"]) < .05

Significance of factor loadings is important because a nonsignificant loading implies that the variable should not be included in the model. A quick way to check whether all the variables included in the model contributed significantly to the respective factor loadings is to check if the number of p values below .05 matches the total number of p values estimated. R-trickery comes in handy to solve these questions. Since boolean vectors in R are series of 0 (false) and 1 (true), we can check if the sum of the vector matches (amount of TRUEs) its length (amount of tests). Wrapping this code into an if statement we can print to console whether or not all our factor loadings were significant:

# Are all factor loadings significant?
pValues <- na.omit(estVals[estVals$op == "=~", "pvalue"])
    if(sum(pValues < .05) != length(pValues)){"NOT "}, 
    "all factor loadings were significant"))

In our case the line above returns an empty array, confirming Iacobucci's findings that all the factor loadings were significant.

The range spanned by the factor loading can be extracted using the range function on either the est or std.all column for un- or standardized estimates respectively.

print(range(estVals[estVals$op == "=~", "est"]))
range(estVals[estVals$op == "=~", "std.all"])

Iacobucci then presents a table with the factor intercorrelations from the confirmatory factor analysis. The values can be extracted from the model fit using the inspect function and passing the argument to it, which extracts the correlation among latent variable.

inspect(mdlFit, "")

Iacobucci syntax also contains directives to plot the path diagram. We can plot the path diagram with the semPaths function from the semPlot package. Below I adapted the call to semPaths to match as best as possible figure 1 in Iacobucci’s paper:

# explanation for parameters choice
#     rotation = 2 # observed variables on left
#     nCharNodes = 0 # do not trim latent variable names on nodes' labels
#     residuals = FALSE # remove recurrent loop on node itself
#     exoCov = FALSE # do not print the covariances among latent variables
#     whatLabels = 'est' # print estimates on paths
semPaths(mdlFit, rotation = 2, whatLabels = 'est', 
  nCharNodes = 0, residuals = FALSE, exoCov = FALSE)
title("Fig. 1. Confirmatory factor analysis.")}


Diagnostics are important for establishing whether the models that we are fitting make sense. At this point there are two aspects we can check to determine whether the model (and the theory guiding its specification) is statistically sound:

Fit Indexes Values
– CFI > 0.97, SRMR < 0.05, RMSEA < 0.05
Factor Loading Significance
– Drop variables which are not significant

This post is becoming quite long, therefore I will continue with the parts on the structural model and the full model in the next post.

Beginning with SEM in lavaan

4 thoughts on “Beginning with SEM in lavaan

  1. kwon nayeon says:

    thank you for the post, it’s really helpful.
    Although I’ve known SEM doesn’t care for confounders, there’s no way to control confounders?
    I first drew a model in DAG, and there are 2 mediators and confounders, also colliders.
    Actually I’d like to analyze the data through SEM, and I want to adjust for the 1 counfounder, how could I do?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s