This post extends this previous one on multiple-mediation with lavaan. Here I modeled a ‘real’ dataset instead of a randomly generated one. This dataset we used previously for a paper published some time ago. There we investigated whether fear of an imperfect fat self was a stronger mediator than hope of a perfect thin self on dietary restraint in college women. At the time of the paper’s publication we performed the analysis using the SPSS macro INDIRECT . However,
INDIRECT could not answer all our questions. Namely, we wanted to establish which was the best of the two independent variables’ effect on the dependent variable, as channeled through the mediators. INDIRECT could not perform that contrast, but a software for structural equation modelling like Mplus or Lisrel could. However, I only had access to the demo version of Mplus which allowed to model only two independent variables, but we had three. BMI, which was a control variable, had to be treated as an independent variable, making the demo version of Mplus insufficient to perform the test. At the time we were rescued by a colleague, who modeled the solution in AMOS. Falling back on AMOS provided an imminent solution, but I was unsatisfied, and I have been looking for an alternative since. I now know lavaan is not only an alternative but it is (in my opinion) the best option to perform this type of analysis.
This image shows the multiple mediation model we tested.
I display above depicts the model in hope of simplifying its complexity, but I will not explain it any further. Instead I will get into the R code. To read the data I used the R function read.delim2 because the file contains tab-separated values and the numbers’ decimal part was separated with a comma. Moreover, to spare myself some typing during the specification of the model I renamed the variables to shorter acronyms. Specifically attainabilityfear and attainabilityhope, the mediators, are M1 and M2, respectively. DietTotal, the dependent variable, is Y. BMI, the control variable, is C. Age I will not use, therefore I left as age. PDiscrepency and Pstandards are independent variables recoded as X1 and X2, respectively. Porder I will not use, therefore I left as Porder. To avoid any sort of confusion I will clarify that the digit I added after the letter identifying each variable type do not index order, the digits only distinguish the variables of the same type from one another.
Data <- read.delim2('~/Dropbox/mPlusTests/PerfectMediation2.dat', as.is = TRUE) names(Data) <- c('M1', 'M2', 'Y', 'C', 'age', 'X1', 'X2', 'Porder')
The model specification is straightforward. First all predictors on Y, the dependent variable, then the same with the two mediators. Then indirect effects to the dependent variable through both mediators for each independent variables. Note that there are no indirect effects for the control. The subtraction of indirect effects, representing contrasts, determines which path through the indirect effect is stronger than the other. Total effects must be specified for each independent variables and for the control as well. The full model specification is below:
myModel <- ' Y ~ b1 * M1 + b2 * M2 + c1 * X1 + c2 * X2 + c3 * C M1 ~ a1 * X1 + a2 * X2 + a5 * C M2 ~ a3 * X1 + a4 * X2 + a6 * C #indirect effects indirect1 := a1 * b1 indirect2 := a3 * b2 indirect3 := a2 * b1 indirect4 := a4 * b2 # contrasts con1 := a1 * b1 - a3 * b2 con2 := a2 * b1 - a4 * b2 con3 := (a1-a2) * b1 con4 := (a3-a4) * b2 # total effect total1 := c1 + (a1 * b1) + (a3 * b2) total2 := c2 + (a2 * b1) + (a4 * b2) total3 := c3 + (a5 * b1) + (a6 * b2) # covariates M1 ~~ M2 '
To make the modelling more robust I ask lavaan to estimate the standard error based on an estimate from 5000 bootstrap samples. Beware that drawing 5000 bootstrap samples takes a few minutes on my (old) machine. If you only care to check whether the procedure works, remove (or comment out) the bootstrap and se lines and leave drawing of the bootstrap samples for a proper analysis.
require("lavaan") fit <- sem(myModel, data=Data, se = "bootstrap", bootstrap = 5000)
Mplus also includes the intersects in the model, therefore increasing the number of parameters to estimate. The intersects can be added automatically to the model setting the parameter mimic=”Mplus”, which adds intercepts of mediators and independent variable, this is identical to specify the parameter meanstructure = True (this link explains meanstructure). Note that adding parameters does not necessarily improve the model, in my specific case it worsen the model (both AIC and BIC were higher for the models with intercepts than for the model without them).
The covariance of the independent variables could also be automatically added to the model setting the parameter fixed.x = FALSE in the call to the sem function. Then, the exogenous ‘x’ covariates are treated as random variables and their means, variances and covariances are free parameters which will be estimated. Calling help(lavOptions) shows all the other options available with lavaan.
While experimenting with the model specifications in lavaan I discovered two points on coding style for which I would have appreciate some word of advice. I found the formulations confusing because I could not see that they were yielding identical results in spite of being expressed differently. The first confusion regards the specification of the equations for the model. In practice specifying the model as
Y ~ b1 * M1 Y ~ c1 * X1
is identical to specifying the model as
Y ~ b1 * M1 + c1 * X1
The parameters estimated will be identical independently of the notation used. As an inexperienced user I preferred the first notation because the relationship (or paths) between the variables was clearer than with the second notation. Nowadays I prefer the second notation because it is more concise and parsimonious, it makes the specification of the model more compact and less error prone. Similarly, when specifying model contrasts the two notations below yield identical results:
con1 := a1 * b1 - a2 * b1 con2 := a3 * b2 - a4 * b2 # OR con1 := (a1-a2) * b1 con2 := (a3-a4) * b2
Take home message is that lavaan’s notation is very flexible, one only needs to make sure s/he does not make mistakes when specifying complex equations.
Last but not least, I found a way to reduce the inconvenience of having to copy-paste the model specification from an editor to the R console. Since mistakes and/or typos are common with complex models, copy-pasting might lead to quite a bit of frustration, especially when trying to scroll back to the line which created the mistake. But I found a brilliant solution to this problem in the lavaan manual. It is possible to specify the model on a separate file, read in the model specification from the file into a variable using the readLines function, and then feed that to the sem function. Brilliant! But there is a catch, for this procedure to work the model specification must be stripped off from all the non-model related text. Also comments (i.e., text preceded by #) are not allowed. For the curious ones I posted here the file multipleMediator2.txt which I used to specify the multiple mediation model.
myModel <- readLines("~/Dropbox/mPlusTests/multipleMediator2.txt")
… and then from here on the procedure is the same as with pasting the code from the editor to the R console: fit the model with sem extract the result with summary, change the model specification, read-in again and repeat all the steps until satisfied with the model fit. And if you might be interested in how to automate the plotting of lavaan’s output into a pretty graph read this post. Instead this post describe a neat way to represent lavaan’s summary output and highlight significant paths.