Multiple-mediation example with lavaan

This post extends this previous one on multiple-mediation with lavaan. Here I modeled a ‘real’ dataset instead of a randomly generated one. This dataset we used previously for a paper published some time ago. There we investigated whether fear of an imperfect fat self was a stronger mediator than hope of a perfect thin self on dietary restraint in college women. At the time of the paper’s publication we performed the analysis using the SPSS macro INDIRECT . However,
INDIRECT could not answer all our questions. Namely, we wanted to establish which was the best of the two independent variables’ effect on the dependent variable, as channeled through the mediators. INDIRECT could not perform that contrast, but a software for structural equation modelling like Mplus or Lisrel could. However, I only had access to the demo version of Mplus which allowed to model only two independent variables, but we had three. BMI, which was a control variable, had to be treated as an independent variable, making the demo version of Mplus insufficient to perform the test. At the time we were rescued by a colleague, who modeled the solution in AMOS. Falling back on AMOS provided an imminent solution, but I was unsatisfied, and I have been looking for an alternative since. I now know lavaan is not only an alternative but it is (in my opinion) the best option to perform this type of analysis.

This image shows the multiple mediation model we tested.

Total, direct, indirect effect of the multiple mediation model

I display above depicts the model in hope of simplifying its complexity, but I will not explain it any further. Instead I will get into the R code. To read the data I used the R function read.delim2 because the file contains tab-separated values and the numbers’ decimal part was separated with a comma. Moreover, to spare myself some typing during the specification of the model I renamed the variables to shorter acronyms. Specifically attainabilityfear and attainabilityhope, the mediators, are M1 and M2, respectively. DietTotal, the dependent variable, is Y. BMI, the control variable, is C. Age I will not use, therefore I left as age. PDiscrepency and Pstandards are independent variables recoded as X1 and X2, respectively. Porder I will not use, therefore I left as Porder. To avoid any sort of confusion I will clarify that the digit I added after the letter identifying each variable type do not index order, the digits only distinguish the variables of the same type from one another.

Data <- read.delim2('~/Dropbox/mPlusTests/PerfectMediation2.dat',
    as.is = TRUE)
names(Data) <- c('M1', 'M2', 'Y', 'C', 'age', 'X1', 'X2', 'Porder')

The model specification is straightforward. First all predictors on Y, the dependent variable, then the same with the two mediators. Then indirect effects to the dependent variable through both mediators for each independent variables. Note that there are no indirect effects for the control. The subtraction of indirect effects, representing contrasts, determines which path through the indirect effect is stronger than the other. Total effects must be specified for each independent variables and for the control as well. The full model specification is below:

myModel <- '
Y ~ b1 * M1 + b2 * M2 + c1 * X1 + c2 * X2 + c3 * C
M1 ~ a1 * X1 + a2 * X2 + a5 * C
M2 ~ a3 * X1 + a4 * X2 + a6 * C
#indirect effects
indirect1 := a1 * b1
indirect2 := a3 * b2
indirect3 := a2 * b1
indirect4 := a4 * b2
# contrasts
con1 := a1 * b1 - a3 * b2
con2 := a2 * b1 - a4 * b2
con3 := (a1-a2) * b1
con4 := (a3-a4) * b2
# total effect
total1 := c1 + (a1 * b1) + (a3 * b2)
total2 := c2 + (a2 * b1) + (a4 * b2)
total3 := c3 + (a5 * b1) + (a6 * b2)
# covariates
M1 ~~ M2
'

To make the modelling more robust I ask lavaan to estimate the standard error based on an estimate from 5000 bootstrap samples. Beware that drawing 5000 bootstrap samples takes a few minutes on my (old) machine. If you only care to check whether the procedure works, remove (or comment out) the bootstrap and se lines and leave drawing of the bootstrap samples for a proper analysis.

require("lavaan")
fit <- sem(myModel, 
    data=Data, 
    se = "bootstrap", 
    bootstrap = 5000)

Done!

Mplus also includes the intersects in the model, therefore increasing the number of parameters to estimate. The intersects can be added automatically to the model setting the parameter mimic=”Mplus”, which adds intercepts of mediators and independent variable, this is identical to specify the parameter meanstructure = True (this link explains meanstructure). Note that adding parameters does not necessarily improve the model, in my specific case it worsen the model (both AIC and BIC were higher for the models with intercepts than for the model without them).
The covariance of the independent variables could also be automatically added to the model setting the parameter fixed.x = FALSE in the call to the sem function. Then, the exogenous ‘x’ covariates are treated as random variables and their means, variances and covariances are free parameters which will be estimated. Calling help(lavOptions) shows all the other options available with lavaan.

While experimenting with the model specifications in lavaan I discovered two points on coding style for which I would have appreciate some word of advice. I found the formulations confusing because I could not see that they were yielding identical results in spite of being expressed differently. The first confusion regards the specification of the equations for the model. In practice specifying the model as

Y ~ b1 * M1 
Y ~ c1 * X1

is identical to specifying the model as

Y ~ b1 * M1 + c1 * X1

The parameters estimated will be identical independently of the notation used. As an inexperienced user I preferred the first notation because the relationship (or paths) between the variables was clearer than with the second notation. Nowadays I prefer the second notation because it is more concise and parsimonious, it makes the specification of the model more compact and less error prone. Similarly, when specifying model contrasts the two notations below yield identical results:

con1 := a1 * b1 - a2 * b1 
con2 := a3 * b2 - a4 * b2
# OR
con1 := (a1-a2) * b1
con2 := (a3-a4) * b2

Take home message is that lavaan’s notation is very flexible, one only needs to make sure s/he does not make mistakes when specifying complex equations.

Last but not least, I found a way to reduce the inconvenience of having to copy-paste the model specification from an editor to the R console. Since mistakes and/or typos are common with complex models, copy-pasting might lead to quite a bit of frustration, especially when trying to scroll back to the line which created the mistake. But I found a brilliant solution to this problem in the lavaan manual. It is possible to specify the model on a separate file, read in the model specification from the file into a variable using the readLines function, and then feed that to the sem function. Brilliant! But there is a catch, for this procedure to work the model specification must be stripped off from all the non-model related text. Also comments (i.e., text preceded by #) are not allowed. For the curious ones I posted here the file multipleMediator2.txt which I used to specify the multiple mediation model.

myModel <- readLines("~/Dropbox/mPlusTests/multipleMediator2.txt")

… and then from here on the procedure is the same as with pasting the code from the editor to the R console: fit the model with sem extract the result with summary, change the model specification, read-in again and repeat all the steps until satisfied with the model fit. And if you might be interested in how to automate the plotting of lavaan’s output into a pretty graph read this post. Instead this post describe a neat way to represent lavaan’s summary output and highlight significant paths.

Multiple-mediation example with lavaan

21 thoughts on “Multiple-mediation example with lavaan

  1. MinZ says:

    Hi Laffa, thanks for the helpful tutorial. I’m trying to use Lavvan for multiple mediation with a repeated measures logistic regression since my outcome variable is binary. The treatment (independent variable) and the mediators are continuous.

    Liked by 1 person

      1. MinZ says:

        Hi! I was trying to ask a question.

        How would you specify the correct error term for the logistic regression?

        Like

  2. Sophia says:

    Hi, I tried to run your model without a control (C), so deleted these terms only (* C in the first 3 lines), and replaced the variable names in this model with my own, but ran into the following error:

    “Error in lav_data_full(data = data, group = group, cluster = cluster, :
    lavaan ERROR: missing observed variables in dataset: c3 a5 a6”

    Do you have any idea what this error could be? Thanks very much!

    Like

    1. Hi Sophia, the error says a variable is missing. I am not entirely sure about your model specifications, but the in the one I did for my case c3 was the coefficient that had to be estimated for the control variable. If you deleted the control variable (e.g. “c3*C” in line1) you do not estimate that coefficient and therefore lavaan is “missing observed variables”. The same is true for a5 and a6 that were defined in line 3. Quick fix: Remove the last of the line specifying total effects and you should be good to go. I hope this solves the problem. Good luck!

      Like

      1. Sophia says:

        Hi! I figured it out. It’s silly – I deleted too little of the code when I removed the effect of C.
        I’d delete my comment because it’s such a silly mistake, but maybe others will make similar mistakes. Thanks!

        This is the code without the effect of C that I ended up using:

        myModel <- '
        Y ~ b1 * M1 + b2 * M2 + c1 * X1 + c2 * X2
        M1 ~ a1 * X1 + a2 * X2
        M2 ~ a3 * X1 + a4 * X2
        #indirect effects
        indirect1 := a1 * b1
        indirect2 := a3 * b2
        indirect3 := a2 * b1
        indirect4 := a4 * b2
        # contrasts
        con1 := a1 * b1 – a3 * b2
        con2 := a2 * b1 – a4 * b2
        con3 := (a1-a2) * b1
        con4 := (a3-a4) * b2
        # total effect
        total1 := c1 + (a1 * b1) + (a3 * b2)
        total2 := c2 + (a2 * b1) + (a4 * b2)
        # covariates
        M1 ~~ M2
        '

        Liked by 1 person

  3. Leanna says:

    Hello! Your tutorials and examples are extremely helpful. The science world is so lucky to have people like yourself who devote time to helping others navigate stats. Stats are my weak-point, and your clear demonstrations are enabling me to learn R mediation on my own during this quarantine period.

    I do have a question – the model I am attempting to run is the exact same structure as the model in your example, except I only have one mediator. I am confused which coefficients demonstrate relationships to which mediator (e.g., a1, a,2, a3, a4…), so I am unsure which aspects of the code I must delete. Perhaps an easy way to demonstrate this, and could be helpful for others, would be to post a figure of your model with the paths labeled with their corresponding coefficients?

    I appreciate all you do!
    Leanna

    Like

  4. Lisa says:

    Hey Paolo, I found this tutorial super helpful for my analysis. In my model, I am including more than two mediators – am I correct in assuming that I would have to add covariances for all combinitions of mediatiors on seperate lines? If for example I have four mediating variables I would specificy the covariates as follows?
    # covariates
    M1 ~~ M2
    M2 ~~ M3
    M2 ~~ M4
    M1 ~~ M4
    M3 ~~ M4

    Thank you again!

    Like

  5. Lana says:

    Thank you for your helpful tutorial! I have a dummy variable as the modering variable in my model. R studio doesn’t show if the regression results are for the dummy variable being 0 or 1. Do you have any tips on how to interpret the results correctly?

    Like

    1. Hey Lana, R’s default is to interpret 0 as the control group, so I assume what you see are the results of comparing 1 vs 0. What you could do to check how/if things look different is calling the function `relevel’ on your variable and specify `ref = 1′. The comparison should then reseverse as, in a simple case, turning a `+’ into a `-‘ or something along those lines. Goodluck!

      Like

  6. Lana says:

    Is it possible to include fixed effects in the mediation analysis? I’m using the package “fixest” which includes fixed effects by adding variables after a pipe but it doesn’t seem to work here. Do you have any tips?

    Like

    1. Hi Lana, no idea. I have never used fixest. If it isn’t a derivative of lavaan, as blavaan for example, I doubt the syntax is interchangeable. Try asking the lavaan mailing list

      Like

  7. Hi Paolo! First of all, I would like to say that your page, examples, and explanation are extremely helpful to me.

    I have a mediated mediation and I was wondering whether you could give me some advice (I also have two predictors and one dependent variable).

    My mediators are Empathy and Community feeling which are both mediated by Loyalty and Trust.
    Should I also add # covariates for Empathy ~~ Loyalty, Empathy ~~ Trust, Community Feeling ~~ Loyalty, Community Feeling ~~ Trust
    OR only Empathy ~~ Community Feeling and Loyalty ~~ Trust?

    Moreover, I have several control variables in my model and I was wondering whether I need to include the path from the first mediator to the control variable as well as the second mediator?
    You have: total3 := c3 + (a5 * b1) + (a6 * b2)
    Should I write it as following (to include both mediations)?
    total3 := c3 + (a5 * b1) + (a6 * b2) + (a7*b3) + (a8 * b4)

    Thank you in advance!

    Like

    1. Hi Maite,
      quite a model you have there 😉
      I think modelling all covariates is a good idea. You could try two things to have a data-driven decision about whether that is a good idea: 1) look at the modification indexes – I explained how to use those in another post on structural equation modelling, so I won’t repeat it 😉 – 2) compare the model with all the covariates to one with less and see if the fit worsens or improve, in any case you should chose the most parsimonious model – Besides from the data-driven solution, don’t forget that theory should always drive your structural equation model.
      About the total effect. If the path from mediator one goes through both the other mediators that’s how you write it, yes. But you will need one total path for each path (confusing but I hope you get what I mean).
      I will recommend you to try to plot the model you get with semPaths, just to double check that all the paths you want are specified in the model. Good luck!

      Liked by 1 person

  8. Samuel Choi says:

    Hello, thank you for uploading example. I was wondering if this multiple mediation example works with binary or categorical outcomes?

    Like

Leave a comment