I wrote this brief introductory post for my friend Simon. I want to show how easy the transition from SPSS to R can be. In the specific case of mediation analysis the transition to R can be very smooth because, thanks to lavaan, the R knowledge required to use the package is minimal. Analysis of mediator effects in lavaan requires only the specification of the model, all the other processes are automated by the package. So, after reading in the data, running the test is trivial.

This time, to keep the focus on the mediation analysis I will skip reading-in the data and generate a synthetic dataset instead. This is because otherwise I would have to spend the next paragraph explaining the dataset and the variables it contains and I really want to only focus on the analysis.

Create toy data:

# make up some data with one mediator set.seed(06052017) X <- rnorm(100) M <- 0.5*X + rnorm(100) Y <- 0.7*M + rnorm(100) Data <- data.frame(X = X, Y = Y, M = M) # add another mediator M2 <- -0.35*X + rnorm(100) Y <- 0.7*M2 + 0.48*-M + rnorm(100) Data <- data.frame(X = X, Y = Y, M1 = M, M2 = M2)

As shown in the lavaan website performing a mediation analysis is as simple as typing in the code below:

simpleMediation <- ' Y ~ b * M + c * X M ~ a * X indirect := a * b total := c + (a * b) ' require(lavaan) fit <- sem(model = simpleMediation, data = Data) summary(fit)

For multiple mediators one simply need to extend the model recycling the code of the first mediator variable:

multipleMediation <- ' Y ~ b1 * M1 + b2 * M2 + c * X M1 ~ a1 * X M2 ~ a2 * X indirect1 := a1 * b1 indirect2 := a2 * b2 total := c + (a1 * b1) + (a2 * b2) M1 ~~ M2 ' fit <- sem(model = multipleMediation, data = Data) summary(fit)

Note that with multiple mediators we must add the covariance of the two mediators to the model. Covariances are added using the notation below:

M1 ~~ M2

There are two ways to test the null hypothesis that the indirect effect are equal to each other. The first is to specify a contrast for the two indirect effects. In the definition of the contrast the two indirect effects are subtracted. If it is significant the two indirect effects differ.

contrast := indirect1 - indirect2 # including a contrast in the model contrastsMediation <- ' Y ~ b1 * M1 + b2 * M2 + c * X M1 ~ a1 * X M2 ~ a2 * X indirect1 := a1 * b1 indirect2 := a2 * b2 contrast := indirect1 - indirect2 total := c + (a1 * b1) + (a2 * b2) M1 ~~ M2 ' fit <- sem(model = contrastsMediation, data = Data) summary(fit)

The second option to determine whether the indirect effects differ is to set a constrain in the model specifying the two indirect effect to be equal.

indirect1 == indirect2

Then, with the anova function one can compare the models and determine which one is better. Including the constrain and comparing the models is simple:

constrainedMediation <- ' Y ~ b1 * M1 + b2 * M2 + c * X M1 ~ a1 * X M2 ~ a2 * X indirect1 := a1 * b1 indirect2 := a2 * b2 total := c + (a1 * b1) + (a2 * b2) # covariances M1 ~~ M2 # constrain indirect1 == indirect2 ' noConstrFit <- sem(model = multipleMediation, data = Data) constrFit <- sem(model = constrainedMediation, data = Data) anova(noConstrFit, constrFit)

In my case the test is not significant so there is no evidence that the indirect effects are different.

For these toy models there is no further need of customizing the calls to sem. However, when performing a proper analysis one might prefer to have bootstrapped confidence intervals. The bootstrap standard errors and the quantity of bootstrap samples to perform are controlled in the call to the sem function with the additional arguments `se' and`

bootstrap’, as in:

fit <- sem( model = contrastsMediation, data = Data, se = "bootstrap", bootstrap = 5000 # 1000 is the default )

(Bootstrap) confidence interval can be extracted with the function calls 1) summary, 2) parameterEstimates, or 3) bootstrapLavaan

summary(fit, fit.measures=TRUE, standardize=TRUE, rsquare=TRUE, estimates = TRUE, ci = TRUE) parameterEstimates(fit, boot.ci.type="bca.simple") bootstrapLavaan(fit, R=5000, type="bollen.stine", FUN=fitMeasures, fit.measures="chisq")

NOTE that bootstrapLavaan will re-compute the bootstrap samples requiring to wait as long as it took the sem function to run if called with the bootstrap option.

Since this post is longer than I wanted it to be, I will leave as a brief introduction to mediation with lavaan. In this follow up post I describe multiple mediation with lavaan using an actual dataset. And if you might be interested in how to automate the plotting of lavaan’s output into a pretty graph read this post. On github is the whole code in one *.R* file.

[…] Multiple-mediator analysis with lavaan […]

LikeLike

Your blog is super helpful, thank you for the contents. Could you explain a little bit more why a one must add the covariance between the two moderators (M1 ~~ M2)? Could you suggest a paper that explores this? Thank you.

LikeLike

Hi Patricia, I do not know of a paper exploring this topic in specific. Preacher and Hayes (2008) talk about this briefly, maybe you can find more references in their paper. That is also the paper I used to translate the mPlus formula to lavaan, see the `with’ clause in the mplus statements in appendix A.

LikeLike

Thanks, the paper solves my question!

LikeLiked by 1 person

[…] Multiple-mediator analysis with lavaan […]

LikeLike

Thanks for the post.

Is the same approach feasible with three mediators?

LikeLike

Yes it is. You will have to extend the model to accommodate for a third mediator. Starting from modeling the mediator itself, adding indirect and total effects plus modeling the covariances of the 3 mediators which would be m1~~m2, m1~~m3 and m2~~m3. I suggest you then check with semPlot whether you created all the paths… Goodluck!

LikeLike

Thanks for you help. I did what you recommended to do and it worked.

Now, I’m little bit confused. For practicing, I changed the order of the two mediators in your example and I noticed that the “total effect” is not the same. Is there a reason for that?

Defined Parameters:

Estimate Std.Err z-value P(>|z|)

indirect1 -0.237 0.087 -2.720 0.007

indirect2 -0.287 0.088 -3.257 0.001

total -0.328 0.206 -1.596 0.110

Defined Parameters:

Estimate Std.Err z-value P(>|z|)

indirect1 -0.287 0.088 -3.257 0.001

indirect2 -0.237 0.087 -2.720 0.007

total -0.429 0.200 -2.139 0.032

LikeLiked by 1 person

I could not reproduce your data. How did you invert the mediators? Using this formula:

Y ~ b2 * M2 + b1 * M1 + c * X

I get the exact same indirect and total effects for both models. If you like you could mail me your model specification [varati at hotmail dot com] and I can see whether I figure out what is different between our approaches. By the way. Your comment point out to me that I made a mistake in the specification of the total effect which I did not spot before.

total := c + (a1 * b1) + (a1 * b1)

should be:

total := c + (a1 * b1) + (a2 * b2)

Thank you very much for sending me into the right direction!

LikeLike

I just did

Data <- data.frame(X = X, Y = Y, M1 = M2, M2 = M) instead of

Data <- data.frame(X = X, Y = Y, M1 = M, M2 = M2)

LikeLike

That explains the difference. The reason you got different total effect is the mistake in the formula of the total effect

total := c + (a1 * b1) + (a1 * b1)

should be

total := c + (a1 * b1) + (a2 * b2)

If you try the proper formula the total effects are identical. The reason why it happened is that, the wrong formula was using the first mediator for both estimates. Therefore, inverting the order of the mediators yielded different estimates.

LikeLike

Thanks a lot. Now everything is ok.

LikeLiked by 1 person

When I write this syntax for the model, I receive this error:

lavaan ERROR: unknown label(s) in variable definition(s): a1 b1 a2 a3 a4 b2 a5 a6 b6 c1 c2 c3

I’m not using R for a long time, so this could be a beginner’s question. Thank you for posting this! It is really helpful.

LikeLike

Hi Bogdan, could you please provide more context? In my example I use those as coefficients which then estimated by lavaan. Because you do not provide enough context I cannot know whether those are you variables’ names or coefficients which should be estimated…

LikeLike

Thank you so much for your easy to follow directions and code.

LikeLiked by 1 person

Thank you, this is such a clear, concise, and informative article with an elegant set of solutions and helpful links for further information!

LikeLiked by 1 person

Hi,

Your multiple mediator model example seems to be just identified (has 0 degrees of freedom) and thus does not estimate model fit. The same happens with my own two mediator model, which is essentially identical but has a few control variables added. I could live with this, if I knew it was a necessary property of multiple mediator models (with no latent variables). Unfortunately, I could not confirm this anywhere. Indeed, Appendix B in the above cited Preacher & Hayes 2008 paper has an SPSS output which provides fit statistics.

So, I wonder if you could give me any pointers whether I should be worried about the lack of model fit statistics and more broadly, why some multiple mediator SEM models are just identified and others are not.

It’s much appreciated.

Alexander

LikeLiked by 1 person

Hi Alexander, thank you for the question, I think I could develop this into a follow up post. In the mean time I hope I can give you some elements to get going. I believe just identified models are not per se problematic, as far as the theory behind them is solid. About determining the goodness of fit of your model you can look at RMSEA, SRMR, and CFI (e.g. fitMeasures(mdlFit, “cfi”)). Engel, Moosbrugger and Muller (2003) give a nice primer (Evaluating fit of structural equation models…). Another paper I also like is ‘A Meditation on mediation’, it provides a way to estimate the effect size of the mediation effect. As an extension to that I also compute AIC of the given model, then invert the mediator with the independent variable and check how AIC and effect size of the mediation effect changes. This procedure might provide more insights just by comparing one model with another, but then again, when using numbers basically any test is possible, but it remains to establish whether the test itself makes sense. The bottom line for me is that the theory has to drive the testing. Therefore I prefer a model with a less nice fit but which provides a better understanding to a model with a very good fit but making the theoretical explanation more complex.

LikeLike

p.s.: adding “fit.measures=TRUE” to the summary call also provides some of the fit indexes that I mention above. By adding ‘rsquare=TRUE’ one can ask for R^2 as well. HTH

LikeLike

Thanks for the reply! I’m looking forward to the new post!

Just to clarify, what I was referring to is that your “multipleMediation” model from above and my own model both result in perfect fit(?): CFI = TFI =1, RMSEA = SRMR = 0, p-value that RMSEA <= 0.05 is NA. It is true, your model estimates AIC & BIC (in my model, these are missing for whatever reason). Perhaps, I was wrong to assume that this is due to the model being just identified. I was mainly assuming this, because if the covariance of the two mediators is not estimated, the fit measures become more meaningful both in your example and in my own model.

Disclaimer, I haven't yet looked into the papers you recommend.

LikeLike

The paper won’t help you with the ‘perfect fit (?)’ issue I am afraid. I am not sure why AIC and BIC would not be estimated in your model, but I know there is a google group for sem and one for lavaan and maybe they will be able to provide more insight that I could?

LikeLike

Hi,

Thank you so much for this helpful post!

So how would control variables fit into this? I have been trying to add 3 controls to all legs of the mediation, but I am wondering how to address the indirect effect. Here is what I have:

model4 <- ' # DIRECT EFFECT

Y ~ c*X

# mediator

M ~ a1*X + a2*gender + a3*race + a4*SES

Y ~ b1*M + b2*gender + b3*race + b4*SES

# INDIRECT EFFECT (a*b)

ab := a1*b1

# TOTAL EFFECT

total := c + ((b1+b2+b3+b4)*(a1+a2+a3+a4))

'

emoabu.ind4 <- sem(model.emoabu4, data = SPIN, se = "bootstrap")

summary(emoabu.ind4)

Do I need to have multiple lines for the indirect and total effects?

Thank you so much!

LikeLike

HI Sarah, I am sorry but I am afraid I do not understand what you are trying to do. Do you have one mediator and three control variables? Then maybe you are looking for something like this:

# DIRECT EFFECT

Y ~ b1*M + c1*X + c2*gender + c3*race + c4*SES

# mediator

M ~ a1*X + a2*gender + a3*race + a4*SES

# INDIRECT EFFECT (a*b)

ab := a1*b1

# TOTAL EFFECT

total1 := c1 + (a1 * b1)

total2 := c2 + (a2 * b1)

total3 := c3 + (a3 * b1)

total4 := c4 + (a4 * b1)

If you have semPlot installed try to plot the fit that you get from the sem() function to see determine whether you successfully represented all the path that you wanted to include in your analysis.

LikeLike

Thank you so much!

LikeLiked by 1 person

Thanks for this really helpful post. I have a question on why the effect sizes I obtain with lavaan might differ from those I obtain using cumulative link models (clm, r package: ordinal) , which yields standardized beta estimates for predictor variables.

I have a dependent variable, Y, which is specified as an ordinal categorical variable (with 6 levels, where 1 < 2 < 3 < 4 < 5 < 6). I have my predictor, X, and also 2 mediators (M1 and M2, where M1 ~~ M2), and 3 other 'control' variables (specified as B, C and D, respectively). So my mediation model looks like this:

mediation.model.3<- '

Y ~ b1*M1 + b2*M2 + c1*X + c2*B + c3*C + c4*D

M1 ~ a1*X + a3*B + a5*C + a6*D

M2 ~ a2*X + a4*B + a7*C + a8*D

direct := c1

indirect1 := a1*b1

indirect2 := a2*b2

total := c1 + (a1*b1) + (a2*b2)

M1 ~~ M2 '

The thing I am confused about is that my predictor variable X has a large 'significant' beta coefficient when I run the full model using clm (r package: ordinal) – but when I look to see if X is mediated by M1 or M2 in package lavaan, the effect of X itself is no longer significant (i.e. the CI associated with the total effect include zero).

I'm wondering if I am violating the assumptions of lavaan by fitting an ordered categorical response term as the DV (Y). Do you know? Thank you!

LikeLike

Hi Nichola, I don’t know what the clm function does, I have never used the ordinal package before…

I do not have experience with fitting ordered categorical variables in lavaan, my experience with lmer and glm says that they give an error if things do not make sense. I am not sure I am allowed to infer that everything is working fine if you did not get an error when you fitted your model, maybe asking to the lavaan google group whether lavaan handles ordered categorical response terms will provide you a better answer than mine.

On the other hand, note that there is a bunch of people arguing that mediation does not require a significant total effect of X on Y (a list of references can be found in the Preacher and Hayes’ paper I cite above). Good luck!

LikeLike

Thanks very much – helpful suggestions!

LikeLiked by 1 person