This posts on multiple mediation on lavaan supplements the two previous ones (1 – introducing multiple mediator analysis with lavaan and 2 – showing an example analysis) by describing how to process lavaan’s output graphically. I discovered the handy package semPlot and I am very positive about it. I will make the example as reproducible as possible, so that each step can be repeated. Also, I am going to try to provide more explanation about the R commands I used because a friend pointed out that the description of the steps was sometimes a bit dry and abstract.
First read in the data. To allow the reproduction of this example I uploaded the dataset into the github repository hosting also the R code. To read in the data I used the function read.delim2 which reads a tab-delimited file which has a header (i.e. a row with names of the variables in the dataset) and separates with a comma a number’s decimal part. If your data has a different format than this, then try importing your data in R using read.table or one of the other options listed in help(read.table).
dat <- read.delim2('~/Dropbox/mPlusTests/PerfectMediation2.dat', as.is = TRUE) names(dat) <- c('fear', 'hope', 'diet', 'bmi', 'age', 'pd', 'ps', 'Porder')
I renamed the variables to shorter names to ease typing the variables name in the model specification (Note that while plotting semPlot shortens the variable names to three-letters acronyms, but I kept them like this because I think they are more informative than as three-letters acronyms). Then I read in the model’s syntax which I stored in the file multipleMediatorCoded_simple.lav. This text file allows easy editing of the model specification. With easy I mean that the model specification can be edited in a text editor which, in my opinion, is more convenient than editing the model specification if it has to be copy-pasted to the R console.
myModel <- readLines("~/Dropbox/mPlusTests/multipleMediatorCoded_simple.lav")
Now I will fit the model using the sem function. I request to estimate bootstrapped standard errors and I set the bootstraps sample to be 5000. I choose these settings based on the paper on multiple mediation analysis of Preacher and Hayes and, moreover, to resemble the closest possible the procedure we used in the paper.
require("lavaan") set.seed(42) fit <- sem(myModel, data=dat, se = "bootstrap", bootstrap = 5000)
A quick note about the function set.seed(). Set.seed() sets the seed of the pseudo-random number generator so that the sequences of random numbers it produces are the same across different users. The reason I am setting it is that, when trying to reproduce someone’s steps it is helpful when the output numbers are the same because if the numbers are the same it is easier to identify what one is looking for. In short, set.seed makes results reproducible in case someone tries following my steps.
The model fit can be extracted using the summary function. As a default the summary function does not return the bootstrapped standard errors. However, these can be requested with the settings:
summary(fit, fit.measures=TRUE, standardize=TRUE, rsquare=TRUE, estimates = TRUE, ci = TRUE)
It took something between 20 minutes and one hour before my computer estimated the model. I think this is quite long and therefore I find it convenient to store the model’s output to a file. This prevents me from re-estimating the model every time I need to check what were the results. The save function allows to export the model results to an external file:
save(fit, file = "modelFit.RData")
One remark about the location where the file is saved. If you are looking at the folder where the data is and there is no new file called modelFit.RData and, after repeating the operation several times, you are still wondering where the files might be, WELCOME! R, as a default, opens in the home directory and does everything there. If you do not know what the home directory is do not worry, you can discover it typing the command getwd() in the R console. If you then move with your file manager into the home folder you will find the file modelFit.RData. If, as me, you are lazy and do not like to move the files from one location to another by dragging and dropping with the mouse, there are at least two alternatives. One option is to set R into the current directory typing in the R console:
The second option is to specify the current directory in the save command as in
save(fit, file = "directory/where/your/files/are/modelFit.RData")
The saved data can be loaded with
Lavaan’s output can be plotted with the package semPlot. SemPlot is very useful because it automates the creation of the model’s plot. Before discovering semPlot, I was doing the model plotting manually, and it was quite frustrating. Already the checking that all the coefficients were on the correct path was a tedious job. I think it is impressive that a solution coming out of the box without having to set many parameters works so well at the first attempt. Big kudos to the developer.
fit <- sem(myModel, data=Data) semPaths(fit, "model", "est", layout = 'spring', node.label.cex=5, edge.label.cex=1.25, fade=FALSE) title("Non-standardized coefficients")
I chose the "spring" layout because it was the only layout in which the coefficients' labels did not overlap. With the other layouts the paths were passing all through the middle and the labels of the coefficients were nicely overlapping at the center. The spring layout solved this issue. The two "cex" settings are specifying the sizes of the labels at the node and edge. In both cases they are increasing the sizes.
SemPlot can also plot standardize coefficients straightaway. This is very handy. One only need to set replace the "est" with "std" (estimates and standardized, respectively) when calling the semPath function, as below:
semPaths(fit, "model", "std", layout = 'spring',
node.label.cex=5, edge.label.cex=1.25, fade=FALSE)
That’s it. Sadly, customizing the plot is more difficult than I thought. So, when unhappy with the given plot, it might be challenging to specify a way to do things differently. I tried customizing the plot by understanding how semPlot works, but I could not figure out what or how to do it. Anyway, when unhappy, the manual solution is still valid: export the graph to a pdf and import the pdf to edit with inkscape, which is also free and awesome. One might want to edit the plot manually anyway to add stars identifying statistically significant paths. To export to pdf include pdf and dev commands in your syntax.
# before pdf(file = 'models.pdf') # plotting semPaths(fit, "model", "std", layout = 'spring', node.label.cex=5, edge.label.cex=1.25, fade=FALSE) title("Standardized coefficients") # after dev.off()
In conclusion lavaan and semPlot provide a fantastic combination of tools for multiple-mediation analysis. In my opinion they are on the way to become the super heroes of multiple mediation analysis and maybe structural equation modelling in general. Even if a picture is worth a thousands words, in the next post I will describe how to digest lavaan’s (quite lengthy) output.
Update July 2018
With time I have become more proficient with semPlot and graph customization. In this post I wrote a section describing how to customize the plot layout among with other features (e.g., variable names, edge labels, colours, …).