Multiple mediation: extracting output

Lavaan’s output is copious. Especially when fitting a complex model, lavaan’s output is literally a mine of information. Finding what is relevant is not always easy, therefore I will try to describe a way to summarize lavaan’s summary output.

First I load the results of the model fit which I saved into an ‘.Rdata’ file the previous time. This will save a bit of waiting time since running the 5000 bootstrap samples of the multiple mediation model took a while. The function load does not show the name of the variable(s) it loads. However, by calling ls(), I can find in the R workspace the name of the variable in which I saved the model fitted the previous time: fit. The results of lavaan’s fit become available through a call to summary:

summary(fit)

The second line of the summary output points out a discrepancy between the ‘Used’ and ‘Total’ number of observations, 222 and 224 respectively. I fear this might be related to having missing values in the dataset. I can check for the presence of missing values using the function is.na. Because is.na returns TRUE/FALSE value(s), when the function is applied to a lot of data it is difficult to spot one TRUE value among many FALSE values. To identify the presence of missing values I use a trick which uses the fact that true and false values are represented as ones and zeros. The trick consists in summing up the values returned by is.na (e.g. a vector of as many 1s as missing values and as many 0s as non-missing values) and check if the sum return a non-zero value. I then apply this trick separately to all the variable in my dataset to discover that…

sapply(dat, function(x) sum(is.na(x)))
>  fear   hope   diet    bmi    age     pd     ps Porder 
>     0      0      0      2      0      0      0      0

… bmi has two missing values. One could run lavaan model again on the dataset removing the missing values first:

dat <- na.omit(dat)

… but the result would not change since lavaan does remove the missing values automatically. This spares us some tedious typing of code and we can be grateful someone was so considerate to think on our behalf.

In the output there are then parameters estimates for the regressions, covariances and defined parameters which include indirect effects, contrasts and total effect. For each of the estimates the output reports the label, the estimate itself, the standard error, the z-value and it’s associated significance. A much more detailed output can be requested by adding parameters to the call of summary(). For example, the output can contain measures of fit (e.g. AIC, BIC), standardized parameter estimates, root mean square error and (bootstrapped) confidence intervals. All these measures are returned simply setting their parameters to TRUE, as I did below:

summary(fit, fit.measures=TRUE,
  	standardize=TRUE,
  	rsquare=TRUE,
  	estimates = TRUE,
  	ci = TRUE)

The R console is now nicely filled with labels and numbers, but choosing which ones to report is a daunting task. The function parameterEstimates helps enormously in extracting output from lavaan. The most helpful advantage of using parameterEstimates instead of summary is that one can assign the output of the call to a variable. This is helpful because it allows to extract the single estimate or significance value needed. For example, this is the call to assign to a variable tmp the 1) fit estimates, 2) standardized estimates, 3) and confidence intervals computed with the adjusted bootstrap percentile method with correction for bias but not for acceleration (bca.simple).

tmp <- parameterEstimates(fit, standardized=TRUE, ci=TRUE,
                          boot.ci.type="bca.simple")

tmp is a data.frame containing all the information extracted from the model fit by parameterEstimates. The first columns of tmp contain the equations specifications in which lhs stands for Left Hand Side of the equation, op stands for operator, rhs stands for Right Hand Side of the equation. With these three columns we can interpret lavaan inner workings by combining the labels of lhs and rhs through the operators, where:

~
implies regressed on
~~
implies:

variance
if lhs and rhs are the same
covariance
if lhs and rhs are different
:=
implies defined as (for the indirect and total effects and for the contrasts).

The second part of the output reports estimates, standard errors, z-values and associated significance, and the bootstrapped upper and lower boundaries of the confidence intervals. The third part of the output displays three types of standardized estimates. Std.lv reports standardized latent variables and the manifest variables in the scale range of their raw scores. Std.all reports standardized estimates of all variables. Std.nox reports standardized estimates for all the variables except for the exogenous manifest variables. In general, the reported standardized parameter estimates are the ones in the std.all column.

Having stored the output of the model fit into a data.frame allows to access all its values. For example one can find which estimates were significant looking for the p-values below .05. Alternatively, using the confidence intervals (c.i.), one can check for significance by identifying c.i. which do not contain zero. A trick to quickly spot significance through inspection of c.i. is by multiplying both lower and upper boundaries and determining whether the multiplication yield a value higher than zero. This trick works because an interval not containing zeros has either both boundaries positive or both negative. Therefore the multiplication of these c.i. boundaries yields a positive number. Eventually one could also compare p- or c.i.-significance tests to see whether they yield the same results, as I did below:

data.frame(round(tmp[,8:10], 2), 
	ciSig = ((tmp[,9] * tmp[,10]) > 0), 
	pSign = (tmp[,8] < .05), 
	ciVsP = ((tmp[,9] * tmp[,10]) > 0) == (tmp[,8] < .05))

In this code snippet I used round to display the output because I do not find helpful the scientific notation typically used by R. I personally find more immediate to grasp the significance of .08 than 8.428792e-02.
It also looks like I have been successful in creating more output rather than reducing the already copious output from lavaan. However, the point I wanted to make will become clear in the few following lines. In fact, I can use the data.frame in which I stored lavaan output to select the output I am interested in and adding a column showing the significance simply by displaying a star, which is very intuitive. Below I create a data.frame properly condensing lavaan’s output. The data.frame contains the names of the variables interested, the estimates, confidence intervals and significance levels:

tableValues = data.frame(tmp[ ,1:3], round(tmp[,c(5,9:10)], 2), 
	ciSig = ifelse((tmp[,9] * tmp[,10]) > 0, '*', ''))
tableValues$ciSig[tmp$op == '~~'] = ''

Note that I select the estimated values instead of the standardized ones (e.g., std.all). This is because the confidence intervals are computed for the estimates, but not for the standardized ones. Therefore I prefer the unstandardized estimates representation since, in general, I find it confusing to see an estimate presented with confidence interval which do not contain it. Alternatively, one could choose for a yet more compact summary displaying only standardized estimates and the stars representing the significance.

To finish this post with a sparkle, here is a little gift to all the latex fan. The tableValues data.frame can be exported to latex directly using the xtable package. It literally takes two lines of code:

if(!require("xtable")) install.packages("xtable",
    repos="http://cran.rstudio.com/", dependencies=TRUE)
xtable(tableValues,	caption="Multiple-mediator analysis example.")

The output of xtable can be directly written to a tex file. This is an handy option if you, like me, do not like to cut and paste back and forth between the R console and your editor of choice. To make the file ‘texable’ I add to the exported file the minimal latex syntax as:

cat('\\documentclass{article}\n', file = "table.tex")
cat('\\begin{document}\n', file = "table.tex", append = TRUE)
resultsTable <- xtable(tableValues,	
	caption="Multiple-mediator analysis.")
print(resultsTable, file="table.tex", append = TRUE)
cat('\\end{document}', file = "table.tex", append = TRUE)

… call pdflatex on the generated tex file and there you have your pretty pdf.

pdflatex /where/your/file/is/table.tex

AWESOME!

Oups, I forgot! The output from the xtable function can be used also by non-latex users. In fact, xtable’s output can be imported/copy-pasted into spreadsheet software like, for example, libre-office’s Calc. Since latex uses $ to separate columns, cut and pasting xtable’s output using $ as separator option when importing into a spreadsheet will work too. You’ll have to manually remove the ‘//’ from the cells in the last column, though.

A condensed, uncommented R file with all the code is available on github. Grab it if you need it, want to spare some typing, do not like cut ‘n paste, have nothing else to do…

Advertisements
Multiple mediation: extracting output

2 thoughts on “Multiple mediation: extracting output

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s