This post extends this previous one on multiple-mediation with lavaan. Here I modeled a ‘real’ dataset instead of a randomly generated one. This dataset we used previously for a paper published some time ago. There we investigated whether fear of an imperfect fat self was a stronger mediator than hope of a perfect thin self on dietary restraint in college women. At the time of the paper’s publication we performed the analysis using the SPSS macro INDIRECT . However,

Continue reading “Multiple-mediation example with lavaan”

# Multiple-mediator analysis with lavaan

I wrote this brief introductory post for my friend Simon. I want to show how easy the transition from SPSS to R can be. In the specific case of mediation analysis the transition to R can be very smooth because, thanks to lavaan, the R knowledge required to use the package is minimal. Analysis of mediator effects in lavaan requires only the specification of the model, all the other processes are automated by the package. So, after reading in the data, running the test is trivial.

Continue reading “Multiple-mediator analysis with lavaan”

# Four dimensions in two dimensions

This scatterplot is one of the best data visualisation I made. I like it because it concentrates a lot of information into a single visualisation. The scatterplot displays four dimensional data (i.e., four variables) using a two dimensional scatterplot. I made the first implementation in R, but because I wanted to add interactivity I switched to d3.js. Below I describe the choices I made to display the information and how I coded them in d3.js. Continue reading “Four dimensions in two dimensions”

# Streamgraphs in base::R [e.II]

Until recently I did not have a practical application in which to use streamgraphs. In fact, I still find the visualisation complex to understand, abstract and a bit too artistic. While I recognise that the strength of streamgraphs is the display of all the time seriesâ€™ values into one (possibly interactive) plot, the amount of data displayed is massive, with many streams and even more data points. Because of the amount of data displayed Continue reading “Streamgraphs in base::R [e.II]”

# Streamgraphs in base::R [e.I]

This is a very simple script plotting a streamgraphs in R. I wanted to be able to plot a streamgraph in base R, without requiring additional libraries. For example, here I made an interactive streamgraph visualization depicting temperatures measured worldwide in the last 150 years. Since a streamgraph is a fancy version of a stacked bar plot, I thought it should have been easy to reproduce if one plots an area on top of another area. In other words, the upper limit of one area is the lower limit of the following area, stacked on top of one another. This is a simple problem to solve in R. First, make a matrix of random numbers with as many columns as streams and as many time points as rows. Second, sum up the columns of the matrix so that the lines add on top of each other. Third, use the polygon function to create the stacked graph.

The generation of data is straightforward:

timePoints <- 100 nStreams <- 10 set.seed(09022017) values <- rnorm(timePoints*nStreams)

I constrained the data to be all positive values, otherwise the streams would overlap between one and another.

values <- abs(values) # reshape into matrix dim(values) <- c(timePoints, nStreams)

In the second part, each new columns of data should be added to the one before. To check that each subsequent line is above its predecessor I used the matplot function, which should display stacked lines.

yy <- matrix(0, timePoints, nStreams) yy[, 1] <- values[,1] for (iStream in 2 : nStreams) yy[, iStream] <- rowSums(values[,1 : iStream]) matplot(yy, type = 'l', lty = 1, bty = 'n')

To make the plot look less peaky I smoothed the values with the smooth.spline function. I think smoothed peaks are also much prettier.

yy[, iStream] <- predict(smooth.spline(rowSums(values[,1 : iStream])))$y

Now, the areas between the lines need to be filled. Filled areas can be plotted with the polygon function. The function polygon requires data going from left to right and backwards for the x axis, and y values for all those x coordinates. In its simplest call polygon works like this:

plot.new() left <- 0 right <- 1 up <- 1 down <- 0 xx <- c(left, right, right, left) yy <- c(down, down, up, up) polygon(xx, yy, col = 'red', border = NA)

If instead of two points one uses two arrays the plot can depict more complex areas. A pass of smooth.spline to soften the rough edges and the stream is ready. The graph is a bit weird-looking, but it gives the idea.

n <- 100 xx <- c(1:n, n:1) y <- c(rnorm(n), rnorm(n)) yy <- predict(smooth.spline(y, xx))$y plot (xx, yy, type = "n", bty = 'n' xlab = "Time", ylab = "Smoothed randomness") polygon(xx, yy, col = "gray", border = "red")

To keep the data organized and simple to feed to polygon, I put the data into a matrix with twice as many columns as the starting matrix. Then each pair of columns will contain the lower and upper boundaries of each stream of data. In particular, the columns for the first streamgraph are 1) an array of 0 and 2) the previous column plus the values of the first ‘stream’ of data. The second streamgraph is, for column three, the same values of the previous column and for column four the values of column three plus the values of the second ‘stream’ of data. Then this is easy to put on a loop and iterate for the number of streams of data.

nStreams <- 4 yy <- matrix(0, timePoints, (nStreams * 2)) for (iStream in 1 : nStreams) { if (iStream == 1) y[, iStream * 2] <- predict(smooth.spline(values[, iStream]))$y else { yy[, iStream * 2 - 1] <- yy[, (iStream - 1) * 2] yy[, iStream * 2] <- predict(smooth.spline(values[, iStream]))$y + yy[, iStream * 2 - 1] } }

The resulting matrix can be plotted with a for loop choosing the correct upper and lower boundaries.

x11() xx <- c(1:timePoints, timePoints:1) plot (xx, xx, type = "n", main = "Streamgraph", xlab = "Time", ylab = "Amplitude", ylim = range(yy), bty = 'n') for (iStream in 1 : nStreams) { y <- c(yy[, iStream * 2], rev(yy[, iStream * 2 - 1])) polygon(xx, y, col = iStream + 1, border = NA) }

… and trying with actual data I leave for a follow up!

# Clustering Pumps [mlw4]

This is the fourth and last assignment of Machine Learning for Data Analysis by Wesleyan University on Coursera. My assignment diverges quite a bit from the approach taken by the instructor since I wanted to have only three clusters to determine pumps functionality (functional, functional needs repair, and Continue reading “Clustering Pumps [mlw4]”

# Shrinking pumps? [mlw3]

This is the third assignment of the Machine Learning for Data Analysis by Wesleyan University on Coursera. I applied least absolute shrinkage and selection operator (LASSO) to the DrivenData data set pumpItUp. LASSO is a technique which does variable selection shrinking the ‘useless’ coefficients (i.e., variables) toward zero. Applying this method Continue reading “Shrinking pumps? [mlw3]”