This post extends this previous one on multiple-mediation with lavaan. Here I modeled a ‘real’ dataset instead of a randomly generated one. This dataset we used previously for a paper published some time ago. There we investigated whether fear of an imperfect fat self was a stronger mediator than hope of a perfect thin self on dietary restraint in college women. At the time of the paper’s publication we performed the analysis using the SPSS macro INDIRECT . However,

Continue reading “Multiple-mediation example with lavaan”

# R

# Multiple-mediator analysis with lavaan

I wrote this brief introductory post for my friend Simon. I want to show how easy the transition from SPSS to R can be. In the specific case of mediation analysis the transition to R can be very smooth because, thanks to lavaan, the R knowledge required to use the package is minimal. Analysis of mediator effects in lavaan requires only the specification of the model, all the other processes are automated by the package. So, after reading in the data, running the test is trivial.

Continue reading “Multiple-mediator analysis with lavaan”

# Streamgraphs in base::R [e.II]

Until recently I did not have a practical application in which to use streamgraphs. In fact, I still find the visualisation complex to understand, abstract and a bit too artistic. While I recognise that the strength of streamgraphs is the display of all the time series’ values into one (possibly interactive) plot, the amount of data displayed is massive, with many streams and even more data points. Because of the amount of data displayed Continue reading “Streamgraphs in base::R [e.II]”

# Streamgraphs in base::R [e.I]

This is a very simple script plotting a streamgraphs in R. I wanted to be able to plot a streamgraph in base R, without requiring additional libraries. For example, here I made an interactive streamgraph visualization depicting temperatures measured worldwide in the last 150 years. Since a streamgraph is a fancy version of a stacked bar plot, I thought it should have been easy to reproduce if one plots an area on top of another area. In other words, the upper limit of one area is the lower limit of the following area, stacked on top of one another. This is a simple problem to solve in R. First, make a matrix of random numbers with as many columns as streams and as many time points as rows. Second, sum up the columns of the matrix so that the lines add on top of each other. Third, use the polygon function to create the stacked graph.

The generation of data is straightforward:

timePoints <- 100 nStreams <- 10 set.seed(09022017) values <- rnorm(timePoints*nStreams)

I constrained the data to be all positive values, otherwise the streams would overlap between one and another.

values <- abs(values) # reshape into matrix dim(values) <- c(timePoints, nStreams)

In the second part, each new columns of data should be added to the one before. To check that each subsequent line is above its predecessor I used the matplot function, which should display stacked lines.

yy <- matrix(0, timePoints, nStreams) yy[, 1] <- values[,1] for (iStream in 2 : nStreams) yy[, iStream] <- rowSums(values[,1 : iStream]) matplot(yy, type = 'l', lty = 1, bty = 'n')

To make the plot look less peaky I smoothed the values with the smooth.spline function. I think smoothed peaks are also much prettier.

yy[, iStream] <- predict(smooth.spline(rowSums(values[,1 : iStream])))$y

Now, the areas between the lines need to be filled. Filled areas can be plotted with the polygon function. The function polygon requires data going from left to right and backwards for the x axis, and y values for all those x coordinates. In its simplest call polygon works like this:

plot.new() left <- 0 right <- 1 up <- 1 down <- 0 xx <- c(left, right, right, left) yy <- c(down, down, up, up) polygon(xx, yy, col = 'red', border = NA)

If instead of two points one uses two arrays the plot can depict more complex areas. A pass of smooth.spline to soften the rough edges and the stream is ready. The graph is a bit weird-looking, but it gives the idea.

n <- 100 xx <- c(1:n, n:1) y <- c(rnorm(n), rnorm(n)) yy <- predict(smooth.spline(y, xx))$y plot (xx, yy, type = "n", bty = 'n' xlab = "Time", ylab = "Smoothed randomness") polygon(xx, yy, col = "gray", border = "red")

To keep the data organized and simple to feed to polygon, I put the data into a matrix with twice as many columns as the starting matrix. Then each pair of columns will contain the lower and upper boundaries of each stream of data. In particular, the columns for the first streamgraph are 1) an array of 0 and 2) the previous column plus the values of the first ‘stream’ of data. The second streamgraph is, for column three, the same values of the previous column and for column four the values of column three plus the values of the second ‘stream’ of data. Then this is easy to put on a loop and iterate for the number of streams of data.

nStreams <- 4 yy <- matrix(0, timePoints, (nStreams * 2)) for (iStream in 1 : nStreams) { if (iStream == 1) y[, iStream * 2] <- predict(smooth.spline(values[, iStream]))$y else { yy[, iStream * 2 - 1] <- yy[, (iStream - 1) * 2] yy[, iStream * 2] <- predict(smooth.spline(values[, iStream]))$y + yy[, iStream * 2 - 1] } }

The resulting matrix can be plotted with a for loop choosing the correct upper and lower boundaries.

x11() xx <- c(1:timePoints, timePoints:1) plot (xx, xx, type = "n", main = "Streamgraph", xlab = "Time", ylab = "Amplitude", ylim = range(yy), bty = 'n') for (iStream in 1 : nStreams) { y <- c(yy[, iStream * 2], rev(yy[, iStream * 2 - 1])) polygon(xx, y, col = iStream + 1, border = NA) }

… and trying with actual data I leave for a follow up!

# Citations Network

This post describes the visualisation of a social network I made for a Coursera course on Data Visualisation. For this specific assignment I opted for gathering data on my own rather than using the datasets provided by the course instructor. I wanted to gather the data myself to try to visualise ‘real’ data. With real data I mean data that I try to scrape from the web and visualise. Basically with ‘real’ data I mean what other people call dirty data (i.e. data that is not been processed or polished before use). The question was also whether I could Continue reading “Citations Network”

# Color-coded parallel coordinates in R

Parallel coordinates can be very helpful in understanding relationships among more than two variables. The first time I encountered parallel coordinates I did not understand their potential, until I saw Alberto Cairo’s slopegraph. In that slopegraph Cairo color-coded the Continue reading “Color-coded parallel coordinates in R”

# Custom colormap for image() in R

Creating a custom colormap in R to plot a matrix is simple:

nsamples <- 20 matrix2plot <- 1:nsamples dim(matrix2plot) <- c(4, 5) colors2spaceThrough <- c('red', 'white', 'blue') customColorMap <- colorRampPalette(colors2spaceThrough)(nsamples) image(1:4, 1:5, matrix2plot, col = customColorMap)