Streamgraphs in base::R [e.I]

This is the first of a series of four post on producing a streamgraph in plain R code. Here I present a very simple R script plotting a streamgraph. In this post I made streamgraph in d3.js, but I wanted to be able to do the same in R, to not depend on a webpage, or without requiring additional libraries (e.g. the streamgraph htmlwidgtet is only a wrapper around d3, and does not work always smoothly).

Since a streamgraph is a fancy version of a stacked bar plot, I thought it should have been easy to reproduce if one plots an area on top of another area. In other words, the upper limit of one area is the lower limit of the following area, stacked on top of one another. This is a simple problem to solve in R. First, make a matrix of random numbers with as many columns as streams and as many time points as rows. Second, sum up the columns of the matrix so that the lines add on top of each other. Third, use the polygon function to create the stacked graph.

The generation of data is straightforward:

timePoints <- 100
nStreams <- 10
set.seed(09022017)
values <- rnorm(timePoints*nStreams)

I constrained the data to be all positive values, otherwise the streams would overlap between one and another.

values <- abs(values)
# reshape into matrix
dim(values) <- c(timePoints, nStreams) 

In the second part, each new columns of data should be added to the one before. To check that each subsequent line is above its predecessor I used the matplot function, which should display stacked lines.

yy <- matrix(0, timePoints, nStreams)
yy[, 1] <- values[,1]
for (iStream in 2 : nStreams)
	yy[, iStream] <- rowSums(values[,1 : iStream])

matplot(yy, type = 'l', lty = 1, bty = 'n')

To make the plot look less peaky I smoothed the values with the smooth.spline function. I think smoothed peaks are also much prettier.

	yy[, iStream] <- predict(smooth.spline(rowSums(values[,1 : iStream])))$y

Now, the areas between the lines need to be filled. Filled areas can be plotted with the polygon function. The function polygon requires data going from left to right and backwards for the x axis, and y values for all those x coordinates. In its simplest call polygon works like this:

plot.new()
left <- 0
right <- 1
up <- 1
down <- 0
xx <- c(left, right, right, left)
yy <- c(down, down, up, up)
polygon(xx, yy, col = 'red', border = NA)

If instead of two points one uses two arrays the plot can depict more complex areas. A pass of smooth.spline to soften the rough edges and the stream is ready. The graph is a bit weird-looking, but it gives the idea.

n <- 100
xx <- c(1:n, n:1)
y <- c(rnorm(n), rnorm(n))
yy <- predict(smooth.spline(y, xx))$y
plot   (xx, yy, type = "n", bty = 'n'
	xlab = "Time", ylab = "Smoothed randomness")
polygon(xx, yy, col = "gray", border = "red")

To keep the data organized and simple to feed to polygon, I put the data into a matrix with twice as many columns as the starting matrix. Then each pair of columns will contain the lower and upper boundaries of each stream of data. In particular, the columns for the first streamgraph are 1) an array of 0 and 2) the previous column plus the values of the first ‘stream’ of data. The second streamgraph is, for column three, the same values of the previous column and for column four the values of column three plus the values of the second ‘stream’ of data. Then this is easy to put on a loop and iterate for the number of streams of data.

nStreams <- 4 
yy <- matrix(0, timePoints, (nStreams * 2))
for (iStream in 1 : nStreams)
{
    if (iStream == 1)
	y[, iStream * 2] <- predict(smooth.spline(values[, iStream]))$y
    else {
	yy[, iStream * 2 - 1] <- yy[, (iStream - 1) * 2]
	yy[, iStream * 2] <- predict(smooth.spline(values[, iStream]))$y + 
		yy[, iStream * 2 - 1]
    }	
}

The resulting matrix can be plotted with a for loop choosing the correct upper and lower boundaries.

x11()
xx <- c(1:timePoints, timePoints:1)
plot (xx, xx, type = "n", main = "Streamgraph",
	xlab = "Time", 
	ylab = "Amplitude", ylim = range(yy),
	bty = 'n')
for (iStream in 1 : nStreams)
{
	y <- c(yy[, iStream * 2], rev(yy[, iStream * 2 - 1]))
	polygon(xx, y, col = iStream + 1, border = NA)
}

… and an example with actual data I leave for this follow up post! Here I describe a more general function to construct a streamgraph.

Streamgraphs in base::R [e.I]

4 thoughts on “Streamgraphs in base::R [e.I]

Leave a comment