Streamgraphs in base::R [e.II]

Until recently I did not have a practical application in which to use streamgraphs. In fact, I still find the visualisation complex to understand, abstract and a bit too artistic. While I recognise that the strength of streamgraphs is the display of all the time series’ values into one (possibly interactive) plot, the amount of data displayed is massive, with many streams and even more data points. Because of the amount of data displayed Continue reading “Streamgraphs in base::R [e.II]”

Streamgraphs in base::R [e.II]

Streamgraphs in base::R [e.I]

This is a very simple script plotting a streamgraphs in R. I wanted to be able to plot a streamgraph in base R, without requiring additional libraries. For example, here I made an interactive streamgraph visualization depicting temperatures measured worldwide in the last 150 years. Since a streamgraph is a fancy version of a stacked bar plot, I thought it should have been easy to reproduce if one plots an area on top of another area. In other words, the upper limit of one area is the lower limit of the following area, stacked on top of one another. This is a simple problem to solve in R. First, make a matrix of random numbers with as many columns as streams and as many time points as rows. Second, sum up the columns of the matrix so that the lines add on top of each other. Third, use the polygon function to create the stacked graph.

The generation of data is straightforward:

timePoints <- 100
nStreams <- 10
set.seed(09022017)
values <- rnorm(timePoints*nStreams)

I constrained the data to be all positive values, otherwise the streams would overlap between one and another.

values <- abs(values)
# reshape into matrix
dim(values) <- c(timePoints, nStreams) 

In the second part, each new columns of data should be added to the one before. To check that each subsequent line is above its predecessor I used the matplot function, which should display stacked lines.

yy <- matrix(0, timePoints, nStreams)
yy[, 1] <- values[,1]
for (iStream in 2 : nStreams)
	yy[, iStream] <- rowSums(values[,1 : iStream])

matplot(yy, type = 'l', lty = 1, bty = 'n')

To make the plot look less peaky I smoothed the values with the smooth.spline function. I think smoothed peaks are also much prettier.

	yy[, iStream] <- predict(smooth.spline(rowSums(values[,1 : iStream])))$y

Now, the areas between the lines need to be filled. Filled areas can be plotted with the polygon function. The function polygon requires data going from left to right and backwards for the x axis, and y values for all those x coordinates. In its simplest call polygon works like this:

plot.new()
left <- 0
right <- 1
up <- 1
down <- 0
xx <- c(left, right, right, left)
yy <- c(down, down, up, up)
polygon(xx, yy, col = 'red', border = NA)

If instead of two points one uses two arrays the plot can depict more complex areas. A pass of smooth.spline to soften the rough edges and the stream is ready. The graph is a bit weird-looking, but it gives the idea.

n <- 100
xx <- c(1:n, n:1)
y <- c(rnorm(n), rnorm(n))
yy <- predict(smooth.spline(y, xx))$y
plot   (xx, yy, type = "n", bty = 'n'
	xlab = "Time", ylab = "Smoothed randomness")
polygon(xx, yy, col = "gray", border = "red")

To keep the data organized and simple to feed to polygon, I put the data into a matrix with twice as many columns as the starting matrix. Then each pair of columns will contain the lower and upper boundaries of each stream of data. In particular, the columns for the first streamgraph are 1) an array of 0 and 2) the previous column plus the values of the first ‘stream’ of data. The second streamgraph is, for column three, the same values of the previous column and for column four the values of column three plus the values of the second ‘stream’ of data. Then this is easy to put on a loop and iterate for the number of streams of data.

nStreams <- 4 
yy <- matrix(0, timePoints, (nStreams * 2))
for (iStream in 1 : nStreams)
{
    if (iStream == 1)
	y[, iStream * 2] <- predict(smooth.spline(values[, iStream]))$y
    else {
	yy[, iStream * 2 - 1] <- yy[, (iStream - 1) * 2]
	yy[, iStream * 2] <- predict(smooth.spline(values[, iStream]))$y + 
		yy[, iStream * 2 - 1]
    }	
}

The resulting matrix can be plotted with a for loop choosing the correct upper and lower boundaries.

x11()
xx <- c(1:timePoints, timePoints:1)
plot (xx, xx, type = "n", main = "Streamgraph",
	xlab = "Time", 
	ylab = "Amplitude", ylim = range(yy),
	bty = 'n')
for (iStream in 1 : nStreams)
{
	y <- c(yy[, iStream * 2], rev(yy[, iStream * 2 - 1]))
	polygon(xx, y, col = iStream + 1, border = NA)
}

… and trying with actual data I leave for a follow up!

Streamgraphs in base::R [e.I]

Streamgraph visualization of global warming

Streamgraph of global warming

Streamgraphs are very pretty!

Streamgraphs are a very catchy way to represent stacked area graphs. Streamgraphs are most commonly used to represent time series data. I encountered streamgraphs for the first time during a coursera data visualization class and I immediately wanted to try to reproduce them. Continue reading “Streamgraph visualization of global warming”

Streamgraph visualization of global warming