Streamgraphs in base::R [e.III]

This is the third post on streamgraph in R. After a simple introduction on how to generate a streamgraphs and an example with actual data it was time for a more general implementation to the creation streamgraphs using R. This post is an attempt at ‘translating’ or interpreting in R the streamgraph’s paper Stacked Graphs – Geometry & Aesthetics by Lee Byron and Martin Wattenberg. Lee Byron original code is in java, and as an additional source of inspiration, I also picked at the javascript code for making streamgraphs by Mike Bostock. I kept the R version as close as possible to these sources, meaning that I did use vectorization for the most part of the ‘translation’. For now it works as such, in the future I might vectorize the code.

Below is the core function generating a streamgraph. It stacks the time-series one on top of the others. Major differences among the four types of streamgraphs I present here are the choice of baseline and how the streams are stacked on top of one another. The simples method to create a streamgraph is the one I coded as zero. Zero referes to the fact that the baseline is set at zero and all the streams are stacked on top of it. In ThemeRiver the baseline is half of the sum of all the values. This is a nice feature of ThemeRiver, because it makes the layout of the graph symmetric. The minimizedWiggle method reproduces the Streamgraph algorithm. It is an extension to ThemeRiver because it improves the readability of the single streams hampering however the readability of the overall silhouette of the graph. The newWiggle method adds to the minimizedWiggle plotting the largest streams in the middle of the graph. This is done very crudely by computing the deviance of each stream and ordering the stream according to their deviance size.

computeStacks <- function(values, method = 'ThemeRiver'){
  timePoints <- dim(values)[1]
  nStreams <- dim(values)[2]
  if (method == "newWiggle"){
    thin2large <- sort(apply(values, 2, FUN=var),
	decreasing = FALSE, index.return = TRUE)$ix
    idxStreams <- c(thin2large[seq(1, length(thin2large), 2)],
	            thin2large[seq(2, length(thin2large), 2)])
  }
  yy <- matrix(0, timePoints, (nStreams * 2))
  for (iStream in 1 : nStreams){
    tmpVals <- values[, iStream]
    if (method == "newWiggle")
	tmpVals  1){
	yy[, iStream * 2 - 1] <- yy[, (iStream - 1) * 2]
	yy[, iStream * 2] <- yy[, iStream * 2 - 1] + tmpVals
    } else {
      switch(method,
	ThemeRiver = {
          yy[, 1] <- -(1/2) * rowSums(values)
	  yy[, 2] <- yy[, iStream * 2 - 1] + tmpVals},
	zero = {
	  yy[, 2] <- tmpVals},
	minimizedWiggle = {
	  baseline <- array(0, timePoints)
	  for (ipoint in 1 : timePoints) {
	    for (jStream in 1 : nStreams) {
	      baseline[ipoint] = baseline[ipoint] +
                 + (nStreams - jStream - .5) * values[ipoint, jStream]
            }
            baseline[ipoint] = baseline[ipoint] / nStreams
          }
          yy[, 1] <- - baseline
          yy[, 2] <- yy[, iStream * 2 - 1] + tmpVals},
	newWiggle = {
	  baseline <- rowSums(matrix((nStreams - 1 : nStreams - .5),
		nrow = timePoints, ncol = nStreams, byrow = TRUE) * values)
          yy[, 1] <- - (baseline / nStreams)
	  yy[, 2]  1){
  }# end: for (iStream in 1 : nStreams){
  return(yy)
}# end: function

Before plotting one needs data to plot. Below I translated to R the functions Byron and Wattenberg used to synthesize their data.

bump <- function (a) {
  x <- 1 / (.1 + runif(1, min=0, max=1))
  y <- 2 * runif(1, min=0, max=1) - .5
  z <- 10 / (.1 + runif(1, min=0, max=1))
  n <- length(a) - 1
  i <- 0 : n
  w <- (i / n - y) * z
  a <- a + x * exp(-w * w)
}

set.seed(19122017)
nStreams = 20 
timePoints = 200 

values = matrix(0, timePoints, nStreams)
for (icol in 1 : nStreams)
{
  a = matrix(0, 1, timePoints)
  for (i in 1:5)
    a <- bump(a)
  values[, icol] <- a
}

Finally the colorRampPalette function takes care of the colour definition. I chose the “rgb” option instead of the “lab” because it gives smoother and more uniform color ramps. As last I randomized the color assignment because the gradual transition in colors hinders the distinction of the different streams.

colorPalette <- colorRampPalette(c("#aaaadd", "#555566"), space = "rgb")
cols <- colorPalette(nStreams)
cols <- cols[sample(nStreams)]

I wrapped the plotting commands into the function streamGraph. It is not necessary to create a function wrapping these very basic commands, but it was convenient to group the plotting commands to call them again later. The plotting procedure is identical to the procedure I presented in the previous posts in which the data are stacked on top of each other:

streamGraph <- function(yy, cols, plotTitle = "Streamgraph"){
xx <- c(1:timePoints, timePoints:1)
plot (xx, xx, type = "n", main = plotTitle,
xlab = "Time",
ylab = "Amplitude", ylim = range(yy),
bty = 'n')
for (iStream in 1 : nStreams)
{
y <- c(yy[, iStream * 2], rev(yy[, iStream * 2 – 1]))
polygon(xx, y, col = cols[iStream], border = NA)
}
}

This last bit of code produces a plot with four quadrants each displaying one of the four types of streamgraphs.

par(mfrow = c(2, 2))
streamGraph(computeStacks(values, 'zero'), cols, 'zero')
streamGraph(computeStacks(values, 'minimizedWiggle'), cols, 'minimized Wiggle')
streamGraph(computeStacks(values), cols, 'Theme river')
streamGraph(computeStacks(values, 'newWiggle'), cols, 'newWiggle')
par(mfrow = c(1,1))

Et voilà

streamgraphs

The code to reproduce this post is on github.

Streamgraphs in base::R [e.III]

4 thoughts on “Streamgraphs in base::R [e.III]

Leave a comment