Streamgraph in R [final]

This post is an update on the previous post translating Byron and Wattenberg’s streamgraphs algorithm into R. Byron and Wattenberg’s algorithm produces beautiful streamgraphs with the synthetic data produced by their streams generator. However, the implementation yields an ugly streamgraph when applied to data which might not be as wiggly as the synthetic ones. In the attempts I made I got very peaky wiggles, not smoothed and irregular. In short the graphs did not transmit the idea of a stream, but of a blurry blob or a peaky primitive bat (the wooden club, not the animal, that would be cool!). In this post I bring-up some points to bear in mind when producing a streamgraph.

First consideration is about getting the streamgraph to appear like stacked waves and not like a series of oddly-stacked staircases. Smoothing the data can help creating waves instead of a staircase or pixelized effect. In previous posts I smoothed the data fitting splines to the data and recreating smoothed streams using predict(). To fit splines I used the functions smooth.spline or ns in this and this post respectively. Both functions fit cubic splines and allow to set number of knots to fit to the data as well as degrees of freedom. Main difference is the amount of detail the two fitting procedure captures. Smoothing splines capture very small differences in the data and might therefore yield a graph that locks like a staked staircase. If this is the case one can choose to model the data using a natural spline. Which function to use with which parameters is left to experimentation and desire of wobbliness.

A second consideration to bear in mind is that Byron and Wattenberg’s data generator creates positive values spanning between 0 and 1. A plot of negative values would not yield a proper streamgraph. Therefore when plotting streamgraphs with negative values one might consider constraining the range of the time-series normalizing their values. Below I wrapped some code into a function to normalize data. R probably already has this functionality already, but the amount of time I would have spend searching for it would have been higher than the amount of time I spent into writing this code:

normalize <- function(dat){
	minV <- apply(dat, 2, min)
	minV <- matrix(rep(minV, each = dim(dat)[1]),
		nrow = dim(dat)[1], ncol = dim(dat)[2])
	maxV <- apply(dat, 2, max)
	maxV <- matrix(rep(maxV, each = dim(dat)[1]),
		nrow = dim(dat)[1], ncol = dim(dat)[2])
	normalized <- (dat-minV)/(maxV-minV)

A last consideration goes to the color choice of the streamgraph. I previously always randomized the assignment of the colors to the various streams in the streamgraph. I believed that the randomization of the color assignment helped distinguishing the different streams within the streamgraph. However, in some cases, e.g. when debugging or trying to group streams within the streamgraph, it might be helpful to have a gradual transition between colors. In such a cases, removing the random assignment of colors can help.

In this R file I updated the previous functions and included all the options I have discussed in this post. For example I added the function computeSmoothedStacks which extend computeStacks smoothing the data using smooth.spline. However, it can easily be adapted/extended including a fit with natural splines. No more excuses holding you from streamgraphing your data!

Streamgraph in R [final]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s