Streamgraphs in base::R [e.II]

Until recently I did not have a practical application in which to use streamgraphs. In fact, I still find the visualisation complex to understand, abstract and a bit too artistic. While I recognise that the strength of streamgraphs is the display of all the time series’ values into one (possibly interactive) plot, the amount of data displayed is massive, with many streams and even more data points. Because of the amount of data displayed
the streamgraph visualisation is very complex, therefore it is difficult to find a practical situation in which plotting time series data with streamgraphs gives more insight than plotting the single lines. Today however, I rediscovered a data set in which streamgraphs might be just what was needed for a polished, stylish look. Below I present 4 plots showing the ‘morphing’ from two multiple lines chart (upper panels) to a streamgraph (lower right panel). The improvement from line chart to streamgraph is clear.

First let’s introduce the data set I used. The data set contains eight time series representing proportions (here is an extensive description of 1) the data, 2) how it was measured and 3) how we analyse it). Usually I presented these eight time series divided between two plots each with four lines, exactly as in the two upper panels of the figure. The eight time series represent two conditions with four time series each. The lower left plot is a multiple-line chart with all the eight time series. However, I mirrored the set of time series of one conditions, plotting them below zero. Note that, since proportion varies between 0 and 1, the scale in the lower plot also varies from 0 to 1 rather than from 0 to -1. The lower right plot is a stream graph obtained smoothing the curves in the lower right plot.

From multiple-lines chart to streamgraph
Morphing of a multiple-line chart into a streamgraph

I made the streamgraph with the method I presented in this post. In short the post describes how to make streamgraph in R. Since streamgraphs are stacked time series, basically a stream graph is an area plot in which the lower boundary is the value of the previous time series and the upper boundary is the lower boundary plus the values of the current time series (Byron & Wattenberg). For the first stream zero is the lower boundary and then the stack will grow 1) upward if the time series are added, 2) downward if the time series are subtracted. Note that this idea is very similar to themeRiver proposed by Havre et al. but does not include the symmetry element.

Two pieces of code are key for creating this streamgraph. One is the mirroring of the values of the second condition. This can be achieved changing the sign of the values to add or subtract to the streamgraph. In the code I switch the state of the variable `upLowPanel’ between 1 and -1 when the condition changes. This is coded with an if-statement. Also when the values should be mirrored, the lower boundary of the first stream graph should become zero again. I used the modulo to find the first baseline which should be zeros. Since I have four possible regions of interest (i.e., target, competitor, distractor1 and distractor 2) embedded in a for-loop, the modulo allows reseting the baseline when a new condition starts. Below is the implementation of this step.

itemFixated <- c(‘target’, ‘competitor’, ‘d1’, ‘d2’)
condition <- c(’n’, ‘c’)
for (icond in condition)
	upLowPanel <- 1
	if(icond == 'c')
		upLowPanel <- -1
	for (item in itemFixated)
		if (iCol %% length(itemFixated) == 1)
			# here the baseline default to zero
			yy[, iCol * 2] <- upLowPanel * dat[, iCol + 1]
			yy[, iCol * 2 - 1] <- yy[, (iCol - 1) * 2]
			yy[, iCol * 2] <- upLowPanel * dat[, iCol + 1] +
					rowSums(yy[, seq((iCol - 1) * 2, iCol * 2, 2)])
		iCol <- iCol + 1

The second piece of code key to the visualisation is the smoothing. To obtain the smooth curves I used a natural spline with 9 degrees of freedom. I used the ns() function in the package splines instead of smooth.splines() because smooth.splines() curves were less ‘attractive’ than the curves obtained with ns(). I think this might results from smooth.splines() using a degree of freedom for each point, therefore smoothing the data to a much finer extent.

splinesDF = 9
tmpVals <- predict(
	lm(dat$propFix[dat$cond == icond & dat$item == item] ~
		ns(dat$bin[dat$cond == icond & dat$item == item], splinesDF)

The streamgraph is pretty, but there are some limits. Firstly, the graph looks a bit weird in the lower right corner where the streams appear to overlap. This is not an error since the same curve is visible in the lower right corner of the lower left plot or on the green line of the context condition in the upper left plot. The `artefact’ is a byproduct of the spline interpolation, which appears to tend to infinity. Secondly, I am not sure yet whether the streamgraph increases the capacity of the visualisation to capture what differs between the two conditions. In practice I still have the impression that the display presents too much information and it is difficult to capture the important bits. Thirdly when plotting more than two conditions the streamgraph will be as a messy graph as a line charts with many lines in it. In spite of all this, for plotting four time series in two conditions this streamgraph representation is more elegant and compact, and eases the comparison among different conditions than a line chart. Moreover, the visualization is almost completely in base R (only ns() requires(spline), but spline is default in all R). The complete R code for the plot is on github, together with a down-sampled data set to reproduce the figure.

Streamgraphs in base::R [e.II]

4 thoughts on “Streamgraphs in base::R [e.II]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s