This scatterplot is one of the best data visualisation I made. I like it because it concentrates a lot of information into a single visualisation. The scatterplot displays four dimensional data (i.e., four variables) using a two dimensional scatterplot. I made the first implementation in R, but because I wanted to add interactivity I switched to d3.js. Below I describe the choices I made to display the information and how I coded them in d3.js. Continue reading “Four dimensions in two dimensions”

# data visualization

# Streamgraphs in base::R [e.II]

Until recently I did not have a practical application in which to use streamgraphs. In fact, I still find the visualisation complex to understand, abstract and a bit too artistic. While I recognise that the strength of streamgraphs is the display of all the time series’ values into one (possibly interactive) plot, the amount of data displayed is massive, with many streams and even more data points. Because of the amount of data displayed Continue reading “Streamgraphs in base::R [e.II]”

# Streamgraphs in base::R [e.I]

This is a very simple script plotting a streamgraphs in R. I wanted to be able to plot a streamgraph in base R, without requiring additional libraries. For example, here I made an interactive streamgraph visualization depicting temperatures measured worldwide in the last 150 years. Since a streamgraph is a fancy version of a stacked bar plot, I thought it should have been easy to reproduce if one plots an area on top of another area. In other words, the upper limit of one area is the lower limit of the following area, stacked on top of one another. This is a simple problem to solve in R. First, make a matrix of random numbers with as many columns as streams and as many time points as rows. Second, sum up the columns of the matrix so that the lines add on top of each other. Third, use the polygon function to create the stacked graph.

The generation of data is straightforward:

timePoints <- 100 nStreams <- 10 set.seed(09022017) values <- rnorm(timePoints*nStreams)

I constrained the data to be all positive values, otherwise the streams would overlap between one and another.

values <- abs(values) # reshape into matrix dim(values) <- c(timePoints, nStreams)

In the second part, each new columns of data should be added to the one before. To check that each subsequent line is above its predecessor I used the matplot function, which should display stacked lines.

yy <- matrix(0, timePoints, nStreams) yy[, 1] <- values[,1] for (iStream in 2 : nStreams) yy[, iStream] <- rowSums(values[,1 : iStream]) matplot(yy, type = 'l', lty = 1, bty = 'n')

To make the plot look less peaky I smoothed the values with the smooth.spline function. I think smoothed peaks are also much prettier.

yy[, iStream] <- predict(smooth.spline(rowSums(values[,1 : iStream])))$y

Now, the areas between the lines need to be filled. Filled areas can be plotted with the polygon function. The function polygon requires data going from left to right and backwards for the x axis, and y values for all those x coordinates. In its simplest call polygon works like this:

plot.new() left <- 0 right <- 1 up <- 1 down <- 0 xx <- c(left, right, right, left) yy <- c(down, down, up, up) polygon(xx, yy, col = 'red', border = NA)

If instead of two points one uses two arrays the plot can depict more complex areas. A pass of smooth.spline to soften the rough edges and the stream is ready. The graph is a bit weird-looking, but it gives the idea.

n <- 100 xx <- c(1:n, n:1) y <- c(rnorm(n), rnorm(n)) yy <- predict(smooth.spline(y, xx))$y plot (xx, yy, type = "n", bty = 'n' xlab = "Time", ylab = "Smoothed randomness") polygon(xx, yy, col = "gray", border = "red")

To keep the data organized and simple to feed to polygon, I put the data into a matrix with twice as many columns as the starting matrix. Then each pair of columns will contain the lower and upper boundaries of each stream of data. In particular, the columns for the first streamgraph are 1) an array of 0 and 2) the previous column plus the values of the first ‘stream’ of data. The second streamgraph is, for column three, the same values of the previous column and for column four the values of column three plus the values of the second ‘stream’ of data. Then this is easy to put on a loop and iterate for the number of streams of data.

nStreams <- 4 yy <- matrix(0, timePoints, (nStreams * 2)) for (iStream in 1 : nStreams) { if (iStream == 1) y[, iStream * 2] <- predict(smooth.spline(values[, iStream]))$y else { yy[, iStream * 2 - 1] <- yy[, (iStream - 1) * 2] yy[, iStream * 2] <- predict(smooth.spline(values[, iStream]))$y + yy[, iStream * 2 - 1] } }

The resulting matrix can be plotted with a for loop choosing the correct upper and lower boundaries.

x11() xx <- c(1:timePoints, timePoints:1) plot (xx, xx, type = "n", main = "Streamgraph", xlab = "Time", ylab = "Amplitude", ylim = range(yy), bty = 'n') for (iStream in 1 : nStreams) { y <- c(yy[, iStream * 2], rev(yy[, iStream * 2 - 1])) polygon(xx, y, col = iStream + 1, border = NA) }

… and trying with actual data I leave for a follow up!

# Making up for univariate [DAI IVb]

This post is an extension of this one, which was (supposed to be) the final post of the coursera course ‘data analysis and interpretation’. This current post extends or complements the previous one because in that assignment I forgot to include univariate graphs in my plot. Since I only had a bivariate graph, the other reviewers failed my assignment. I was quite disappointed by their reaction, but I understood their motives. If univariate graphs get points and the absence thereof does not, I was righteously failed. Therefore, in this post I try to fix my previous mistake including three univariate graphs. The conclusion one can gather from these graphs remains unchanged and one should Continue reading “Making up for univariate [DAI IVb]”

# Visualizing participants performance [wbwit III]

This is the third post on the development of a web-based word identification task. See this post for the implementation of the word identification task and this post for uploading the participants results to the server. This post describes how to plot the Continue reading “Visualizing participants performance [wbwit III]”

# Streamgraph visualization of global warming

Streamgraphs are very pretty!

Streamgraphs are a very catchy way to represent stacked area graphs. Streamgraphs are most commonly used to represent time series data. I encountered streamgraphs for the first time during a coursera data visualization class and I immediately wanted to try to reproduce them. Continue reading “Streamgraph visualization of global warming”

# Citations Network

This post describes the visualisation of a social network I made for a Coursera course on Data Visualisation. For this specific assignment I opted for gathering data on my own rather than using the datasets provided by the course instructor. I wanted to gather the data myself to try to visualise ‘real’ data. With real data I mean data that I try to scrape from the web and visualise. Basically with ‘real’ data I mean what other people call dirty data (i.e. data that is not been processed or polished before use). The question was also whether I could Continue reading “Citations Network”