Four dimensions in two dimensions

scatterplot of reaction time differences

This scatterplot is one of the best data visualisation I made. I like it because it concentrates a lot of information into a single visualisation. The scatterplot displays four dimensional data (i.e., four variables) using a two dimensional scatterplot. I made the first implementation in R, but because I wanted to add interactivity I switched to d3.js. Below I describe the choices I made to display the information and how I coded them in d3.js.

First a short description of the dataset. The data set contains reaction times responses in a lexical decision task. The data displayed varies among four variables: 1) participant’s group (categorical: normal hearing participants vs. cochlear implant users), 2) word’s neighbourhood density (continuous), 3) word’s frequency (continuous), 4) type of stimulus (categorical: word vs. non-word).

The first variable is represented as a group difference in RT between the group of normal hearing individuals and the group of cochlear implant users. I decided to use the subtracted values to 1) focus on the difference between the two groups and 2) squeeze the two group levels into one difference. The RT difference is represented in the size of the dots. Moreover, the values of RT difference are represented in the “diff” column in the in the tsv file. The second variable is neighbourhood density, which reflects the number of words created by adding, deleting, or substituting a single sound in a given word. Neighbourhood density is represented on the y-axis. Neighbourhood density scores are in the column “neighDens” of the tsv file accompanying the javascript code on github. The third variable is word frequency, which is the number of times a word is heard in a language. Word frequency is log-transformed and represented in the x-axis. Word frequency is log-scaled and its values are in the column “logFreq” of the tsv file. The fourth variable is whether the stimulus presented was a word or a non-word. I used the blue and orange colours to distinguish between words and non-words, respectively. Whether the stimulus is a word or a non-word is stored in the column “word” of the tsv file.

I will start describing the plot from the data import. This tutorial is a fantastic introduction on how to get d3.js to import data (I wish I would have find it earlier). The settings I used to import the data are below:

data.forEach(function(d) {
	d.neighDens = +d.neighDens;
	d["logFreq"] = +d["logFreq"];
	d.diff = +d.diff * 1000;
	d.dotSize = Math.abs(d.diff);
});

The plus before the variables transforms them from string to number (this is because by default tsv file stores values as strings). I multiply d.diff by 1000 to convert the time to milliseconds. The variable dotSize is the absolute value of the RT difference. I used absolute values to represented the dot’s size because I wanted the dot to reflect the size of the difference, a negative value would represent a negative size, which is impossible. Moreover, using absolute values to represent the sizes allow to keep the ratio between negative and positive sizes constant (e.g., -100 should have the same size of 100).

The x and y scales follow the general pattern of d3, i.e. d3.scaleLinear with domain() defined from the minimum and maximum values of data and using the width and height as range(). I used d3.scaleLinear() also for the size of the dot, but I defined the domain() using the range of the absolute value of the RT difference spanning from 1 to 25. The values 1 and 25 are arbitrary, but since they define the radius of the dot (i.e., the attribute r in the svg circle element)

svg.selectAll(".dot")
	.data(data)
	.enter().append("circle")
	.attr("class", "dot")
	.attr("r", sizeMap)
	.attr("cx", xMap)
	.attr("cy", yMap)

I thought circles with radius from 1 to 25 would represent the range of differences nicely. The scale for the colour range was instead ordinal and I used the predefined d3.schemeCategory10 which gives a range of 10 colours. Below is the code defining the scales:

var xValue = function(d) { return d["logFreq"];}, 
    xScale = d3.scaleLinear().range([0, width]),
    xMap = function(d) { return xScale(xValue(d));};

var yValue = function(d) { return d.neighDens;},
    yScale = d3.scaleLinear().range([height, 0]),
    yMap = function(d) { return yScale(yValue(d));};

var cValue = function(d) { return d.words;},
    color = d3.scaleOrdinal(d3.schemeCategory10);

var sizeValue = function(d) { return d.dotSize;},
    sizeScale = d3.scaleLinear().range([1, 25]),
    sizeMap = function(d) { return sizeScale(sizeValue(d));};

// and the domains defined after reading in the data
xScale.domain([d3.min(data, xValue)-1, d3.max(data, xValue)+1]);
yScale.domain([d3.min(data, yValue)-1, d3.max(data, yValue)+1]);
sizeScale.domain([d3.min(data, sizeValue), d3.max(data, sizeValue)]);
// the default declaration of color.domain is okay.		

All the other code to produce the scatterplot is unsophisticated d3.js. Only one last tactical shrewdness about the colour definition. I added transparency to the dots. Because some dots overlap it would not be possible to distinguish them. To allow overlapping I added transparency to the dots. Transparency can be added to the dot setting the attribute `fill-opacity’ to a value between 0 and 1, for fully transparent and fully opaque respectively.

// append to the circle>dot object
	.attr("fill-opacity", 0.75)

I included a legend to help deciphering the colour code. The code for the legend is plain d3.js to plot a rectangle with some text in a ‘transform translate’ container to present it in a given position of the plot.

At last I included interactivity. To help interpret the data I added a function displaying information about a given dot when the mouse hover over it. The information displayed is the name of the word/non-word, the RT difference, the neighbourhood density and word frequency. This information makes the graph rich, but since it is displayed only when the mouse hovers over the object it is not cluttering the visual display, which allows to maintain the plot simple. Below is the code for the display of the information on each dot:

// initial declaration
var tooltip = d3.select("body").append("div")
    .attr("class", "tooltip")
    .style("opacity", 0);
    
// appended to the 'circle' svg object.  
    .on("mouseover", function(d) {
        tooltip.transition()
            .duration(200)
            .style("opacity", .9);
	tooltip.html(d["word"] + " " + d["diff"] + " (ms)" + "<br/>" + 
		      "(" + xValue(d) + ", " + yValue(d) + ")")
            .style("left", (d3.event.pageX + 5) + "px")
            .style("top", (d3.event.pageY - 28) + "px");
    })

Here it is, a simple graph with plenty of information. I like it a lot. To recreate the plot I uploaded code and data on github, and on my webpage there is the interactive version of the plot.

Four dimensions in two dimensions

2 thoughts on “Four dimensions in two dimensions

Leave a comment