I wrote this brief introductory post for my friend Simon. I want to show how easy the transition from SPSS to R can be. In the specific case of mediation analysis the transition to R can be very smooth because, thanks to lavaan, the R knowledge required to use the package is minimal. Analysis of mediator effects in lavaan requires only the specification of the model, all the other processes are automated by the package. So, after reading in the data, running the test is trivial.
Continue reading “Multiple-mediator analysis with lavaan”
This scatterplot is one of the best data visualisation I made. I like it because it concentrates a lot of information into a single visualisation. The scatterplot displays four dimensional data (i.e., four variables) using a two dimensional scatterplot. I made the first implementation in R, but because I wanted to add interactivity I switched to d3.js. Below I describe the choices I made to display the information and how I coded them in d3.js. Continue reading “Four dimensions in two dimensions”
Until recently I did not have a practical application in which to use streamgraphs. In fact, I still find the visualisation complex to understand, abstract and a bit too artistic. While I recognise that the strength of streamgraphs is the display of all the time series’ values into one (possibly interactive) plot, the amount of data displayed is massive, with many streams and even more data points. Because of the amount of data displayed Continue reading “Streamgraphs in base::R [e.II]”
This is the first of a series of four post on producing a streamgraph in plain R code. Here I present a very simple R script plotting a streamgraph. In this post I made streamgraph in d3.js, but I wanted to be able to do the same in R, to not depend on a webpage, or without requiring additional libraries (e.g. the streamgraph htmlwidgtet is only a wrapper around d3, and does not work always smoothly).
Continue reading “Streamgraphs in base::R [e.I]”
This is the fourth and last assignment of Machine Learning for Data Analysis by Wesleyan University on Coursera. My assignment diverges quite a bit from the approach taken by the instructor since I wanted to have only three clusters to determine pumps functionality (functional, functional needs repair, and Continue reading “Clustering Pumps [mlw4]”
This is the third assignment of the Machine Learning for Data Analysis by Wesleyan University on Coursera. I applied least absolute shrinkage and selection operator (LASSO) to the DrivenData data set pumpItUp. LASSO is a technique which does variable selection shrinking the ‘useless’ coefficients (i.e., variables) toward zero. Applying this method Continue reading “Shrinking pumps? [mlw3]”
The random forest algorithm is the topic of the second assignment of Machine Learning for Data Analysis by Wesleyan University on Coursera. This assignment extends the previous one because besides from using random forest instead of decision trees I included more variables than the previous assignment. In this analysis I included also Continue reading “The forest and the pump! [mlw2]”