Clustering Pumps [mlw4]

February 11, 2017February 11, 2017 paolotoffaninmachine learning, PumpItUp, random forestLeave a comment

This is the fourth and last assignment of Machine Learning for Data Analysis by Wesleyan University on Coursera. My assignment diverges quite a bit from the approach taken by the instructor since I wanted to have only three clusters to determine pumps functionality (functional, functional needs repair, and Continue reading “Clustering Pumps [mlw4]” →

Shrinking pumps? [mlw3]

February 8, 2017February 8, 2017 paolotoffaninLASSO, machine learning, PumpItUp, pythonLeave a comment

This is the third assignment of the Machine Learning for Data Analysis by Wesleyan University on Coursera. I applied least absolute shrinkage and selection operator (LASSO) to the DrivenData data set pumpItUp. LASSO is a technique which does variable selection shrinking the ‘useless’ coefficients (i.e., variables) toward zero. Applying this method Continue reading “Shrinking pumps? [mlw3]” →

The forest and the pump! [mlw2]

February 1, 2017January 30, 2017 paolotoffaninmachine learning, PumpItUp, random forestLeave a comment

The random forest algorithm is the topic of the second assignment of Machine Learning for Data Analysis by Wesleyan University on Coursera. This assignment extends the previous one because besides from using random forest instead of decision trees I included more variables than the previous assignment. In this analysis I included also Continue reading “The forest and the pump! [mlw2]” →

Pump it up with a decision tree [mlw1]

January 25, 2017January 25, 2017 paolotoffanindata analysis, DrivenData, machine learning, PumpItUp, pythonLeave a comment

This post is about the first assignment of Machine Learning for Data Analysis by Wesleyan University on Coursera. In the past month I have tried to mine the dataset of the pumpItUp challenge on DrivenData. The challenge requires Continue reading “Pump it up with a decision tree [mlw1]” →

Making up for univariate [DAI IVb]

October 19, 2016 paolotoffanincheating, data visualization, pythonLeave a comment

This post is an extension of this one, which was (supposed to be) the final post of the coursera course ‘data analysis and interpretation’. This current post extends or complements the previous one because in that assignment I forgot to include univariate graphs in my plot. Since I only had a bivariate graph, the other reviewers failed my assignment. I was quite disappointed by their reaction, but I understood their motives. If univariate graphs get points and the absence thereof does not, I was righteously failed. Therefore, in this post I try to fix my previous mistake including three univariate graphs. The conclusion one can gather from these graphs remains unchanged and one should Continue reading “Making up for univariate [DAI IVb]” →

Visualizing participants performance [wbwit III]

October 16, 2016October 9, 2016 paolotoffanind3js, javascript, participants performance, web-based lexical decision task, word-identification task1 Comment

This is the third post on the development of a web-based word identification task. See this post for the implementation of the word identification task and this post for uploading the participants results to the server. This post describes how to plot the Continue reading “Visualizing participants performance [wbwit III]” →

Streamgraph visualization of global warming

October 4, 2016March 27, 2018 paolotoffanind3.js, data visualization, global warming, interactive visualization, streamgraphLeave a comment

Streamgraph of global warming

Streamgraphs are very pretty!

Streamgraphs are a very catchy way to represent stacked area graphs. Streamgraphs are most commonly used to represent time series data. I encountered streamgraphs for the first time during a coursera data visualization class and I immediately wanted to try to reproduce them. Continue reading “Streamgraph visualization of global warming” →

Citations Network

August 8, 2016August 24, 2017 paolotoffaniniGraph, python, Social network analysisLeave a comment

This post describes the visualisation of a social network I made for a Coursera course on Data Visualisation. For this specific assignment I opted for gathering data on my own rather than using the datasets provided by the course instructor. I wanted to gather the data myself to try to visualise ‘real’ data. With real data I mean data that I try to scrape from the web and visualise. Basically with ‘real’ data I mean what other people call dirty data (i.e. data that is not been processed or polished before use). The question was also whether I could Continue reading “Citations Network” →

Do men cheat more than women? [DAI IV]

August 3, 2016 paolotoffanincheating, discrimination and identification, gender differences, on-line questionnaire, python1 Comment

First of all let me make clear that this post is about identifying cheaters who fills in questionnaires with fictitious answers. This post does not describe how to determine whether your (or your friend’s) lover is cheating on you (or your friend’s). Cheater identification will not work with the method I will describe below unless, Continue reading “Do men cheat more than women? [DAI IV]” →

Cheat Hunt [DAI III]

July 30, 2016October 18, 2016 paolotoffanindata analysis and visualization, find cheaters, python1 Comment

This post title is inspired by the title of a movie, witch hunt, I did not see, but I do like the sound of the title. I decided to change the dataset I am exploring for the data management and visualization course (if you need an introduction check this previous post). I decided to change dataset because it is not interesting to do the assignments with an already clean dataset. In fact, this week assignment requires pure data management, which is 1) identification and removal of missing values 2) computation of new variables etc. Since my dataset is already clean and only has three variables, I have nothing to do for the assignment. In the previous assignment I already came up with a new variable, and I was not capable to invent something new. But then I got a fantastic idea.

Continue reading “Cheat Hunt [DAI III]” →