This post is an extension of this one, which was (supposed to be) the final post of the coursera course ‘data analysis and interpretation’. This current post extends or complements the previous one because in that assignment I forgot to include univariate graphs in my plot. Since I only had a bivariate graph, the other reviewers failed my assignment. I was quite disappointed by their reaction, but I understood their motives. If univariate graphs get points and the absence thereof does not, I was righteously failed. Therefore, in this post I try to fix my previous mistake including three univariate graphs. The conclusion one can gather from these graphs remains unchanged and one should read the previous post for details (apologies for the redirection). A very short summary is: 1) more women than men completed the questionnaire, 2) no gender differences were apparent.
Below is the code I use to generate the plot above. To get more insight on data and analysis click here
# count the numbers of honest people nnCheaters = data['sex'].value_counts() - data['sex'][df['subIdx'].unique()].value_counts() # specify figure and plotting areas import matplotlib.pyplot as plt fig = plt.figure(figsize=(6,4)) fig.subplots_adjust( bottom=0.025, left=0.025, top = 0.975, right=0.975) X = [ (2,3,1), (2,3,2), (2,3,3), (2,1,2) ] # subplot one sub1 = fig.add_subplot(X, X, X) nPercent = 2 index = numpy.arange(nPercent) barWidth = 0.35 opacity = 0.4 rects1 = sub1.bar(index, meansMen, barWidth, alpha=opacity, color='b', label='Men') rects2 = sub1.bar(index + barWidth, meansWomen, barWidth, alpha=opacity, color='r', label='Women') sub1.set_xlabel('Type of proportion') sub1.set_ylabel('Proportion') sub1.set_title('Scores by Proportion Type and Gender') sub1.set_xticks((index + barWidth)) sub1.set_xticklabels( ('relative to gr. cheaters', 'relative to ALL participants')) sub1.legend() # subplot two sub2 = fig.add_subplot(X, X, X) sub2.bar(index+.1, data['sex'].value_counts(), .8) sub2.set_xlabel('Gender') sub2.set_ylabel('Counts') sub2.set_xticks(index + .5) sub2.set_xticklabels(('Female','Male')) # subplot three sub3 = fig.add_subplot(X, X, X) sub3.bar(index+.1, (nnCheaters['f'], data['sex'][df['subIdx'].unique()].value_counts()['f']), .8) sub3.set_xlabel('Females') sub3.set_ylabel('Counts') sub3.set_xticks(index + .5) sub3.set_xticklabels(('Honest','Cheater')) # subplot four sub4 = fig.add_subplot(X, X, X) sub4.bar(index+.1, (nnCheaters['m'], data['sex'][df['subIdx'].unique()].value_counts()['m']), .8) sub4.set_xlabel('Males') sub4.set_ylabel('Counts') sub4.set_xticks(index + .5) sub4.set_xticklabels(('Honest','Cheater')) plt.tight_layout() plt.show()
I found the python code to create the subplots quite elaborate. In spite the fact that matplotlib inspires itself to matlab I must admit I found it a bit challenging to translate the code from matlab to python. Especially the allocation of the plots took me 5 to 10 minutes to understand (I don’t think it took me more than a minute in R). In creating the subplots allocation the line:
X = [ (2,3,1), (2,3,2), (2,3,3), (2,1,2) ]
is crucial. The first number in the tuple identify the number of rows. Therefore two rows of subplots. The second number indicates the number of columns of subplots within a row. This I found particularly tricky because the numbers for the first row of subplots and the second row of subplots are different. In my specific case it means that in the first row there will be three subplots and one in the second. The third number indicates the plot number. However, the fourth tuple has the number two where I expected four, or maybe (4,6). The number two is correct because the tuple (2,1,2) asks for the second (e.g. lower) subplot in a graph with 2 rows of subplots in one column. Later I found out that (2,3,(4,6)) also is possible. This makes more sense to me because I can see a continuation with the previous three tuples and implies one is plotting on the position of subplot 4, 5, and 6. This notation is also more consistent with matlab and R which I still prefer to python. Anyway, the goal of this post was to make a univariate graph completing my previous post investigating whether cheating is affected by gender differences. Read that post, it is fun!