This post describes the visualisation of a social network I made for a Coursera course on Data Visualisation. For this specific assignment I opted for gathering data on my own rather than using the datasets provided by the course instructor. I wanted to gather the data myself to try to visualise ‘real’ data. With real data I mean data that I try to scrape from the web and visualise. Basically with ‘real’ data I mean what other people call dirty data (i.e. data that is not been processed or polished before use). The question was also whether I could come up with some data which made sense to visualise into a (social) network. Therefore, maybe this project became a bit lame since I chose to study my ‘citation network’, with the connections among the authors of papers citing my work. Someone might address it as a megalomaniac self-promoting project, but since there were not many citations others could just define it as pathetic. Anyway, it was fun to do and that’s all I care of.
This is what I did to produce the animation. 1) I gather the data from ISI-web of knowledge. 2) I processed the data with python to make sure that all the authors name were processed correctly (e.g. jolicoeur is the same of joliceur, etc.). 3) I shortened the name to 5 letters strings. 4) I found the unique authors. 5) I connected the authors who publish together. I then wrote all this data into a graph which I could read into R and did the analysis with the igraph package. I exported the plots and combined them into an animation with imageMagick. The result is below.
The network is composed of links between authors who publish together. Each author is represented as a node or vertices in the graph. The link between two authors is represented as a line which is also called edge in igraph terminology. Each node posses a degree, representing how many edges are incidents on the node. I chose to represent the network as an undirected network meaning the edges/links do not have direction.
The animation shows how the network can be simplified. First a network is displayed in which each node has equal weight. Then I colour-coded all the clusters in the network, resulting into 48 communities. I then removed the communities with only 1 node or 0 degrees, meaning those nodes could be compared to solitary persons (antisocial sounds harsh sometimes). I then highlighted the communities with more than 5 nodes and displayed the number of members. The two largest communities have 60 and 65 members respectively. I then displayed the labels of the nodes with more than 5 connections (degree > 5) and gradually thickened the connections’ links from very weak to very strong.
At this point I started to wonder who could be the central persons into this network. Social network analysis has various ways to identify these, betweenness and degree are the most common. A node’s degree is the simplest metric, it counts the nodes to which a node is linked. It reflects a person’s popularity. Betweenness reflects the centrality of a person in the network, it reflects how many other nodes a given node can connect, and that this node has very far reach within the network. It could be interpreted as a sort of intermediator. I also included eigenvector centrality a measure of a node influence on the network. Eigen vector centrality is interesting because together with betweenness it allows to identify two key nodes in the network 1) gate keeper characterised by low Eigenvector centrality and high betweenness centrality 2) well connected people, the heart of the core, who are characterised by low betweenness centrality measures and high Eigenvector centrality. As visible in the plot graphing betweenness centrality as a function of Eigenvector centrality I identified two gate keepers and two ‘related-to-power’ persons.
In conclusion it was a fun project. What I wanted (or maybe my megalomaniac self hoped for) to see in this visualisation was whether I was the centre of this network. As the last image of the animation clearly show, I am not central to the network at all. However, I am part of one of the two big networks even if I thought I would have belong to the other one. I realise now that my sense of belonging was based on my theoretical background, by which I would identify more with the other big network. My belonging to this specific network is instead due to the fact that the network reflects something which could be defined as self-citation effect. Basically the most two influential persons in this graph are influential because they cite their own work very often.
The code to generate this visualisation will be available on github after I polished it.