Sub-Communities within

Posted on June 28, 2012

My last post about the PageRank of #PPCChat participants was popular and quite quick to put together. I’m not above whoring myself out for social media attention so I thought I’d see what else I could find in this data.

With a social network graph like the one I’m able to generate for the #PPCChat community one of the things to look at is sub-communities and clusters within the original graph.

For a directed graph (where the edges point from one vertex to another and don’t go both ways) we need to look at weakly connected components.

The #PPCChat graph has 47 weakly connected components meaning there are 47 subgraphs that aren’t connected to each other. Of these subgraphs 38 of them contain only one vertex meaning that person has never been mentioned by another in #PPCChat. Give them some love:

  • jstatad 
  • yettiebvxl 
  • chiragemadv 
  • xpctechnology 
  • hoppervsagm 
  • adwiserhq 
  • marabelafjpu 
  • amy_garden 
  • whitestick 
  • oskben 
  • brennanbrooks 
  • vir2biz 
  • timesecrets 
  • fisheyedave 
  • rogersikes 
  • teknicks 
  • mariskanrhob 
  • jererpr 
  • sterlinggreen 
  • steveoneclick 
  • tauseefazhar 
  • jezabelhv 
  • samowenppc 
  • peckgmj 
  • rodrigotqueiroz 
  • johnpauldesousa 
  • septembersot 
  • convertable 
  • jackportersmith 
  • wijnandvdvlies 
  • hbekxavier 
  • rodsblog 
  • joshkimber 
  • shazenkhan 
  • tommieplg 
  • samueldjames 
  • websterj 
  • reynaldaiqkh

A further seven components have only two vertices. These happy couples tweet at each other but aren’t mentioned by anyone else:

  • gladue and bcharlesworth
  • onwardatlanta and jenniferbrabson
  • microsoftsmb and farhiyo_hiray
  • mineral_deadsea and deal_rt
  • jmthefourth and linda_irvine
  • web2updates and webmarketingjo
  • stebppc and vickybrowne

Then there is a component with four vertices whom I have dubbed the FastWebMedia collective:

  • brionygunson, go_viva, fastwebmedia and thinkfirstweb

Everyone else (all of the 689 remaining participants) is lumped together in one massive group where everyone is joined in some way to everyone else. This is a large, tightly connected community.

The diameter of this core community is eight meaning that there are at most eight degrees of separation (NB eight is the maximum, the average is only 3.05) between people in this core community.

The two participants who are furthest apart are wtongen and thesearchagency. The connection between them goes from wtongen through mnsearch, ppcmemes, attacat_colleen, michellemsem, lukealley, jbguru and semwisdom before reaching thesearchagency. Hopefully these two can get to know each other a little better in the future.

Having eight degrees of separation between two participants makes me think that perhaps there are sub-communities within the hub.

There are many algorithms for identifying communities within a graph. I will use the Girvan-Newman algorithm because it is quite easy to understand and has been recommended to my by Christopher Berry.

The algorithm relies on the idea of “edge betweenness”. The betweenness of an edge is the number of shortest paths between two vertices the edge is on. An edge that links two communites will be on many such paths so if we iterate by removing the edge with the highest betweenness we will eventually split the graph into communities.

We start out with a graph like this:

After 63 iterations we manage to split joesev and arpitsinghi out from the herd:

It isn’t until 229 iterations that we split out another two person community (although lots of single vertices have split in the meantime):

259 iterations:

434 iterations (each snapshot is taken whenever a community of size > 1 spilts from the hub):

813 iterations:

I got bored looking for large sub-communities after 878 iterations:

It looks like my plan to steal Matt Umbro’s crown using a divide and conquer strategy is doomed to failure :-(

Running the algorithm until it finishes produces a tree diagram, of which a massive version is linked to below:

This chart allows you to see how close people are together. For example, mel66’s closest #PPCChat friend is michellemsem. Going up a level they are joined by the pair of matt_umbro and realicity then by chriskos, lukealley and bigalittlea (in that order).

Actionable social media data or just #PPCChat navel gazing?

Doing this kind of analysis for a brand can reveal sub-communities (this didn’t really happen for #PPCChat) who can then form the basis for some data driven personas. It can also reveal the highly connected individuals who it is worth engaging with directly.

The tree diagram is another way to spot communities and also a way to see how close users are together.

What are your thoughts? Could these techniques make social marketing better?