Torrents, Community and Cultural diversity

Inspired by Philippe Aigrain's Sharing, I got curious about the way different media and distribution methods have an impact on the diversity of content. In section 3.2 of his book, there's an interesting summary of how p2p networks have a much higher diversity of attention then regular (commercial) distribution channels. That is, instead of everyone downloading the same top-5 chart hits or blockbusters, in p2p networks we see attention going to a much broader variety of content. To give an example, it is claimed that for a certain "legal single downloading site" 5% of the works generate 90% of revenues. In contrast, on eDonkey the 5% most popular files are responsible for less then 35% of all downloads.

One of the odd conclusions though, was that it seemed as if for bittorrent this diversity isn't all that great. A study by Balazz-Lakatos on film sharing in Hungary (using bittorrent) showed the 5% most popular files got 58.8% of the attention. For music, a study by Page-Garland indicated as much as 76% of the attention going to the top 5%.

Some possible explanations for this given by Balazs-Lakatos include trackers forcing to keep a 1:1 upload/download ratio and the general discouraging on torrent sites to offer entire individual libraries. Oberholzer-Gee-Strumpf try to explain this by noting that files without seeders tend to get removed from the index sites, causing more popular torrents to get more attention.

Now I like both my torrents and a good bit of obscure film and music, so none of these arguments seem really convincing. Time to have a closer look at the attention diversity on different torrentsites. For now, I have roughly analysed three different sites (more are coming):

  • one of the biggest torrentsites offering pretty much everything (books, music, film, applications, you name it) except for porn.
  • a relatively small site offering music albums
  • an underground torrent-site specialised in archive material (books, music and film), which for obvious reasons will remain anonymous. notable is their strict ratio policy, users that don't upload enough get banned very easily.

As demonoid was too big to analyse completely (or atleast, that'd be a lot of work and take the fun out of all this), i took 10.000 torrents from the categories books, music and film and 10.000 uncategorised torrents to get a picture of the overall site.
Now let's have a look at how much popularity affects the ammount of downloads on the sites, looking at the behaviour of the site in total and the different categories:

Attention diversity on demonoidAttention diversity on demonoid

Attention diversity on an underground siteAttention diversity on an underground site

Attention diversity on coda.fmAttention diversity on

That's an interesting picture. Overall, on demonoid the top5% gets 64,7% of the attention. On the underground site it's only 34,2% and on it gets as low as 26,9%.
Now that last one ( shouldn't be compared one to one to the others that offer a lot more then just music. Let's look at the music sections, there the top5% accounts for 54,2% (demonoid) and
26,1% (underground site).

So what does this tell us about the above mentioned explanations? The theory that enforcing a good upload ratio has a negative effect on diversity seems completely falsified, given the underground tracker enforces this very strictly, yet seems to have a level of diversity matching that found on eDonkey.
But what comes to light immediatly when looking at the graphs is the difference in diversity between film, music and books. Furthermore, note the difference between a huge tracker like demonoid, having a much smaller diversity then the relatively small and the underground site.
To explain this, I wanted to have a look at the impact of the community around the sites on the level of diversity. After all, it's not the computers that make the network, but the people sitting behind them. Unfortunatly (or maybe it's a good thing), a community is not easily quantifiable into raw data. As an experiment, I had a look if there was any relation between the ammount of comments on and the ammount of downloads of a particular torrent. Admittedly, taking comments as an indicator of positive community engagement towards a particular torrent is disputable (if only for all the spam/fake-reports, unrelated discussions, etc.), but it was worth a shot. Here's the results for the different trackers:

  • demonoid (uncategorised): a correlation of 1.30857336786747e-05 with t-value of 0.00130824618373437 and 9995 degrees of freedom.
  • demonoid movies: a correlation of 6.48935181356329e-06 with t-value of 0.000648772927290355 with 9995 degrees of freedom
  • demonoid music: a correlation of 5.01183570244519e-05 with t-value of 0.00499803415724177 with 9945 degrees of freedom
  • demonoid books: a correlation of 5.95491502467066e-05 with t-value of 0.00595342612033258 with 9995 degrees of freedom
  • underground (uncategorised): a correlation of 0.00126709985798827 with t-value of 0.378347919160155 with 89158 degrees of freedom
  • underground movies: a correlation of 0.00113434746983279 with t-value of 0.273001109309952 with 57921 degrees of freedom
  • underground music: a correlation of 0.00949155933205367 with t-value of 1.04653191278413 with 12156 degrees of freedom
  • underground books: a correlation of 0.00196267940269253 with t-value of 0.138907507033697 with 5009 degrees of freedom
  • a correlation of 0.000214671008695305 with t-value of 0.015221976301537 with 5028 degrees of freedom

I found it interesting to see once again the difference in correlation between books, music and film and between demonoid and the other trackers. Unfortunatly, the correlations are pretty damn small and actually none of this has any meaningful significance.
Still, it'd be worthwhile further exploring why certain media and sites have such a difference in diversity. To be continued...

ps. if anyone cares about code and/or raw datasets, contact me by e-mail on groente [at] puscii [dot] nl