Who’s Most Famous?
A recent research study was carried out by Young-Ho Eom of the University of Toulouse with the objective of determining the most influential person on Wikipedia. The inference was rather surprising, and at odds with the prevailing paradigm, in that it concluded that the Swedish botanist Carl Linnaeus was more influential than either Jesus or Hitler. An explanation for this rather atypical assessment can be attributed to the approach employed in the study, namely, the adaption of Stanford University’s PageRank (PR) algorithm to calculate the number and value of incoming links to any given article.
What is PageRank?
PageRank is one of the methodologies that Google uses to determine the relevance or importance of a site. The PageRank metric was developed by Google’s co-founder Larry Page and Sergey Brin during their time at Stanford University. This ranking procedure has drawn a great deal of attention from researchers in various fields due to its importance in the evaluation of webpage performance. Most preceding analyses attempted to resolve the problem with either a subjective approach, based on expert survey metrics, or an objective approach, based on citation-based metrics. Both methodologies have their own advantages and disadvantages, and they are usually complementary.
PageRank does indeed provide a good approximation to the importance of a webpage. However PageRank may not provide an accurate evaluation of new websites, many of which may contain relevant information, because of a lack of backlinks pointing to the site. Further, since PageRank does not analyse the web page content, the inbound links to a particular page may carry descriptions of topics which may not be pertinent to queries because of the classification of webpages by topics. In simple terms, PageRank is a numeric value assigned to a website depending upon the importance the algorithm places on unique content, such as backlinks, site structure, anchor text, etc. Thus, there is no guarantee that a site with a high PR will automatically acquire a high position in terms of the relevance to a particular topic or query.
Why PageRank went wrong in this study
Something unusual must have happened in the calculation of PageRank in this study, because the result showing a botanist as bigger than Jesus does not seem to hold merit. In particular, the integrity of a link based algorithm depends, to a large extent, upon no one person or effect being able to unduly unbalance the data. In this case, it appears that the calcualtions were carried out entirely on pages within Wikipedia, and ignored external links pointing into the site. Whilst this is the only realistic way to do the work without using a global index like Google or Majestic, it demonstrates the need for global data sources when considering a subset or segment of the web universe.
Different algorithms that calculate both incoming and outgoing links can give rise to different effects. Further, the results can be influenced by the cultural and linguistic contexts within which these studies are undertaken. In addition, the constantly varying evolution of the Wikipedia content can also have a discernable effect upon the outcomes, and therefore upon the conclusion reached. Re-indexing by Google can also have an influence on the current PageRank of a particular site. In particular, PageRank does not provide any indication regarding the content or size of a page, the language it’s written in, or the text used in the anchor of a link.
Comparative Studies
In this article, we revisit the study of the most influential person on Wikipedia. We use two other comparative metrics, namely, our very own MajesticSEO Topical Trust Flow and MOZ’s opensiteexplorer.com. Majestic SEO’s data is developed from the ground up by crawling the entire web (not just Wikipedia) and applying its own proprietry metric instead of the PageRank algorithm. OpenSiteExplorer also uses its own metric, which is not public in how it is totally derived, but is believed to be in part influenced by the Search Engine prominence of a URL and therefore is likely to correlate well with PageRank as calculated by Google on a worldwide index.
Both Comparative methodologies return Jesus as the most influential person.
Figure 1 shows the results of the page specific metrics as computed by MajesticSEO’s Site Comparator Tool for 24 June, 2104. The influence list for Wikipedia is, in order, Jesus, Hitler and Linnaeus, with Trust Flows of 56, 56 and 50 respectively. Indeed, in terms of the number of metrics, the values for Jesus in the Wikipedia entries greatly outnumber those for the other two.
Figure 1: MajesticSEO Site Comparator Tool Statistics
The MOZ metrics also corroborate the MajesticSEO statistics for the same Wikipedia entries, as displayed by the Page Authority scores in Figure 2.
Figure 2: MOZ Page Specific Metrics
Next, we consider individual Trust Flows (TF) and Citation Flows (CF) using MajesticSEO’s Site Explorer Tool to determine the Trust and Citation Flows for each of the aforementioned Wikipedia entries. Figures 3, 4 and 5 provide details of the inbound link and site summary data for Wikipedia entries referring to Jesus, Hitler and Linnaeus respectively. Again, the statistics support our ranking order as Jesus, Hitler and Linnaeus. Note the concentration of topics for each of these entries. The general topic “Society” seems to dominate the composition of the Topical Trust Flows for Jesus and Hitler, while “Science” leads that of Carl Linnaeus, which is not surprising, given that he was a botanist, physician, and zoologist.
Figure 3: Site Summary Data for Wikipedia Entry “Jesus”
Figure 4: Site Summary Data for Wikipedia Entry “Hitler”
Figure 5: Site Summary Data for Wikipedia Entry “Linnaeus”
Finally, we compare a composite list of the Topical Trust Flows for these Wikipedia entries, as displayed in Figure 6. Again, the MajesticSEO data provides the rankings as
- Jesus has a TF of 56 and a CF of 55;
- Hitler has TF of 56 and a CF of 54;
- Linnaeus has TF of 50 and a CF of 50.
Figure 6: MajesticSEO Bulk Backlink Checker Results
Conclusions
This study provides evidence that MajesticSEO’s view of “importance” based on spatially understanding the whole universe of URLs instead of analysing just a site or subset such as Wikipedia is a stronger methodology for determining the ranking of Wikipedia’s influence list.
**Sign up to Majestic Insights for more**
If you enjoyed this research, you are welcome to join Majestic Insights – a free service that will tell you when we produce more in-depth data, such as industry reports. Users signing up get our Twitter top 50,000 list as well. Registering is easy over here.
- Ranking of Top Data Scientists on Twitter using MajesticSEO’s Metrics - August 19, 2014
- Measuring Twitter Profile Quality - August 14, 2014
- PageRank, TrustFlow and the Search Universe - July 7, 2014
I don’t think I’ve ever seen Jesus and Hitler used to show how good a product performs before now! When measuring inbound links, how is relevancy measured? I understand how trust, citation, authority etc can be measured, but not relevancy. And Sir Google informs us that this is just as important
July 7, 2014 at 2:21 pmHi Colin, we have given some more detail about our categorisation (Topical Trustflow) over here: http://blog.majesticseo.com/development/topicaltrustflow/ but I got more technocal abput it over on Google+ here: https://plus.google.com/+DixonJonesDotCom/posts/YSbwNXhQRpd
July 7, 2014 at 2:35 pmThanks Dixon. The accuracy isn’t perfect but it does give a really good indication. Another metric to use!
July 7, 2014 at 6:37 pmIs it true that your trust flow metric is better than the page rank? so i’ve heard from a co worker. thx!
July 7, 2014 at 9:00 pmThat’s not what this study shows. This study suggests that using connections within an ecosystem is not as accurate as using the entire universe.
However, there are several advantages of Flow Metrics over Page Rank.
July 7, 2014 at 10:18 pm* an extra degree of granularity
* more regularly updated (compared to Google’s calculation)
* available on demand
* available by topic
* accessible in bulk and by API
* available by keyword as well as by URL
* available at the domain level, not just by page level.
I used majesticSEO in my last organization and truly it is a really good and reliable tool for search engine ranking/analysis. I am not sure whether TF is better or PR because we always see many websites with low PR or TF overtake high PR/TF sites. Does these all really matter today?
July 10, 2014 at 3:41 pmI just try to use majestic its great tool and as far as your above points about page rank and other SEO discussion its informative keep it up. Thanks
July 14, 2014 at 12:50 amHello Dixon,
Today using your tool I came to know about this. What to do if my links are coming from poor sites. I am noticing sudden huge fall in my incoming traffic. Probably may be due to this. Kindly guide.
July 30, 2014 at 1:36 pmWe do not offer one-to-one consultancy. Here are a few tips that may apply to you:
July 30, 2014 at 1:49 pm* Don’t use a free web hosting service if you want to be taken seriously
* Don’t use a free email address if you want to be taken seriously
* Don’t try to put your website in comments on blog posts about link building. I took the liberty of taking your blogspot link off your comment.
Makes lots of sense. Another evidence that link building is not dead. In fact, its even more important today than anything else. However, crappy links are not getting much value. Do not buy links or post links in link directories. Focus on good quality articles that add value.
August 20, 2014 at 9:10 pmI think that’s right. There is so much more “intel” measurable in a link than most SEOs realize.
August 20, 2014 at 9:23 pm