We thought we would have some fun. Attached is a file showing the top 50,000 backlinks that we have to Google.com’s root domain. Feel free to play with this data. Here are some interesting facts about Google’s own backlink profile in Majestic SEO’s database.

Some observations about Goole’s Backlinks

1:  All but 1695 were crawled by us in December or January. That’s about 97%.

2:  Many of Google’s strongest links by ACRank are from widgets – in particular pulling in newsfeeds. For eample planet.wordpress.org uses Feedproxy to 301 redirect users to inner stories with links like: http://feedproxy.google.com/~r/weblogtoolscollection/UXMP/~3/k5JSiI6IzXI/ . These of course would not help Google, as they redirect straight out of Google. But this might set up the question: “If you have a link from feedproxy.google.com (something we can all do), is this worth more than a link directly within your site?” Majestic SEO is not the place to say what that means in algotithmic terms – but may be the best place to start if you want to set up some experiments of your own.

3:  Cnet, WashingtonPost and Chicago Tribune are all linking to Google from their home page (Or at least, they were, when we checked)

4:  If you take the file and start to analyze it, you will see that one page can typically link many times to Google in many different ways (and to many different subdomains). This demonstrates the need for care when deciding “which links count” (whatever you choose to mean for the word “count”.)

5: I was surprised to see “mysimon.com” so near to the top of the list. I had never heard of “mysimon”.

Getting this data within MajesticSEO is not hard. It is the equivalent of a “standard” report in the system – although the actual number of links you get will depend on your subscription level. So at £10 a month (about US$16) you would be able to get 10 such reports, but they would only have the top 5,000 backlinks for Google.com. You can make this report go a LOT further. though, by running the same report for the subdomain (WWW.google.com) or indeed the home page itself. Platinum subscriptions go to 20,000 depth, although you can get subscriptons that go 50,000 deep in the standard reports by agreement.

Why “Standard” is better than “Advanced”

Using the advanced report in theory will return ALL the links we have to Google.com. Now I am telling you here and now that you do not have a large enough subscription to run such a report and to do so would need us to prepare our servers for the ensuing onslaught as we serach through 25,492,660,101 links to 3,666,643,870 urls on Google.com we have indexed via 3180,909  subdomains. That’s on awful lot of data! But that’s why we created the standard report in the first place. It’s SO much more efficient.

Now – this is 50,000 backlinks as we see them raw in our dataset, sorted by ACRank. Our definition of “strongest” is quite specific – so please don’t go saying “But I know of many stronger links than these”. Any data set of this size will have some unusual anomolies. Imagine the anomolies in the other 25 billion 442 odd million that we know of!


Attachment: google_com_top50k_backlinks_Jan_2011.csv (Gzip format).

