At SMX Advanced, Matt Cutts told Danny Sullivan ON CAMERA that links would be an important search ranking factor for some years to come. He chose the empassioned answer and tried to explain some of the issues surrounding the enormity of the Internet, but stopped short of giving the whole, geeky answer.
We thought we would expand on his comments. When he talks about “the enormity of it all” he leaves you hanging a bit. We think we have some concept of what he means.
The truth is that in order for you to know WHICH page is the most relevent to return for any given user query, you not only need to be able to crawl and index all the content, you also need to be able to know how to order the data set at your disposal. It turns out that is HARD, because most signals are disparate and not easy to compare side by side. Certainly, some signals can influence rankings significantly in certain situations. For example, if you are positively identified as being located in Charlotte NC, USA, the chances are, when you type in “Chinese Takeaway” you are not looking for our favourite lunch haunt in Birmingham. However, it is not a universal signal and not always appropriate.
As Matt Cutts says in the interview – paraphrasing the immensely influential Probability Drive theorist, Douglas Adams, “Space is Big, you have no idea how vastly, hugely, mind bogglingly big space is”. Then in his own words he says “You may think a walk down to the chemist is long, but that’s peanuts to space, and the fact is the web is like that”.
He is right. He gives an idea of that scale when he says that the Library of Congress has 151.4 million items which – when scanned through OCR is around 235 TerraBytes of text. For the biggest library in the world – 235 TerraBytes is all it needs. Here at Majestic – in order to crawl the whole web, and JUST record the link data, we have to measure our combined storage in PetaBytes, even before we include any storage used by the crawlers themselves around the world.
Matt uses an example to demonstrate a point – that when you start to use other signals, you really are not going to see a holistic metric for a while to come. When the suggestion is that social is going to take over, or that links are dead, Matt points out that it is premature to reach that conclusion and suggests that links will be around as a significant factor for some time to come yet. He points out that when you look at the number of links on the web that are no-followed, it is a single digit percentage. You can see that easily enough for yourself on Majestic SEO by looking at any website’s headline link stats. MattCutts.com for example has a higher nofollow count than most at around 12%. CNN.com seems to be less at 1%.
There is another challenge though – in the ability to store, assimilate and importantly retrieve large amounts of disparate data points at scale. Let’s say Google does indeed manage to assimilate us all and we have a Google profile for every person on the planet. That’s say 50 MB of data per person – which is pretty conservative if we include an avatar.
So that’s a pretty straightforward calculation 50,000,000 bytes X 6,973,738,433 = 348,686,921,650,000,000 bytes. that’s 348,686 Terra-bytes. Now that is doable, but the interrelationships between these 7 billion people is a wholly different scenario. When we looked at including internal links into our Flow Metrics, our link count increased by 800%. The relationship between – say – Dixon Jones and Matt Cutts – is complex. Are they personal friends? Well no. But have they played cards together? Well yes. Do they have personally have friends in common? Hell yes! Would Matt Cutts trust a post from Dixon Jones? well… hmm… about as much as Dixon would trust one from Matt, less a bit, I expect.
So then (and here’s my point) when Dixon Jones searches the internet for “Douglas Adams Quotes” there are a lot of them out there. Clearly Matt Cutts has some to hand (I doubt in context it was accurate, but Matt – if it was then the odds were EXACTLY 1 million to one that it was, as Terry Pratchett could tell you). But there are a lot of Douglas Adams quotes to be had. 30 million pages of them according to the all wise one. How are you going to order 30 million results about Douglas Adams quotes based on the views and opinions expressed by a few thousand friends or friends of friends in your Circle of Trust? How many of them have an opinion on the subject? – and if they did, I have as many friends that think Douglas Adams should be burnt at the cross as a heretic as have actually worked out the ultimate question. (The ultimate question needs you to read the books – but the answer is, as you may know, 42.)
By contrast, links win as a ranking signal by virtue of their ubiquity and relative ease of classifying and ranking. A link only pertains to a relationship between two entities on the web and this gives the link context. The number of links pointing into a url and the relative strength of those links give the content itself strength in that context. In short – it’s easier to retrieve this data FAST, out of a set of 30 million candidates – than it would be to order the views of a few thousand friends who may or may not have an opinion on the subject.