Majestic’s Fresh Index currently boasts a whopping 354.72 Billion (unique) URLs crawled. This number shows a dramatic growth rate, being more than double the size it was in September 2016. The Majestic Index developed steadily because of some technical wizardry. Here’s the obligatory screenshot.
Majestic Crawled Index
Before Becoming Too Excited
The magnitude debate (Mine is bigger than yours) will probably rumble on forever, but sometimes obfuscates underlying data. Majestic can be proud and fortunate to have one of the largest indexes of the web anywhere in the world. Leave alone amongst the Linkerati tools. Majestic acknowledges that size is not everything, however. The way link-counts are calculated is surprisingly complicated, as this old article on Moz demonstrates. Is a Blogspot site a subdomain or a root domain? Is a 301 from page A to page B a single link, or should every link from Page A now be added to the underlying numbers for Page B?
Everything is a compromise at scale. Different technologies select different operational decisions to these questions. Consequently, Majestic certainly does not always report the highest link counts. However, Majestic makes very good use of their Index and currently, Majestic Flow Metrics are hard to equal for quality. Majestic also remains focussed on their link graph to develop insights and therefore avoids rank checking or estimating traffic levels. The upside to that approach for consumers is Price. If you want great link intelligence, Majestic has it at the lowest entry price of any major data source. The best advice probably remains – multiple data sources for links in your SEO Tool set is probably much better than just one.
Clearly none of your Link Intelligence providers are standing still, which is great for SEOs. It does now look like the different tools are beginning to develop different strategies, though. Points of Differentiation are incredibly important for a healthy market.
Some insights into the technology employed to Achieve these results.
Not being a Tagfee company, I am not quite at liberty to give you as much of the detail as the Moz team would, but I can tell you that this did not happen overnight! We didn’t just add a bunch of servers and things doubled. Instead we had to look at the entire data stack. Being a distributed crawler, Majestic can generally crawl fast – but there is a difference between crawling quickly and crawling intelligently. Our Flow metrics certainly help to crawl more intelligently than if we didn’t have them, but there is a cost to that… the Metrics need a huge amount of crunching to work out. You first need to see the whole picture before you can start to do the maths. This means that much of the CPU work is not in the crawl, but in the analysis ready for the next crawl. We were finding that there was so much spam in the bottom end of the funnel that this was significantly hampering the entire process. If SEOs in the west think they understand Spam, you probably haven’t looked East… The new TLD systems have created new nuances and challenges which I imagine all search engines are having to grapple with.
Oh – and the Historic Index just went over the Trillion Mark!
As this post was being written, our Historic Index was also updated. Coincidentally, the crawled URLs in that index has gone over the Trillion mark. Here are the stats for the Historic Index now… Which you’ll notice have grown since the screenshot this morning: