From time to time, we receive queries from users who are surprised when their latest page, blog comment, or content does not appear in Majestic Fresh Index. whilst we are very flattered that our data is considered so comprehensive that people are surprised when they find something that isn’t there, we felt it worthwhile explaining some of the technical reasons why some backlinks are not reported in Majestic Site Explorer.

1 – Majestic SEO reports on the internet.



Majestic SEO is a powerful internet mapping tool with access to huge amounts of data. The creation of a “Map” is key to Majestics offering – Majestic-12 Ltd, the company behind Majestic SEO, scans the web with crawling software, building up a map. It is then this map that gets checked when someone enters a query in Majestic Site Explorer. You could say “Majestic SEO reports on the Internet but Majestic SEO is not the internet”. Just as a map of a major city need not show every newspaper seller to be a valid map, a map of the internet also needs to filter out some data to be effective.

Once the data is gathered, the map building process begins, and maps can take time to build. After years of research, Majestic is able to aim to update the fresh index on a daily basis. Our historic index contains so much data that the update process takes much longer. This production phase – a batch process, which increases the time between a page being crawled and the finished map being published adds an inevitable delay in reporting backlinks.

If you want to learn more on how the crawl data gets transformed into our index, our recent Infographic, “How Majestic SEO works” is a good starting point.

2 – Majestic SEO is massive.

Here at Majestic, our engineers constantly try to find ways of building bigger and bigger maps of the internet in a quick a time as possible. This has enabled us to build up staggeringly huge maps in our Fresh and Historic indexes.



Whilst Majestic-12 has massive amounts of storage and substantial processing capability which facilitates the creation of these indexes, they are not infinite. Whilst this seems obvious, it has a knock on effect that is critical to all web crawling. Consider for a moment the number of links that can exist on one web page. Then think about how many web pages can be hosted on one website, then consider how many websites can be hosted on one single web server – mind blowing amounts of data! When the effect of dynamic pages, some of which may be generated by buggy webservers creating near infinite amounts of pages is considered, it becomes clear that there will be more links on the internet than are worthwhile storing, even if it were possible.

Therefore, there has to be some cutting off point for any Search Engine on the amount of pages that can be visited and hence the number of links harvested from any website.

3 – Majestic tries to be a good netizen


At the time of writing, our historic index contains references to 3.7 Trillion URLs. That’s 35 times more than our Fresh Index. Whilst Majestic has access to a massively powerful distributed crawling network, we have a duty to use the power responsibly and continue to invest in trying to ensure the delicate balance of scanning websites to create fresh reports, whilst not creating undue load on individual webservers is met. This means that as a matter of etiquette, regardless of logistics, we cannot recrawl every page in our Historic index every month – instead we prioritise our crawl, visiting some sites daily and others less frequently.

As a responsible netizen, Majestic also needs to ensure it’s crawlers obey robots.txt. We encourage webmasters to use the “crawl-delay” setting if they are concerned about the load on an economy or free hosting provider, which may be more sensitive to bandwidth and page requests than a more robust commercial offering.

Whilst some webmasters ( and perhaps some systems administrators acting without the website owners knowledge ) choose to block robots, the interlinking nature of the internet generally results in articles with real-person friendly content getting picked up from a variety of sources – it can be helpful to view links as being fluid in nature, the odd link, or to analogise, “odd drop” mattering little when compared to the stream of traffic enabling backlinks which good content facilitates.

4 – All crawlers are different



Some time ago, we attempted to address the question “Why do search engines and backlinks providers show different link counts for the same site“. The brief answer is that most backlink intelligence providers generate their internet map using different techniques. On the most popular websites, there is often a degree of conformance of understanding, but as sites become less well linked to – perhaps because they cater for a niche audience, or are very new, the relative priority associated with the site will vary between crawlers – potentially lowering the visit rate and making the chance of inclusion in Search Engine Data more volatile.

In conclusion

We have seen that for any index of the internet – be it a full text search engine, or a specialist search engine like Majestic SEO, it is inevitable that different websites will need to be assigned different crawling priorities – that is the frequency of visits will vary between websites, and in order for a web page or link to be detected, and built into an index, certain requirements must be met. These requirements vary between web crawlers, leading to differences in which pages and links are present in different link indexes.

If you have noticed a link to your site on a third party site, but have not seen the link reflected in our Fresh Index index, the first issue to investigate is “is the source of the link on a well linked to website” ( you can check this using Majestic Site Explorer ). If the source page is well linked to, chances are that Majestic will prioritise the crawl higher, resulting in links from this page being reported faster.

If your link is from a popular page, and it doesn’t appear in Majestic Site Explorer, check robots.txt – see if the source site allows robots to crawl. Its also worth asking how many pages are hosted on the same site, and how many links per page. If the link is from a page which contains huge numbers of competing links, don’t be surprised if it gets overlooked in a search engine. Directories of links maybe easy to score backlinks from, but logistically, Majestic and other Search engines can only store a finite amount of data about any given page or website, and if there are too many links some will be omitted from indexing.

Crawling the internet is at the heart of what Majestic does, and whilst we are proud of being a global leader in backlinks intelligence data, we will not rest on our laurels. We continue to invest heavily into enhancing our web crawl, with a number of exciting projects planned for this year.

I’ll finish with some good news – whilst there will always be edge cases when the occasional link is absent from any search engine, with frequent updates and Tens of Billions of URLs crawled per month, once word gets out about your new page or site, we are confident you will be able to follow its growth using Majestics Fresh and Historic indexes.

Comments

Comments are closed.