Majestic has done some Spring Cleaning this Autumn.
(It’s spring down under)
We’ve been doing some pretty cool back-end stuff to improve our systems. We didn’t think anyone would notice, but a few did. If you noticed, that might not be a good thing for you if you are tracking your own sites, but we thought it best to give you some of the background. I can’t give it all away though!
Majestic has found a way to improve its index quality – particularly at the Subdomain level. We have achieved this without removing any CRAWLED urls from our index, but we were able to identify some truly unbelievably spammy PBN (Private Backlink Networks) that went way beyond general link manipulation and into the realms of trying to flood a web index with stuff nobody would see or use.
…so we deleted them…
…and banned the sub-domains…
The long and short… our Fresh Index “URLs Found” table now has half the “Found URLs”, but the SAME number of “URLs crawled” as it had before.
HALF the (crawled) URLs were simply trying to send bots in circles!
How Much Data is That?
There was a spam network… maybe more than one… which was getting absurd at the sub-domain level. We can live with a bit of spam here and there, but these sub-domains were all REALLY bad.
On the 12th October we had 473,173,079,983 sub-domains in our Fresh index. By the 17th of October we were down to 57,956,672,821 and we think that when this is all said and done, we may have dropped 50% more!
Anyway – these are now not in the Fresh Index. If you aren’t involved in anything shady, you probably didn’t notice… EXCEPT if you compare sites at the sub-domain level (more about that below). The backlinks counts to your websites will not have changed (Since we deleted low value pages we never crawled anyway), although, over time, we probably won’t crawl some of the backlink pages we had previously crawled ever again, because we’ll have a better crawl methodology moving forward. Eventually the worst of the URLs will be confined to the Historic Index, allowing the Fresh Index to find better data.
If my Link Count didn’t change, why has my TF?
If you are finding that your Trust Flow has dropped wildly, then this MIGHT be because you are looking at the www. version of your site, not the Root or the home page.
It is up to you which level to choose, but if you CHOOSE www then you are comparing your site with every single other sub-domain in the world. The TF score is a normalized score of all these sub-domains (so very few with 100, lots with zero). Since we have removed around 90% of all sub-domains, you can imagine the re-calibration is pretty significant! Because we removed a lot of REALLY BAD stuff, then this will mean that a lot of sites actually have TF going DOWN because we are now comparing your site against a set which is generally of better quality. Of course, if you are at the top of the pile, then you may have found your TF actually going UP at the sub-domain level.
Why does Majestic have so many versions of TF & CF
This is a criticism that gets thrown at us all the time. People just want ONE TF score, regardless of whether they enter a page, a subdomain or a root domain. We could if you like, delete all the other metrics except the page level… but then you wouldn’t have domain level information. We could just get rid of the subdomain level… but then how would you compare Blogspost sites? Whichever way you look, there are times you need each one. The important thing is not to compare different types. Never compare Subdomain level metric with a URL or Root domain metric.
If you are buying or selling domains, don’t evaluate at the sub-domain level if you are buying a Top Level Domain. There. You have been informed.
Did you do anything else to your Algorithms?
Since we are here, it is worth saying that we are beginning to understand why the search engines don’t talk about this stuff much. Some of you will have noticed that some third party tools that steal our data don’t work so well now. You’ll have to ask the people stealing our data about that. I would recommend that you only trust tools that are listed either at:
Or ones using your own API keys (which you should keep secure!)
We also have made other improvements lately to our crawl methodology. In theory this won’t change anything fast, but should improve our data over time. Let’s see!