How do you test how fresh a search index’s data is?
We decided to check – for ourselves – exactly how fresh (or stale) data is in various indexes around the web. We’ll show you comparison data for an example checked this morning (19th September 2011).
We are going to compare:
- MajesticSEO Fresh Index
- Yahoo
- Bing
The example website that we will be using at this stage is http://status.aws.amazon.com/.
If you don’t care about the methodology – only the research output… here you go:
Search Index Tested | Date Seen by Index |
Actual: status.aws.amazon.com | 19th September 2011 |
Majestic SEO Fresh Index | 17th September |
Google.com | 19th September |
Yahoo.com | 14th September |
Bing.com | 15th September |
Here’s exactly how we got this data. There are a few steps that you will need to take, which I will take you through right now.
Finding a base line
Right now, http://status.aws.amazon.com/ shows today’s date in its title. That’s a great place to start, because nobody is going to say that Amazon is likely to try to manipulate the date, so every time an index updates, you’ll see the date that the page was actually crawled in the title, regardless of when the new information becomes live. Once you have the example web page up, right click on your mouse and select ‘View Page Source’. A separate box should pop up (like the one in the picture).
There you will need to look for the title that any and every crawler will see. (The title is highlighted in the picture). So in this case, the title for this particular page is ‘AWS Service Health Dashboard – Sep 19, 2011’. The date will change depending on the day that you complete this action.
Testing Majestic SEO Fresh Index
Now that you have found a base line, you just need to check this against all the indexes of the web that you would like. So for Majestic SEO, go to majesticseo.com and enter the address http://status.aws.amazon.com/ into the bar and press explore. You should get this:
Hopefully, you should come to a page that gives information on the web sites Backlink History, Referring Domains and Top Backlines; basically it is a complete summary of the web domain that you want information on. If you don’t, try logging in first!
The picture above shows the title of the web domain, it’s URL, the date that it was last crawled, its External Backlinks and the number of Referring Domains. So this example website was last crawled one day ago on 18th September 2011 and has 1,062 Referring Domains – but the TITLE says that the crawl date was actually the 17th. It is perhaps our own bad luck that our crawlers are using London time and Amazon is using a US time. Otherwise the crawl date in our system and the date in the page title should be the same. But we want to compare like with like, and the Amazon title is the base line, so we’ll take the 17th September as the ACTUAL crawl date, using Amazon’s server time.
Testing Google
After testing Majesticseo.com you will need to test Google. Type in the same URL into the Google search bar, the same information should match with the picture below.
From this picture, the information that Google gives you is spot on with today’s date. In fact, the information was last updated two hours ago (from when this was written). Interestingly, if you were to grab the “cache” of Google’s data set, it suggests the cached information is sometimes older, but again – let’s go like for like, using the independence of Amazon’s title as the baseline.
Testing Yahoo
Once you have tested Google, the next step is to test Yahoo.com. Once you are there, you will again need to type in the same link that has been used in the first, second and third steps. At the very top of the page, you will see the information you will need.
The information that yahoo.com has given can be seen above. The date given by Yahoo for the same link (all this information was collected on the 19th September) is September 14th, so that is around five days ago!
Testing Bing
The final step in collecting this information is to go onto bing.com. Again, you will need to type in the same link here. Once you have done that, a page should appear that looks like the one below.
The title here is still the same as it is in each of the other steps, but the date given by bing.com, is the 15th September. This is still nearer to today’s date than Yahoo, but is not as up to date as Majestic or Google.
In Conclusion
You can see exactly what index, gives you the freshest data and which gives you the most out of date without having to rely on claims. As the table shows, the method that gave the most out of date information is Yahoo – whilst Google.com, gave the most up to date information. The information was updated two hours before this post was written. Majestic SEO gave the second most up to date information, beating Bing, and Yahoo by several days.
- How Important will Backlinks be in 2023? - February 20, 2023
- What is in a Link? - October 25, 2022
- An Interview with… Ash Nallawalla - August 23, 2022
I see Yahoo/Bing both showing as the 16th, but interesting I also see this post in Yahoo as being 47 minutes old. I presume Bing employ a historical (previously discovered URLs) and fresh index (previously undiscovered URLs) separate of each other?
September 19, 2011 at 2:06 pmAh! you just spotted the very reason why you need to be wary. Just because Bing knows that the page is here, does not yet mean that it has crawled it properly for its main index. You can see that (as I type) there is no “cached” link for this page yet… whereas other results further down the Bing page do have cached versions. (Bing LIKES fresh stuff it seems)
Using a page that has a date in the title helps us – because to get the TITLE a crawl must have taken place. of course, you could have your own page with a date and a time in the title, but it won’t be as powerful as Amazon’s. so will not get crawled anywhere like as regularly by any of the indexes.
September 19, 2011 at 2:14 pmAh yes good point, I hadn’t noticed the lack of a cached link.
September 19, 2011 at 2:54 pm