Today we have updated the Historic Index. Whilst you should generally use the FRESH index for day to day analysis, because this data contains the most up to date and current links, the historic index remains by far our largest Index as it
contains historical data going back over more than five years. We don’t remove dead links from this database, which means it can be used to analyze historical data about how a site may have developed links in the past – even if they have since cleaned up their act.
Today’s index contains 3.6 Trillion Unique URLs and we have physically crawled 357.6 billion of these (most of them have been crawled many times).
It is worth explaining why other databases claim to have similar numbers but do not always show so many external inks as us when put to the test. The two reasons are that we include the deleted links in this number but we also ONLY include External links. That is, links coming from another domain or sub-domain to the site being analyzed.
When you look at other index sizes, there is a lot of confusion. Indeed – there is confusion even between our own Fresh index (which only includes links verified as existing within the last 30 days) and out historic index.
In short – use “Fresh Data” for day to day work. use “Historic data” when you want the largest possible numbers or deepest possible investigation provided you accept that this data will not have newly discovered links, as these generally take a month or more to migrate into the historic index.
Latest posts by Dixon Jones (see all)
- Crawling Smarter on the Infinite Web - March 16, 2017
- Biggest Link Index Hits New Highs with Historic Update - March 14, 2017
- A New Approach to Blogging. Expect a Better Standard. - February 7, 2017