With over twenty years of crawling, it’s fair to say that the Majestic crawler has begun to show its age. When we began, a distributed, community led crawl was cutting edge stuff. However, as the web has matured, so have expectations around crawling. In this post we will:
- Share news about a future complementary new Crawler
- Revisit our approach to Crawling
- Open up our roadmap for crawling and development.
New Crawler
Over the past few months we’ve been refining our Crawling stack. People who keep a keen eye on their logs may see entries from a v2 MJ12bot. This reflects many months of development, and a fork in our crawling strategy.
For years Majestic has relied upon a distributed network of crawlers. The aim of this recent development is to add a complimentary, centralised crawling capacity to compliment MJ12Bot. While a great many webmasters seem content to continue to let MJ12bot crawl, times have changed, and there are those who have concerns about a crawler that isn’t able to support easy verification via reverse DNS.
This marks a significant change of direction for Majestic. Many other firms have operated more than one crawler for some time. For the most part, Majestic have relied on MJ12Bot to gather data. However, in keeping with industry practice, some third party data sources have been included.
The aim is for the new centralised crawler to be sensitive to webmasters with more limited bandwidth. A centralised service offers greater orchestration and co-ordination, together with support for standards like reverse DNS.
As centralised and distributed crawling are somewhat different, Majestic will be introducing a new, distinct user-agent for this new centralised crawler. We’ll be releasing details close to launch.
DON’T PANIC!!! Most webmasters will not need to do anything. At least not just yet. For the final stage of beta release, and at least 12 months after, the new user agent will respect all robots.txt commands related to MJ12Bot.
Details of the new user agent for the centralised crawl, together with the RFC9309 product token will be released on a new microsite. Advanced users will be able to separately target both MJ12bot and the new User-Agent separately by introducing robots.txt directives targeting the new user-agent.
Our Approach to Crawling
The primary means of data collection for Majestic has been MJ12Bot. However, those familiar with the field of web crawling will be aware that there are some sites that are happy to be indexed, but not too happy to be crawled. An obvious example is Wikipedia. Wikipedia receives a lot of requests, so asks developers to download archives instead of crawling the site.
There are other archives that tend to be incorporated into web crawls. The use of Common Crawl data is widespread.
We’ve been transparent about our approach to the inclusion of third party data for a number of years.
However, just like with web crawling, further ways of sharing resource and making crawling more efficient for website hosts have also come online.
Look no further than Ahrefs and Bing co-operating to share information through Bing’s innovative Index now program.
Given that Majestic is on the brink of becoming a multiple crawler organisation, we felt it is a good time to review our data inclusion policies.
With the advent of AI, webmasters now see the increasing demands of an increasing variety of crawlers. We know from experience that many webmasters and boutique web hosting providers are concerned with bandwidth demands. To try to ensure a level playing field, Majestic has begun trialling a limited evaluation program which will see collaboration with a small number of boutique third party crawlers. The aim is to share information and attempt to co-ordinate crawling to reduce load on webservers. We recognise that this is a bold step, so are founding this program with the following important guardrails in place:
- Crawling must be RFC 9309 compliant: User Agents must be declared and robots.txt must be respected.
- The third parties must be associated with Internet Cartography in some way, or in the Internet information architecture research area.
- We want to work with established firms. We have no desire to create a revolving door of ever-changing User Agents from new start-ups.
In the initial stages, this program will be on an “Invite-only” basis. There is no waitlist.
We hope that this program will go a small way to reducing the load on webmasters while offering benefits to member organisations and through them, to the wider internet community.
Your Feedback
A new crawler is a significant step. MJ12bot has been operating for over 20 years, and we hope it will continue to operate for at least another 20. However, much has changed on the web since the distributed crawl project was conceived.
We hope that by introducing a new crawler, we can offer a more nuanced crawl, especially to webmasters concerned about the distributed nature of MJ12bot. We’ve had a great deal of feedback and experience over the years and have put much of that into recent developments.
MJ12bot will continue to see enhancements. The two crawlers share a great deal of code and infrastructure. Where possible, enhancements to one user-agent will be made available to the other.
We look forward to sharing details of the new user agent in the weeks to come.
In terms of the collaborative crawl initiative, statements are slightly harder to co-ordinate as more parties are involved. However, communications strategies are being discussed and we hope to share more soon.
This strategy has been informed by the feedback and conversations we have had with the community over the last twenty years. We continue to welcome your feedback and dialog.
- 2025-2026 Crawler Roadmap - August 11, 2025
- SEO for AI Models (Majestic SEO Podcast) - July 28, 2025
- 10 Ways to assess a backlink profile in Majestic - June 30, 2025