2025-2026 Crawler Roadmap -

With over twenty years of crawling, it’s fair to say that the Majestic crawler has begun to show its age. When we began, a distributed, community led crawl was cutting edge stuff. However, as the web has matured, so have expectations around crawling. In this post we will:

Share news about a future complementary new Crawler
Revisit our approach to Crawling
Open up our roadmap for crawling and development.

New Crawler

Over the past few months we’ve been refining our Crawling stack. People who keep a keen eye on their logs may see entries from a v2 MJ12bot. This reflects many months of development, and a fork in our crawling strategy.

For years Majestic has relied upon a distributed network of crawlers. The aim of this recent development is to add a complimentary, centralised crawling capacity to compliment MJ12Bot. While a great many webmasters seem content to continue to let MJ12bot crawl, times have changed, and there are those who have concerns about a crawler that isn’t able to support easy verification via reverse DNS.

This marks a significant change of direction for Majestic. Many other firms have operated more than one crawler for some time. For the most part, Majestic have relied on MJ12Bot to gather data. However, in keeping with industry practice, some third party data sources have been included.

The aim is for the new centralised crawler to be sensitive to webmasters with more limited bandwidth. A centralised service offers greater orchestration and co-ordination, together with support for standards like reverse DNS.

As centralised and distributed crawling are somewhat different, Majestic will be introducing a new, distinct user-agent for this new centralised crawler. We’ll be releasing details close to launch.

DON’T PANIC!!! Most webmasters will not need to do anything. At least not just yet. For the final stage of beta release, and at least 12 months after, the new user agent will respect all robots.txt commands related to MJ12Bot.

Details of the new user agent for the centralised crawl, together with the RFC9309 product token will be released on a new microsite. Advanced users will be able to separately target both MJ12bot and the new User-Agent separately by introducing robots.txt directives targeting the new user-agent.

Our Approach to Crawling

The primary means of data collection for Majestic has been MJ12Bot. However, those familiar with the field of web crawling will be aware that there are some sites that are happy to be indexed, but not too happy to be crawled. An obvious example is Wikipedia. Wikipedia receives a lot of requests, so asks developers to download archives instead of crawling the site.

There are other archives that tend to be incorporated into web crawls. The use of Common Crawl data is widespread.

We’ve been transparent about our approach to the inclusion of third party data for a number of years.

However, just like with web crawling, further ways of sharing resource and making crawling more efficient for website hosts have also come online.

Look no further than Ahrefs and Bing co-operating to share information through Bing’s innovative Index now program.

Given that Majestic is on the brink of becoming a multiple crawler organisation, we felt it is a good time to review our data inclusion policies.

With the advent of AI, webmasters now see the increasing demands of an increasing variety of crawlers. We know from experience that many webmasters and boutique web hosting providers are concerned with bandwidth demands. To try to ensure a level playing field, Majestic has begun trialling a limited evaluation program which will see collaboration with a small number of boutique third party crawlers. The aim is to share information and attempt to co-ordinate crawling to reduce load on webservers. We recognise that this is a bold step, so are founding this program with the following important guardrails in place:

Crawling must be RFC 9309 compliant: User Agents must be declared and robots.txt must be respected.
The third parties must be associated with Internet Cartography in some way, or in the Internet information architecture research area.
We want to work with established firms. We have no desire to create a revolving door of ever-changing User Agents from new start-ups.

In the initial stages, this program will be on an “Invite-only” basis. There is no waitlist.

We hope that this program will go a small way to reducing the load on webmasters while offering benefits to member organisations and through them, to the wider internet community.

Your Feedback

A new crawler is a significant step. MJ12bot has been operating for over 20 years, and we hope it will continue to operate for at least another 20. However, much has changed on the web since the distributed crawl project was conceived.

We hope that by introducing a new crawler, we can offer a more nuanced crawl, especially to webmasters concerned about the distributed nature of MJ12bot. We’ve had a great deal of feedback and experience over the years and have put much of that into recent developments.

MJ12bot will continue to see enhancements. The two crawlers share a great deal of code and infrastructure. Where possible, enhancements to one user-agent will be made available to the other.

We look forward to sharing details of the new user agent in the weeks to come.

In terms of the collaborative crawl initiative, statements are slightly harder to co-ordinate as more parties are involved. However, communications strategies are being discussed and we hope to share more soon.

This strategy has been informed by the feedback and conversations we have had with the community over the last twenty years. We continue to welcome your feedback and dialog.

Author
Recent Posts

Majestic

Comments

Statistics Nerd
Will Majestic also include some fulltext crawl and analytics in the future ?
August 19, 2025 at 2:36 pm
Steve Pitchford
Hi Statistics Nerd,

I wouldn't wish to rule out anything.

Majestic was founded on the principles of a distributed search, and understanding search is a key part of that. It's worth noting that Majestic did operate a full text search based on the crawl for a few years. It is well treaded ground. And, there are a lot of players who have tried and failed.

So far, I think it's fair to say that only bing seem to have made a dent in googles dominance, though new AI platforms seem to be altering the market and starting to create mindshare.

Others like Ahrefs and Mojeek have had a go at full text, are still going but seem yet to capture wider interest. That's not to dismiss the effort and skill. I've tried both and what they've acheived is impressive. That said, with search in a state of flux, i'm not quite sure that today is the right day to launch a new full text search engine or commit to timescale to deliver one.

However, with Google introducing AI overviews and the rise of Grok giving X information retrieval capabilities, it's totally possible that there will be new opportunties in Full text search if Google forgets it's past and continues on the path to portalisation and user capture. Some day soon, the concept of ten blue links could be truely innovative and disruptive. If that time comes, Majestic does have some rather impressive metrics and link analytics up its sleeves.

We shall see. Interesting times?

Steve
August 19, 2025 at 3:27 pm
BAJI IntelliQ IT
The Majestic 2025-2026 Crawler Roadmap looks truly promising and futuristic. With its advanced updates, scalability, and enhanced data capabilities, it sets a strong direction for businesses aiming to stay ahead in digital intelligence. IntelliQ IT appreciates the vision behind this roadmap and looks forward to seeing how it empowers developers, marketers, and enterprises with more accurate insights and performance-driven results.
August 21, 2025 at 5:36 am
Rohini
Great update on the 2025-2026 crawler roadmap. Good to see Majestic focusing on transparency and smooth transition with the new crawler. At IntelliQ IT, a leading DevOps institute in Ameerpet, we also value such innovations that help learners and professionals stay updated with the latest technologies.

Ed: Thanks for your comment. Sorry – links not allowed.
August 23, 2025 at 12:16 pm

Comments are closed.