Majestic Launches Robots.txt Archive

Today we are delighted to announce the launch of OpenRobotsTXT – a project to archive and analyse the world’s robots.txt files.

The first version of the openrobotstxt.org website is now live. This initial release is a slimmed-down site that aims to provide context for the OpenRobotsTXT crawler to begin its operation in the next few days.

(There is a Catch-22 when launching new crawlers, as the webmaster community likes to see a page that describes the crawler, to help inform consent.)

A banner for OpenRobotsTXT.

It says "Archiving and analysing the world's robots.txt files"

and "search - discover - monitor" — Pull-up banner for OpenRobotsTXT which reads “archiving and analysing the world’s robots.txt files”.

The project has been bootstrapped by a huge data export of robots.txt files collected by the Majestic crawler, MJ12bot. This has enabled us to analyse the User Agents reported around the web. The initial release of the site focuses on this study with a free-to-download (Creative Commons) data set that details the User Agents discovered across the web.

A range of free tools and features are planned for the OpenRobotsTXT site. Once we have launched the dedicated crawler, further updates will provide searchable archives, lots of stats, and deliver a greater insight into the world of robots.txt.

Latest posts by Majestic (see all)

Site Explorer: Advanced Query Filters BETA part 3 - November 27, 2025
A Sneak Preview of SEO in 2026 - November 27, 2025
How does GA4 help drive SEO strategy? - October 29, 2025

Comments

Jorge

Hm… Interesting stuff. Maybe it would be possible to also collect Data at wich IP/Webhoster/CDN the robots.txt/Domain resides on. That could give some interesting statistics on the market share of webhosting and cloud companies. Especially when tracking how it changes over time.

May 19, 2025 at 11:10 pm

Steve Pitchford
Hi Jorge,

Thanks for the comment. Some interesting thoughts. At the moment, we are focussed on working to release the dedicated crawler and searchable archive. It'll certainly be interesting to see what other datapoints we can extract and share.

Steve
May 20, 2025 at 12:19 pm

Emily

This is a fantastic initiative.
Archiving and analyzing robots.txt files at scale fills a much-needed gap in understanding how websites interact with crawlers. The free dataset on User Agents is already a valuable resource for researchers, SEOs, and developers alike.

May 30, 2025 at 7:44 am

Philip Aggrey
Thank you Emily. We hope to develop the archive even further in coming months. Watch this space. Ed.
May 30, 2025 at 3:19 pm

Comments are closed.