Today we are delighted to announce the launch of OpenRobotsTXT – a project to archive and analyse the world’s robots.txt files.
The first version of the openrobotstxt.org website is now live. This initial release is a slimmed-down site that aims to provide context for the OpenRobotsTXT crawler to begin its operation in the next few days.
(There is a Catch-22 when launching new crawlers, as the webmaster community likes to see a page that describes the crawler, to help inform consent.)

The project has been bootstrapped by a huge data export of robots.txt files collected by the Majestic crawler, MJ12bot. This has enabled us to analyse the User Agents reported around the web. The initial release of the site focuses on this study with a free-to-download (Creative Commons) data set that details the User Agents discovered across the web.
A range of free tools and features are planned for the OpenRobotsTXT site. Once we have launched the dedicated crawler, further updates will provide searchable archives, lots of stats, and deliver a greater insight into the world of robots.txt.
Read more at openrobotstxt.org
- What do SEOs Need to Know About Reactive PR? - May 27, 2025
- SEO User-Agents disallowed in robots.txt. Reflections on Ahrefs recent study. - May 23, 2025
- Majestic Launches Robots.txt Archive - May 15, 2025
Hm… Interesting stuff. Maybe it would be possible to also collect Data at wich IP/Webhoster/CDN the robots.txt/Domain resides on. That could give some interesting statistics on the market share of webhosting and cloud companies. Especially when tracking how it changes over time.
May 19, 2025 at 11:10 pmHi Jorge,
Thanks for the comment. Some interesting thoughts. At the moment, we are focussed on working to release the dedicated crawler and searchable archive. It'll certainly be interesting to see what other datapoints we can extract and share.
Steve
May 20, 2025 at 12:19 pmThis is a fantastic initiative.
May 30, 2025 at 7:44 amArchiving and analyzing robots.txt files at scale fills a much-needed gap in understanding how websites interact with crawlers. The free dataset on User Agents is already a valuable resource for researchers, SEOs, and developers alike.
Thank you Emily. We hope to develop the archive even further in coming months. Watch this space. Ed.
May 30, 2025 at 3:19 pm