The words "OpenRobotsTXT" on a teal background.

Today we are delighted to announce the launch of OpenRobotsTXT – a project to archive and analyse the world’s robots.txt files.

The first version of the openrobotstxt.org website is now live. This initial release is a slimmed-down site that aims to provide context for the OpenRobotsTXT crawler to begin its operation in the next few days.

(There is a Catch-22 when launching new crawlers, as the webmaster community likes to see a page that describes the crawler, to help inform consent.)

A banner for OpenRobotsTXT. 

It says "Archiving and analysing the world's robots.txt files"

and "search - discover - monitor"
Pull-up banner for OpenRobotsTXT which reads “archiving and analysing the world’s robots.txt files”.

The project has been bootstrapped by a huge data export of robots.txt files collected by the Majestic crawler, MJ12bot. This has enabled us to analyse the User Agents reported around the web. The initial release of the site focuses on this study with a free-to-download (Creative Commons) data set that details the User Agents discovered across the web.

A range of free tools and features are planned for the OpenRobotsTXT site. Once we have launched the dedicated crawler, further updates will provide searchable archives, lots of stats, and deliver a greater insight into the world of robots.txt.

Read more at openrobotstxt.org

Comments

  • Jorge

    Hm… Interesting stuff. Maybe it would be possible to also collect Data at wich IP/Webhoster/CDN the robots.txt/Domain resides on. That could give some interesting statistics on the market share of webhosting and cloud companies. Especially when tracking how it changes over time.

    May 19, 2025 at 11:10 pm
    • Steve Pitchford

      Hi Jorge,

      Thanks for the comment. Some interesting thoughts. At the moment, we are focussed on working to release the dedicated crawler and searchable archive. It'll certainly be interesting to see what other datapoints we can extract and share.

      Steve

      May 20, 2025 at 12:19 pm
  • Emily

    This is a fantastic initiative.
    Archiving and analyzing robots.txt files at scale fills a much-needed gap in understanding how websites interact with crawlers. The free dataset on User Agents is already a valuable resource for researchers, SEOs, and developers alike.

    May 30, 2025 at 7:44 am
    • Philip Aggrey

      Thank you Emily. We hope to develop the archive even further in coming months. Watch this space. Ed.

      May 30, 2025 at 3:19 pm

Leave a Comment

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>
*