The words "OpenRobotsTXT" on a teal background.

Today we are delighted to announce the launch of OpenRobotsTXT – a project to archive and analyse the world’s robots.txt files.

The first version of the openrobotstxt.org website is now live. This initial release is a slimmed-down site that aims to provide context for the OpenRobotsTXT crawler to begin its operation in the next few days.

(There is a Catch-22 when launching new crawlers, as the webmaster community likes to see a page that describes the crawler, to help inform consent.)

A banner for OpenRobotsTXT. 

It says "Archiving and analysing the world's robots.txt files"

and "search - discover - monitor"
Pull-up banner for OpenRobotsTXT which reads “archiving and analysing the world’s robots.txt files”.

The project has been bootstrapped by a huge data export of robots.txt files collected by the Majestic crawler, MJ12bot. This has enabled us to analyse the User Agents reported around the web. The initial release of the site focuses on this study with a free-to-download (Creative Commons) data set that details the User Agents discovered across the web.

A range of free tools and features are planned for the OpenRobotsTXT site. Once we have launched the dedicated crawler, further updates will provide searchable archives, lots of stats, and deliver a greater insight into the world of robots.txt.

Read more at openrobotstxt.org

Leave a Comment

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>
*