Today we are releasing Majestic Million CSV FREE on an updating basis, for anyone to download and use under a free creative commons license.

Way back on Christmas Day, we released a copy of the Majestic Million database as a one off. We decided it was time to do something for FREE again that would help the web-o-sphere and thought that updating this for you every day and giving it free would fit the bill 🙂

The Majestic Million is a list of the top 1 million website in the world, ordered by the number of referring subnets. A subnet is a bit complex – but to a layman it is basically anything within an IP range, ignoring the last three digits of the IP number.

When we launched the Majestic Million, it did get a good reaction – but we have not seen it used as prolifically as we would like. We hope that making the data freely available in a CSV, programmers and API fanatics will download this on a regular basis and use it for interesting new ideas.

If you want to see what data is available in the Majestic Million download, I would urge you first to go and play with our own web implementation of Majestic Million, because a CSV file with 1 MILLION LINES is NOT something you want to download on a whim! However, recent versions of Excel can cope with 1 million lines if you have a computer with enough memory to handle it. Most people, though, would want to import the data into a database first.

Regular Updates

We have set up a CRON job so that every day we will recompile the data, which is based on our Fresh index. It is POSSIBLE that on two consecutive days, the data is the same, if the Fresh Index did not update for some reason, but usually the data will change daily. Please do not try to download the data more than once every 24 hours, otherwise you will simply end up getting banned or we will have to reconsider giving the data away or putting it behind a walled garden.

Download Location

Be wary scrapers… this will do your head in if you weren’t expecting it… The Majestic Million CSV can regularly be downloaded here:

http://downloads.majesticseo.com/majestic_million.csv

Please – if you have made use of this data for a benevolent purpose, please mention it in the comments. We are dying to see what you use it for. We cannot promise to publish all uses and the CSV is of course provided without warranties or support unless you are on a paid plan for other API options.

Small print:
Creative Commons License
Majestic Million CSV by Majestic 12 is licensed under a Creative Commons Attribution 3.0 Unported License.
Permissions beyond the scope of this license may be available at https://www.majestic.com/support/contact-us.

Dixon Jones
Latest posts by Dixon Jones (see all)

Comments

  • Sean Carlos

    Nice news, Dixon!

    I’m not completely sure I understand what the exact criteria are for saying these are the top million sites by subnet, i.e. determined by number of incoming links? I ask as many casual readers may assume that traffic (visitors or page views) is the determining criteria.

    As a side note, LibreOffice’s Calc also supports 1 million rows, although I’ve not yet actually put this to the test. I suspect that neither it nor Excel would be happy with column headers + 1 million data lines :-).

    October 1, 2012 at 10:34 am
    • Dixon

      I think there is a blog post about why we chose subnets about a year back – but in short, when looking at which sites may be creating the most “waves” or “influence” in the world, raw link counts are not ideal, because sitewides can distort rankings. Similarly, we found some servers may have thousands of sites/domains linking to a site that are controlled by the same person or group, so some false positives there as well. By ranking based on IP Ranges, rather than others, the list looks more robust.

      On the traffic front, we are measuring something different – and I think that whilst inevitably one would expect a correlation between traffic and this order, it is ultimately probably the outliers (in either direction) that would be of interest. I wonder who will be the first to highlight interesting aspect of outliers between our list and (say) Alexa’s list?

      October 1, 2012 at 10:45 am
  • Corey Northcutt

    Thank you Dixon (and all else that made this available)!

    I can’t wait to see how Excel on my laptop reacts to 79MB of data. 🙂

    October 1, 2012 at 3:50 pm
  • George

    Nice work, i will download it and wish you a nice day.

    October 1, 2012 at 6:06 pm
  • Ariel Fauler

    Great product I got some good ideas from the Majestic Million.

    October 2, 2012 at 8:29 am
  • Wes

    Just threw this in a pivot table. 11 TLD’s account for 80% of top 1 million websites. Pretty cool.

    October 2, 2012 at 2:54 pm
  • Petr Nachtmann

    Is Majestic Million CSV the best kept secret among the analysts?

    www.majesticseo.com/reports/site-explorer?q=http://blog.majesticseo.com/development/majestic-million-csv-daily/

    www.majesticseo.com/reports/site-explorer?q=http://downloads.majesticseo.com/majestic_million.csv

    December 4, 2012 at 7:19 pm

Comments are closed.