Today we are releasing Majestic Million CSV FREE on an updating basis, for anyone to download and use under a free creative commons license.
Way back on Christmas Day, we released a copy of the Majestic Million database as a one off. We decided it was time to do something for FREE again that would help the web-o-sphere and thought that updating this for you every day and giving it free would fit the bill 🙂
The Majestic Million is a list of the top 1 million website in the world, ordered by the number of referring subnets. A subnet is a bit complex – but to a layman it is basically anything within an IP range, ignoring the last three digits of the IP number.
When we launched the Majestic Million, it did get a good reaction – but we have not seen it used as prolifically as we would like. We hope that making the data freely available in a CSV, programmers and API fanatics will download this on a regular basis and use it for interesting new ideas.
If you want to see what data is available in the Majestic Million download, I would urge you first to go and play with our own web implementation of Majestic Million, because a CSV file with 1 MILLION LINES is NOT something you want to download on a whim! However, recent versions of Excel can cope with 1 million lines if you have a computer with enough memory to handle it. Most people, though, would want to import the data into a database first.
Regular Updates
We have set up a CRON job so that every day we will recompile the data, which is based on our Fresh index. It is POSSIBLE that on two consecutive days, the data is the same, if the Fresh Index did not update for some reason, but usually the data will change daily. Please do not try to download the data more than once every 24 hours, otherwise you will simply end up getting banned or we will have to reconsider giving the data away or putting it behind a walled garden.
Download Location
Be wary scrapers… this will do your head in if you weren’t expecting it… The Majestic Million CSV can regularly be downloaded here:
http://downloads.majesticseo.com/majestic_million.csv
Please – if you have made use of this data for a benevolent purpose, please mention it in the comments. We are dying to see what you use it for. We cannot promise to publish all uses and the CSV is of course provided without warranties or support unless you are on a paid plan for other API options.
Small print:
Majestic Million CSV by Majestic 12 is licensed under a Creative Commons Attribution 3.0 Unported License.
Permissions beyond the scope of this license may be available at https://www.majestic.com/support/contact-us.
- How Important will Backlinks be in 2023? - February 20, 2023
- What is in a Link? - October 25, 2022
- An Interview with… Ash Nallawalla - August 23, 2022
Nice news, Dixon!
I’m not completely sure I understand what the exact criteria are for saying these are the top million sites by subnet, i.e. determined by number of incoming links? I ask as many casual readers may assume that traffic (visitors or page views) is the determining criteria.
As a side note, LibreOffice’s Calc also supports 1 million rows, although I’ve not yet actually put this to the test. I suspect that neither it nor Excel would be happy with column headers + 1 million data lines :-).
October 1, 2012 at 10:34 amI think there is a blog post about why we chose subnets about a year back – but in short, when looking at which sites may be creating the most “waves” or “influence” in the world, raw link counts are not ideal, because sitewides can distort rankings. Similarly, we found some servers may have thousands of sites/domains linking to a site that are controlled by the same person or group, so some false positives there as well. By ranking based on IP Ranges, rather than others, the list looks more robust.
On the traffic front, we are measuring something different – and I think that whilst inevitably one would expect a correlation between traffic and this order, it is ultimately probably the outliers (in either direction) that would be of interest. I wonder who will be the first to highlight interesting aspect of outliers between our list and (say) Alexa’s list?
October 1, 2012 at 10:45 amThank you Dixon (and all else that made this available)!
I can’t wait to see how Excel on my laptop reacts to 79MB of data. 🙂
October 1, 2012 at 3:50 pmNice work, i will download it and wish you a nice day.
October 1, 2012 at 6:06 pmGreat product I got some good ideas from the Majestic Million.
October 2, 2012 at 8:29 amJust threw this in a pivot table. 11 TLD’s account for 80% of top 1 million websites. Pretty cool.
October 2, 2012 at 2:54 pmHeh cool. Excel can handle 1 million rows in a PIVOT table 🙂
October 3, 2012 at 12:18 amIs Majestic Million CSV the best kept secret among the analysts?
www.majesticseo.com/reports/site-explorer?q=http://blog.majesticseo.com/development/majestic-million-csv-daily/
www.majesticseo.com/reports/site-explorer?q=http://downloads.majesticseo.com/majestic_million.csv
December 4, 2012 at 7:19 pmYes, it may be!
December 4, 2012 at 9:19 pm