As it is Christmas Day, Majestic SEO is releasing data on the top one million websites in a Creative Commons sharealike license, downloadable CSV file, allowing web users to create derived works and research (subject to attribution). The files are available at the end of this post.
Majestic SEO launched Majestic Million on May the 19th, and it has caused ripples of interest from time to time, and has found a nice niche to power Buzz League Tables.
We have altered the algorithms behind Majestic Millions, generating the list on the Referring C-subnet count rather than the Domain Count. This has resulted in a shift of the top ten, with an increase in the number of well known domains in the Majestic Million.
Today though, we thought we would do something different. Majestic has had a long history of making our data publicly accessible, and we would like to think that it has bought us a certain amount of goodwill in the wider internet community. So we have a surprise gift for the internet analytic community ( and who knows – perhaps some statisticians also ) and are making a snapshot of the entire Majestic Million List available to download.
As a sanity check, we ran a couple of plots using the Statistical Computing package “R”:
A graph of referring C-subnet count against Majestic Million Rank:
Again, but just for the top 250:
And a Graph of the referring IP Address count against the C-subnet count:
We would love to hear about any conclusions you come to using the data – so what are you awaiting for – Downloadable in Excel or TXT below:
[ download Excel file here NB: 1,000,000 records in an Excel file is 60 MB. You need a modern version of Excel. Save to Disk first]
[ download full file here This is the 25 MB .TXT file ZIPPED, Tab delimited and much smaller – but it is still a million lines of data!]
This Majestic Million Data is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
If you would like to use this data or re-release it, you should reference Majestic SEO as follows, providing a link to this blog post should the medium support it:
Backlink Data sourced from the MajesticSEO.com public release of Majestic Millions Dataset – generated on 22nd December 2011
- Introducing Duplicate Link Detection - August 27, 2021
- Python – A practical introduction - February 25, 2020
- Get a list of pages on your site with links from other sites. - February 7, 2020
Hm i can’t seem to find the download link?!?
December 25, 2011 at 9:03 amThanks for sharing!
The download links are missing. I’m looking forward to starting to play with the data 🙂
December 26, 2011 at 12:01 pmThe download links are not working, aarrgh
December 26, 2011 at 1:17 pmIt would be a great gift if the download links actually worked. 🙂
December 26, 2011 at 4:30 pmWe will get that fixed!
December 26, 2011 at 6:37 pmOK – All fixed now (we hope). I do apologise for not having these links working on Christmas morning! But it’s all there now 🙂
Dixon.
December 26, 2011 at 6:45 pmThanks for sharing the list!
December 27, 2011 at 12:08 pmi like it … thanks for sharing !
it will be usefull !
Sebastien
December 30, 2011 at 8:51 pmThis is a really useful list – Loving the work from the Majestic SEO team!
January 5, 2012 at 1:46 pmHmm, where it says ‘This is the 9.7 MB .TXT file’ one actually gets a 23.3MB .ZIP file, containing a 276 MB textfile!
January 10, 2012 at 1:46 pmYou are right. I have amended. Thanks.
Obviously – giving away 1 million lines of data is not a small download. I managed to open the Excel verso it in the latest version of Excel on my PC though. Earlier versions of Excel will not cope with a million records.
January 10, 2012 at 2:05 pmWhoop! My own website is listed at 709159! 🙂
January 10, 2012 at 2:48 pmHeh. OK – we won’t take away your comment link, just in case.
January 10, 2012 at 2:54 pmThat is quality! Data to play with in 2012 – a good start in my mind. Thanks.
January 10, 2012 at 3:16 pmdmoz.fr is going up ! we are on the middle of the list of all TLD and also on the good middle of the .fr list.
January 10, 2012 at 3:57 pmI will put the Majestic Badge on our homepage.
That’s great! We’d love to see our Majestic Million badge up on a few more sites. I really should have mentioned it in the post! So if anyone else wants to advertise that they are on the list (and where) they can grab a Majestic Million badge here.
January 10, 2012 at 4:17 pmThanks Dixon and the team @ Majestic – great data to help with many, many things.
January 10, 2012 at 4:03 pmThanks for giving us access to all this data – going to make a cup of tea whilst Excel decides if it’s going to let me look at it!
January 10, 2012 at 4:21 pmThanks for this data. I really need to start using ‘R’ as well. SPSS is just too expensive. Cheers!
January 10, 2012 at 8:24 pmYes – I know we use “R” here… But as I also have a machine that can load a million records into Excel, I’ll stay there for a bit longer yet!
January 10, 2012 at 8:32 pmGreat gift – thanks for access!
January 10, 2012 at 9:25 pm^_^
Thanks for the share!
I thought example.com would be closer to the top.
January 10, 2012 at 9:42 pmthanks for the data, I’m going to use this to start building page rank of my sites
January 10, 2012 at 9:48 pmThanks for the data!
Looks like its UTF-16 encoded. I used
$ iconv -f UTF-16 -t UTF-8 majesticmillion-20111222.txt > utf8.txt
to work with it on a Mac Terminal.
Cheers,
January 10, 2012 at 10:32 pmPaul
Great Gift! Thatks…Awesome playing with these data!
Thanks MajesticSEO.
January 10, 2012 at 10:38 pmThanks a lot for sharing such valuable data.
January 10, 2012 at 10:53 pmI was a little surprised how well some of the sites are positioned using your metric.
A great piece of date collection and analysis.
Very interesting stuff… although I see two sites I have worked on in the past one mid 200’s one mid 300’s spot. The one at mid 200’s actually performed quite poorly in terms of non-brand natural search traffic compared to the mid-300’s site. Emphasises for me that its quality over quantity. Thanks for sharing!
January 10, 2012 at 11:03 pmit’s gift.thanks
January 11, 2012 at 12:33 amYou could significantly reduce the file size (141MB) by converting to UTF8 or ANSI. And further by removing the redundant dates and standardized links (41MB). Just publish a second file with the data or put it on the site, demonstrating the linking format. The file name is sufficient for determining the date.
Nice to see something coming back to the community for all the pounding your bot has been giving everyone.
January 11, 2012 at 2:09 amThanks. It’s a big world out there, so ANSI won’t cut it I’m afraid. We did think about also putting up a shorter list, but at the end of the day, we felt that the whole list was the best. I am sure others can slice and dice the data. You are welcome to do so and pass it on in other formats.
January 11, 2012 at 7:39 amThanks guys.
January 11, 2012 at 4:30 amIt is all starting to come together.
Shame, my site is not on the list, I really want to put a badge on my site. At least everybody downloading must have the latest Excel.
Drachsi
January 11, 2012 at 5:16 amGreat list, great work.
January 11, 2012 at 5:26 amThanks guys this is a goldmine – really good to be able to see how my (small) sites shape up to the big boys
January 11, 2012 at 8:13 amGood going, that is a very big list and I find some opportunities over there.
January 11, 2012 at 9:05 amThanks. If you think the top Million as a giveaway is big, you should see the computers that handles the rest of it! This is part of it.
January 11, 2012 at 9:16 amHey thanks!!!
January 11, 2012 at 9:06 amIll rush to have a look and drop comments if I manage to have them 🙂
This is a massive list. I’m a market samurai pro users and regularly check competition using your sources. This list give me big ammunition or I can say weapon 🙂
January 11, 2012 at 10:54 amI like the guys at Market Samurai so much, I linked your comment 🙂
January 11, 2012 at 10:58 amI hope this one will have a good ability as previous Yahoo site explorer
January 11, 2012 at 11:48 amOh – Majestic left Yahoo Site Explorer for dust years ago – but you should use our web interface with a silver subscription to get the full “Site Explorer” experience. This giveaway is “just” a list of the top 1 million sites. Majestic’s Site Explorer is just scratching the surface of the full data you can can from Majestic.
January 11, 2012 at 1:35 pmThanks guys, already use MajesticSEO find it an awesome tool.
January 11, 2012 at 12:13 pmOne of my sites is in the list,
just made the Million Badge
and posted it 🙂
You just made my day Dixon.
Thank you for the data, much appreciated.
January 11, 2012 at 1:33 pmThank you!! Micca is very grateful for your services as Costa del Sol’s number 1 Solution provider.
January 11, 2012 at 2:21 pmGreat gift guys! Thanks for sharing this list!
January 11, 2012 at 2:56 pmI have used Majestic for about a year now including through Market Samurai.
It looks like it’s time to upgrade to a paid subscription.
Mark
January 11, 2012 at 3:12 pmAnd there, just for a brief moment, I was so close to delete the email where you linked to this post. That would have been a big mistake, one that I, luckily, did not make.
January 11, 2012 at 5:47 pmThanks for sharing the info. Definitely will come in handy. In so much data I guess one can push and support any theory.I, being a Content Marketing evangelist, most definitely see some links of sites that are content-heavy and will definitely use them in my posts.
(I see that whoever’s in charge of this blog takes good care of the discussion, so I won’t post any links)
Christmas again? Super – nice to see a few of our sites in the too – very flattering, a million is a lot – but not on the web.
Nice work!
January 11, 2012 at 7:42 pmThank you I hope to put this to good use.
January 11, 2012 at 9:03 pmThanks guys already use Majestic through Market Samurai. Great tool
January 11, 2012 at 11:25 pmThanks so much for the information. It is very usable.
January 12, 2012 at 12:06 amHey guys. Thanks for making the data set available. I started working it up using a data package that can produce results similar to what you do with R.
And if I find anything interesting am happy to share it either here, or perhaps in the link provided above where I post web analytic studies I do.
Anyway, I’m thinking we might uncover some insights if we can add some dummy variables to the data set, such as content vertical. It’d be interesting to see if the rank curves look the same across search sites, social medial sites, digital news sites, etc. I believe that theoretically a subset of a power curve like you show above should generate a power curve too. Wonder if it’d work out that way by content vertical? Also curious to see if the tail of the curve has any common characteristics.
Any thoughts on where I might be able find a data set like that so I can join it to yours? Rather than manually adding the field to 1 million records or some smaller part. 😉
January 12, 2012 at 12:27 amTo find that, you’ll need a massive rank checking system. AuthorityLabs comes to mind. But you also need to select a keyword for each vertical, which would break up any structure to the logic wouldn’t it?
January 12, 2012 at 8:05 amLots of options with this data, to include a great backlink (starting point) source. Thanks.
January 12, 2012 at 10:34 amGreat resource. I just wonder how many websites of here listed ones can get such a ranking.
Helmuts
January 12, 2012 at 11:39 pmGreat post, usefull informations is here, thanks Mejstic Seo!
January 13, 2012 at 12:01 amThanks for the data. Just wanted to let you know that the link in the mail does not open with some email providers. Like I was not able to open the link sent on hotmail and had to forward it to gmail to open it. I suggest having the full url in the mail rather that just the anchor text link.
January 13, 2012 at 7:32 amOK thanks. That is useful to know. I guess you also can’t click on the “click here to read the web version” link in the email either. I will put a full link to the web version in future newsletters.
January 13, 2012 at 9:25 amWow, this is great! Just downloaded it and have to say thanks!
January 13, 2012 at 8:47 amInteresting stats.
January 13, 2012 at 4:23 pmHi guys,
This is amazing resource and great stats, thanks a lot. I will make sure I put a Majestic Badge on my site.
January 13, 2012 at 4:25 pmThanks a lot.
Hope I find my site on it!
January 19, 2012 at 2:58 pmThank you I hope to put this to good use.
January 21, 2012 at 1:08 amDoes anyone know if we can have these filtered by country? Because a lot of the companies that I SEO optimize are local(state of missouri).
January 21, 2012 at 4:34 pmThey are recorded and ranked by TLs. But you would need to do an IZp lookup for that.
The problem is that some of the bigger web sites have multiple IP addresses, across several countries, potentially, to spread the load.
January 22, 2012 at 4:37 pm