As it is Christmas Day, Majestic SEO is releasing data on the top one million websites in a Creative Commons sharealike license, downloadable CSV file, allowing web users to create derived works and research  (subject to attribution). The files are available at the end of this post.

Majestic SEO launched Majestic Million on May the 19th, and it has caused ripples of interest from time to time, and has found a nice niche to power Buzz League Tables.

 

We have altered the algorithms behind Majestic Millions, generating the list on the Referring C-subnet count rather than the Domain Count. This has resulted in a shift of the top ten, with an increase in the number of well known domains in the Majestic Million.

Today though, we thought we would do something different. Majestic has had a long history of making our data publicly accessible, and we would like to think that it has bought us a certain amount of goodwill in the wider internet community. So we have a surprise gift for the internet analytic community ( and who knows – perhaps some statisticians also ) and are making a snapshot of the entire Majestic Million List available to download.

As a sanity check, we ran a couple of plots using the Statistical Computing package “R”:

A graph of referring C-subnet count against Majestic Million Rank:

 

Again, but just for the top 250:

And a Graph of the referring IP Address count against the C-subnet count:

We would love to hear about any conclusions you come to using the data – so what are you awaiting for – Downloadable in Excel or TXT below:

[ download Excel file here  NB: 1,000,000 records in an Excel file is 60 MB. You need a modern version of Excel. Save to Disk first]

[ download full file here This is the 25 MB .TXT file ZIPPED, Tab delimited and much smaller – but it is still a million lines of data!]

Creative Commons License
This Majestic Million Data is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

 

If you would like to use this data or re-release it, you should reference Majestic SEO as follows, providing a link to this blog post should the medium support it:

Backlink Data sourced from the MajesticSEO.com public release of Majestic Millions Dataset – generated on 22nd December 2011

Steve Pitchford
Latest posts by Steve Pitchford (see all)

Comments

  • Marco

    Hm i can’t seem to find the download link?!?

    December 25, 2011 at 9:03 am
  • Ana

    Thanks for sharing!

    The download links are missing. I’m looking forward to starting to play with the data 🙂

    December 26, 2011 at 12:01 pm
  • 512banque

    The download links are not working, aarrgh

    December 26, 2011 at 1:17 pm
  • Greg

    It would be a great gift if the download links actually worked. 🙂

    December 26, 2011 at 4:30 pm
  • Dixon

    We will get that fixed!

    December 26, 2011 at 6:37 pm
  • Dixon

    OK – All fixed now (we hope). I do apologise for not having these links working on Christmas morning! But it’s all there now 🙂

    Dixon.

    December 26, 2011 at 6:45 pm
  • Jan-Willem Bobbink

    Thanks for sharing the list!

    December 27, 2011 at 12:08 pm
  • aem

    i like it … thanks for sharing !

    it will be usefull !

    Sebastien

    December 30, 2011 at 8:51 pm
  • James Todd

    This is a really useful list – Loving the work from the Majestic SEO team!

    January 5, 2012 at 1:46 pm
  • Evert

    Hmm, where it says ‘This is the 9.7 MB .TXT file’ one actually gets a 23.3MB .ZIP file, containing a 276 MB textfile!

    January 10, 2012 at 1:46 pm
    • Dixon

      You are right. I have amended. Thanks.

      Obviously – giving away 1 million lines of data is not a small download. I managed to open the Excel verso it in the latest version of Excel on my PC though. Earlier versions of Excel will not cope with a million records.

      January 10, 2012 at 2:05 pm
  • Raymond Theakston

    Whoop! My own website is listed at 709159! 🙂

    January 10, 2012 at 2:48 pm
  • Dirk - seorie.net

    That is quality! Data to play with in 2012 – a good start in my mind. Thanks.

    January 10, 2012 at 3:16 pm
  • dmozFR

    dmoz.fr is going up ! we are on the middle of the list of all TLD and also on the good middle of the .fr list.
    I will put the Majestic Badge on our homepage.

    January 10, 2012 at 3:57 pm
    • Dixon

      That’s great! We’d love to see our Majestic Million badge up on a few more sites. I really should have mentioned it in the post! So if anyone else wants to advertise that they are on the list (and where) they can grab a Majestic Million badge here.

      January 10, 2012 at 4:17 pm
  • Dwight Zahringer

    Thanks Dixon and the team @ Majestic – great data to help with many, many things.

    January 10, 2012 at 4:03 pm
  • Rich Rankin

    Thanks for giving us access to all this data – going to make a cup of tea whilst Excel decides if it’s going to let me look at it!

    January 10, 2012 at 4:21 pm
  • BobbyJ

    Thanks for this data. I really need to start using ‘R’ as well. SPSS is just too expensive. Cheers!

    January 10, 2012 at 8:24 pm
    • Dixon

      Yes – I know we use “R” here… But as I also have a machine that can load a million records into Excel, I’ll stay there for a bit longer yet!

      January 10, 2012 at 8:32 pm
  • Petz

    Great gift – thanks for access!
    ^_^

    January 10, 2012 at 9:25 pm
  • Matt

    Thanks for the share!

    I thought example.com would be closer to the top.

    January 10, 2012 at 9:42 pm
  • Kane

    thanks for the data, I’m going to use this to start building page rank of my sites

    January 10, 2012 at 9:48 pm
  • Paul

    Thanks for the data!
    Looks like its UTF-16 encoded. I used
    $ iconv -f UTF-16 -t UTF-8 majesticmillion-20111222.txt > utf8.txt
    to work with it on a Mac Terminal.

    Cheers,
    Paul

    January 10, 2012 at 10:32 pm
  • livingseolife

    Great Gift! Thatks…Awesome playing with these data!

    Thanks MajesticSEO.

    January 10, 2012 at 10:38 pm
  • Tesch Online Marketing

    Thanks a lot for sharing such valuable data.
    I was a little surprised how well some of the sites are positioned using your metric.
    A great piece of date collection and analysis.

    January 10, 2012 at 10:53 pm
  • Simon

    Very interesting stuff… although I see two sites I have worked on in the past one mid 200’s one mid 300’s spot. The one at mid 200’s actually performed quite poorly in terms of non-brand natural search traffic compared to the mid-300’s site. Emphasises for me that its quality over quantity. Thanks for sharing!

    January 10, 2012 at 11:03 pm
  • AC

    You could significantly reduce the file size (141MB) by converting to UTF8 or ANSI. And further by removing the redundant dates and standardized links (41MB). Just publish a second file with the data or put it on the site, demonstrating the linking format. The file name is sufficient for determining the date.

    Nice to see something coming back to the community for all the pounding your bot has been giving everyone.

    January 11, 2012 at 2:09 am
    • Dixon

      Thanks. It’s a big world out there, so ANSI won’t cut it I’m afraid. We did think about also putting up a shorter list, but at the end of the day, we felt that the whole list was the best. I am sure others can slice and dice the data. You are welcome to do so and pass it on in other formats.

      January 11, 2012 at 7:39 am
  • Glenn Grifffin

    Thanks guys.
    It is all starting to come together.

    January 11, 2012 at 4:30 am
  • Drachsi

    Shame, my site is not on the list, I really want to put a badge on my site. At least everybody downloading must have the latest Excel.

    Drachsi

    January 11, 2012 at 5:16 am
  • Manjul Singh

    Great list, great work.

    January 11, 2012 at 5:26 am
  • Voiliers

    Thanks guys this is a goldmine – really good to be able to see how my (small) sites shape up to the big boys

    January 11, 2012 at 8:13 am
  • Renny

    Good going, that is a very big list and I find some opportunities over there.

    January 11, 2012 at 9:05 am
  • charly @ md marketing digital

    Hey thanks!!!
    Ill rush to have a look and drop comments if I manage to have them 🙂

    January 11, 2012 at 9:06 am
  • Andy @cruisesgalveston

    This is a massive list. I’m a market samurai pro users and regularly check competition using your sources. This list give me big ammunition or I can say weapon 🙂

    January 11, 2012 at 10:54 am
  • riple

    I hope this one will have a good ability as previous Yahoo site explorer

    January 11, 2012 at 11:48 am
    • Dixon

      Oh – Majestic left Yahoo Site Explorer for dust years ago – but you should use our web interface with a silver subscription to get the full “Site Explorer” experience. This giveaway is “just” a list of the top 1 million sites. Majestic’s Site Explorer is just scratching the surface of the full data you can can from Majestic.

      January 11, 2012 at 1:35 pm
  • CCTV.co.uk

    Thanks guys, already use MajesticSEO find it an awesome tool.

    January 11, 2012 at 12:13 pm
  • neil

    One of my sites is in the list,
    just made the Million Badge
    and posted it 🙂
    You just made my day Dixon.

    Thank you for the data, much appreciated.

    January 11, 2012 at 1:33 pm
  • Micca

    Thank you!! Micca is very grateful for your services as Costa del Sol’s number 1 Solution provider.

    January 11, 2012 at 2:21 pm
  • bellimbusto

    Great gift guys! Thanks for sharing this list!

    January 11, 2012 at 2:56 pm
  • Cozy Web

    I have used Majestic for about a year now including through Market Samurai.

    It looks like it’s time to upgrade to a paid subscription.

    Mark

    January 11, 2012 at 3:12 pm
  • Igor Mateski

    And there, just for a brief moment, I was so close to delete the email where you linked to this post. That would have been a big mistake, one that I, luckily, did not make.
    Thanks for sharing the info. Definitely will come in handy. In so much data I guess one can push and support any theory.I, being a Content Marketing evangelist, most definitely see some links of sites that are content-heavy and will definitely use them in my posts.
    (I see that whoever’s in charge of this blog takes good care of the discussion, so I won’t post any links)

    January 11, 2012 at 5:47 pm
  • Top Search

    Christmas again? Super – nice to see a few of our sites in the too – very flattering, a million is a lot – but not on the web.

    Nice work!

    January 11, 2012 at 7:42 pm
  • Patrick Page

    Thank you I hope to put this to good use.

    January 11, 2012 at 9:03 pm
  • Blue Jet

    Thanks guys already use Majestic through Market Samurai. Great tool

    January 11, 2012 at 11:25 pm
  • John Mauldin

    Thanks so much for the information. It is very usable.

    January 12, 2012 at 12:06 am
  • BESegal

    Hey guys. Thanks for making the data set available. I started working it up using a data package that can produce results similar to what you do with R.

    And if I find anything interesting am happy to share it either here, or perhaps in the link provided above where I post web analytic studies I do.

    Anyway, I’m thinking we might uncover some insights if we can add some dummy variables to the data set, such as content vertical. It’d be interesting to see if the rank curves look the same across search sites, social medial sites, digital news sites, etc. I believe that theoretically a subset of a power curve like you show above should generate a power curve too. Wonder if it’d work out that way by content vertical? Also curious to see if the tail of the curve has any common characteristics.

    Any thoughts on where I might be able find a data set like that so I can join it to yours? Rather than manually adding the field to 1 million records or some smaller part. 😉

    January 12, 2012 at 12:27 am
    • Dixon

      To find that, you’ll need a massive rank checking system. AuthorityLabs comes to mind. But you also need to select a keyword for each vertical, which would break up any structure to the logic wouldn’t it?

      January 12, 2012 at 8:05 am
  • Warner Robins Homes

    Lots of options with this data, to include a great backlink (starting point) source. Thanks.

    January 12, 2012 at 10:34 am
  • Helmuts

    Great resource. I just wonder how many websites of here listed ones can get such a ranking.

    Helmuts

    January 12, 2012 at 11:39 pm
  • David Victor

    Great post, usefull informations is here, thanks Mejstic Seo!

    January 13, 2012 at 12:01 am
  • krishnan

    Thanks for the data. Just wanted to let you know that the link in the mail does not open with some email providers. Like I was not able to open the link sent on hotmail and had to forward it to gmail to open it. I suggest having the full url in the mail rather that just the anchor text link.

    January 13, 2012 at 7:32 am
    • Dixon

      OK thanks. That is useful to know. I guess you also can’t click on the “click here to read the web version” link in the email either. I will put a full link to the web version in future newsletters.

      January 13, 2012 at 9:25 am
  • matka

    Wow, this is great! Just downloaded it and have to say thanks!

    January 13, 2012 at 8:47 am
  • honorabili

    Interesting stats.

    January 13, 2012 at 4:23 pm
  • Jonathan - FeelGoodTime

    Hi guys,

    This is amazing resource and great stats, thanks a lot. I will make sure I put a Majestic Badge on my site.
    Thanks a lot.

    January 13, 2012 at 4:25 pm
  • Pete

    Hope I find my site on it!

    January 19, 2012 at 2:58 pm
  • masarf

    Thank you I hope to put this to good use.

    January 21, 2012 at 1:08 am
  • sms reseller

    Does anyone know if we can have these filtered by country? Because a lot of the companies that I SEO optimize are local(state of missouri).

    January 21, 2012 at 4:34 pm
    • Dixon

      They are recorded and ranked by TLs. But you would need to do an IZp lookup for that.

      The problem is that some of the bigger web sites have multiple IP addresses, across several countries, potentially, to spread the load.

      January 22, 2012 at 4:37 pm

Comments are closed.