In 2012, the term “Big Data” was thrown around with as much abandon as “Social Media” was for the previous couple of years.

A one-size-fits-many-things moniker, “Big Data” (like its cousin “Social Media”) has been the subject of many a blog post, conference session and CMO’s sleepless night, as industries realize its essence is important, but are not quite sure what to do with it (a bit like they did (still do?) with “social media”).

Do a search for the phrase over the last month and you can see a bit of a backlash against startups using the term “Big” in front of their wares and all that information we know and love.

It’s always been “Big” in relative terms to whatever we’ve been mining; it’s just that now (thanks to the web) it’s getting bigger.


Photo on Flickr D Sharon Pruitt

An article in the New York Times offers:

“Big Data proponents point to the Internet for examples of triumphant data businesses, notably Google. But many of the Big Data techniques of math modeling, predictive algorithms and artificial intelligence software were first widely applied on Wall Street.”

Others are suggesting what we mean when we say it, is a smart use of data, and that marketers are just using it as a “pernicious” way to rebrand an old adage and start a new fad.

The NYT piece mentions a McKinsey report that concluded that soon the US will need 140-190 thousand workers with “deep analytical” expertise and 1.5 million “data-literate managers”.

It goes on to quote (and here’s the gem) Rachel Schutt, a senior statistician at Google Research, who defines a good data scientist as someone who’s not just savvy with math(s) and computer science, but:

“…someone who has a deep, wide-ranging curiosity, is innovative and is guided by experience as well as data.”

She must be talking about SEOs surely?! 😉

As seasoned professionals in the search industry, we know there’s more to a successful web strategy than a bunch of numbers (just listen to Jim Sterne talk about the importance of asking your customers how they “feel”) but it’s those numbers, coupled with curiosity about human behaviour, that keeps us pushing the envelope and wanting to do better.

At Majestic, we have a lot of numbers to talk about. A database with 4 trillion URLs is nothing to sniff at, but (as my wife keeps reminding me) it’s what you do with it that really counts.

So beyond SEO, we’re starting to think of other ways businesses can use all this gorgeous data. We’re talking to the finance industry about how they can pre-screen credit applications from ecommerce sites. We’re taking to social platforms about how they can use the data to improve their algorithms, and to advertising networks about how they use Majestic with pricing or fraud detection or targeting.

The more we think about it and get “curious”, the more “innovative” we think we could be.

So my question to you is: What DO you or what WOULD you use Majestic data for other than link-building and SEO?

We’re grateful for all the kudos we get for being agile enough in bringing new features, bells and whistles to the platform, but we want to do more and be better.

So we’re curious to know what would you do with a distributed crawler and 4,060,890,688,027 unique URLs?

Let us know in the comments below.

Oh and please keep it useful and “White Hat”!

To slightly misquote Rachel Schutt in that NYT article, “We only worship THAT machine.”


  • mohsin

    May be build a better search engine from scratch – learn from the mistakes of Google i.e. inability to catch spammy links and too much monetization of SERPs.

    Develop a way to build the relative importance of each URL over the other – by using metrics other than backlinking URLs – like: how old is every backlink – the older the link the more it is worthwhile.

    Similarly calculating mentions of URL can give good clue of its popularity.

    January 24, 2013 at 9:27 am
    • Nitmeare

      Yeah, a searchengine would be fine. But just not an other copy of Google or DuckduckGO i think this will not work.

      But a searchengine wich allows its users to alter search metrics and ranking so they can create their own search applications or Search Portals trough API.

      Majestic12 has been very succesfull in SEO because of the many partnerships with other companies. If Majestic is going to develop a searchengine they should not start their own search Portal. But offer this technologie trough API to partners who then usw Majestic technology to start their own search applications and portals.

      So everyone could register at Majesticsearch and then start their own search engine. (Price could be calculated by number of searches) so everyone could afford to create their own search site by using the Majestic Database. So Majestic could help many other people to start theit own search business 😉

      I think within months there will be hundrets of different searchengines availible wich use Majestic technology. This will lead to a broader marketshare than to start an own searchportal and more money for Majestic to earn with many creative partners 😉

      January 24, 2013 at 10:53 am
  • Mr. 1984

    Fulltext would be nice. So one could analyse how often specific words apear on the Internet and on wich locations.

    So as example on could track how often the Name of a Company is published on the Internet, and on wich locations to track its popularity in the net or in wich communitys (Forums, Blogs, Social Networks) it is more discussed than on others. Might be valuable Data for planning advertising campaigns and so on.

    January 24, 2013 at 11:22 am
  • John Lawson

    The previous posters seem to have missed the point or misread. SEOy and Searchy suggestions are not required. In this spirit of blatant disregard for your question I would like to say that there are tonnes, of other things, not only links that you could provide data on from pages. I know you know this, but wow if I were you …

    January 24, 2013 at 12:45 pm
    • Henry B.

      Using Majesticseo to anticipate elections and the succes of politic campaings. (

      This should also work to analyse succes of advertising campagins for websites. Or to monitor startups or websites of musicians to look wich one is becoming virulent on blogs and social networks. Might be a cool feature to find the next superstars or companys of tomorrow.

      There is no need to use fulltext, there are many nice things wich could be done with nothing else than backlinks !

      January 24, 2013 at 8:26 pm
  • R2D2

    Would be interesting to track economic developments.

    For example .gr Domains stand for Greece, they have and economy crisis and debt problems right now. Would it be possible to make a Ranking for all .gr domains and sites hosted in Greece to look if the economy development of a country has siginficant influence on its websites and their international ranking ?

    Perhaps it would even be possible to track economic networks. Which country gives the most backlinks on .gr domains. And is this country with the most backlinks really the most important economic partner ? If so such statistics based on Backlinks may serve as realtime indicator for recent developments in world trade.

    So if MajesticSEO is used to create rankings for Universitys (I read recently abouth that) than it might also be usable to make economic rankings on how strong the Interneteconomy in a specific nation ist and how good it is connected to other parts of the world. Developing methods to use such Data for economic analysis and statistics could be very interesting.

    January 24, 2013 at 3:11 pm
    • Henry B.

      > Is the toplevel domain a good indicator for a country ? Look at .to or Domains as example which are used by the millions by people wich have nothing to do with the country.

      GeoIP might be more precise. But, companies in countries with bad net infrastructure will host their stuff somewhere else but not within their country.

      January 24, 2013 at 8:37 pm
  • Chris Le

    This would be a dream!

    If *I* had Majestic’s URL database, 1) I would be calculating indexes (think S&P 500, DowJones) based on URL metrics and 2) Allow the same calculations to be run against my own portfolio of sites.

    The point would be to see how my portfolio of clients are doing at a macro scale against my competition and the industry as a whole.

    – Spammy index
    – Industry level indexes
    – Alexa500 index
    – Volatility index by industry

    The information here would be useful for:

    – Measuring the real impact of a Google change in my industry vs the entire Internet
    – Determining the risk of bringing on a new client
    – Calculating the actual work needed to obtain a specific client goal
    – Calculating the spread between my client and the competition.

    January 25, 2013 at 3:50 pm
  • Brian Smith

    I don’t think it’s possible to crawl your way through 4,060,890,688,027 unique URLs…you must be running…

    February 10, 2013 at 11:39 pm
    • Dixon Jones

      Actually – we do not need to crawl ALL of these to know they exist. But yes, we are running and learning to sprint.

      February 11, 2013 at 10:00 am

Comments are closed.