I was asked in the comments of a recent post whether it we could get Google’s PageRank ® for any given URL. Pulling Google’s number would probably be against their terms of service, but we would not want to do this anyway. Since we crawl the web independently of Google, we have our own data and arguably we could do the Google PageRank algorithm on our data and work out the Page Rank for every URL.

We still would not do this. Even though mathematical formulas are legally interpreted as matters of fact rather than intellectual property in the UK, we look to be legitimate on a world stage and in any event believe we have done better.

PageRank(R) is a very **old** mathematical formula. It is protected in some countries as a Patent owned by Stanford university. (In the UK you cannot patent a mathematical fact… but we still respect the patent.) In the days of “big data”, combined with large computer processing power, newer metrics make for better understanding of the Internet. Flow metrics(R) are much more modern and we conveniently haven’t trespassed on any prior art.

## 5 Benefits of Flow Metrics over Page Rank

- Flow metrics are ten times more granular (being from 0-100 instead of 1-10). 100 times more granular when you take two metrics together
- PageRank does not “FLOW” information beyond a single link, whilst Flow metrics use a decay algorithm over many iterations of links.**
- We can represent different “quality” concepts by using different data sources. For example, Citation Flow is different to Trust Flow.
- Page Rank rarely updates. Flow metrics are recalculated on every full index update (daily in Fresh Index)
- Page Rank operates at a URL level only, Flow metrics also wrap up to sub-domain and top level domains

** [EDIT] A reader has correctly pointed out that The PageRank Matrix does indeed effectively “flow” data through iterations, so to maintain five solid reasons, I offer this in position 2: “Backlink data is returned with new metrics already in place instantly – no need to query PR servers, What’s more backlinks returned are already sorted by our metrics as part of our data.”[/EDIT]

n actual fact, we did see that Citation flow had over a 0.8 correlation wit PageRank when we checked a basket of sites – but recreating an old metric is not our objective. We have something new. Something stronger and something that is rightly ours to be able to share with you.

If you want more details on how we calculate Flow Metrics you can see the original announcement.

October 25, 2012 at 2:16 pmSean GolliherHi

1. Not sure how you can claim to be “more granular” than page rank. The number 0-10 is a normalized value after the Google matrix has been calculated. Resolution (accuracy) is determined by number of iterations. The issues are convergence and tolerance. This is all done my matrix calculations.

2. You haven’t escaped any real issues with your “flow metrics”. The reason page rank is a difficult problem is because before you start the calculation you have to know the page ranks of the pages that point to them beforehand. You don’t know the values of page rank beforehand. You have to initialize and you also run into convergence issues. The other main issues is with how large the matrix gets.

3. In claim number 2 you say “PageRank does not “FLOW” information beyond a single link”. When, in fact, this is exactly what the Google matrix does. It is a very large matrix that calculates how page rank should flow to all nodes in the set.

Google’s page rank values, that they calculate internally, are very valuable and still one of the most important calculations they are making to determine how reputable a page is initially. They make these calculations continuously and the original formula can have all kinds of modifications. The original formula is not that “old” as far as formulas go and it has surely been modified and there are lots of variations. The fundamental problem is the same.

You must have a similar matrix format to calculate your values but I don’t see any forumlas published. This is a big claim to make since there are lots of papers published about calculating the reputation of a page. Where are the formulas published?

Your indexing and backlink information is impressive and useful. I just disagree with most of the claims being made here. If the data correlates well with google page ranks (which I am not sure how you calculated that data either) then it may be useful enough as a research tool.

October 26, 2012 at 5:59 pmDixonHi Sean,

Thanks for the contribution.

I’ll accept point 3 and have amended the post to reflect the very fair point. In fact it was our previous metric – AC Rank – did not flow, which was one of the main reasons that we embarked on the Flow Metrics project.

My post is aimed at marketers and I expect you read my post as a mathematician. From a marketer’s perspective, my points remain valid, but I believe I can defend the mathematician’s argument at least to the point of a draw:

On 1. We of course also calculate with many iterations and along the way with way more than 2 decimal places. My point is that Google only ever gave a value between 0 and 10 to us mere mortals. Some clever people found a way to find another point for a time I think, whether through scraping or an added calculation, but for most it was always 0-10, so of course (1-100) X (1-100) is more granular information for marketers.

On 2. We again do not use the Page Rank algorithm, but we do start with significant data sets with both Citation Flow and Trust Flow. Bear in mind that we have seen nearly 4 trillion urls over the last 5-6 years and see 3 billion every day. With Citation Flow we started with a universal data set because we started with our AC Rank metric (Citation Flow’s predecessor). With Trust Flow we were able to get a very large curated set of data to act as an intial base for the algorithm. So given a sizeable sample, we do indeed get enough information to overcome the initial data hurdle.

In the UK – mathematical facts cannot be protected, so we won’t publish the full maths that we are using for commercial reasons. It was a six month headache for us that involved technological as well as computational challenges. However – anyone is free to look at our metrics at reasonable scale via our bulk backlink checker tool which lets you analyze up to 400 urls in a second or so via the API. (Unfortunately, checking 400 urls for PageRank will take somewhat longer and is not a tool that we can build for the user.)

I am a long way from pretending we are cleverer than anyone.. although I am immenesely proud to be associated with the small team that created metrics at this scale update daily. I am simply saying that collecting our Flow Metrics as a way to evaluate a URL value has several advantages over collecting PageRank for the same purpose.

Here’s a 6th reason for free

6: Collecting PageRank would seem to violate Google’s terms of service, as Google no longer volunteers this in normal use.

Thanks again for the comments. I hope you find Flow Metrics useful.

October 27, 2012 at 7:44 amAngelaBased on how your Flow Metrics works, is there an optimum balance between Trust Flow and Citation Flow for inbound links? If there’s a high Citation Flow, but low Trust Flow is that bad? Conversely, if there’s a high Trust Flow but low Citation Flow is that bad?

Trying to get a better handle on how to interpret the data.

Thanks.

November 14, 2012 at 8:57 pmDixon JonesHi Angela. Interesting question. Different sites and verticals can legitimately end up with very different profiles – but here’s what I “feel”. If all links were genuinely natural, then Trust Flow and Citation Flow would theoretically converge, because they are essentially the same maths, using two different starting points. So when I see a large disparity between these numbers, I start to ask myself “why”? If the site has only a few backlinks, then it is less likely for the numbers to come together, but at scale, the normally should.

