This is an advanced post about Topical Trust Flow showing how we calculate it and how to avoid misinterpreting the data. It goes into far more detail than previous posts and videos. Towards the end of the article I will reference some more basic interpretations of Topical Trust Flow and how to use it every day.
Above is a graphic I use frequently, but rarely refer to online. Majestic uses an iterative process to analyse links and collect our metrics. So if Page A links to Page B links to Page C links to Page D, then SOME of Page A’s influence will impact the score of Page D. Majestic only shows external links (from other sites) in the online link data, but most people are unaware that Majestic DOES use internal links to help calculate these Flow Metric scores (as well as to facilitate link discovery). Majestic cannot store every single internal link that we crawl, because our data centre would roughly need 800% more storage capacity, which is pointless as you can use Screaming Frog or URLProfiler to see internal links on any site anyway. Consequently, between the crawl and the Index build, the internal link data is dropped after the maths is done.
Returning to the picture above… the iterative process is important, because without it, a search engine could consider the page in orange to have a similar score to the page in green. As you look at the whole picture, it is clear that this is not likely, because 2 of the links to the page in orange are from orphaned pages. Visualise the link graph as you see it in the picture ratcheted up to a trillion URLs (and many more links than that) and you have a picture of what Majestic is endeavouring to map.
Interpreting the High Level Metrics (Trust Flow and Citation Flow)
Before examining Topical Flow metrics, it makes sense to first explain the difference between Citation Flow and Trust Flow. Citation Flow starts with a raw number for every URL we know, based on the number of referring domains to each URL. We then iterate the calculation. Majestic will not explain it fully, but if a page has two links out, some of the value passes through link A and some through link B. This is not exactly how it works, but explains the principle in a fairly simplistic manner. This ultimately results in a numerical score for every page on the Internet, which is then transformed into a metric between 0-100. This is not a flat line, because if it was, almost every page up to 99 would be pure spam. One way to visualise this is with Lego:
Image every 1X1 block of lego represents a site. Then imagine the pyramid above represents the entire universe of pages. The ones at the top have the highest Citation Flow (in the 90-100 range) and the ones on the bottom layer have a much lower Citation Flow (0-10 range). This diagram is for illustration purposes only, before someone points out that there are not enough layers… but here you have 4 pages at the top of pyramid and 32X32=1024 at the bottom of the pyramid!
It is hard to get to the top.
Trust Flow is the same calculation as Citation Flow, but we only START with domains that we know were created and curated by humans. I won’t go into detail about how these sites were initially selected, but there are hundreds of thousands and they have been peer reviewed. In the diagram at the top of the page, you can think of these as the ones in blue. Mathematically, in a “random” world of link building, Trust Flow and Citation Flow should start to converge at scale. In truth that is not always the case, because so many links are created by machines. This is why a rise in Citation Flow without a corresponding rise in Trust Flow may indicate a red flag when it comes to link quality.
Interpreting Topical Flow Metrics
Now we move to the more detailed “Topical Trust Flow” methodology. If you return to the above graphic, you can now consider that if the pages in blue are about “tennis”…. then the green page is much more likely to be about tennis than the page in orange. Because Majestic can identify SOME trustworthy pages about tennis, the algorithm can use the similar mathematics to the Trust Flow algorithm to propagate that information about how influential a page is about tennis through the links on the Internet. Again, this culminates in a score for every page on “how influential it is about tennis” which we then convert into scores between 0-100 to help humans interpret the data.
So now to some questions. This post originated because of a support ticket from Andreas at Cuponation.com who asked: Can the numbers be added together somehow to create the whole number? The answer is no. To visualise why, we should return to the Lego. This time let’s compare two different topics in Lego pyramid form:
There are a lot less pages about Tennis than there are about (say) computers. So the 0-100 scores use extremely different volumes in the core of the algorithms, you can only interpret the numbers about tennis with other pages about tennis. To help unravel this, we use the circle around the Trust Flow score on site explorer to approximate the percentage of trust relating to each category. A site like eBay, which has influence in many categories, looks like a rainbow, whilst a site like Moz.com has almost all its influence in Computers / Internet / Web Design and Development (SEO fits into this category, as there is no category at present for SEO). If you hover over the circle around the Trust Flow score on the left, you will see the 91.21% appear:
Topical Trust Flow scores are not percentages! Moz.com has a high Trust Flow of 65. Almost all of that Trust Flow is made up of influence in the Web Design and Development category, which is a subset of the Internet category which in itself is a subset of the Computers category. The fact that the Topical Trust Flow for that category is 64 (being close to 65) is only incidental. Mathematically you would need the calculations unravelled to find out a percentage contribution to the overall Trust Flow, which is what Majestic has approximated in the “hover” area above the primary Trust Flow metric.
Click on the graphic to see interactive Topics (Illustrative data for Twitter profiles only)
Andreas also asked in the same ticket: “You have a KPI called “SourceTopical_TrustFlow_Topic”, e.g. “reference/education” or “shopping/reference”. What does EACH of the possible categories really mean, and how did you define it?” He was finding these references in the export files that Majestic makes available. The word “source” means the start point of the link. A link of course needs two URLs – the one it links from and the URL it links to. Topical_TrustFlow-Topic is going to be one of about 800 topics that we are doing the maths on. Because there are quite a few, we put them in a hierarchy, so “Reference/education” will mean a top level category of Reference and a LOWER level category of education. It is by no means perfect, but it is infinitely better than not trying to categorise sites and will eventually (Majestic hopes) help us improve our own search engine. In the meantime, it is there for you to use. Although we do not publicly list the full topic list, you can pretty much see the list graphically from this “proof of concept” idea that one of our interns built a few years ago (embedded above).
The interactive graphic really shows just how many more “Lego blocks” can be involved in calculating one Topical Trust Flow score compared to another.
A Real Brain-Ache to Understand
I hope this has helped to peel back some of the mysteries surrounding Topical Trust Flow in particular, but also our main two metrics of Trust Flow and Citation Flow. But there is one area of Site Explorer where even I have trouble interpreting the data. It does my head in and makes for extremely difficult support tickets. If anyone can find a better way to explain it, I welcome suggested text in the comment. The Brain ache is in the Topics tab of Site Explorer. Here is an example:
It can be really hard to comprehend why NASA’s highest Topical Trust Flow in the Topics tab is Science / Astronomy with a score of 78, but in the summary tab (inset) this is only the SECOND strongest topic with a score of 85 behind Science / Technology at 91. It’s mind-blowing at times. Why does this happen? There are several factors here, again based around the understanding that this is a comparison of apples and oranges. The Inset (85) score represents the strength of the NASA website in the topic of Science/Astronomy. The Topics tab (78) shows the cumulative strength of the web pages LINKING TO NASA in the topic of Science/Astronomy. Because NASA has more authority than most of the sites that link to it, the content on its own website, and the way the Internal links propagate information of its own website can significantly change the scores. This effect is in addition to the Lego brick size issue.
One way to interpret this is to understand that links have a start URL and an end URL. If a page about Tennis links to a page about Astronomy, this does not reduce the target page’s influence in astronomy, but does increase the page’s overall Trust Flow. It is a stronger page as a result, so any links out will not have the same context or value as the links in.
Mind blown? Then you can now consider the three columns on the right. Why does column 1 say 2.4 Million, column 2 says 27.7 Million and Column three show just 14.5 Thousand? This is because there are 2.4 Million pages linking to NASA that in themselves have some Topical Trust Flow with the Astronomy category. However – if you look at domain level, there are 27 Million domains that link to NASA which have SOME Astronomy Trust Flow, even though not all of the individual links have Astronomy Trust Flow. So what is column 3? This is a subset of column 1… it is the number of domains that make up column 1’s scores.
- Flow metrics are not about “ranking visibility” or even about “link counts” to a page. They are far more fundamental and result from an iterative algorithm.
- Topical Flow Metrics help to identify how influential pages or sites are for that topic. A page can be influential about many topics, but the 0-100 scores compare to other pages or sites on the internet about that topic. They do not directly compare to the scores for other topics on the same page.
- I am sure you will agree, the maths is complicated. Currently it takes about 24 hours to calculate these scores. We can bring you raw link data much faster, but our customers are paying for insight.
- The Topical Trust Flow of a link is a misnomer. It is URLs (or sites) that have Flow Metrics, so a link must connect two different points with two different sets of Flow Metrics.
This Lego brick visualisation of flow metrics also helps to understand another confusion, about why you cannot compare a subdomain’s Flow Metrics with a Root domain or page level Flow Metric score. The building blocks are entirely different. There is frequent criticism that Majestic shows so many variations for Flow Metrics. But they are all valid and useful as long as you always compare like with like.
Other takes on Flow Metrics:
If you would prefer to hear some of our Brand Ambassadors speak about all things Flow Metric then take a look at the 3 videos linked below –
Latest posts by Dixon Jones (see all)
- Outbound Links and Language Data lands in the Historic Index. - November 22, 2017
- New Functionality: Outbound Links and Language Upgrade - October 23, 2017
- Majestic and SEMRush Combine Forces - October 11, 2017