This week, Majestic have made some “under the bonnet” improvements to the way in which we record and return link data. The improvements will improve tracking of secure URLs (https), will change the way in which we show counts for sites that have both forms of URL, with and without, trailing slashes and – perhaps the most visible change, is the ability to see all the links within a specific directory, or all links to any set of URLs starting with the same URL syntax.
New Wildcard Option
Do you want to see all the links to a given directory on a site? Or all links to a page regardless of the variables added to the URL? We have now introduced this as an option within Site Explorer. So take the URL: http://uk.weather.com/weather/today-Bristol-UKXX0025:1:UK… Now you can see all the links that link to the weather directory OR you can see all the links that start with: http://uk.weather.com/weather/today EVEN THOUGH that URL as shown is a 404, you can still see all the links to pages that start with that syntax.
This will greatly help in analysis of many types or websites and potentially eCommerce sites where many URLs add variables to amend things like product colour or size.
Some affiliate programs may also be easier to analyze with this syntax.
To see the new wildcard option, we have simply added a radio box in the Site Explorer syntax options.
Show me an Example
Sure – a good example of how this is insightful is analysing www.nfl.com/teams/ at the wildcard level and then looking at the Pages tab. Now we can see the relative influence of each team in the league!
The list by default is in link count order – but put it into Excel (from the link at the bottom of the screenshot) and you can change that default easily enough. The New York Jets are WAY bigger than the Dallas Cowboys… (Excuse us while we put on our tin foil hats):
Is this a game changer?
This is something we developed because our customers and partners requested this functionality. It has any number of uses and we’ll be talking about a few of these in the coming weeks. If you find an interesting use for this functionality, why not talk about it on your own blog. I am sure our audience would love you to share!
So are there any OTHER New Announcements?
Yep – we made some other under the bonnet improvements – some of which were already live, but we didn’t get around to mentioning yet… for example…
Improved interpretation of “Trailing Slash” URLs
Many pages – can return two versions of a URL. An excellent example is Twitter:
Now these are the same content, but technically different URLs – although they did not HAVE to be different content, very few instances occur where the content is different. Most sites, (including modern WordPress installations), redirect to a single version of the URL, but when linking – or indeed when using Majestic – we can easily enter the other version inadvertently, only to be surprised to see small numbers.
We have now improved the system so that we can show you the combined number WITHOUT duplication. So whether you enter the URL WITH or WITHOUT the trailing slash, we will return a consistent number. But we have achieved this without simply merging the URLs into a single record, so if you are looking at Top Pages data, for example, you can still see how many people link to the trailing slash vs non slash versions of the URL. This also means advanced reports can continue to work as previously in giving you the detail you need to fully analyze sites.
Better Coverage for http and https analysis
The resolved trailing slash problem has also helped us to give better coverage for https URLs. Again, we have previously treated both variations as entirely different, but the data is often the same – so now you can still see the difference between secure and non-secure URLs as you drill down into the data; but at the top level, the assumption is that you are looking at the combined number. Over the last year or two, more and more websites are moving towards SSL and this creates significant disparity when analyzing data unless you are extremely aware of the differences. Previously on these sites, you would probably need to search both options and combine the totals to see all of our data for a site. Most users didn’t do this, of course. Our contemporaries don’t really handle this situation any better – so now we believe we have the best way of presenting the numbers moving forward.
How Does This Compare To The Competition?
The Wildcard is something that our users have requested for some time and is available immediately within the API. Oddly enough, one of our competitors currently defaults back to the URL* variant of a URL WHETHER YOU WANT IT OR NOT. This tends to inflate numbers by simply not returning what the user requested. We only give you the wildcard number if you specifically request it. The trailing slash issue is now much more in line with all the major engines. It is likely to increase numbers to some URLs, but will not affect overall link counts to a domain. The SSL/non-SSL enhancement has less uniformity amongst search engines and crawlers. Some cannot count SSL links – but Google seems to count http and https as similar and as such we have followed the leader in this respect.
Integrity of Data is Important to us
We are striving hard to continually improve the data that we give to you – whilst also being as transparent as possible about how the data is calculated and interpreted. Sometimes these changes seem subtle and not important to users – but I urge the more enlightened SEOs out there to help educate others as to the validity of data that can be found across the web. Far to often, users are given numbers, statistics or interpretations which simply do not withstand rigorous analysis.
Latest posts by Dixon Jones (see all)
- Crawling Smarter on the Infinite Web - March 16, 2017
- Biggest Link Index Hits New Highs with Historic Update - March 14, 2017
- A New Approach to Blogging. Expect a Better Standard. - February 7, 2017