Website Crawling for SEO (Webinar)

How often should you be crawling your website and what are the key issues that you should be looking to identify with a site crawler?

Website crawling is the topic that we’re covering on episode 22 of OLD GUARD versus NEW BLOOD, hosted by Dixon Jones, with Mark Thomas from Botify, Izabela Wisniewska from Creatos Media, and Richard Lawther from Screaming Frog.

Watch on-demand

Listen to the Podcast

Transcript

Dixon Jones

Hello, everyone, and welcome to Old Guard vs New Blood Episode 22. This time we’re talking about the SEO crawlers, search crawlers, and I’ve got another great panel. I’ve got Izzy, Richard, and Mark here. Guys, welcome to the show. Thank you very much for coming on, I really appreciate it. Why don’t we start with you, Izzy? Tell us about yourself. Who are you and where do you come from?

Izabela Wisniewska

Hi. Yes, my name is Izzy. I am managing director of a small digital marketing agency called Creatos Media. We’re based near Birmingham. I have been in the industry for eight years now. Apart of SEO, I’m also a fitness instructor and I love martial arts.

Dixon Jones

Cool, excellent. I’m obviously incredibly fit as you can tell by the double chin and stuff like that. No, I wish, so well done, and thank you very much for coming on the show, Izzy. Richard, tell us about yourself. Where are you? Where do you come from?

Richard Lawther

Yeah, thanks for having me. My name is Richard Lawther. I’m the senior technical manager at Screaming Frog. Been in the industry about six-and-a-half years now, which for me, feels like I’m one of the old guard, but very much the new blood, I think. Split across agency and software, so I do a lot of web crawling basically.

Dixon Jones

Yeah, we’re calling you new blood in this crowd because it’s me and Mark you’re up against, and Mark I’ve known for a long, long time. Mark, tell us more about who you are and where do you come from?

Mark Thomas

Great. Thanks, Dixon. Yeah, I’m Mark Thomas. I have been knocking around a while now, so old guard I’ll take as a compliment, I think, but my role currently is the vice president of customer experience. Basically, managing all customers for the European region for Botify, which is obviously a web analytics software company focused on SEO, and a large part of that product is a crawler. I’ve worked in one or two different crawling businesses in the last 10 years, and I’m very passionate about the industry, so looking forward to a good discussion this evening.

Dixon Jones

That’s brilliant, and thanks very much all of you for coming along, and I think we’ve got a good balance of new blood versus old school, old blood… old guard versus new blood. Obviously, the whole event’s sponsored by Majestic who are themselves a bloody great big crawler, although albeit somewhat specific, I guess around links. Thanks to Majestic for sponsoring the event. We really, really appreciate it, and just before we get underway, can I just introduce David and make sure that I’ve not missed anything important. David, how are you?

David Bain

I’m very good, thanks. No, I just want to share something, actually, towards the end of the episode in terms of what we’re going to be doing next month. But in terms of if you’re listening at the moment on Apple Podcast, Spotify, yes, we are a podcast as well if you’re watching live, sign up to watch live next time. Majestic.com/webinars is where to go to make sure you’re here for the next live audience. If you’re watching live, try and add comments, interact a bit, ask some questions, and we’ll try and involve you as part of the show.

Dixon Jones

That’s great. Thanks a lot, guys. Okay, so 45 minutes on crawlers starting now, but before we just jump into my questions, Mark, maybe I’ll start with you. If people haven’t got time to stay around for 45 minutes, what one tip would you say people should think about when they’re doing crawls on websites?

Mark Thomas

Well, that depends, of course, Dixon. Look, I think there’s usually two different types of website that we would deal with. One is the smaller, so fewer than 100,000 pages, and I think if you’re dealing with that size of website, then the more frequently you can be crawling, the better. I mean that goes for any website, but obviously, when you get into millions of URLs, it’s more difficult to be checking it as frequently as you can.

But I think it’s extremely clear from all of the different Google representatives that the technical quality of your website and the performance of it is just as important as the content that’s going into it as well. Crawlers will help you with both sets, but certainly, checking the technical health of your site is extremely important. Crawling as frequently as you can is right up there.

I think for a larger website, it’s about trying to get a really significant view of the performance and the detail of how the site’s structured so that you can convince stakeholders that you have an authoritative view of exactly how the site’s information architecture is designed. Sometimes we try and use samples or smaller amounts, but I think it is really important to show just a very comprehensive view to any developer if you’re hoping to get change implemented, which is the number one challenge, I think, for most of us. So, yeah, does that help?

Dixon Jones

Yeah. No, it’s more than one, but we’ll get through it. But thank you. Good thoughts. Good place to start the show. Richard, what about you? One takeaway. One thing that they can’t miss in the next few minutes.

Richard Lawther

Yeah, so I think my one would be understanding the differences between a web crawler like Screaming Frog or Botify and a search engine like Google and how it goes about its crawling process. I get to see quite a lot of exposure to support emails that get sent in, and most of the time, the number one question we get asked is why has our software not found a particular page or a particular set of pages, but they’re clearly indexed in Google? “Why is that? What’s going wrong with your software?”

This one is a very straightforward answer. The page either isn’t being linked to or you’re maybe linking to it in a nonstandard manner with JavaScript linking and things like that. Whereas Google is obviously… they’re much more sophisticated in the way they go about crawling. They’re obviously crawling millions of URLs a day. They’re crawling across domains from one domain to another, so they’re going to pick up pages and discover pages from various different sources. If you’re using a web crawler like Screaming Frog and you’re trying to find out why it’s not picking up a certain page and you look at it like that, the question is then it’s not necessarily worse because it’s not found those pages. It’s that Google’s doing it in a different approach, and you can use that to your advantage if it’s not being properly linked to. Their signals aren’t being traded to that page. They’re not being passed across properly, and then you can go and optimize your site from there.

Dixon Jones

That’s a good tip. We’ll come back into that, I think, a little bit later, especially something like JavaScript rendering and stuff because I think that’s an interesting aspect. Izzy, you’re looking at things from the user side as opposed to the technology side a little bit. What tip have you got for people?

Izabela Wisniewska

Yeah, so I’ve got sort of two in one. First thing, and definitely use the crawlers because they’re amazing, but please do remember to analyze your results because it sounds like it’s a waste, but I’ve seen so many times that we just rely so much on the technology, and as amazing as the crawlers are, they’re just tools that we do have to analyze the results and prioritize the tasks and the hints they give us.

Coming out of that, try to go to your development team or the web management team with solutions rather than just a list of issues like, “Here is everything that is wrong. Please, go do something with it.” If you can, give them the solutions like, “This is what we should be doing to make it better.” Trust me, the conversation is a lot easier if you do it this way.

Dixon Jones

That’s a great tip, yeah. Approach developers with solutions, not problems is a fundamental one that I think we should all take on board as SEOs because we’re very good at dumping problems on a web developer’s desk without a solution, so I love that one. Let me follow on from that, Izzy. Last time you crawled a site with whatever, however you did it really, what was it that you were looking for on that particular occasion? Not generally, but as long as it’s not confidential. You don’t have to give anyway secrets for your customers at all, but what were you looking at for last time you did a crawl?

Izabela Wisniewska

The very, very last time I’ve been just looking at technical performance overall because one of the clients has just been declining in the rankings a little bit and we’ve been trying to find if there are any technical issues. But more generally in the last sort of time, we’ve been looking into core web vitals quite a lot, and yeah. So, that’s like a… And speed being very, very important.

Dixon Jones

I guess over recent months a lot of people will be looking at web core vitals because Google made such a big thing of it, so yeah. Richard and Mark, I appreciate you guys are more about the tool rather than using the tool, and I know from my Majestic days that sometimes I was guilty of not actually using my own technology as much as I should, just talking about it a lot. But last time you crawled a site, Richard, what were you looking for?

Richard Lawther

Yeah, so I think the last time was probably analyzing the site structure and the site architecture. In terms of the internal linking and how the site’s built up, is that structurally sound? Are all the imported category pages that we were looking at properly linked to and close to the home page as they could be or were there outliers that are linked quite deep in the site when they shouldn’t be?

And the same as I was saying earlier, if there are any unlinked pages, are they being properly linked? Are they being properly crawled? Is all that page rank and all those signals being passed evenly throughout the site as you want it to?

Dixon Jones

Excellent. Excellent. Mark, I mean I don’t know if you remember the last time you did a crawl yourself, but if you can remember the last thing you were looking for last, then great. But also, then move on to what are the general things that people are looking for when they… What are the most common things that people are getting out of using a crawler if you may?

Mark Thomas

Scandalous, Dixon, that you may suggest I haven’t been crawling personally. I know what you mean though, yeah.

Dixon Jones

You’re in such a large organization that you have to spend all your time on this big, big sort of technology. That’s what I was going for.

Mark Thomas

I think, yeah, I mean, I guess for all of us I always think I didn’t coin the phrase, but crawling is about discovery, and I don’t think I’ll ever lose… This will sound very sad to anybody that’s not in our industry, but I don’t think you’ll ever lose that enthusiasm to start a crawl and then to discover things you didn’t know, and whether that’s how large a website is, the structures that Richard’s talking about or whether it’s a specific need you’re looking for.

It’s always extremely fascinating and I can always remember… Actually, I had a friend that I really wanted to join our business at one point, and I said to him, “Look, just come and listen to a call I’ll make to show a client a first crawl of their website.” We’d worked together previously, and he was just staggered by the prospective customer’s reaction back in those days about what we were able to start to show them. You’d be surprised about how little people do know about their site architecture.

So, it begins with just trying to start to build this rich picture of exactly what’s going on. To not veer from your question too much, I think the last time… I mean the crawls and the discussions we have vary significantly. There is, as we know, thousands of factors. I just left a call with a big group where they’re all comparing status code distributions and how they’re handling different issues at scale across multiple properties. This could be something that we’re looking at.

I’d say that probably it’s maybe a leaning of Botify, but we are very much looking at the crawl budget distribution. Similar to what Richard’s saying, it’s a question of, “Well, how much of my itinerary am I getting crawled at a really quick rate? Is Google seeing as much as possible?” And so, normally, the crawls are pretty broad. You want to go as broad as possible to find every link, so you haven’t got that kind of, “Well, maybe it wasn’t linked to, but we never went deep enough.”

You’re trying to get that really holistic view of the linked architecture as it is, and then comparing that to all of the sources from Google to see where is Google actually going? Normally, that is the big thing, and that ties into a lot of the speed element. It’s not really directly on core web vitals, but it’s still onto the speed of people are super interested to see how they can get more done or as much as possible done within the results that Google’s going to give them. That typically it’s the crawl budget focus I’d say as a general [crosstalk 001327].

Dixon Jones

I’d just like to say that you don’t have to worry about people that aren’t in the industry watching this show. As geeky as we get, everybody watching this show apart from maybe my wife, but she wouldn’t watch this show, I can tell you that. Okay, so crawl budget. We’re coming away with crawl budget is something that’s important for large sites especially, from your point of view. What about Richard? What sort of things do you think are commonly being looked for using crawlers?

Richard Lawther

In terms of crawl budget or in general-

Dixon Jones

No, no, no, no, no, no, no that was Mark’s one. What things people might be using crawlers like Screaming Frog for to find solutions to?

Richard Lawther

I think that’s the thing with all crawlers is if you have one crawl of your site, there’s so much information you can glean from it. Whether that’s your focused on content and you want to make sure that all your metadata is properly aligned and keyword-focused as it should be. If you’re going into the crawl budget avenue like Mark was talking about, also the internal linking as I said earlier, there really is an endless plethora of use cases. And when we kind of start-

Dixon Jones

I was going to say do you think that people just want to get them all or… I mean I would have thought, and please correct me if I’m wrong, that nobody is ready to fix everything at any one time. You might be saying, “Right, I got to take an arc on checking that all the descriptions are not duplicated,” or, “I need to check that the links are working,” or, “I need to check that the 404 pages or whatever.” But you can’t fix everything at the same time, so do you think… I guess that’s a point more than a question really. I’m a little concerned that some of these reports can be overwhelming, and it’s probably a better approach to know what you’re looking for before you start your report.

Richard Lawther

Yeah, I think… For us when we do technical audits, we try and gather as much information as we can, and obviously, as we’re going through the audit, we’ll go stage-by-stage and tackle different sections at a time, as you’re saying, if you want to look at your meta descriptions or whatever it is. The question is then when you have all that information, how do you get those recommendations implemented, and that’s where the battle really starts working with your developers. If you give them a document of 6,000 things they need to fix, hardly any of those are going to get done. Really, it’s kind of a case of what do you think is going to have the most impact and how easy is it to make the most changes, and that’s where the battle with developers comes from.

Dixon Jones

Izzy, do you think that’s true, or would you tend to dive into individual things when you’re using the tools, or do you take an audit, a full audit, and then try and work out which things you need to prioritize? How does it work in practice?

Izabela Wisniewska

I was dying to say something.

Dixon Jones

Go on.

Izabela Wisniewska

Okay, can I answer a few of your questions that you asked before? I asked Richard what people might be using Screaming Frog for, so I just wanted to say that I’m using it to audit core web vitals because they’re doing brilliant job with this. From the last question that you asked is I think I was dying to say this as well, it depends. As with everything in SEO, it depends.

I think if we, for example, let’s take SEO audit on a wall. If we do an audit, if I’ve got a new client, I usually start from an audit to just see overall what’s been going on with the website because let’s face it, even if we ask client the questions, they might not be even familiar enough with the processes that been previously done or haven’t been done to actually brief us enough, so I always like to start with an audit.

And then as we said, we end up with this huge pile usually of hints or tasks that are… Things that are technically wrong and what do we do? So, I think the key, and I think Richard mentioned that the key here is prioritization. I haven’t seen, and I’ve honestly done a lot of website audits and I haven’t met a development team that would just say, “Yeah, okay, no problem. We’re just going to fix all of that. It’s going to be all good, and then your crawler is going to come back with nothing else to fix.” It just doesn’t happen. We don’t live in an ideal world.

I would say it’s just a matter of prioritizing how much resource it’s going to take, how much impact it’s going to have. How big is the scale of the problem? Is it like two URLs that are not that important and maybe we don’t have to worry about that much at the start? Or is it the whole website is affected and the whole website is very slow, and we honestly have to really tackle that because the speed being more and more important? So, I think the prioritization is a key here even if we’ve got this huge pile of things to fix.

Dixon Jones

Excellent, okay. Trying to prioritize these things is always a challenge I’m sure, but that’s cool. Right, okay. Let’s go back a little bit to where Richard made the point that when Google crawls a site, it often gets different information to what Screaming Frog or Botify or Majestic would get when they crawl a site or the ones Sitebulb and all the other tools that are out there as well. I don’t mean to lean to one over the other.

But how does that work then, Richard? I guess I’m playing devil’s advocate a little bit here, but how does that work? Surely, a crawler’s a crawler and it should get the same results every time. What goes wrong?

Richard Lawther

We always try and mimic Google as much as we can, particularly if it comes to things like JavaScript rendering. We use the same Chromium they do, use evergreen Chromium like they do. So, we try and keep in touch with them as much as we can, but even then, we still often find differences in how content is presented between what Google bot sees and what a web crawler might see, and it can be really difficult to diagnose, particularly on scale.

The way I would always go about it is look in Search Console. See the rendered HTML that Google is seeing for a URL there and then compare it against what you’re seeing in a crawler or in a browser or something like that, and then you can make a decision, is this website doing something a bit odd? Serving different content to Google bot when it should be to users.

Dixon Jones

Cool. Mark, have you got any other tips and thoughts about that? I guess there’s, for a start, Google’s trying to look at the mobile version of the site rather than the browser version of the site, but I’m sure there’s more.

Mark Thomas

Yeah. I think ultimately most of the crawlers on the market are going to be within a small margin of difference. It should be very similar, I think. In terms of the systematic request beginning at a particular location and then trying to follow links as Richard said, I mean there may be slight nuances with the slightly more complex rendering of the pages and then discovering further links.

I mean I know some great SEOs that you would all know swear by using two or three sources of crawled data just to be certain of what they’re telling a client. I think that’s extreme for a lot of people, but it’s definitely done by some ex-Googlers that work in the industry these days. But yeah, I don’t think there’s a lot of difference there. I would suggest coming back to… It starts to become really how you can work with the data and surface the biggest issues.

I don’t think replicating a crawl is the more complex element of it. Of course, at scale, it depends and how you want to pay for that and run it, whether it’s locally or in the cloud, but ultimately, it then becomes about, I think as Izzy was suggesting, trying to look at the scales and cross-referencing a lot of the volumes to make sure you focus on the biggest things.

I think, Dixon, it goes back to something you were saying a minute ago. There’s a real blend between coming to the crawler with a set of questions that the business has, or you have, and then actually trying to look for the answers, and I think that’s a super important thing that separates the best SEOs that I might encounter and some that are still finding their way, let’s say. Some that have got some kind of hypothesis that they want to challenge, and that’s where a crawler is super useful to then start to probe those areas.

But then the crawler software companies like us have a job to do to then surface as easily as possible some other issues to go and look at, and it often really requires some debate amongst different SEOs. I think globally, depending on where people might be watching this from, typically, we see in the US these really well-resourced teams of numerous people that are going to be debating in-house amongst themselves. In the UK, I tend to notice the teams are much smaller. It might only be one or two people, perhaps with the support of an agency.

It’s kind of about having that debate and that relationship. I do covet some of our American cousins with the ability to have such large resources and I think probably all of us are very passionate. It’d be great to see in EMEA that kind of resourcing given to search. But yeah, sorry, I’m waffling on a bit, but generally, you can come with some good hypotheses to challenge, but also, the tools will surface things. It’s never simple to then get a list of five things you’ve got to go and fix and that’s your sprint for the next period.

Dixon Jones

Izzy, anything you want to jump in with there?

Izabela Wisniewska

Yeah, I think I’m kind of like I’ve been using want a few crawlers and jumping in between them, and I think it’s also, I’ve already said, I’m a big advocate of tool being a tool that we actually have to analyze the data and don’t rely 100% on it. Because they’ve all been designed and developed by different people and different things in mind so I think they might be looking at things slightly differently.

As Mark, I think said, there are many people who actually advise and look into different crawlers in order to just make sure and cross-reference. Sometimes you’ll get these differences between them, just it might be because of the way they work, and they’ve been designed. But I think, if I got it correctly from what Mark was saying, I would definitely say yes, do cross-check and cross-reference, but if we’ve got this bigger picture, try to go with it.

Don’t be like, “Oh, maybe this one. Google found this one little link and this crawler didn’t find this one little link so it’s like my work is totally screwed now because this one little link wasn’t there.” Try to see bigger picture as well. Obviously, analyze data and be specific, but try to see the bigger picture as well.

Mark Thomas

Yeah, I’m just jumping in, Dixon, before you can. Following Izzy’s point, I think checking that you’ve got consistent data is something you would do quite infrequently, but definitely worth doing just to build the confidence. It’s always about confidence and making sure you’ve got a clear message to work with the engineering teams that you’re going to need to influence.

I think then from that point onwards, it is once you’re happy that you’ve got good data, making that a source of truth that starts to feed dashboards, feeding information into the business, that’s really where the crawler’s also extremely useful because it’s so factual there’s really not doubts about how data’s been scraped maybe for ranking performance or these kind of questions that can be a little bit murkier.

I think crawling data is very, very factual, and from really any of the tools, you will be able to go in and closely inspect to see the issue in real-time normally. That’s going to be very possible. I think making it that source of truth that then is feeding into the business is also very important.

Dixon Jones

There’s also to add to that, of course, Google doesn’t necessarily start at the same point as which the site-specific crawlers start because Google is crawling the whole web and gets a lot of its signals from deep links from other websites, which is a different starting point many times, I hope you’d agree. That can have the impact, for example, of having a different weighting on each page. So, they’ve got page rank information from the web as opposed to page rank information just from within your website.

That might affect the speed at which they crawl different pages, which might be different to certainly anyway, the point at which Google crawls a particular web page will never be exactly the same point that you crawl the web pages or I crawl the web pages, so there’s going to be a difference there. And of course, even on a small site, especially on a big site, but even on a small WordPress site out of the box, over time, of course, the links change. As you write a new blog post, that changes the emphasis because WordPress gives preference to new links.

So, I think there’s something there. Would you agree that the fact that your site is dynamic more changing over time, of course, crawls will not always be the same from day-to-day? Is that a fair point, Richard?

Richard Lawther

Yeah, so jumping in on that, one of the main advantages as well with crawlers, they’re so configurable. The amount of settings you can adjust if you want to run a particular test or a particular experiment or say you want to only crawl a certain area of the site, you can really throttle it down to exactly what you want in that time. Whereas obviously, with Google, it’s just crawling everything as much as it can, and there’s always a delay in that.

Richard Lawther

Using those configurations and those settings to your advantage to get that data, and then, like Mark was saying, feed that back into your reporting where you can.

Izabela Wisniewska

I totally agree. I was thinking of saying something similar. We’ve got some sort of control over what we allow and not allow Google to control. With the crawlers, we’ve got so many configurations that we can use that we can ask them to consider this, not consider that, and see how it would look. That’s I would say the main difference that we can control it so much and we can see how things would look if we do things this way or that way.

Dixon Jones

So, I remember, I mean I’m so old I started with Xenu Link Sleuth and that kind of you press a button and went, and that was it really. Things have moved on a long, long way with things like Botify and Screaming Frog and the others out there, so it’s amazing. Okay, right, so we’re in Majestic territory. We’re sponsored by Majestic, so we’ve got to have a question on links, guys.

Majestic users can export a list of links where they can see a list of links into a website from other web pages on the internet. We’ve got a source URL and a target URL. Not internal links, but links in. Majestic will pretty much check all these links every day or two, few days, so they’ve got that, but not everybody’s a Majestic customer. If you’ve got this data from Google Search Console or from another link provider and you’ve got that information. You’ve got a list of links from different websites, are crawler tools able to check those links on the fly? Mark is that something that is in Botify’s tool or is Botify more looking about this crawl budget and looking at things on the website?

Mark Thomas

Well, it’s certainly something we will be looking to add, hopefully, sooner rather than later. It’s possible now, so like with most things, the data is very feasible to pull via API and to use linking data to then create very useful reports. We do it and it’s custom work that we were trying to do at the moment, and there is a lot of work in our… We’re just trying to…

This goes back a little bit. Sorry, to swing from the Majestic for one second, but it goes back to the previous point about crawlers are… We say, “Oh, we’re trying to crawl like Google.” Well, like Richard was correctly saying, using the same infrastructure and browser settings and so on to replicate as closely as we can. But a web crawler is a bit of a misunderstanding that obviously Google very much isn’t going to start the home page and systematically line up links and conveniently crawl through an architecture.

I’m doing this kind of pyramid here through the depths of it. Popular parts of the site where new content is found. Google will understand that, be checking that more frequently perhaps. And so, that becomes a really important part about bringing in third-party data to try and understand from other sources why will Google come there, and at the moment, we do a lot with log analysis so we can see forensically, why Google has made requests so that you can try and understand.

Probably there’s a strong correlation then with the backlink profile to that page as well. It’s going to be-

Dixon Jones

Well, you seem to have frozen a little bit there. Oh, okay.

Mark Thomas

We use the log data as a proxy for that. Oh, sorry. The log data can be a bit of a proxy for popular elements of the website we understand Google will go to, but certainly, bringing in the backlink data is something we see a lot of clients of ours doing and something we want to integrate further in the roadmap really in 2022. So, yeah, very important.

Dixon Jones

Richard, is that something that you do on Screaming Frog?

Richard Lawther

Yeah, so I guess if you have the list of backlinks already from another source like Search Console or say you have Majestic as well or another backlink tool, and you grab that list, you can always import that just as a standalone list and crawl those individual URLs to check if they’re still alive. And check their current status code if they’re redirecting elsewhere or doing a 200. If you want to just pull in direct-

Dixon Jones

That checks whether the pages are still alive. This doesn’t check whether the link on the page is still alive.

Richard Lawther

Yes, so then you’d need to do a bit of extra work on that sense. So, at least with Screaming Frog, you can set up custom searches for your domain name particularly, and it will scan the HTML if that link is still present in that page.

Dixon Jones

Oh, that’s clever. Okay. Okay.

Richard Lawther

And then you can take that out and then say, “Okay, the link is there-“

Dixon Jones

Okay, so it sounds like both your tools a little bit of setting, tweaking, and stuff, and you can get there, but you got to know your tools really.

Richard Lawther

Yeah, exactly that.

Dixon Jones

That’s fair enough. Izzy, is that something you do at all, or is that something that’s not really a part of your world?

Izabela Wisniewska

I am. I do, and I was going to say I’m probably a bad person to ask this question because I’m a Majestic customer. So, I mainly use Majestic. Well, yeah, I mean there are other tools that you can use to check that kind of things, but as I said, I’ve been using Majestic pretty much since I’ve started work in industry and I’m loyal.

Dixon Jones

Well, it sounds to me that all three tools can do the job then for that particular [inaudible 003344], so that’s good. But I think checking links is an interesting use of crawlers, really, at least for Majestic customers, in general, because clearly, if people are losing links, they want to know at some point. Thanks very much, I didn’t mean to challenge all the tools against each other there. That was my usual, “Dixon put his foot in it again. There we go. That’s what he does.”

Let’s talk about something a bit more out there, and I don’t know who’s going to take this question, but I’ll ask it out there. Let’s talk about CDNs and the Cloudflares of this world and stuff. Certainly, it’s something that I think some crawlers can have a problem with because the CDN networks have a tendency to try and block non-human traffic, which I think is a mistake and there’s a philosophy around that. We can actually ask about that, but do CDNs cause problems for crawlers, do you think, and do they deliver different kind of content on different crawls? Any particular problems with CDNs that we should know about. I don’t know if anybody wants to take that one.

Richard Lawther

I’ll jump in. I guess I’d go the same route as what you said really. CDNs are fine most of the time, but when it’s when they block that bot behavior. So, if you’re trying to audit your website and then you’re using Cloudflare or something, they do often prevent you from crawling that, pulling up a forbidden response. It just comes down to really your setup. You can make allowances within the server settings for either your IP address or a particular user agent such as the Spider, and then you should be able to crawl that fine. In terms of different content, that shouldn’t be a problem, but again, the setup might change how that behaves.

Dixon Jones

Okay.

Izabela Wisniewska

I can talk a lot about my issues with Shopify, for example, and crawling that website because-

Dixon Jones

Do it. Tell us what are your problems? We’ve got a lot of Shopify users that listen.

Izabela Wisniewska

Yeah, I’ve had so many issues with crawling Shopify websites because they’ve been blocking all my efforts, and pretty much it comes down to the settings in the crawlers. Majority of the crawlers got some specific settings that you can use. For these two, I’ve been specifically using Sitebulb and DeepCrawl, and both got specific settings that you can just dig around and use to be able to crawl the sites.

But if you don’t do that, for example, computer-based crawlers, I would put a crawler on, and it would just go and go and go and go and give me back… Even it finished at some point, the results weren’t really that good as they should be. So, yeah, definitely make sure that if there are known issues with that content delivery network, make sure that you are using the crawler that can handle it because, yeah, I did have those issues, and trust me, nothing fun.

Dixon Jones

Mark, anything you want to add in there?

Mark Thomas

To be honest, web crawlers for this kind of technical SEO auditing basis typically don’t encounter too many problem… or shouldn’t really because the authentication will be achieved where the client wants to crawl their website ahead of time, so it’s not really a topic for us. I think the interesting thing with CDN is now tools and Botify with a bit of a shameless plug here, but moving into the space of actually starting to now work with the CDNs more to give back to bots a better experience of the web page than perhaps the way that the-

Dixon Jones

Than Cloudflare likes to give.

Mark Thomas

Exactly. So, we’re very much looking at working down that route a lot more now. In fact, they’re our friends rather than any kind of problem, I think, yeah.

Dixon Jones

I think it’s all fine as long as you have somebody in the organization that’s able to pop into the CDN account and whitelist the IP address so they use it. That’s not always an easy thing to do especially if it’s a bank or something like that. Presumably, the CDN level is well and truly locked down, and allowing spurious IP addresses in causes them some more issues, but-

Izabela Wisniewska

Or even different story, doesn’t have to be a bank. It can be a client that just had this web agency build the website and that’s all they know, and they just got the website, and they want their SEO to do something. And then if you tell them there’s an issue, they’re just like, “We don’t know why.”

Dixon Jones

Exactly, right? Yeah, and there’s loads and loads and loads of them. I suppose as you get sort of-

Izabela Wisniewska

Yes.

Dixon Jones

At a small-level client, they don’t know, and at a large-level-client, they can’t get the access in there.

Izabela Wisniewska

Yeah, or it takes so long to ask this person and that person to actually get to someone who’s got the access. This takes ages to actually do it.

Dixon Jones

Yeah, and I think this is an SEO’s perennial nightmare really is getting the right person at the right time to say yes to the right things. Sorry, Richard, you looked like you wanted to jump in.

Richard Lawther

There’s also some things you can do sometimes to try and avoid that. A lot of the time CDNs kick in when you’re crawling too quickly or too fast, so if you can really throttle that crawl back and slow down the speed at which you’re making those requests, then sometimes they’re just happy for you to crack on crawling the site. You don’t always need to whitelist, but it does make it a lot easier.

Izabela Wisniewska

Yeah, so those are the specific settings I’ve been mentioning as well that they usually just more imitate the human traffic rather than the bot traffic and it’s slower, but at least, they can go around the restrictions, let’s call it.

Dixon Jones

So, guys, I mean 40 minutes, 45 minutes goes really, really quick in these things. Certainly, for me, because I geek out on these things as well. We’re already pretty much close to our time, but I do want to give you guys a chance to talk about each of your individual products and what’s coming out, what’s new? What can our users reach out and try and use? I use all the products and I think they’re great things, but let’s start with you, Izzy, on the creative side. Tell us about what you got. What you can do.

Izabela Wisniewska

Okay, so I don’t have any product for you guys, unfortunately, like the rest of us do. Well, definitely, come to me if you need any digital marketing services. Other than that, I can tell you that we at Creatos Media recently started a mentorship program that is open to absolutely anyone who is new, who needs some guidance in the industry, who’d like to get in into industry. Totally free of charge. We just want to give something back. Head into our website and sign up and-

Dixon Jones

And can spell the website for everyone?

Izabela Wisniewska

Spell, oh my God. Creatos Media, C-R… let me just go through it so I don’t make a mistake. C-R-A-T-O-S M-E-D-I-A dot co dot uk.

Dixon Jones

Excellent, thank you. Richard, any new products or features out?

Richard Lawther

Yes, we just dropped version 16 of the Spider. There are some really cool features in there. Things around JavaScript auditing where you can compare the differences between what’s in the rendered HTML and the non-rendered so you can make sure that you’re not indexing in the JavaScript, but not in the other version.

There’s also some Data Studio integration so you can pull in crawl reports straight into Google Data Studio and it lets you really easily monitor any changes to your site if you’ve done anything wrong or anything’s gone up or down, et cetera. There’s loads, and also, have a look at our blog and our Twitter for the full list.

Dixon Jones

And Mark, what do you got in the Botify feature list?

Mark Thomas

Well, a very exciting program, so we probably relatively… We took some good funding, actually, earlier in the summer to-

Dixon Jones

Congratulations with that, by the way.

Mark Thomas

Yeah. Well, it’s tremendous for the industry to again see some tools getting money that can really help us to kick things on. I think a big topic for probably a lot of tools, but certainly for us is automation. I hinted at it before, working with CDNs now, starting to support web managers giving bots a great experience is something we’ve been, again, quite public about in the last six to 12 months now. We’re taking that journey ahead towards more automation of audits to then be able to hopefully start to make some changes for site owners so that they can circumvent that issue and the block of needing to get a developer on board.

Once you’re in a strong position to start to influence that copy and that experience for the bot, making sure that we can deliver that. If someone is listening to this and is interested, I think that that would be a really interesting conversation to have with us now. There’s lots of buzzwords like machine learning and so on, but the key thing is most things at this scale require machine learning now, so it goes with it, but we’re definitely looking at how do you take the learnings and actually start to…

Just getting the SEO to be more of the decision-maker on which things proceed rather than digging through the data or trying to make a recommendation. It’s more just about controlling the stop and go of what do we release and getting those changes made. I think time goes fast as today has, as the last 10 years, Dixon, as we were talking about, has gone.

Dixon Jones

Yeah, absolutely.

Mark Thomas

The next three to five years, it’s not going to be overnight that this stuff is everywhere, but I do think we’re reaching a point where the SEO will have a really interesting job to just be controlling that flow of new ideas. Going out to the site and making those changes much quicker. It’s been a terrible couple of years, but a great couple of years for search marketing.

Dixon Jones

Honestly, Mark, if I can be with my sort of Majestic hat, part of that conversation, and I think I agree with you that giving that experience to the bots is much more important than a lot of the technologies realize. If I can lend a voice that’s anything positive in that debate, then reach out and I’m here as well, so it’s great. Thank you very much, guys. I really do appreciate it. David, what do we got? You promised information at the end of the show. What do we got?

David Bain

I did indeed. So, in December, that’s just next month, we’re going to be publishing here at Majestic a big series. A video series, a podcast series, and a book called SEO in 2022. If you go to seoin2022.com, you can sign up for information about that. We’ll tell you exactly when that’s going to be published, but it’s going to be at some point in December when the book’s going to be published and all the content’s going to be released.

Prior to that being published, we’ll do a special preview show next month on the first of December. That’s the next edition of Old Guard New Blood on the 1st of December at 500 PM GMT, 1200 PM Eastern Standard Time, special book preview show. Already booked for that are Aiala Icaza Gonzalez, Jono Alderson, and Michael Bonfils. We’ll possibly have one or two more on there. It should be a great episode there previewing the book. Just go to seoin2022.com to sign up for more information about then.

Dixon Jones

I’ve seen the table of contents, guys, it’s absolutely storming, so thanks very much, David. Brilliant. Guys, just one last thing before I send you all on your way, and thank you very much. If they want to get you on Twitter or somewhere else, how do they reach out to you, Izzy?

Izabela Wisniewska

My Twitter handle is just Izzy_CM as you can see on your monitor, so just, yeah, you can ping me.

Dixon Jones

The podcast people can’t see that.

Izabela Wisniewska

Oh, they can’t see that. Yeah.

Dixon Jones

Izzy underscore CM, yeah.

Izabela Wisniewska

Yeah. Well, I’m sorry. Yes, as the guys can see on the screens. Yeah, it’s just Izzy_CM. Yeah, @Izzy_CM on Twitter.

Dixon Jones

Excellent. Richard, how do they find you?

Richard Lawther

Handle on Twitter is @RichLawther. You can find me there. I didn’t put it as my name on the screen.

Dixon Jones

Can you spell your surname?

Izabela Wisniewska

Which they can’t see anyway.

Dixon Jones

Spell your surname.

Richard Lawther

My surname is L-A-W-T-H-E-R.

Dixon Jones

You see, so the iTunes people they just don’t have this advantage really. And Mark, how do they find you? Track you down.

Mark Thomas

Yeah, unfortunately, with such a common name it’s not very obvious, but mine is @SearchMATH, so M-A-T-H.

Dixon Jones

Excellent. Brilliant.

Mark Thomas

Nice double meaning there. Yeah, @SearchMATH you can find me on Twitter, but botify.com, you can easily reach out to us through that website as well.

Dixon Jones

Brilliant. Guys, thank you very much for coming on Old Guard New Blood. I really do appreciate it. Please, when it comes out, let people know. Let your audience know that it’s out and I’ll say goodbye. I’ll say thank you after the show’s over, but for everybody out there in cyber world, thank you very much for coming along to the show.

Previous Webinars

Follow our Twitter account @Majestic to hear about more upcoming webinars!

Or if you want to catch up with all of our webinars, you can find them on our Digital Marketing Webinars page.

Author
Recent Posts

Majestic