Right. It has been busy few months and it seems like there is some blogging due. Since the results of the latest batch of experiments I have been busy with are inconclusive, I thought to do a non-experimental post in the meantime, just to keep my ~500 (whoha-just-went-down-to-380-overnight-was-it-something-i-said. The numbers went back, it was some kind of a glitch) RSS subscribers (whoohooo) happy.
So as many of you have probably heard, Google Webmaster Tools have started reporting the referrals of the 404 errors they have encountered as they are crawling the site. Matt was very surprised no one from the SEOsphere was writing about it (although since then, there was a number of posts about this issue), and rightly so – this report can be very useful for several purposes:
- As a webmaster you want to know which pages on your site are being requested and from where and act upon that information. One way of acting is by fixing the possible URL mistakes you may have on your site. Another way is creating a customized 404 pages that will channel your incoming traffic to more useful places on your site. Just imagine that you are getting tons of requests from a bad link from a forum on a certain topic and being able to offer those visitors a customized 404 pages which casually offers links to the same-topic page on your site. Or 301ing the visitors to a landing page offering related products. The possibilities are numerous.
- As an SEO you do not want those links to go to waste by sending their link juice to your default/customized 404 page. You can redirect those links through .htaccess (or any other form of redirection) to your landing page
Sounds great. The question that started bugging me when I read about this service offered by Google was whether the information I am getting from my Webmaster Tools (WMT) is correct / up to date / comprehensive? So I dug into my log files covering the same time period as the WMT for this blog and lo and behold, Google was showing me only a fraction of the referrals to the 404 page.
In Google WMT I could see only two bad referrals, while digging through log files discovered 22 different bad referrals. That is a 9% fraction, which is pretty bad.
To be fair, not all of the referrals found in WMT report were included in the log file report, for the simple reason that for a referral to appear in your log files, it needs to be clicked on. Google’s spider, however, does not pass referrals nor does it click on the link immediately upon discovery, therefore you will see in WMT report some links that will not be found in log files.
There are a number of reasons for the discrepancy between WMT and the log files and not all of them require putting a tinfoil hat on:
- Google is showing only those bad referrals that their spider found. It is possible that the page with the bad link was removed so as far as Google is concerned, the linking page does not exist so there is no need to report it.
- The page that contains the bad link has not been crawled yet.
- The page that contains the bad link is not being crawled due to password requirement, robots.txt or metatag blocking (although we all know how well that works), duplicate content issues found on the linking page or any other possible issue.
- As with the incoming link data, Google is withholding some of the information from the webmasters (yes, this is the tinfoil one, which doesn’t make it less of a possibility, however it sounds farfetched to me).
In any case, for all the reasons I have mentioned earlier, you want to know about all the links causing a 404 on your site, even if they are not counted by Google presently. One possible scenario that comes to mind is a page that is not being crawled by Google for whatever reason, and thus a bad link from that page not reported in WMT. If that page gets scraped and the scraped page is counted by Google, you have a 404 problem which you could use to your benefit.
Now, I know that the number of bad referrals I am showing here is pretty low, however it must not be forgotten that the audience that reads this blog and links to it is very web savvy and there are not many mistakes done by them (even though Search Engine Roundtable is one of the bad referrals both in log files and the WMT report <looks in Barry’s direction>). However, when it comes to websites from other niches, I would assume that the percentages of reported vs. unreported bad referrals could mount up to significant numbers and due diligence should be applied by exploring your log files.
And for end, here are several tips that can assist your digging through the log files:
- First and foremost, use a good log file analysis tool. I prefer Nihuo, which gives great visualization of your log file data, while giving you a great deal of flexibility in defining the analysis parameters, such as tracking single files, tracking advertising campaigns, setting up filters for a plethora of parameters, etc. Of course, a custom made log analysis tool is better, if you have the skills/resources to get one.
- Filter out No Referral 404s. They are good for fixing your missing pages on the site, not so useful for redirecting your link juice.
- A majority of requests resulting in 404 on my site were requests for favicon.ico from the time I did not have a favicon. Another very popular file whose request result in 404 is robots.txt. Filter those out since they are of no
interest to you for this purpose.
- Filter out all of the referrals coming from your URL.
This should leave you predominantly with the 404 referrals coming form broken links from outside of your site.
As I said earlier, WMT 404 report is a step in great direction, however, do not rely completely on it and complement your 404 research with your log file info.