With a directory script recently installed to an add-on domain on my website, since the script was kind enough to create sitemaps for me, I decided that I may as well let Google know about the sitemap.
I’ve done this a few times before… basically all you have to do is put the sitemap on your website, then log in to your Google Sitemaps account, and enter the URL to the sitemap. To make sure that the website is really yours, Google tells you to create an empty file with a cryptic name like google2378798124.html which you must put in the same directory as the sitemap. The logic being of course that if you have enough access to the site to upload a file, chances are it’s ok if you’re granted access to some of Google’s information relating to that site.
So I created the file, and uploaded it with FileZilla. Then I clicked the verify button.
Unfortunately it wouldn’t verify, and instead I was presented with an unpleasant message… the kind you really don’t need at 4am when most of your brain is already asleep. The message was as follows:
We’ve detected that your 404 (file not found) error page returns a status of 200 (OK) in the header.
Despite not having committed all the status codes to memory, it was pretty easy to determine what was going on. For those of you that have somehow managed never to click a broken link, a 404 page is what usually shows up when a link no longer exists. It also shows up if you type in a URL wrong, or basically any time your web browser tries to load a page that just isn’t there. Now instead of seeing the 404 error code, for whatever reason, Google was seeing a 200 code.
Now this didn’t concern me. If my error page is returning a status code of 200 at 4am, that’s perfectly fine with me. It can return a negative error code for all I care. It’s an error page, and all I wanted to do was to get Google to fetch my darn sitemap. Unfortunately, simply clicking VERIFY over and over again in frustration wasn’t convincing Google Sitemaps to change it’s mind, so i copied the entire error message and googled for it.
Sometimes you just get lucky, and the first result page was filled with forums where others had the same problem. Skimming through all the results quickly, I determined the probable causes were as follows:
- Web host issue. I ruled this out almost immediately, because my main domain and add-ons were all functioning fine except for this one. Unless something funky was going on, it probably wasn’t the host.
- Error page is being served, but returning the wrong code. Fixes are to make sure the title of the page contains “404″ in it somewhere, or adding the following (remove the space between the ? and <>):

to the front of the code on the error page. This wasn’t it in my case. In fact I didn’t even have an error page set up, although I was pretty sure the server would still dump out a 404 code even without the page existing. - Re-directs in .htaccess . This can happen when someone doesn’t want error pages and would rather re-direct the person to the main page. As it turns out, this was the problem. The .htaccess file included with the directory script I was using for the site redirected any requests for non-existant pages to the main page.
Note that regarding #3, if you want users seeing the home page instead of error pages, a neat little trick someone posted was to do as follows:
- Create an error page, for example 404.php
- Add the following code (remove the space between the ? and <>):

What this does is still utilize the 404 error page and pump out the code, but the content of the page ends up being whatever is in index.html. Of course, replace index.html with whatever your main page is.
Since in my case it was the .htaccess file, I simply renamed it to something else, clicked VERIFY in Google Sitemaps (which then worked), and renamed the file back to .htaccess afterwards. Keep in mind that in your own situation (depending on what role your .htaccess file plays), renaming it could leave your website temporarily vulnerable, inaccessable, or something else (in which case editing the file would probably be a better option). Anyway, after Google verfied the sitemap, I changed everything back to normal and was good to go.
So why does Google Sitemaps care???
Actually, it makes perfect sense. When Google searches for that cryptic file you put up on your site, it needs to know that the file really exists. If your website is returning 200 (OK) codes all the time, it has no way of really knowing for sure. So when you click VERIFY, in addition to checking for that file, it also searches for another non-existant file expecting to get a 404 error. If it does, then it knows it’s getting a legitimate 200 from the cryptic file you put up there.
Is there anything Google could do to make things easier?
Probably. Off the top of my head, I would guess that they could give you the option of having some sort of cryptic text in the cryptic file which it could read to verify. They’re a pretty smart company, and I’m sure they could come up with many more options.
Is it something I can see Google changing?
Actually, I could be wrong, but I don’t think they’ll change it. I wouldn’t be surprised if they intend it to work this way. If there’s a reason that they dislike error pages not showing up as 404′s, this is of course a way to discourage that from happening. I would imagine that their spiders have an easier time removing dead links when they show up as 404′s. A dead link simply being directed is at the very least more work for the spider, and it’s possible the spider can’t tell at all. If that’s the case, even the website owner should be concerned, because if there are 15 non-existant pages that all get redirected to the main page, it’s going to look like 15 pages of duplicate content which isn’t exactly healthy for a ranking.


Comments
Leave a comment Trackback