Archive for 'April, 2009'

GoDaddySucks.com

I don’t really hate GoDaddy.

I do confess that I don’t like them all that much. For the supposed leader – at the least, one of the most recognizable names – in domain registration, I find the experience of dealing with them pretty bad: bloated admin pages, opaque admin interfaces, interminable payment processes.

I even got bitten by one client who was using them for Windows hosting, but the account didn’t support PHP. What basic hosting account these days doesn’t support PHP? Goodness!

Still, I can usually get done what I want, though in a clearly sub-optimal way. Like I said, I don’t really hate them, at least not with the white hot intensity of a thousand suns, a level of fury that comes easily to me when the topic turns to IE6.

So, I was surprised to find that the domain name godaddysucks.com redirects to the godaddy.com home page. A whois check shows that GoDaddy themselves have the domain name.

A bit of pro-active defense there, a good lesson for all of us.

[ As a side note, GoDaddy was not so pro-active as to secure certain hyphenated variants of the godaddysucks.com theme, some of which lead into a distinctly "red-light" area of the web. 'Nuff said. ]

Google news sitemaps and article url requirements

One customer of mine has been trying to get his content listed in Google News index (as distinct from Google’s main web index).

Google’s general guidelines for news publishers includes a collection of Technical Requirements for Article URLs.

One of those requirements strikes me as just odd:

Display a three-digit number. The URL for each article must contain a unique number consisting of at least three digits. For example, we can’t crawl an article with this URL: http://www.google.com/news/article23.html. We can, however, crawl an article with this URL: http://www.google.com/news/article234.html. Keep in mind that if the only number in the article consists of an isolated four-digit number that resembles a year, such as http://www.google.com/news/article2006.html, we won’t be able to crawl it. Please note, this rule is waived with News sitemaps.

What the heck is that is that for? As long as each article has its own unique URL, why must it have a unique number as part of the URL? It sounds like a pointless hoop through which news sites are simply forced to jump in order to demonstrate that they are “big” enough to jump through it. I would be grateful to anyone who can provide a definitive answer to what purpose is served by this requirement.

Anyway, rather than change the entire CMS, I opted to develop a Google News sitemap, a creature that uses a different format than the standard sitemap protocol. Google Webmaster Tools confirms that the the sitemap is being spidered and is error free.

Although Google representatives confirmed to me via email on several occasions that the 3-digit-url requirement is waived for sites that submit a news sitemap and the spider is still successfully pulling/parsing the news sitemap file, the customer has yet to see any content show up in Google News.

It actually occurred to me later that since the Google will probably only spider/add new content to the news index (as opposed to the standard index), we really could get away with simply changing the scheme for new URLs only. So, we bit the bullet and changed the CMS to add the digits.

It will be interesting to see if it makes a difference. If so, it undermines the Google claim that the url rule is waived for sites implementing a news sitemap.

Stay tuned.

2009-04-13: Update Content now appearing Google News. Apparently, the three-digit rule waiver is unreliable.