So, I was thinking about pages that are very shaped by JavaScript and how they might not do so well with web crawlers.

For example, if the comments on a blog is loaded by Ajax (which I know many blogs do), then those comments might not be indexed by web crawlers. This is a sad loss for us web developers because the number of keywords and the content of those comments are not being indexed.

Another example, that I'm facing myself, is when you have pages with what one could call JavaScript applications. An Ajax based chat, or a fun game of JavaScript Tetris, if you will. These pages are entirely generated in JavaScript, and thus, when a web crawler tries to index it, it will find absolutely nothing. Or even worse, the infamous "You appear to have turned off JavaScript in your browser, please enable it" message.

So, I thought about ways to deal with this issue. The first possibility I thought of was to have a message in the HTML document, and then remove it with JavaScript as soon as the page loads. If I have an JavaScript based Tetris game, I could write in the document's body "JavaScript Tetris is a fun application based on the classic addictive game we all know", and have it instantly removed with JavaScript. This way, the web crawler would pick up this description for when it's indexing, but your visitors would not see it.

Another idea I had was to use .htaccess to secretly redirect the web crawlers to a fake copy of the document that contains nothing but the description. The web crawler would then index the page with that text, but when users (who are not web crawlers) access the page, they are not redirected and thus gets the real application instead. This solution scares me a little though, as I don't think most search engines would look kindly upon being tricked to the content of a webpage. Indeed, this trick could be used for malicious purposes, and it wouldn't surprise me if big search engines like Google would punish you and not index your page if they found out.

What are your thoughts on this issue?