Click to See Complete Forum and Search --> : robots.txt - allow W3C link checker / new site


CarolW
01-29-2006, 10:41 AM
I have two related questions.

1) I really don't understand the robots.txt file. I put one in recently to prevent my images from being indexed (I think). I just parrotted what I could find, without understanding it.

2) Currently I have a site I'm putting together for a friend sitting in a subdirectory on my site. We're working on transferring my friend's domain registration; when it's transferred, I'll then put her site on her own domain. I will need a robots.txt file for her, too; it will probably be similar to mine.

Here's my current robots.txt file:


User-agent: Googlebot-Image
Disallow: /
User-agent: *
Disallow: /coflip/
Disallow: /cohorse/
Disallow /newsite/
Disallow: /images/


In the disallow list, "/newsite/" is the one we're transferring. As long as I leave /newsite/ files on my site, I should surely keep that statement in my robots.txt file. Presumably I can delete that statement when I delete the /newsite/ directory and all its contents from my own site. Right?

Let's assume I use this same robots.txt file (changed, though so it will work properly) for /newsite/ when its own domain is ready and I can FTP its files to it.

How can I change this to allow the W3C link-checker access? What it's not being able to access is images.

Too bad that in my continuing studies, I really have no idea what's going on - what these statements actually mean, so if somebody can point me to a resource I might have a chance of comprehending, or perhaps even try to explain it yourself, I'd be very grateful!

Thanks in advance!
Sun, 29 Jan 2006 08:37:56

Fang
01-29-2006, 01:31 PM
Try this (http://www.searchengineworld.com/robots/robots_tutorial.htm). It also has a validator for robots.txt

CarolW
01-29-2006, 02:48 PM
Dear Fang,

Try this (http://www.searchengineworld.com/robots/robots_tutorial.htm). It also has a validator for robots.txt

You Again! To the rescue AGAIN! Thanks a million! I followed the links, and also checked my robots.txt file with the validator - wonderful things, those are! Learned a few things!

I note there's no "Allow," but there are ways around that.

So here's my current robots.txt file after going through the validator:


User-agent: Googlebot-Image
Disallow: /

User-agent: *
Disallow: /coflip/
Disallow: /cohorse/
Disallow: /newsite/
Disallow: /down/
Disallow: /images/


Note the addition of a blank line after the first User-agent statement! Yikes! So, I'm starting to catch on!

Now what I need to know is how to give W3C permission to access my images, or even every directory and file (well, some of them, anyway).

So, I'd have to put that one before the single record that starts
"User-agent *" and continues with its disallowed fields.

I invoked the link-checker on my site just now, as it's been a while since I've used it, and I hoped I could see the User Agent in my web logs - tomorrow. But I'd like to let them in today, if anybody knows how to specify the W3C validators as User Agents (if you follow me). Thanks again!

Sun, 29 Jan 2006 12:46:58