Click to See Complete Forum and Search --> : Tracing 404 errors??
Frost
06-26-2003, 08:20 AM
In my error log, I keep seeing that there are errors for files such as robots.txt, 404.shtml, favicon.ico, and ikonboard.
I don't see a way to trace what files or actions are calling these to get an error. Is there a way to find what is calling these files besides searching every file on my server?
PeOfEo
06-26-2003, 08:27 AM
Yes, If you were to make a 404 page and have it submit an email to you telling you when the 404 occurered. How is your error log working, that seems to be tracing them.
Frost
06-26-2003, 08:32 AM
This is exactly how my error log looks:
[Thu Jun 26 01:21:30 2003] [error] [client 63.121.**.***] File does not exist: /public_html/robots.txt
It shows which file doesn't exist but not what file called the nonexistent file.
PeOfEo
06-26-2003, 08:34 AM
you can try to trace the referrer to that page and you can see which page a user was at beforehand. That might work.
PeOfEo
06-26-2003, 08:41 AM
Yes I just checked,referrer is a server variable... I am not quite sure if that will work on the same server or not it might just be tracking a server that refers you.
Frost
06-26-2003, 08:49 AM
In my Webalizer, when it shows robots.txt, it shows no referrer page. As if someone were typing it in manually but I don't think that is the case. I don't see the other pages at all in Webalizer.
PeOfEo
06-26-2003, 08:51 AM
Humm well assuming you made it yourself using a server side of cource. If you did not code this thing yourself this allows you no way to do it. You are just going to need to check your pages when you recieve and error report.
Frost
06-26-2003, 09:04 AM
Ok, I'll check with the tech when he gets back from being out of town. I didn't make it myself. I'm using a hosting service and it came with the error logs and error pages. I was just hoping there was an easy way to do this without having to wait for the tech.
Thanks.
Aronya1
06-26-2003, 03:26 PM
I have the same situation. My best guess is that it's a default location requested by the search engines' bots when they index a site. Maybe this is where we should be storing a file specifically for the bots, telling them which pages to index or not index, etc.
Frost
06-26-2003, 04:54 PM
http://www.inktomi.com/slurp.html
I found this URL in the AGENT field of the stats. This has something to do with the calling of robots.txt.
I read the page but I'm still not sure what the hell Slurp is and why it's calling robots.txt.
Robert Wellock
06-27-2003, 08:23 AM
Slurp is a robot and is probally looking for a robot.txt file to see if you allow it access, it didn't find one so you got 404.
Frost
06-27-2003, 08:43 AM
I've discovered that much. I'm still not understanding what a robot is for, though. I checked out www.robotstxt.org and it helped a LITTLE. All I did to get rid of the error was create an empty robots.txt file. I haven't decided whether or not to disallow them access. I don't know why they are accessing.
Robert Wellock
06-27-2003, 09:01 AM
If you want to be indexed by that search engine the allow them.
Robots can be automated programs that trawl websites to gather information.
Search engines use robots because it would require too many resources to employ thousands of people to index websites.
Also some robots are bad like Spambots, which harvest e-mail addressess to spam people with...
PeOfEo
06-27-2003, 08:30 PM
Why can't you just check your code manually and go to your site? Do you have a server that doesnt have 99.9% uptime or what is it because if you have the right path... the link should work.