Okay, new to the forums (thank you) and I have a simple, or seems simple question regarding the robots.txt file for web sites.
Based on my sample below, is it necessary to list the last user-agent that is to have access to all of the site in order for the whole site to be indexed? I mean, is the robots.txt file's contents hierarchical? Anyone that can answer this question for me I would appreciate. I have found nothing on the web that supports my question.
User-agent: googlebot # all services
Disallow: /private/ # disallow this directory
User-agent: googlebot-news # only the news service
Disallow: / # on everything
User-agent: * # all robots
Disallow: /something/ # on this directory
10-03-2013, 12:39 PM
No. Robots.txt is an exclusion protocol. I.e. The default is for all bots index everything unless an exclusion applies.
10-03-2013, 12:44 PM
Jedaisoul, thanks for your reply. However, with the fact that the last User-Agent listed encompasses all bots, does that have to be listed last in the example provided?
10-03-2013, 04:46 PM
The order only matters if "Allow" is used. In which case the "Allow" should precede the "Disallow".