May 18, 1998
DR. WEBSITE:
Using the Robots META Tag; Preloading Images on a Site
By David Fiedler and Scott Clark
Dear Dr. Website®:
I read the article on your site about Robot Exclusions, but I am still a bit unclear as to their proper use. We've figured out how to keep our site at the top of the search engine listings but are experiencing difficulty controlling which pages within our site are ranked first on engines such as Infoseek, which currently tops its listing of our pages with the lamest page on our site. We'd prefer to have our home page listed first, with all of the other pages sublisted, but we seem destined to have the most insignificant and boring pages listed first. Not a good introduction to would-be customers.
We're not trying to block a search engine from listing unfinished or private pages, just ones that don't represent a good welcome for potential customers.
There are about 25 pages that we don't want mentioned. To prevent a spider from listing these pages in our site, do we place the exclusion on our home page, or do we put an exclusion at the beginning of each page we don't want the spider to register? My belief is that we put the exclusion above the <HEAD> tag. Would the code essentially look like the following?
User-agent:*
Disallow: /alienation/
Disallow: /annivrings/
Disallow: /billofrights/
The code you've written is intended for the robots.txt file, which would go in your main HTML directory.
Since we've covered this in the past, perhaps this is a good time to talk about the "robots" META tag.
You would place this code in the header of each page that you do not want indexed. Below is the general format of the robots META tag:
<META NAME="robots" CONTENT="all | none | index | noindex | follow | nofollow">
The default for the robot attribute is "all." This would allow all of the files to be indexed. "None" would tell the spider not to index files, and not to follow the hyperlinks on the pages. "Index" indicates that this page may be indexed by the spider, while "follow" would mean that the spider is free to follow the links from this page to other ones. The inverse is also true, thus this META tag
<META NAME="robots" CONTENT="noindex">
would tell the spider not to index this page, but would allow it to follow subsidiary links and index those pages. "Nofollow" would allow the page itself to be indexed, but the links could not be followed. You can learn more about this tag at the W3C's robot paper. For more information on META tags, you can refer to the original article.
Dear Dr. Website®: How can I load graphics that appear on successive pages in a site while the home page is loading, without increasing the initial load time?
You can preload images by using JavaScript the way you would for mouseovers. The images will sit in a browser's cache and will be ready for use in the next few pages.
<SCRIPT LANGUAGE= "JavaScript">
<!--//
// here images are preloaded
onimage = new Image(8,14);
onimage.src = "spiranim.gif";
offimage = new Image(1,1);
offimage.src = "spirit.gif";
//-->
</SCRIPT>