Click to See Complete Forum and Search --> : SSI and search engines


hypnoseeker
08-20-2004, 01:54 PM
I am a bit puzzled about how do search engines deal with SSI pages. I understand that one can lift the navigation section from an HTML page and place it into nav.html document then include that code by using
"# INCLUDE file = nav.html" in place of the original code.

So far so good.

Now, the confusing part. Since the navigation code is no longer there, what happens when the search engines come to spider your page? Do they see the navigation code (links) or are the links invisible to them?

Very interested in an answer to this puzzle.

rsd
08-20-2004, 04:42 PM
It would depend on the search engine algorithm itself. Each search engine is different so they don't all crawl the same way. If you have a meta robots tag set to nofollow then they diffinately wouldn't be found. Some engines even ignore this tag. You might be able to find some information on this topic at http://searchenginewatch.com/

philaweb
08-21-2004, 03:48 AM
Originally posted by hypnoseeker
Now, the confusing part. Since the navigation code is no longer there, what happens when the search engines come to spider your page? Do they see the navigation code (links) or are the links invisible to them?

One would then ask: "Why use SSI if the code will not show?"

SSI (Server Side Includes) are server side generated includes. There is no such thing as "prior" to SSI. Either you've got the right include commands or you don't. If the SSI commands are "invisible" to the robot, the tags are not correct. Server Side Includes are server generated and there is no way a robot can "unsee" them, since the include tags are never presented to the robot.

hypnoseeker
08-21-2004, 04:52 AM
Thank you for the answers. That has cleared up a lot for me.

In actual fact, this question was not an easy one to answer because the meager picture I painted was inadequate. At what level of detail should the response be?

Mea Culpa :) What I should have said was this.

==================================================
SSI is a mystery to many. In a book of 200 pages maybe 1 or two pages touch this subject... if at all.

To a complete novice, it is disconcerting to see the removal of all inter-related links and be replaced by a single line of code.

Faced with such a situation, the complete novice reasons that if THEY can't see the links maybe the search engines can't either.
==================================================

Had I asked this question in this way, perhaps it would have been easier to answer it.

In any event, your response has been spot on and I do appreciate it.


Now it is much easier to see why a complete novice would reason their way to a wrong conclusion. And that happens only because one vital bit of information is never explained clearly.

A novice does not realize that "There is no such thing as a "prior" to SSI." as far as viewing of the page goes.

They also do not realize that a search engine must first "open" up the page before it begins the indexing of the contents. That action of "opening" the page forces the server to replace the tag with actual content represented by the tag.

In other words, the search engine bot will never see the SSI tag itself... only the content that SSI tag was pointing to.

Am I correct in thinking this way?

philaweb
08-21-2004, 08:21 AM
Originally posted by hypnoseeker
In other words, the search engine bot will never see the SSI tag itself... only the content that SSI tag was pointing to.

Am I correct in thinking this way?

Bullseye! :)

diamonds
08-21-2004, 09:07 AM
Server Side Scripting like SSI, PHP, and CGI Scripts, is a bit complex to understand, but most people get it after a while ;)

lets say here is the content on the server:

page.html
<html><head><title>example</title></head>
<body>
{include toc.html, via a server-side-method.}
</body></html>

toc.html:
<a href="pg1.html">page 1</a>
<a href="pg2.html">page 2</a>
<a href="pg3.html">page 3</a>

now, before the server even sends the HTML, it prases the server-side scripting, and than sends the HTML to the client:

HTML the client recives:<html><head><title>example</title></head>
<body>
<a href="pg1.html">page 1</a>
<a href="pg2.html">page 2</a>
<a href="pg3.html">page 3</a>
</body></html>

By the way: ASP is out. SSI is out. PHP is in...

hypnoseeker
08-21-2004, 09:51 AM
It's becoming clearer and clearer by the minute :) Many thanks, diamonds.

One small point about PHP being in... true... so long as your host has the register_globals turned ON.

If they have register_globals turned OFF, then it's a different ballgame altogether.

I've lost count of how many different php scripts I tried that would not work because of this restriction.

I guess this is early days and there is no set standard for hosting companies to follow. The only solution is to find a reliable host that happens to have their system variables set in a way where most of your PHP scripts will work.

Finding a reliable... inexpensive... PHP compatable host is not always easy. If your host happens to fit this profile, then send them a bottle of beer at Christmas and do all you can to make them your best friend :)