Click to See Complete Forum and Search --> : Can web searches find xml data files?


wikiman
07-01-2008, 12:04 PM
I am trying to find out if search engines can index sites that use xml files as data storage, and then display them using javascript, css and master pages on asp.net? Or other display methods?

Thanks.

bogocles
07-03-2008, 02:09 PM
From what I understand of search engine web crawlers, I'd have to say yes. For instance, if you were to navigate to:

http://www.some-domain.com/index.xml

And that .XML page contained a processing instruction like:

<?xml-stylesheet type="text/xsl" href="index_style.xsl"?>

And the referenced XSLT's job was to turn the XML into XHTML complete with links, images, and what not, then yes it should work.

I believe web crawlers, once they arrive at a page, parse the HTTP response (page HTML), make a list of all possible links, and then follow them up sequentially, making a list of all other possible links each time they follow one. You can imagine this could potentially lead to a huge nested mess of links and sub links, but as long as your site is not overly complex, a web crawler should be able to catalog it completely in less than a few seconds.

As long your XML is being changed into HTML or XHTML, this should work fine. Again, not completely sure, so you might wanna check. I'd also recommend looking into web crawlers in general. The net could definitely benefit from web developers coding in a more bot-friendly manner, though.

wikiman
07-05-2008, 10:22 AM
Thanks, bogocles.

So could this work well on an extensive encyclopedic wiki site where there are thousands of data files?

bogocles
07-08-2008, 03:24 PM
Sorry my response took so long.

I would say that, yes, as long as the data files are being incorporated into the final page and sent as one document through the webserver. In reality it doesn't matter how or where things are arranged on the server. It matters only what the client can see and access. The client can be me, you, or a google spider. Just make sure that the final product (what is sent over HTTP) can be accessed by anybody (i.e. certain data files are not behind some sort of authentication).

Check out this URL, too, for help in getting your site to be "crawlable".

https://www.google.com/webmasters/tools/docs/en/protocol.html