Click to See Complete Forum and Search --> : serious help needed!


sirosman
01-13-2007, 12:26 AM
Ok,

I think this is the best place for me to ask for some advice… so here we go:

My company has recently signed an agreement with the federal government on some web content syndication. As exciting as this might be, I am looking for some ways as to how we can utilize this information on our company site.

I am a physician by training and I do a lot of medically related web design and programming (as a hobby really)… but I think I have finally gone over my head this time!

The information that we are given is in XML format. There are well over 15000 xml files that are given to us (on monthly basis) and we need to incorporate this data into our site. Problem is I am not sure where to start.

The XML files are of course complex and not all share the same fields. I was originally planning to write a code to dump the XML date into mysql… well that isn’t going to work because each XML file is unique with its field names and so forth… To give you an example: If you have a cancer information page, each line or paragraph is inserted into its own XML field… stuff like -<ItemizedList Style="bullet">text…

So, given the fact that there are 15000 of these files, and we need to make them searchable and all, can some of you give some ideas as to how you’d go about cataloging them?

What I was thinking so far was as follows:
Write a code that would take each XML file and take the identifying information such as article/page id and title and so forth and dump it into a mysql. Then whenever a page was requested, access the XML file and display it using CSS.

(I realize this is probably very elementary!)

Any suggestions and ideas would be greatly appreciated. Thank you in advance…

Kor
01-14-2007, 02:05 AM
I guess you need a server-side application to insert XML into mysql. Which are the languages your server supports? php? java? pearl?

Anyway, I will move your thread to the SQL Forum, as I think you will find there more support. But I leave also a redirect link here, in the XML Forum, as well.

chazzy
01-14-2007, 07:51 AM
I don't think you need a database at all :-D if these files change often, you're better off reading them from the files as needed.

If you use a well equipped server side language that can keep the XML files in memory (such as java) you can easily make them searchable via the web. I would not recommend PHP at all for this, since its stateless code.

if you are only a web designed in practice, i'd probably say its a bit above you (not to say you're not capable, but i can't do surgery its above me)

NightShift58
01-14-2007, 11:29 PM
You may or may not need a database. I haven't seen the XML files so I won't venture too far.

Nonetheless, it would be worth the effort to determine if these XML file - in spite of their different field names - perhaps do share a common structure. If that were the case, even with exceptions, one could use these commonalities as a basis for building a database. It's done every day. An oversimplified example would be an address: we lump PO Box #, Suite ZZZ and R.R. 5 into a fields called "address1" and "address2".

It is also not uncommon to design database tables with fields that are seldom used, just to cover eventualities. Again, I refer to "address1" and "address2", where one doesn't create a table for records with 2 address lines and another one for records with only one address line.

I wouldn't give up on PHP too quickly, either. With PHP, your application will not be any more or less stateless than any other HTTP application. You could say it's in the nature of the beast. But that beast can be tamed.

Having said that, I don't think that the entire process of these integrating these files into your application can or should be handled entirely online. Some of the work will have to be done in the background, on the server and outside the realm of HTTP, to pre-process the files for online consumption.

Finally, I wouldn't let myself get discouraged. Even though 15000 files sounds like too much to handle, once a basic concept is in place, you will only have to deal with perhaps a few hundred true exceptions and those can be solved as well.

You can do it, Doc!