sirosman
01-13-2007, 12:26 AM
Ok,
I think this is the best place for me to ask for some advice… so here we go:
My company has recently signed an agreement with the federal government on some web content syndication. As exciting as this might be, I am looking for some ways as to how we can utilize this information on our company site.
I am a physician by training and I do a lot of medically related web design and programming (as a hobby really)… but I think I have finally gone over my head this time!
The information that we are given is in XML format. There are well over 15000 xml files that are given to us (on monthly basis) and we need to incorporate this data into our site. Problem is I am not sure where to start.
The XML files are of course complex and not all share the same fields. I was originally planning to write a code to dump the XML date into mysql… well that isn’t going to work because each XML file is unique with its field names and so forth… To give you an example: If you have a cancer information page, each line or paragraph is inserted into its own XML field… stuff like -<ItemizedList Style="bullet">text…
So, given the fact that there are 15000 of these files, and we need to make them searchable and all, can some of you give some ideas as to how you’d go about cataloging them?
What I was thinking so far was as follows:
Write a code that would take each XML file and take the identifying information such as article/page id and title and so forth and dump it into a mysql. Then whenever a page was requested, access the XML file and display it using CSS.
(I realize this is probably very elementary!)
Any suggestions and ideas would be greatly appreciated. Thank you in advance…
I think this is the best place for me to ask for some advice… so here we go:
My company has recently signed an agreement with the federal government on some web content syndication. As exciting as this might be, I am looking for some ways as to how we can utilize this information on our company site.
I am a physician by training and I do a lot of medically related web design and programming (as a hobby really)… but I think I have finally gone over my head this time!
The information that we are given is in XML format. There are well over 15000 xml files that are given to us (on monthly basis) and we need to incorporate this data into our site. Problem is I am not sure where to start.
The XML files are of course complex and not all share the same fields. I was originally planning to write a code to dump the XML date into mysql… well that isn’t going to work because each XML file is unique with its field names and so forth… To give you an example: If you have a cancer information page, each line or paragraph is inserted into its own XML field… stuff like -<ItemizedList Style="bullet">text…
So, given the fact that there are 15000 of these files, and we need to make them searchable and all, can some of you give some ideas as to how you’d go about cataloging them?
What I was thinking so far was as follows:
Write a code that would take each XML file and take the identifying information such as article/page id and title and so forth and dump it into a mysql. Then whenever a page was requested, access the XML file and display it using CSS.
(I realize this is probably very elementary!)
Any suggestions and ideas would be greatly appreciated. Thank you in advance…