Click to See Complete Forum and Search --> : get xml tag names (distinct)
adarshyam
07-02-2010, 10:54 AM
hey guys..,
i just wanted to know how to get the tag names of an xml document..
i wanted to migrate XML data into the database so i want the xml tag names (unique) to create columns in the database table. below is an example. Please help me out. thank you..
<address>
<city>LA</city>
<state>CA</state>
<street>33 Downey Ave</street>
<zipCode>90001</zipCode>
</address>
<address>
<city>Hollywood<city>
<state>CA</state>
<street>13 hollywood blvd</street>
<zipCode>90002</zipCode>
</address>
so i want the result as
address
city
state
street
zipcode
Charles
07-02-2010, 11:47 AM
Well, what tools do you have? PHP? Perl? JScript?
adarshyam
07-02-2010, 11:55 AM
i have acces to javascript only... actually i have converted the xml to xsd so now i am looking for some tool which will get the column names for me.
please lemme know if u know any of them. thank you
Charles
07-02-2010, 11:57 AM
Are you using Windows? Then give me a few minutes and I'll write you something.
But first I need to find some XML.
adarshyam
07-02-2010, 12:06 PM
yes I use windows. thank u so much
Charles
07-02-2010, 12:10 PM
Here's the no-frills version:<?xml version="1.0" encoding="iso-8859-2"?>
<job>
<script language="JScript">
<![CDATA[
dom = new ActiveXObject("msxml2.DOMDocument.3.0");
dom.async = false;
dom.validateOnParse = true;
dom.resolveExternals = false;
list = new Object();
e = new Enumerator (WScript.arguments.unnamed);
while (!e.atEnd()) {
dom.load (e.item());
f = new Enumerator (dom.getElementsByTagName ('*'));
while (!f.atEnd()) {
list [f.item().tagName] = true;
f.moveNext();
}
e.moveNext();
}
for (p in list) WScript.echo (p);
]]>
</script>
</job>Name it "elements.wsf" and invoke it from the command line with cscript elements.wsf filename.xml
adarshyam
07-02-2010, 12:39 PM
thank you very much for the script, i appreciate it. dude but i am not able to understand a word in it and im not sure how to execute it. i am using dreamweaver and where can i add ths script? and how to invoke it? i can hardly understand the code. but most important is please lemme kno how to use it. thank u v much again
Charles
07-02-2010, 01:11 PM
You should have disclosed that you were a Dreamweaver user, I had assumed that you had some basic understanding of the concepts.
Open a text editor, Notepad and not Word and certainly not Dreamweaver. (In fact, do yourself a favor and uninstall that abomination right now.) Cut and past that script in Notepad and then save it in the same folder as your XML file and save it as "elements.wsf".
Now in Windows open up the Explorer window for that folder and then highlight and copy the address to the clipboard. You may have to turn on the address bar. (View->Toolbars->Address Bar)
Click on Start->Run and then run "cmd". Once the DOS shell has opened up type "cd ", making sure you trail with a space. Then right click next to that space and select Paste and hit return. That should navigate you to that folder painlessly. Then type "cscript elements.wsf filename.xml" where filename.xml is the name of your XML file and then hit return.
adarshyam
07-02-2010, 01:11 PM
hey i figured it out how to invoke.. thank you so much for the help. for those who visit this thread in future....
i copy pasted this script in note pad and saved as elements.wsf and then
run->cmd-> cscript elements.wsf filename.xml (on the location where these files are located)
Charles
07-02-2010, 01:14 PM
And if you invoke it with cscript //NoLogo elements.wsf filename.xml > headings.txt the result will be saved to the file "headings.txt".
adarshyam
07-02-2010, 01:15 PM
wow thats cool.. hmm gr8 stuf to learn.. thanks..
adarshyam
08-19-2010, 12:30 PM
Hi Charles,
Hope you are doing good. suddenly this thread striked me when I heard this task. I have to find all the URL references that are NOT under say www.games.com.
for example., zynga.games.com is URL reference that points to separate sub-domain that is hosted outside of www.games.com. (which resides in zynga's server)
I have all the pages in my local and I need to run a regex program to populate them in either excel or on a flat file
expected result is something like this..
sub-domain url page that it resides
zynga.games.com index.html
zynga.games.com contact.html
.
.
.
Thank you
Charles
08-19-2010, 01:06 PM
Where are these URL references?
adarshyam
08-19-2010, 01:27 PM
they reside under different pages in www.games.com. say there will be an image in www.games.com/index.html which will link to zynga.games.com.
Charles
08-19-2010, 01:35 PM
So you want to spider the file system? Or are these listed in some XML file?
adarshyam
08-19-2010, 01:37 PM
spider the file system and pull the results n retrieve in flat file or excelsheet
Charles
08-19-2010, 01:43 PM
How do you know which files go with what sub domain?
adarshyam
08-19-2010, 01:57 PM
thats what exactly im trying to find out. :)
the only option i know is to open each file(.html) and look for sub domains that does not reside in the server (eg., zynga.games.com).
1 of my friend asked me to write a regex program which will spider the file system and find all the sub domain links.
Charles
08-19-2010, 01:59 PM
OK, if you were to have a human being do this what would the steps be?
adarshyam
08-19-2010, 02:03 PM
open each .html page in the system (inside a folder) and look for and retrieve links which are like xyz.games.com/abc.html or abc.games.com/xyz.html and NOT http://www.games.com/abc/contact.html
hope you can get me. thank you
Charles
08-19-2010, 02:04 PM
Now I get it. You can't do that with JScript unless the files are actually XHTML. For HTML you'll need Perl or PHP.
adarshyam
08-19-2010, 03:17 PM
oh ok.. damn i thought there will be some magical windows script like what you gave me that day for xml thingie.. anyways thank u v much..