Click to See Complete Forum and Search --> : Xml reader queries..?


Dragonkai
10-02-2007, 09:26 PM
I have a big xml file.I found out about xml reader. So I looked into my phpinfo and I have lib xml but I don't see anything about xml reader. Do I have to install it. If so how? I'm new to extensions. By the way I have the latest version of PHP standard installation no gimmicks.

Anyhow here's my second question:

Here is an example of the layout of the xml:

"
<page>
<title>thesaurus</title>
<id>20</id>
<revision>
<id>2517225</id>
<timestamp>2007-06-08T01:12:40Z</timestamp>
<contributor>
<ip>68.148.164.134</ip>
</contributor>
<text xml:space="preserve">{{see|Thesaurus}}
==English==
{{wikipedia}}
===Etymology===
16th century, from [[New Latin]] '''[[thesaurus#Latin|thesaurus]]''' &quot;treasure&quot; &lt; [[Classical Latin]] '''[[thesaurus#Latin|thesaurus]]''' &lt; {{AGr.}} '''{{polytonic|[[θησαυρός]]}}''' (thēsauros) &quot;storehouse&quot; or &quot;treasure&quot;; its current English usage/meaning was established soon after the publication of Peter Roget's ''Thesaurus of English Words and Phrases'' in 1852

===Pronunciation===
*:Rhymes: [[Rhymes:English:-ɔːrəs|-ɔːrəs]]

===Noun===
'''thesaurus''' (''plural:'' '''[[thesauri]]''' ''or'' '''[[thesauruses]]''')

# A [[publication]], usually in the form of a [[book]], that provides [[synonym]]s (and sometimes [[antonym]]s) for the [[word]]s of a given [[language]].
#:''Wiktionary is a '''thesaurus''' and dictionary''.
# {{archaic}} A [[dictionary]] or [[encyclopedia]].

====Translations====
{{top}}
*Czech: [[tezaurus#Czech|tezaurus]] {{m}}
*Dutch: thesaurus {{m}}, [[synoniemenwoordenboek]] {{n}}
*Esperanto: [[tezaŭro]]
*French: [[Dictionnaire des synonymes]]
*German: [[Thesaurus]] {{m}}
*Hungarian: [[szinonímaszótár]] [[:hu:szinonímaszótár|°]]
{{mid}}
*Japanese: [[シソーラス]] (shisōrasu), [[類語辞典]] (ruigo jiten)
*Polish: [[tezaurus]] {{m}}
*Portuguese: [[dicionário]] de [[sinônimos]] {{m}}, [[tesauro]] {{m}}, thesaurus {{m}}
*Spanish: [[tesauro]] {{m}}
*Swedish: [[synonymordbok]]
{{bottom}}

===See also===
* [[WikiSaurus]]

===External links===
''Roget's Thesaurus can be found at:''
* [http://www.bartleby.com/thesauri]
----

==Latin==

===Noun===
'''thesaurus''', '''[[thesauri]]''' {{m}}

Second declension

# [[treasure]]

[[Category:Greek derivations]]

[[de:thesaurus]]
[[el:thesaurus]]
[[fr:thesaurus]]
[[ko:thesaurus]]
[[it:thesaurus]]
[[nl:thesaurus]]
[[pt:thesaurus]]
[[ru:thesaurus]]
[[simple:thesaurus]]
[[fi:thesaurus]]
[[vi:thesaurus]]
[[tr:thesaurus]]</text>
</revision>
</page>
"


Yea its a dictionary.

Anyhow, I know how xml reader looks for the title of the page, and here it's thesaurus. What I want it to do is. Take the name of the title of the page. And first check if it starts with a number. If it does skip the whole page. So 1rst question: Can I just use

<?php
$reader = new XMLReader();
$reader-> open('dictionary.xml');

while ($reader->read()) {

*/By the way what does this part mean? Does it mean that when 'nodeType' is equal to 1? Which is XMLREADER::ELEMENT?
Does that mean that whenever the nodeType is the start element (which is 1) do this code...? /*

switch ($reader->nodeType) {
case (XMLREADER::ELEMENT):

if ($reader->...... This is where I'm stuck... I don't know the command to get the <title> of the <page>

How do I check if the <title> starts with a number and then follow on. And if it isn't it skips the entire page and goes to the next one.

So thats one of my problems.

My other one is this:

Alot of the stuff like etymology and translations and stuff are starting with a
"==BLAHBLAHBLAH=="
There's no end tag nor starting tag... I don't want the translations of some of the other stuff. I just want the definition which starts with: "#" and again has no ending tag... The definitions are between ===Noun=== and ===translations===.

Is there someway of extracting out the information I want?

Also note that there is two ===Noun===s one at the top and one next to the ===latin===.

I've looked around everywhere and I cannot find a decent tutorial on XML Reader class...

Thanks for any help.