Click to See Complete Forum and Search --> : parsing xml/rss with curly quotes and apostrophe... trouble?


Wart_Hog
11-03-2005, 12:43 PM
Hi,

I've been reading up on xml/rss parsing.
Although, I have no problem with the processes involved, I have had problems dealing with the actual content of the rss feeds. Alot of the time a feed will contain curly quotes. When these quotes are parsed, strange symbpls appear in their place. 'tm', a solid black square, etc. I've tried a few ways of filtering the quotes (using str_replace() on #8221) but that doesn't work at all.

Please help...

-Mike

ShrineDesigns
11-03-2005, 03:24 PM
hmm...

your problem is most likely a result of a poorly made rss feed generator that someone made without having much knowledge of xml

as stated in the xml specification http://www.w3.org/TR/REC-xml/#syntax: apostrophes, double quotes (mainly applies to attributes more so than text nodes), <, and > need to be converted to entities in order for the xml parser to read them correctly, except in the case where these characters are in a <![CDATA[]]> element as the parser ignores these

Wart_Hog
11-04-2005, 09:56 AM
ShrineDesigns,

Thanks for the reply.
Basically, your saying there is no work around for a generator that does not comment out 'evil entites'. I really wish there was...

-Mike

ShrineDesigns
11-04-2005, 01:21 PM
if they didn't convert specail characters over to entities than you would have to convert them over by hand or go through the trouble of making a script to do this

post an example of the rss feed that you are having trouble with

Wart_Hog
11-04-2005, 02:11 PM
Yes, yes!!!
That what I want to do, I'm having trouble figuring out how...
The feed(s) I've parsed always have some kind of illegal chars due to user input or bad generation. I cannot manually edit the feeds nor do I know how to tell php where/how to look for the illegal chars.

Do you know how it's done?
I have little knowledge of text types...
so, this is all new to me.

Please help if you can,
-Mike

ShrineDesigns
11-05-2005, 01:47 AM
try$xml_string = chr(96) . "quote" . chr(180);
$xml_string = str_replace(array("'", chr(96), chr(180)), array("&#" .ord("'"). ";", "&#" .ord("`"). ";", "&#" .ord("´"). ";"), $xml_string);for some reason the forum is converting the entities over