Click to See Complete Forum and Search --> : HTML entities in XML


ultraniblet
09-10-2006, 04:20 PM
Hello all, this is my first post so very nice to meet you!

I have an XML file which contains html style entities like ™. I am adding a doctype to explain what the entities mean to the parser:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
Although the file is technically xml not xhtml, everything is working fine and dandy in Mozilla and friends, but IE is throwing it's usual hissy fit. So I have been considering giving IE a separate doctype, an XSL stylesheet, through sniffing:
<!DOCTYPE xsl:stylesheet [
<!ENTITY % xhtml-lat1 SYSTEM
"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
<!ENTITY % xhtml-special SYSTEM
"http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">
<!ENTITY % xhtml-symbol SYSTEM
"http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">
%xhtml-lat1;
%xhtml-special;
%xhtml-symbol;
]>
This works, but I am a little concerned that the poor users of my site will have to download those entity definitions with every piece of xml, that is they don't seem to be cached.

So I guess my question is: is there an easier way to coax IE into accepting html entities in a piece of xml? Or can I cache those entity definitions on the user's machine somehow?

Thanks in advance!

Charles
09-10-2006, 04:57 PM
Actually, and this is quite embarrassing, but MSIE does much better with XML and XHTML than FF or Opera.

And yes, the XHTML entities aren't built into XML they have to be added by a DOCTYPE, but you need the correct DOCTYPE. You can't just use the one from XHTML. Your best bet is to write a complete DTD for your XML. It's easier than you think and you will come much closer to achieving nerdvana. It's very much worth doing.

But I seem to recall, and do try this out because I may be wrong, that there is a way to use a DOCTYPE without a DTD. Let's say that the root element of your XML is foo-bar:<!DOCTYPE foo-bar [
<!ENTITY nbsp "&#160;">
<!ENTITY copy "&#169;">
]>Just define the ones that you want to use.

ultraniblet
09-10-2006, 06:08 PM
Hi Charles,

Thanks a lot for the ideas. The reason I was using the XHTML doctype is because the entities in my xml will be identical to xhtml, ie the nodes contain standard markup. As there are rather a lot I didn't want to specify them all, and as the main W3C doctypes come prebuilt into Firefox, the user wouldn't have to download them each time. I guess my big concern is reloading 40kb of entity definitions for ~1kb of xml data. So I would definitely like to write my own DTD if it would be cached - is that what will happen?

Oh and I get the "Use of default namespace declaration attribute in DTD not supported" error if I try using any of the standard DTDs for parsing XML in MSIE, does it let you roll your own? For example this xml file is a no-go:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>hello&trade;</html>

Charles
09-11-2006, 04:15 AM
You can't use the wrong namespace or the wrong DTD.

Stop breaking the rules.

ultraniblet
09-11-2006, 06:25 AM
Haha yeah I guess I should, except the problem these days is there are so many different sets of rules..

ultraniblet
09-11-2006, 01:44 PM
I am now thinking the easiest solution to this problem is to encode all html entities into unicode character references with some php scripting, that way no entities need be defined at all.