Click to See Complete Forum and Search --> : Doctype Character Issue
capneb
09-12-2005, 09:52 AM
I've noticed, since I've started using Safari v2.0 (default encoding set to Unicode 8 - but it should take a ISO page and read it correctly if the doctype is declared properly), that certain pages are now displaying a sort of question mark character with a diamond-shaped background where another character ought to be.
This seems only to occur on pages whose doctypes are either undeclared or are of the doctype <!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN.>, where the last character of the word Transitional is an "l" which is apparantly not a traditional ASCII "l" but one which is shaped like a tall, vertical "z" and has a dash through it. When compared to the "l" in public, it is distinctly different.
One such page is c i n g u l a r d o t c o m (no affiliation here). Those using Safari 2, view their home page (which is in 4.01 Strict), and then go to shop for phones and view the source. (It's happened on other sites as well.)
What's this all about? Is this some doctype hack I'm not aware of, or a bug with Safari 2 (Firefox does not seem to have this problem)?
Jeff Mott
09-12-2005, 10:36 AM
default encoding set to Unicode 8 - but it should take a ISO page and read it correctly if the doctype is declared properlyExplicitly setting the character encoding in your browser often will override all other methods of determining the character encoding. If I tell FF to read the page as UTF-8 then the same effect will happen.
capneb
09-12-2005, 09:58 PM
Yup. That's what it is. L/ l/ l-slash whatever. Kwak’wala Unicode (http://www.languagegeek.com/keyboardmaps/kwakwalakbd.html). It is a distinct letter from L in some language(s). It is formed by typing L then slash. Causes issues in some browsers set to use Unicode as its encoding type when the doctype is "...Transitional//EN" or similar.
Solutions? Maybe add a space after the Transitional?
Any 'special' characters should be changed to unicode entities.
It is not advisable to alter the DTD as it may then be read incorrectly by the browser.
The site you mentioned with the incorrectly displayed characters is the fault of the designer; they don't know the difference between HTML and XHTML, nor have sufficient knowledge of DTD's, unicode and rendering modes.
capneb
09-13-2005, 08:23 AM
I don't know. I think it may be a browser bug. FF doesn't seem to do it, nor does IE. I'm no expert on this, but I surmise that browsers ought to read doctypes in plain ASCII characters and not apply Unicode, or whatever the user has specified as his/her preferred encoding, until the browser has interpreted the intended doctype from the page. It's only one line after all.
I disagree with "The site you mentioned with the incorrectly displayed characters is the fault of the designer..." The problem occurs anytime the doctype contains "Transitional//*" (splat being the language). GoLive, Dreamweaver, FrontPage, etc all insert the doctype that is either default or chosen. Some doctypes contain l/ in the quotes as required by web standards.
We will resist M$IE with everything we have. It is not futile.
If indeed the error only occurs with "Transitional//EN", which id doubt, then it is a very serious error in Safari.
The DTD has no effect on the characters being displayed. That is determined by charset, encoding and font-family, either set in the document or browser options.
Jeff Mott
09-13-2005, 09:43 AM
capneb, I checked the page in Safari 2 and found the problem to be exactly as I had described to you earlier. You said you have the encoding set to UTF-8. Do not do this. When you set this you are telling the browser to read the page as if it were UTF-8 even if it was not actually encoded that way. Change the text encoding value to "Default" instead.
capneb
09-13-2005, 10:00 AM
Safari doesn't have default Encoding as an option. I think Safari is supposed to read the declaration, and then either use the declared doctype, or, if none is declared, use the user preferred one. Since the ell-slash is being misread by Safari, there are bound to be some pages where strange characters appear.
I chose Unicode, as it's what I thought would be the most "forward looking" encoding. I'm not too annoyed by this issue, since it's easy enough to switch in the event I pick up a screwy page, but it may be yet another issue developers have to struggle with when coding for multiple browsers.
Incidentally, If I choose "Western (ISO Latin 1)" encoding, some transitional//EN pages display with conjoined AE characters and such. It's a bug with the browser (I've reported it). Apple, are you listening?
capneb
09-13-2005, 10:03 AM
BTW, I've started using <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
No ell-slash there!
Cheers.
Jeff Mott
09-13-2005, 10:49 AM
Safari doesn't have default Encoding as an optionIt does. I've attached a screen shot to this post.I think Safari is supposed to read the declaration, and then either use the declared doctype, or, if none is declared, use the user preferred one. Since the ell-slash is being misread by Safari, there are bound to be some pages where strange characters appearBrowsers do not determine the character encoding from the doctype at all. The doctype declares the version of HTML being used, not the character encoding of the document.I chose Unicode, as it's what I thought would be the most "forward looking" encodingYou cannot tell your browser to use only one character encoding for every page and expect them all to work correctly. This is not a bug in Safari, this is a misconception on your part.BTW, I've started using
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">You don't just start using a doctype. If you plan to use an XHTML doctype then the conent of your pages must conform accordingly.
In addition, you should have a very good reason why you need to use module-based XHTML. And if you don't know what module-based XHTML is then this certainly isn't the doctype for you.
capneb
09-13-2005, 11:10 AM
Well all righty then. Excuse me.
I have now learned, from the COURTEOUS CONSULTATION of Jeff, that when I select View>Text Encoding it was set to default the entire time. But in Preferences, it is set to Unicode. Allow me to include a screenshot, as well... if I can figure this out, since I am obviously NOT VERY SMART.
Maybe you could also explain to me how it is that with View>Text Encoding set to Default, what's going on here.
Jeff Mott
09-13-2005, 12:03 PM
I was curteous the first two times, but instead you continued to insist that it is a bug in the browser or a problem in the doctype; neither of which is the real issue. The default encoding is used when the page does not declare what encoding was used, either in the document itself or the HTTP headers. So the browser effectively has to guess. The ISO Latin 1 encoding is the default choice because currently it is the most common. And it is even the best practical choice because sites encoded with some Unicode-based encoding are unlikely to omit a character encoding declaration, since pages rarely render properly otherwise.
capneb
09-13-2005, 12:49 PM
I did continue to assert my thought that it may be a browser bug. I still think it might be. Maybe I'm missing something. The way I understand it is if a browser's default character encoding type is set to Unicode, but a page declares itself to be of an ISO-8859-1, Safari normally says, "Ok, this is an HTML page with ISO-8859-1 encoding, so that's how I'll work with it." But I do believe that there is a bug in how or when Safari chooses to read the doctype.
Why does an l-slash seem to cause it to ignore the DTD and act as if it wasn't declared, thereby reverting to the user chosen "default" encoding, which may be contrary to the intended one?
The DTD has nothing to do with the encoding. The DTD determines how elements are laid out.
You are correct in how Safari choses the encoding, but the font-family has an effect on the displayed characters. Some fonts will not display all characters correctly. Again if the font-family is not set in the document the browser default will be used. If the font and encoding used are not suitable for the characters used in the document you will see those question marks.
capneb
09-13-2005, 01:16 PM
Thank you
Jeff Mott
09-13-2005, 03:47 PM
The way I understand it is if a browser's default character encoding type is set to Unicode, but a page declares itself to be of an ISO-8859-1, Safari normally says, "Ok, this is an HTML page with ISO-8859-1 encoding, so that's how I'll work with it."This is correct. However, that cingular page you referred to did not declare itself to be ISO-8859-1; it didn't declare itself to be anything. Thus Safari resorted to the default encoding, which you have set to UTF-8. Unfortunately the cingular page was not actually encoded with UTF-8, thus you got odd looking characters. The fault is with the page (not the browser) for not declaring what it's character encoding is. (I did say in my first repsonse that FF does the same thing if set to interpret the page as UTF-8.)
Given all that, ISO-8859-1 is the safer choice for a default encoding simply because the people who author these documents are more likely to omit character encoding declarations. Documents encoded with some form of Unicode will almost certainly include a declaration.
capneb
09-13-2005, 04:00 PM
Now it all makes sense. The developer/WYSIWYG editor didn't do his/her/its part. One should never rely on the default when creating a page - make sure everything is spelled out explicitly. Thanks for clearing things up.