[RESOLVED] Us-ascii / utf-8
What is the biggest difference between the two and if any which is better to use for webpages.. (.php .html .css .js etc..) Any help with this would be great! I've been trying to gain a concrete understading by searching, but haven't had any good information of a comparison. Any information I've gathered has been from years ago and I was looking for an up to date opinion! Thanks again
PS- I've also heard talk about ISO but again have no idea.
US-ASCII only has 256 characters (only 254 of them are printable though IIRC). 0-31 are control codes and 32-127 contains the 0-9, A-Z, a-z and any other punctuation characters that you would see on a US keyboard. 128-254 is the set of IBM extended characters. They're a sort of extension to ASCII really. It includes some accented characters, some box drawing glyphs and a couple of mathematical symbols.
The ISO character sets (there are 15 in all, I believe) are similar to ASCII, VISCII, and other older systems based upon "code pages", except that they have more characters. You basically get a subset of Unicode characters in each set, depending upon which character set you use.
UTF-8 is basically an 8-bit representation of Unicode code points. Think of all of the characters in the world (Japanese, Russian, Latin, Arabic, etc.). UTF-8 encodes each character as a sequence of bytes. The basic ASCII set from 0-127 belongs to U+0000 - U+007F. These are the single byte characters. From there, you can have two-byte, three-byte or even four-byte characters. Of course, various mathematical symbols, punctuation characters and standalone combining characters are included. There is also UTF-16 (two bytes or four bytes per character, though usually two bytes) and UTF-32 (four bytes per character, a waste of space). Those two have associated "endianness" (little endian vs. big endian), which complicates things even further.
UTF-8 is the preferred encoding when creating Web pages since it allows for any character to be used without endianness issues and without wasting space, not to mention UTF-8 support is the best compared with UTF-16[LE/BE] and UTF-32[LE/BE]. As for the ISO character sets, these are specialised, and ought to be discarded in favour of Unicode. However, when you only need access to Latin characters, why include Japanese and Russian for example? US-ASCII shouldn't be used... XML processors are supposed to use it in certain specific cases, but otherwise it should be avoided like the plague.
Edit: To answer your question, use UTF-8. Make sure that if you save your pages in UTF-8 that there is no BOM (byte order mark). This will cause things like XML to fail parsing, and in some cases browsers might not like it in HTML, CSS, etc. (for example, IE 6 will go into quirks mode if anything appears before the DOCTYPE declaration, whether it is something perfectly legitimate like an XML prolog or illegitimate like "random characters").
Last edited by dmboyd; 04-07-2009 at 04:37 PM.
Thanks! This cleared everything up!
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)
Tags for this Thread