Click to See Complete Forum and Search --> : UTF-8 - how do I implement this?
Rodders
01-19-2006, 07:14 AM
Hi
I am working on a website http://www.total-image-nation.co.uk/new/ and have noticed that wherever I want a '£' to appear I get a '?' instead. There are other characters that do the same.
I have been informed that I need to use UTF-8 entities in order to get this to work correctly. (see thread (http://www.webdeveloper.com/forum/showpost.php?p=477268&postcount=12))
This webpage seems to show me the correct entities for each character i need.....UTF-8 entities (http://webdesign.maratz.com/lab/utf_table/)
My query is how do I implement this?
Much of the text on the site is stored in a MySQL database which is part of the bespoke CMS that I created. The website administrator can modify this text as and when desired. It isn't going to be sensible to have them type in the entities where they want them, so I need some automated solution.
Should I modify the text before it is saved in the database table? Should I modify the text as and when it is picked out of the table? Is there a cleaner way to get these working?
I am using PHP to generate the HTML for each page.
Many Thanks
Charles
01-19-2006, 09:17 AM
Just use HTML 4.01 Strict and the entity "£".
See http://www.w3.org/TR/html4/sgml/entities.html .
Rodders
01-19-2006, 02:52 PM
Thanks.
I would but I have made the site XHTML in order to get the desired layout and stylesheet behaviour. I assume it is possible to achieve within XHTML.
Also, the person who will maintain the webpages is going to have to remember to use the correct entity code......now that doesn't sound very appealing! I can't understand why a '£' character in the page source doesn't display correctly. I mean.....why do the alphanumeric characters work but not that?
Thanks for the pointer though, I'll get reading and see if I can find anything which will work for my situation.
Charles
01-19-2006, 03:01 PM
1) As far as the payout of your page is concerned, XHTML has absolutely no advantage whatsoever. There are a lot of myths out there concerning XHTML, mostly people have confused it with HTML 4.01 Strict. HTML 4.01 Strict will greatly help your layout.
2) The person maintaining the webpage is just going to have to grow up and learn on more entity.
3) latin based systems are based on the old ASCII. With 7 bits you can only address 127 characters but that takes care of everything on a common keyboard. With 8 bits you get 255 but there are different systems that use those numbers differently. UTF-8 is the same as ASCII for the first seven bits but when the seventh is set it goes into a different mode where the first byte establishes just how many bytes the character is going to take. Thus they can address any number of characters and include everything. Support for UTF-8 is limited by the editor, the browser and the fonts installed.
Rodders
01-19-2006, 03:41 PM
1) OK. I didn't design the layout entirely by myself, i asked for some guidance on how to achieve my desired layout and someone in the CSS forum pointed me towards a site which had some example layouts. I altered those to suit my needs, but for some reason it wouldn't look right unless I specified XHTML. So, if it's true that HTML 4.01 Strict will greatly help my layout then I'd be happy to hear how.
2) The person maintaining the webpage will cope. It just seems silly to me that alphanumerics work how you'd want them to, but other characters don't. It's not very intuitive if you have to type &615372 to get a pound (UK money) sign. I was hoping there would be a solution, or that I could create a workaround using PHP and a search/replace function.
3) Perhaps I shouldn't use UTF-8. As I said previously, I modified someone else's layout and simply copied the headers.
Thanks for your patience.
Charles
01-19-2006, 08:38 PM
It's not very intuitive if you have to type &615372 to get a pound (UK money) sign. Perhaps, but nothing makes more sense than typing "£".
Rodders
01-20-2006, 08:28 AM
Other than typing "£" !!!!
I'm happy to tell the website editor person to learn these codes and use them. I just can't believe that it isn't possible to type "£" and get "£". There is millions of websites out there who sell stuff in British Sterling and I would bet that the content editors for those sites don't all use "£".
I have been known to be wrong in the past........
In the meantime, I will develop a PHP solution for this.
Charles
01-20-2006, 10:28 AM
As I explained above, outside of the range of ASCII, there are several different systems for representing characters. ASCII runs from 0 to 127. £ is mapped to 163 in iso 8859-1 and Windows Latin-1. On some editors running in Windows you can get that with an ALT+0163 if you don't have a key for it. £ seems a great deal easier to remember than ALT+0163.
That glyph is also mapped to 163 in Unicode, which is what utf-8 uses. But remember, in order to allow for characers on the other side of 255 utf-8 uses bytes in the range of 128 to 255 to mean "this character takes more than one byte". So utf-8 and iso 8859-1 use different byte codes to represent that number 163, though they both map the number 163 to the same glyph, the pound sign. Again, not all editors and not all browsers support utf-8 so you might as well stick to iso-8859-1 when you can.
So, use iso-8859. If you have a pound key, then use it and it should work. Else just use the £ which will always work.
Note however and note well, it is just an happy accident that the pound sign is number 163 in both iso 8859-1 and Windows Latin 1. For those characters between 128 and 255 it is more often not the case.
toicontien
01-20-2006, 11:57 AM
This is a problem that the company I work for has been dealing with for quite some time. Being in newspaper publishing, our main concerns were using curly quotes, and the aforementioned pound sign. We added a meta tag that tells the browser to use UTF-8 and we haven't had any problems with UTF-8 characters.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
So feel free to use UTF-8 characters. HTML entities and iso-8859-1 is still the safest route, but the browsers in use today, including many text editors, don't seem to have a problem with UTF-8 characters, or any special character inserted from the keyboard in its non-entity form.
Charles
01-20-2006, 12:02 PM
As I wrote above, the pound sign is an 8859-1 character. If you're editor is using 8859-1 or Windows Latin 1 specifying the encoding as utf-8, or not specifying anything in XHTML, is in error.
Jeff Mott
01-20-2006, 05:16 PM
We added a meta tag that tells the browser to use UTF-8 ...To help clarify what Charles just said: the character encoding declared with the META element is supposed tell the browser what your document's encoding is, not what you want it to be. If you're in Notepad and go Save As, you'll notice an Encoding field. The META element should reflect what you choose here. If it doesn't, then you're giving the browser the wrong information.
Rodders
01-21-2006, 08:30 AM
I'm sure all this is simple for you guys, but I am baffled at it all. I think Charles is saying that I could swap from UTF-8 to 8859-1 and then just use the £ glyph as i want to.
In any case, I have created a PHP work around which searches for the 'problem' glyphs in the raw text and replaces them with suitable entities. It wasn't what i was hoping for, but I believe it will work perfectly well. We'll see.....
Thanks all for your help