Understanding character encoding
I am confused with character encoding. I have following questions, please answer them in detail and provide links which explains basics. I did a lot of research, but none of them explains it properly.
1) When I use default character encoding, it is the UTF-8. Some bits are assigned to each character in this encoding. When I type with keyboard, whatever character is typed is automatically represented by bits assigned to it on the screen.
When I want to change input language in windows 7, I can change the language in Region and Language option. But encoding will still be UTF-8.
When I select a different language for input in windows 7, its working fine on notepad and wordpad (it displays characters of selected language). But when I use a different text editor like notepad++ or textpad, it just types ??? (Characters of selected language are not getting displayed). Why is that ?
2) If I want use a different encoding, this new encoding will have assigned different bits to each character than UTF-8 encoding. How can I apply this new encoding with input device (keyboard) as keyboard is already typing according to UTF-8 ? If my understanding is not wrong, this problem is not faced by end users who are reading this page with different encoding than UTF-8. Because meta tag of HTML header will inform the client browser about encoding to use and that encoding will automatically be invoked by client browser. But for person who wants to type in this different encoding, how can he do that ?
3) Secondly, if I want to type Hindi characters using different character encoding, how can I do that if I don't have Hindi keyboard ? Do I need to type in Hex code pertaining to each character ?
This chapter in Dive into Python 3 explained the concept of Unicode quite well, I thought: http://getpython3.com/diveintopython3/strings.html
(If you have no interest in Python, just stop reading at the bit where he starts to include Python code examples ;-))
Re: your question about Notepad, Notepad++ and Textpad:
There's some discussion of it here: http://superuser.com/questions/21135...ext-in-notepad
I don't really use it myself, but I've just installed it to have a quick look - I can't seem to input Unicode characters using the keyboard in Windows 8, but if I copy and paste into Notepad++ under default settings I get question marks or other random punctuation; if you select Encoding > Encode in UTF-8 it seems to sort it out.
Question (2) - I don't really know.
(3) You may have to learn the hexadecimal codes, or use an application like Character Map, or copy and paste them from elsewhere. None of those solutions sounds brilliant.
There's a guide here which may be of some help: http://en.wikipedia.org/wiki/Unicode_input
Actually, I can't seem to do Unicode at all through the usual keyboard entry method of Alt + Hexadecimal Number, but I'm using Windows 8.
It's much easier on Linux. Sigh.
P.S. Charts of Unicode characters are available here: http://www.unicode.org/charts/
Perhaps these will be of some use...
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)