Click to See Complete Forum and Search --> : Unicode output problem


mameha1977
01-15-2007, 03:02 AM
I am having trouble displaying unicode data stored in my MySQL database.

It displays OK onscreen in the browser, and also within phpmyadmin the data looks OK. I cut and paste the data (korean and chinese text) in there from a notepad file with 'save as' utf-8.

Everything is fine until I try to read the website using other tools.

For example, I have installed an open source search spider, and when that indexes my pages it handles english OK but all korean etc is stored as garbage (ie. it looks like garbage in phpmyadmin, and also when outputted to my search results pages that are in utf-8.).

Another tool that spiders the site as garbage is Sitescore (http://sitescore.silktide.com/).

this leads me to think that although things look OK in my browser, somehow things are not quite right. I am worried that when we release the site soon it will not be indexed properly by search engines.

Some variable data from mysql:

character set client utf8
(Global value) latin1
character set connection utf8
(Global value) latin1
character set database latin1
character set filesystem binary
character set results utf8
(Global value) latin1
character set server latin1
character set system utf8
character sets dir /usr/local/mysql/share/mysql/charsets/
collation connection utf8_unicode_ci
(Global value) latin1_swedish_ci
collation database latin1_swedish_ci
collation server latin1_swedish_ci

Can anyone solve this mystery?

chazzy
01-15-2007, 07:48 AM
First thing that jumps out at me is that you're telling it to use latin1. Unless I'm mistaken, Korean and Chinese are not latin based. You're going to need to figure out what value is supposed to be there.

mameha1977
01-15-2007, 07:29 PM
Yeah I'm not sure why thats there, but that seems to be default with mysql. On my local windows PC and on the live unix server they both have that setup.

I really can't understand why the korean/chinese looks OK in my browser but yet automated tools read the site as garbled text. Im using utf-8 as the encoding in the meta head, and my browser (IE6 and firefox 2 on english winXP) reads it ok automatically.

chazzy
01-16-2007, 07:25 AM
Well, let me ask you this.

The tool you're using, does it explicitly say that it supports korean/chinese alphabets? If not, there's a pretty good chance that it doesn't.