I receive a HTML file via an email attachment which I extract, process and insert into a MySQL database
I have a MySQL table within which I have created a column to store
I have set the columns collation to
ALTER TABLE SalesData CHANGE customer_name customer_name VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL;
I have also set the connection charset using
I have now received a customer with name
Iánson which is breaking the insert:
(1366) Incorrect string value: '\xE1nson' for column customer_name
I have set this columns collation to
ALTER TABLE SalesData CHANGE customer_name customer_name VARCHAR(100) CHARACTER SET utf32 COLLATE utf32_general_ci NULL DEFAULT NULL;
I have also set the
This works and allows me to insert the data into the database correctly with the name remaining intact... BUT now when I output the data into a HTML page (using
mysqli_set_charset('UTF-32');) I am getting:
I changed HTML charset
<meta http-equiv="Content-Type" content="text/html; charset=utf-32" />
but it still shows
As a workaround, I have just switched back to UTF-8 and implemented the following code, which works by replacing the
\xE1 with a normal 'a', but now I have a situation where I have to react to every new
Incorrect string error by adding the new string to the array which is probably manageable but not really efficient.
//Create array of non-UTF-8 characters and their replacement
$regex = array(
'/\\xE1/' => 'a',
'/\\xE9/' => 'e'
foreach($regex as $expression => $replacement)
$customer_name = preg_replace($expression, $replacement, $customer_name);
While I can probably live with the above, I would really like to understand where this initial code is falling down.
- When I view the HTML file from the original email attachment - the character shows as expected in Chrome.
- When I process and insert into MYSQL as utf-32, the character shows as expected in PHPMyAdmin.
It's just when I extract it using PHP and print into the HTML page, I get an unknown encoding error and I don't know where it is breaking down.