Click to See Complete Forum and Search --> : elminating accent in strings sent by post
delr2691
11-13-2005, 06:29 PM
Hey...
I have a form that, when submitted, a php file processes the sent data and it tries to eliminate the accent vowels (á,é,í,ó,ú, etc)
but for some reason, php doesnt 'detect' the accent vowels, for example:
<?php
//Suppose that $_POST['string'] = 'Día'
//Note the accent in the i
$string = $_POST['string']
//I've tried htmlentities() and htmlspecialchars() but none work
str_replace('í', 'í', $string);
//it returns the same string even though it is supposed to return 'Día'
?>
Does anyone knows why this happen..?
Are the strings sent by post treated differently..?
Thanks in advance...
bokeh
11-13-2005, 06:56 PM
<?php
print str_replace('í', 'í', 'Día');
?>That works for me. Don't forget you need to check the source to be able to see this.
delr2691
11-13-2005, 07:06 PM
yes... i know...
it works fine when you do this:
str_replace('í', 'í', 'Día');
But not when you do this other thing:
$_POST['var'] = 'Día';
str_replace('í', 'í', $_POST['var']);
NogDog
11-13-2005, 09:06 PM
Depending on what you are doing, you might just want to use htmlentities() (http://www.php.net/htmlentities).
purefan
11-13-2005, 10:59 PM
Already discussed here:
http://www.webdeveloper.com/forum/showthread.php?t=82796
bokeh
11-14-2005, 02:53 AM
yes... i know...
it works fine when you do this:
str_replace('í', 'í', 'Día');
But not when you do this other thing:
$_POST['var'] = 'Día';
str_replace('í', 'í', $_POST['var']);If it does not work the same with the $_POST array you need to echo that variable and find out what it contains. By the way this str_replace('í', 'í', $_POST['var']); will not find ì or Í. Sorry if this is stating the obvious. Salud!
delr2691
11-14-2005, 03:44 PM
i dont get you bokeh...
also, if the var 'var' sent by post has this value 'Día' and i use htmlentities() with it, i replaces the 'í' for something like this: 'Ã~'
it does not replace it with í that is what is expected to...
Thank you all guyz...
Gracias..........
bokeh
11-14-2005, 03:57 PM
Esta ì no tiene un acento de castellano y esta Í es un mayuscula, para PHP no son iguales de esta í.
delr2691
11-14-2005, 04:07 PM
claro... pero yo envié letras con acento de castellano, por ejemplo, áéíóú...
y si pongo por ejemplo:
//$_POST['var'] = 'í'; acento castellano | vowel I with spanish accent
if ($_POST['var'] == 'í') { //si la variable es igual a la í con acento castellano | if the variable has the value of an I with spanish accent
//deberia devolver verdadero, pero no... no se la razon... | it should return true, but it does not... i dont know why...
echo "SI";
} else {
echo "NO";
}
//devuelve "NO" | it returns "NO"
tnx....
bokeh
11-14-2005, 04:56 PM
Run the following:<form action="" method="post">
<input type="text" name="var" value="í" readonly>
<input type="submit" value="post">
</form>
<?php
print_r($_POST);
echo "<br>\n";
var_dump(str_replace('í', 'í', $_POST['var']));
?>Paste the output into the forum.
delr2691
11-14-2005, 05:08 PM
it output this:
Array ( [var] => í )
string(2) "í"
and i know that it's displayed ok...
but in the source code i've got this: Ã-
and i need it to be í cuz i'm comparing it with the source code of some pages...
like a simple search engine...
i could do it with javascript... but im worried for those who have js disabled...
delr2691
11-14-2005, 05:09 PM
also, in mozilla firefox, i recieved a '?' simbol instead of the í
SpectreReturns
11-15-2005, 12:08 AM
That means that your browser can't identify the entity. Oh, and by the way (I can't read those crazy foreign languages), incase your problem wasn't solved, you were trying to replace all is with their accents, so you would be doing the reverse of what you want, and thus would be the root of your problem.
bokeh
11-15-2005, 04:18 AM
I can't read those crazy foreign languagesIt's never to late to learn... especially such an important one. :)
Delr2691, If you have Firefox download the extention LiveHttpHeaders and check what it is sending to the server. My opinion though is the browser is not the problem. Firefox does not have any trouble with these characters and with the more obscure characters it posts them directly as entities. I personally would take a long hard look at the server.
ShrineDesigns
11-15-2005, 03:09 PM
I can't read those crazy foreign languageslmao, i don't know much spanish, but i can still read it
use htmlentities() instead of htmlspecialchars(), it converts the í to an entity
bokeh
11-15-2005, 03:18 PM
lmao, i don't know much spanish, but i can still read it
use htmlentities() instead of htmlspecialchars(), it converts the í to an entityOk but that is no good in this case because the í is coming from a form. The character has been entered by a UA and is already corrupt when it arrives at the script which leads me to believe it is a server error. He is using firefox and this does not have any trouble sending those characters but as I said they arrive corrupt. This: print_r($_POST); prints í and this: var_dump(str_replace('í', 'í', $_POST['var'])); prints string(2) "í". It makes no sense... 2 characters in the string... should be 7.
delr2691
11-15-2005, 03:50 PM
oh yeah... it converts the string to an entity.. but not the most common one which is í
htmlentities("í");
//it returns Ã- <-- which i dont know what it means...
//it returns the same for all accented vowels... i dont know how the browser can reconize them...
//firefox does not...
But in any case... I validated it using javascript... but still, it's kind of a pity for me to not be able to solve that problem...
I think that POST array is treated in a diferent way...
But anyway... Thank you all guys for having helped... :D
bokeh
11-15-2005, 03:56 PM
I think that POST array is treated in a diferent wayNot on my build... What are you running on... server... apache version... php version etc.
delr2691
11-15-2005, 04:00 PM
I'm kind of lazy... so I copied & paste the description... xD
Apache/2.0.52 (Win32) PHP/5.0.4 mod_ssl/2.0.52 OpenSSL/0.9.7e mod_perl/1.99_20-dev Perl/v5.8.6 Server at localhost Port 80
bokeh
11-15-2005, 04:20 PM
Well that's pretty much what I'm running except I have apache 2.0.54 but I don't have any of these troubles. And I use these characters all the time for one of my sites. (http://costablancatranslations.com)
delr2691
11-15-2005, 04:33 PM
hmmm... yeah...
but what i'm doing is a 'search engine' for my site...
this php script will open a defined directory and read the <body> tags from all the documents in the directory...
if it finds the keyword on a file, it is stored in an array...
but as it reads the source code, the php script will never find an ì but this 'í'
That's why i needed to convert all accent into its entity, but neither htmlspecialchars() nor htmlentities(), etc worked...
bokeh
11-15-2005, 04:40 PM
Try my form again but this time slightly different as shown:
<form action="" method="post">
<input type="text" name="var" value="í" readonly>
<input type="submit" value="post">
</form>
<?php
var_dump($_POST);
echo "<br>\n";
var_dump(str_replace('í', 'í', $_POST['var']));
?>I want to see how many characters are in the post array. Should look like this:array(1) { ["var"]=> string(1) "í" }
string(7) "í"
delr2691
11-16-2005, 02:51 PM
it returns this on firefox:
array(1) { ["var"]=> string(3) "�" }
<br>
string(3) "�"
and this on ie:
array(1) {
["var"]=>
string(2) "í"
}
<br>
string(2) "í"
what does it mean?
bokeh
11-16-2005, 03:15 PM
what does it mean?Well since neither firefox or IE cause this it can only be a corruption being added by the server. I would check that httpd.conf contains the following line, and that it is not commented out: AddCharset ISO-8859-1 .iso8859-1 .latin1
ShrineDesigns
11-16-2005, 03:54 PM
i tested this on my local server (win2k sp4 PHP 5.0.5 Apache 1.33)<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Untitled Document</title>
</head>
<body>
<pre><?php
if($_POST)
{
echo htmlspecialchars(htmlentities($_POST['t'], NULL, 'iso-8859-1'));
}
?></pre>
<form action="" method="post">
<input name="t" type="text" value="sí">
<input name="s" type="submit" value="submit">
</form>
</body>
</html>
which outputssíalso, make sure you set the encoding to what the encoding is for the page, esp. if you use utf-8
bokeh
11-16-2005, 04:00 PM
i tested this on my local server ... which outputssíOk! Same for me! So what do you think might be corrupting this variable for him?
ShrineDesigns
11-16-2005, 04:07 PM
it is probably the page encoding, htmlentities() and htmlspecialchars() defaults to iso-8859-1
if i set the encoding for the page as utf-8 and use htmlentities() without the charset argument set it outputssÃ
delr2691
11-16-2005, 04:14 PM
nooooo...
it outputs this for me:
s�
and the source code:
s&iuml;&iquest;&frac12;
delr2691
11-16-2005, 04:15 PM
and in ie:
sÃ
source code:
s&Atilde;&shy;
ShrineDesigns
11-16-2005, 04:18 PM
what is the page encoding you are using?
bokeh
11-16-2005, 04:21 PM
Diego, have you got a URL you can post? I want to see what headers are being served with the page.
bokeh
11-16-2005, 04:24 PM
what is the page encoding you are using?I think it is possible he doesn't know. If the server is serving a charset in the headers that will overrule anything contained in the page.
ShrineDesigns
11-16-2005, 04:29 PM
it is easy open the page in a browser and
IE:
View > Encoding > (the marked charset)
Mozilla:
View > Page Info > Encoding
delr2691
11-16-2005, 04:31 PM
iso-8859-1
ShrineDesigns
11-16-2005, 04:36 PM
hmm...
this is odd
the only thing i could suggest is reconfiguring your server or use mb_convert_encoding($str, 'auto', 'iso-8859-1')
delr2691
11-16-2005, 04:39 PM
should i sent a header with php defining the correct encoding, which would it be..?
delr2691
11-16-2005, 04:41 PM
hey...
i wrote this lines at the top of my doc and it works now...
<?php
ob_start();
header("Content-type: text/html; charset=iso-8859-1");
?>
or at least it now returns:
sí
as expected...
Thank you all... :)
bokeh
11-16-2005, 04:41 PM
it is easy open the page in a browser and
IE:
View > Encoding > (the marked charset)
Mozilla:
View > Page Info > EncodingI might be barking up the wrong tree here but if the document contains one encoding and the headers contain another the headers win. As far as I understand that encoding you are seeing is the one in the document.
It would be really helpful if Diego were to post a link so we can actually see what we are dealing with.
ShrineDesigns
11-16-2005, 04:43 PM
that may help, depends on what you want to use, i would go with either iso-8859-1 or utf-8
if i remember correctly it isheader("Content-Encoding: iso-8859-1");orheader("Content-Type: text/html; charset=iso-8859-1");EDIT: i know that the <meta> will override what the server's/php's setting is, unless you send the header, then the header will override the <meta>
delr2691
11-16-2005, 04:46 PM
https://diegoweb.bounceme.net/diego/buscar/borrar.php
now it works... with the headers i sent defining the encoding...
But why does another encoding (UTF-8) was sent by the header..?
In my doc i defined it as iso-8859-1...
ShrineDesigns
11-16-2005, 04:50 PM
it comes up as iso-8859-1 and outputssí
delr2691
11-16-2005, 04:52 PM
yep... that's because i already wrote this:
<?php
ob_start();
header("Content-type: text/html; charset=iso-8859-1");
?>
But before that, it came up with an utf8 encoding, which i did not defined anywhere....
bokeh
11-16-2005, 04:57 PM
Your server must be set to send UTF-8 as the default content type in the http headers.
delr2691
11-16-2005, 05:00 PM
oh... and how can i change it..?
delr2691
11-16-2005, 05:01 PM
nevermind... i noticed...
i've got this two lines:
#AddDefaultCharset ISO-8859-1
AddDefaultCharset UTF-8
i must uncomment the first and comment the second...
ah?
ShrineDesigns
11-16-2005, 05:02 PM
hmm...
it must be the server that is sending it out
see if default_charset is set (not commented out with ";" before it) in your php.ini file
EDIT:
found this in the php manualdefault_charset string
As of 4.0b4, PHP always outputs a character encoding by default in the Content-type: header. To disable sending of the charset, simply set it to be empty.EDIT: i personally would comment them both out, and let the page or header set the encoding instead of the server
bokeh
11-16-2005, 05:19 PM
The other place the charset is added is by Apache itself in httpd.conf: AddDefaultCharset If so you may be able to over rule it in .htaccess.
delr2691
11-16-2005, 08:15 PM
ohh... thanks...
it has been so simple..! xD
and we've spent days on this..! hehe
Well... thank you very much for your help...