Spanish accents & MySQL select query
I have struggled for months with this one. I originally placed all of the Spanish into the database with the accent characters in HTML-entity form.
This would correctly display the characters on screen, but the word "fotografía" could not be matched in a select query (with the possible exception of actually typing the HTML-entity).
¡Me encanta la fotografía!
I wanted everything to be in UTF8 format, so I then tried converting the entire database to this, replacing all of the HTML-entities with their UTF8 equivalents. Now I can search, and find, the Spanish accented words, but ONLY if the accents are entered into the query.
For example, if the word is "Jesús", the un-accented form "Jesus" will not match, but "Jesús" will.
I have heard there is a way to ignore accents, similar to ignoring case.
How can I properly set this up?
I _VERY MUCH_ appreciate any helpful hints on this.
Case Insensitive, Accent Insensitive
Most of the databases today have the "Collation" option in which you define your database collation in certain "language oriented", Case Sensitive/Insensitive and Accent Sensitive/Insensitive.
Defining your database (all), or only a table with a desired collation will allow you to query data without having to deal with accents or case. Example:
My database is MS SQL 2005 and the collation of my database is defined as:
This definition is made at the time of database creation, or later your can change properties of database or only a table with an specific collation.
Latin1_General_CI_AI collation gives you the chance to have latin caracters and include most of the latin and european languages (excluding japanese, russian and some others with very funky characters).
CI means Case Insensitive and AI means Accents Insensitive
When creating a database or changing the collation you are given multiple options of Latin or other character sets.
To change the collation of a table, set Extended Properties of the table and change the collation.
In case of MS SQL, ther is no way to define specifically UTF-8 set since database is multi-character capable (that says Microsoft). However, if you store the data from a page defined as utf-8, data will be saved into table as UTF-8 and when retrieved will be utf-8 as well. This is a tricky thing since data can be utf-8 but retrieved in a page with a different charset, SQL will convert.
FYI: ASP language on IIS does not handle utf-8 well, sometimes display data as charset=iso-8859-1 or so, so funny chars are shown. MS says utf-8 is not supported. You can use it but half of the time will sitch data to funny chars.
ASPX is other stoy, is fully functional with UTF-8 as well as php and other languages.
If you use Postgre or MYSQL database, both have this collation types witha case and accent sensitive/insensitive and full definition for utf-8.
In short, defining collation with AI and CI collation, you only need to query your database and accents and case will not be taken in consideration, this means user can type something with or without accent. Example:
User type: ídea and results come with records for: Idea, idea, ídea