www.webdeveloper.com
Results 1 to 12 of 12

Thread: Spanish accents & MySQL select query

  1. #1
    Join Date
    Jun 2007
    Posts
    41

    Question Spanish accents & MySQL select query

    Friends,

    I have struggled for months with this one. I originally placed all of the Spanish into the database with the accent characters in HTML-entity form.

    For example:
    Code:
    ¡Me encanta la fotografía!
    This would correctly display the characters on screen, but the word "fotografía" could not be matched in a select query (with the possible exception of actually typing the HTML-entity).

    I wanted everything to be in UTF8 format, so I then tried converting the entire database to this, replacing all of the HTML-entities with their UTF8 equivalents. Now I can search, and find, the Spanish accented words, but ONLY if the accents are entered into the query.

    For example, if the word is "Jesús", the un-accented form "Jesus" will not match, but "Jesús" will.

    I have heard there is a way to ignore accents, similar to ignoring case.

    How can I properly set this up?

    I _VERY MUCH_ appreciate any helpful hints on this.

    Blessings,

    PolyGreat

  2. #2
    Join Date
    Aug 2005
    Location
    The Garden State
    Posts
    5,634
    how are you searching? full text? or just IS or LIKE ? if you're not doing FT then try that first.
    Acceptable Use | SQL Forum FAQ | celery is tasteless | twitter

    celery is tasteless - currently needing some UI time

  3. #3
    Join Date
    Jun 2007
    Posts
    41
    I've been using queries, via my perl script, of this form:

    Code:
    SELECT * from Database.Table WHERE TextColumn LIKE "%keyword%" LIMIT 0, 10;
    Part of my issue here is that the script needs to handle multiple languages. I have multiple tables in the database, one for each language: English, Spanish, Chinese, Thai, etc. Unfortunately, I don't think I can use a FULLTEXT search on the Asian languages. Also, in order to handle multiple languages equally, the database is setup to do everything in UTF8.

    Blessings,

    PolyGreat.

  4. #4
    Join Date
    Aug 2005
    Location
    The Garden State
    Posts
    5,634
    Quote Originally Posted by PolyGreat View Post
    I don't think I can use a FULLTEXT search on the Asian languages
    Of course you can. Try it.

    Of course, it's unlikely that a simplified chinese text can ever match against english. not even the google can do that.
    Last edited by chazzy; 10-28-2008 at 07:27 PM.
    Acceptable Use | SQL Forum FAQ | celery is tasteless | twitter

    celery is tasteless - currently needing some UI time

  5. #5
    Join Date
    Jun 2007
    Posts
    41
    So, is this accent-insensitive search only going to be available when using fulltext? Is there a way of making it work with an ordinary LIKE statement, or perhaps with RLIKE?

    Spanish is the only database I'm having issues with right now, and I'm hesitant to make major changes to the search queries when all of the other languages are operating smoothly.

    As I understand, using a FULLTEXT search is not all that similar to doing a LIKE search in several aspects: 1) Ordering of results, 2) Wordlengths, and 3) System requirements.

    Blessings,

    PolyGreat

  6. #6
    Join Date
    Jun 2007
    Posts
    41
    I'm still looking for an answer to this. I've asked a few of my expert friends, who said they'd get back to me, but never did...so I presume they did not find the answer either.

    There are a few other reasons why I am not doing a fulltext search, and thus cannot consider it as a "best option" in my case. Is there no way to do an accent-independent search in MySQL without fulltext?

    Is this really that difficult?

    Blessings,

    PolyGreat

  7. #7
    Join Date
    Aug 2005
    Location
    The Garden State
    Posts
    5,634
    as far as I am concerned, I've already provided you a viable solution. it is your choice whether to try it not.
    Acceptable Use | SQL Forum FAQ | celery is tasteless | twitter

    celery is tasteless - currently needing some UI time

  8. #8
    Join Date
    May 2009
    Posts
    1

    Case Insensitive, Accent Insensitive

    Poly:

    Most of the databases today have the "Collation" option in which you define your database collation in certain "language oriented", Case Sensitive/Insensitive and Accent Sensitive/Insensitive.

    Defining your database (all), or only a table with a desired collation will allow you to query data without having to deal with accents or case. Example:

    My database is MS SQL 2005 and the collation of my database is defined as:

    Latin1_General_CI_AI

    This definition is made at the time of database creation, or later your can change properties of database or only a table with an specific collation.

    Latin1_General_CI_AI collation gives you the chance to have latin caracters and include most of the latin and european languages (excluding japanese, russian and some others with very funky characters).
    CI means Case Insensitive and AI means Accents Insensitive

    When creating a database or changing the collation you are given multiple options of Latin or other character sets.
    To change the collation of a table, set Extended Properties of the table and change the collation.

    In case of MS SQL, ther is no way to define specifically UTF-8 set since database is multi-character capable (that says Microsoft). However, if you store the data from a page defined as utf-8, data will be saved into table as UTF-8 and when retrieved will be utf-8 as well. This is a tricky thing since data can be utf-8 but retrieved in a page with a different charset, SQL will convert.

    FYI: ASP language on IIS does not handle utf-8 well, sometimes display data as charset=iso-8859-1 or so, so funny chars are shown. MS says utf-8 is not supported. You can use it but half of the time will sitch data to funny chars.
    ASPX is other stoy, is fully functional with UTF-8 as well as php and other languages.

    If you use Postgre or MYSQL database, both have this collation types witha case and accent sensitive/insensitive and full definition for utf-8.

    In short, defining collation with AI and CI collation, you only need to query your database and accents and case will not be taken in consideration, this means user can type something with or without accent. Example:

    User type: ídea and results come with records for: Idea, idea, ídea

  9. #9
    Join Date
    Jun 2007
    Posts
    41
    Are you sure that MySQL supports these *_AI character sets?

    To be frankly honest, I'm taking a hard look at non-MySQL solutions right now, because MySQL is behind the times on the basics like this. Another feature I need that it lacks is that of word proximity searching.

    I'm just tired of dealing with this. I have only one possible solution, it appears, with MySQL: create a column for searching in which ALL of the accents have been replaced with non-accented characters, and then return the search results from a separate column where the original text has been preserved. But that means double the data space, and more mess to deal with.

    Thank you for taking the time to share the possibility. I just haven't seen it be a reality for MySQL.

    Blessings,

    Polygreat

  10. #10
    Join Date
    Jul 2009
    Posts
    2
    try this

    $search = str_replace('%', '[ÑñÇçÀÁÂÃÄÅÈÉÊËÌÍ&#20 6;ÏÒÓÔÕÖÙÚÛÜàáâãäå&# 232;éêëìíîïòóôõöùúûü]', $search);

  11. #11
    Join Date
    Jul 2009
    Posts
    2
    or this
    $search = str_replace('%', '[aàáâãäå]', $search);
    $search = str_replace('%', '[eèéêë]', $search);
    $search = str_replace('%', '[iìíîï]', $search);
    $search = str_replace('%', '[oòóôõö]', $search);
    $search = str_replace('%', '[uùúûü]', $search);
    $search = str_replace('%', '[AÀÁÂÃÄÅ]', $search);
    $search = str_replace('%', '[EÈÉÊË]', $search);
    $search = str_replace('%', '[IÌÍÎÏ]', $search);
    $search = str_replace('%', '[OÒÓÔÕÖ]', $search);
    $search = str_replace('%', '[UÙÚÛÜ]', $search);

  12. #12
    Join Date
    Sep 2009
    Posts
    1

    Smile Solution

    SELECT *
    FROM `table`
    WHERE `column` LIKE _utf8 '%word with accent%' COLLATE utf8_unicode_ci;

    That will work.

Thread Information

Users Browsing this Thread

There are currently 2 users browsing this thread. (0 members and 2 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles