www.webdeveloper.com
Results 1 to 4 of 4

Thread: He�p: my text �eeps ��ing thi�

  1. #1
    Join Date
    Jul 2010
    Location
    /ramdisk/
    Posts
    865

    Question He�p: my text �eeps ��ing thi�

    The problem is that people are copying and pasting from text-editors that replace normal characters with special ones which are not ASCII. This is OK, the text looks nicer when you’re viewing it in the text editor, but I need to display the text in an HTML document.

    �oN±��§À��, know what I mean?

    Here is what I currently use, and it seems to fix the majority of the characters. From previous experience working with UTF8 I know that my browser does not have every character on the map which is why I wrote U+FFFD so you can see.


    Edit: Here is how I am handling the text from start to finish:

    1) A user copy/pastes from word into a textfield on a webpage with:
    HTML Code:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    2) I put it into a mySQL table... the field is either varchar(N) or mediumtext. The collation is latin1_swedish_ci.

    3) I take the text out using:

    PHP Code:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    </head>
    <body>
    <?php
    System
    ::mysql_con();
    $string mysql_fetch_array(mysql_query("select field from table, limit 1"));
    echo 
    System::MS_COMPAT($string['field'], truetrue);
    ?>
    </body>
    </html>
    PHP Code:
    class System #this is just a snippet
      /**
       * When people copy/paste from MS word, the encoding causes
       * display problems... Use this to convert it to some better ^_^
       *
       * @param string $string  The String to Convert
       * @param bool   $fromSQL applies strip slashes
       * @param bool   $nl2br   applies nl2br
       */
      
    public final static function MS_COMPAT($string$fromSQL false$nl2br false) { 
        
    $string $fromSQL stripslashes($string) : $string;
        
    $string htmlentities($stringENT_COMPAT"UTF-8");
        if (
    $nl2br)   $string self::myNL2BR("\n"+$string);
        return 
    $string;
      }


    Why am I doing it wrong?

    Edit: I added my table's collation: latin1_swedish_ci (maybe this is why?)
    Last edited by eval(BadCode); 02-15-2011 at 08:25 PM.

  2. #2
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    22,327
    Assuming you want to work with UTF-8 throughout, make sure that:

    a. The database is storing it as UTF-8, e.g.:
    Code:
    CREATE TABLE `test`.`example_table` (
    `sample_field` VARCHAR(255) NOT NULL ) 
    ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_unicode_ci
    b. When you connect to mysql in your script, before doing any other queries, do the following to ensure you are "talking" to MySQL in UTF-8:
    PHP Code:
    mysql_query("SET NAMES 'utf8' COLLATE 'utf8_unicode_ci'"); 
    c. If/when applying htmlentities() to your output, be sure to use the 3rd optional parameter to specify UTF-8:
    PHP Code:
    echo htmlentities($textENT_QUOTES'UTF-8'); 
    d. In addition to the content-type meta tag, also output a HTTP header in case your web server sends something else and the browser uses that instead of the meta tag:
    PHP Code:
    <?php
    header
    ("Content-Type: text/html; charset='utf-8'");
    That should take care of all UTF-8 issues on the PHP side. You can also add an accept_charset attribute to your <form> tags to "ask" the browser to only accept UTF-8 inputs, though of course you cannot depend on it.

    Lastly, I use this function to filter inputs before saving them in the DB if I'm worried about people cutting and pasting text from M$ Word documents with their proprietary character set for punctuation: http://www.charles-reace.com/blog/20...-ms-word-text/.
    "Well done....Consciousness to sarcasm in five seconds!" ~ Terry Pratchett, Night Watch

    How to Ask Questions the Smart Way (not affiliated with this site, but well worth reading)

    My Blog
    cwrBlog: simple, no-database PHP blogging framework

  3. #3
    Join Date
    Jul 2010
    Location
    /ramdisk/
    Posts
    865
    I already checked the headers sent by the server using wireshark. It's not that.

    I did not intend work with utf8 at all.

    I will do my best to implement your solution. I trust your experience

    Just a thought however:
    I find it perverse that I'm changing my table collation because of a text editor that has nothing to do with my application ... I feel bullied by M$... had this application been for myself I wouldn't give this issue the time of day.


    Edit:
    Is there an ASCII only version of your filter_text function?
    Last edited by eval(BadCode); 02-15-2011 at 09:33 PM.

  4. #4
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    22,327
    Quote Originally Posted by eval(BadCode) View Post
    I already checked the headers sent by the server using wireshark. It's not that.

    I did not intend work with utf8 at all.
    I assumed you did since you had it in your content-type meta tag. Also, it makes your page more internationally accessible, since ASCII only supports the base "western" latin character set.

    I will do my best to implement your solution. I trust your experience

    Just a thought however:
    I find it perverse that I'm changing my table collation because of a text editor that has nothing to do with my application ... I feel bullied by M$... had this application been for myself I wouldn't give this issue the time of day.
    UTF-8 has nothing to do with M$, which uses its own character set (I don't recall of the top of my head what its designation is). If you don't want to use UTF-8 and instead limit yourself to only western latin characters, you can replace UTF-8 throughout with ISO-8859-1 for PHP functions and "latin1" for the MySQL character set (normally the default) and latin1_general_ci for the MySQL collation type.
    Edit:
    Is there an ASCII only version of your filter_text function?
    Nothing I have lying around.

    EDIT: If you filter your output with htmlentities($text, ENT_QUOTES, "ISO-8859-1") if you choose to go that route for your content-type, you should be OK even if you use that function (I'm 90&#37; sure).
    Last edited by NogDog; 02-15-2011 at 09:58 PM.
    "Well done....Consciousness to sarcasm in five seconds!" ~ Terry Pratchett, Night Watch

    How to Ask Questions the Smart Way (not affiliated with this site, but well worth reading)

    My Blog
    cwrBlog: simple, no-database PHP blogging framework

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center

"

"

X vBulletin 4.2.2 Debug Information

  • Page Generation 0.15449 seconds
  • Memory Usage 2,896KB
  • Queries Executed 13 (?)
More Information
Template Usage (36):
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_global_above_footer
  • (1)ad_global_below_navbar
  • (1)ad_global_header1
  • (1)ad_global_header2
  • (1)ad_navbar_below
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (1)ad_thread_first_post_content
  • (1)ad_thread_last_post_content
  • (1)bbcode_code
  • (1)bbcode_html
  • (5)bbcode_php
  • (3)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)headinclude_bottom
  • (4)memberaction_dropdown
  • (1)navbar
  • (4)navbar_link
  • (1)navbar_moderation
  • (1)navbar_noticebit
  • (1)navbar_tabs
  • (2)option
  • (4)postbit
  • (4)postbit_onlinestatus
  • (4)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available (6):
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files (26):
  • ./showthread.php
  • ./global.php
  • ./includes/class_bootstrap.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/functions_navigation.php
  • ./includes/class_friendly_url.php
  • ./includes/class_hook.php
  • ./includes/class_bootstrap_framework.php
  • ./vb/vb.php
  • ./vb/phrase.php
  • ./includes/functions_facebook.php
  • ./includes/functions_calendar.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_notice.php
  • ./packages/vbattach/attach.php
  • ./vb/types.php
  • ./vb/cache.php
  • ./vb/cache/db.php
  • ./vb/cache/observer/db.php
  • ./vb/cache/observer.php 

Hooks Called (71):
  • init_startup
  • friendlyurl_resolve_class
  • init_startup_session_setup_start
  • database_pre_fetch_array
  • database_post_fetch_array
  • init_startup_session_setup_complete
  • global_bootstrap_init_start
  • global_bootstrap_init_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • load_show_variables
  • load_forum_show_variables
  • global_state_check
  • global_bootstrap_complete
  • global_start
  • style_fetch
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • strip_bbcode
  • friendlyurl_clean_fragment
  • friendlyurl_geturl
  • forumjump
  • cache_templates
  • cache_templates_process
  • template_register_var
  • template_render_output
  • fetch_template_start
  • fetch_template_complete
  • parse_templates
  • fetch_musername
  • notices_check_start
  • notices_noticebit
  • process_templates_complete
  • friendlyurl_redirect_canonical
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • memberaction_dropdown
  • tag_fetchbit
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • build_navigation_data
  • build_navigation_array
  • check_navigation_permission
  • process_navigation_links_start
  • process_navigation_links_complete
  • set_navigation_menu_element
  • build_navigation_menudata
  • build_navigation_listdata
  • build_navigation_list
  • set_navigation_tab_main
  • set_navigation_tab_fallback
  • navigation_tab_complete
  • fb_like_button
  • showthread_complete
  • page_templates