dcsimg
www.webdeveloper.com
Results 1 to 10 of 10

Thread: fopen can't read arabic characters

  1. #1
    Join Date
    Jul 2013
    Posts
    18

    fopen can't read arabic characters

    This function reads (document.doc) files.. But it turns arabic characters into english characters

    I want to make it read arabic characters , Or remove it at least.

    PHP Code:
    function word($filename){
        
        
        if((
    $fh fopen($filename'r')) !== false ) {
            
           
    $headers fread($fh0xA00);
           
    $n1 = ( ord($headers[0x21C]) - );
           
    $n2 = ( ( ord($headers[0x21D]) - ) * 256 );
           
    $n3 = ( ( ord($headers[0x21E]) * 256 ) * 256 );
           
    $n4 = ( ( ( ord($headers[0x21F]) * 256 ) * 256 ) * 256 );
           
    $textLength = ($n1 $n2 $n3 $n4);
           if(
    $extracted_plaintext = @fread($fh$textLength)){

           }else{
                return 
    docx2text($filename); // Save this contents to file
           
    }
           
            
    $text=str_replace(  chr(13) , "\n"$extracted_plaintext);

            echo 
    $text;
        }
        
        
    }

    word('filename.doc'); 

  2. #2
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,683
    Any difference if you read it as binary?
    PHP Code:
    fopen($filename'rb'
    Just a stab in the dark, not sure if it should/would make any difference in this case.
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  3. #3
    Join Date
    Jul 2013
    Posts
    18
    No , It didn't make any difference ..

    An Arabic word is converted into English word with similar amount of digits.
    Example , A word 'علي' in the file is read as '9Dj' ..

    Any suggestion ?

  4. #4
    Join Date
    Jul 2013
    Posts
    18
    I noticed that each Arabic character is converted into a specific English one .. May this help diagnosing the problem..

  5. #5
    Join Date
    Mar 2007
    Location
    localhost
    Posts
    2,589
    Perhaps Ali needs to be represented as an escaped character or code.
    If your post falls off the page, bump it. ...
    Please remember to wrap any code you have in forum tags:-

    [CODE]...[/CODE] [HTML]...[/HTML] [PHP]...[/PHP]

    If you can't think outside the box, you will be trapped forever with no escape...

  6. #6
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,683
    What does the docx2text() function do? If by any chance it implements PHPDOCX, perhaps you need to make use of the setEncodeUTF8() method?
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  7. #7
    Join Date
    Jul 2013
    Posts
    18
    docx2text() will only direct it to read another format (docx rather than doc) , but the problem is n't in it .. In case of docx , the file is read correctly .. But in case of the word , the current error occurs.
    nction
    I'm not using PHPDOCX and there is no chance to use it in that project.

    Is there any similar function to setEncodeUTF8() ?

  8. #8
    Join Date
    Jul 2013
    Posts
    18
    I found that each Arabic character is converted to its corresponding ASCII character ..

    How can I prevent this , or how can I reconvert it to its original Arabic chr ..

  9. #9
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,683
    Where is this conversion happening? It's not clear to me exactly when this transition is happening. Maybe htmlentities() could handle it, or maybe the mb_string functions, but as of yet I'm not really sure where the problem is.
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  10. #10
    Join Date
    Mar 2007
    Location
    localhost
    Posts
    2,589
    If your post falls off the page, bump it. ...
    Please remember to wrap any code you have in forum tags:-

    [CODE]...[/CODE] [HTML]...[/HTML] [PHP]...[/PHP]

    If you can't think outside the box, you will be trapped forever with no escape...

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles