jacen6678
08-26-2003, 02:45 PM
I am trying to parse a text file but it is full of weird characters like 22. Is 'box' a control character? Does anyone know how I can get rid of it besides going through the whole file and deleting each individually?
|
Click to See Complete Forum and Search --> : weird characters jacen6678 08-26-2003, 02:45 PM I am trying to parse a text file but it is full of weird characters like 22. Is 'box' a control character? Does anyone know how I can get rid of it besides going through the whole file and deleting each individually? Charles 08-26-2003, 03:20 PM That box typically means that a glyph coresponding to that character cannot be found in the current font. Change your font to Lucida Sans Unicode and if that doesn't help then you need to change the character encoding. This is especially the case if every other character is displayed as a box. jacen6678 08-26-2003, 03:26 PM Only a few characters are displaying strangely... less than .05% in 2MB of text. Changing to Lucida did not do anything. And, that still does not answer the question of how I can remove them... Charles 08-26-2003, 03:45 PM The point is you shouldn't. Those are characters. You just need to find out which ones and find a way to see them. Every other character a box would have indicated that the original document was encoded in utf-16. I'm going to guess that the original encoding is utf8 or iso-8850-1 and that your editor thinks that it is windows-1252. Try playing around with the encoding setting of your editor. And keep the font set at Lucida Sans Unicode until you figure it out. jacen6678 08-26-2003, 03:52 PM I imported the file to mac and used an option called Zap Gremlins to get rid of those characters. However, I am still getting a segmentation error when I try to parse the file. Do you know what that is? webdeveloper.com
Copyright WebMediaBrands Inc., All Rights Reserved. |