www.webdeveloper.com
Results 1 to 7 of 7

Thread: xml - about 5.000 files - search and delete spaces in text

  1. #1
    Join Date
    Sep 2011
    Posts
    6

    xml - about 5.000 files - search and delete spaces in text

    Hello,

    I am working at National library in Slovenia on the IMPACT project of digitazing and OCR books from 19th century. It aims to significantly improve access to historical text and to take away the barriers that stand in the way of the mass digitisation of the European cultural heritage.

    We are working also with xml files - there are about 5.000 files.

    We are changing (find-raplace) some mistakes in them with Text crawler.

    We have already correct some mistakes - background color, color of fonts, etc., but we can't find the Regular expression for finding (searching) and replacing (deleting) useless (superfluous) white spaces in some part of texts - at the end of the line.

    Here is an example:

    <Unicode>This is an example
    of our text to show you
    what we need to do.</Unicode></TextEquiv></TextRegion>

    I have marked white spaces with the - we want to find them and delete them with the correct Regular expression.

    Thank you for your answer in advance.

  2. #2
    Join Date
    Aug 2006
    Posts
    1,898
    You're trying to replace the sequence Space+Newline, with just a Newline. Have you tried that with the text crawler tool? Newline is \n. I've never used the tool, but their manual seems to imply you can do that.

    Dave

  3. #3
    Join Date
    Sep 2011
    Posts
    6
    Would you be so kind and write the whole Regular expression with \n included.
    Thank you.

  4. #4
    Join Date
    Sep 2011
    Posts
    6
    What is the sign (character) for white space? Is it \s ?

  5. #5
    Join Date
    Aug 2006
    Posts
    1,898
    I'm no expert on regular expressions, but I think a space is a space. Ie the expression would be " \n"

    Dave

  6. #6
    Join Date
    Sep 2011
    Posts
    6
    Does anybody else know what would the whole Regular expression for this case be?
    Thank you.

  7. #7
    Join Date
    Sep 2011
    Posts
    6
    i have found it, it is simple: [ ]\n

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles