Click to See Complete Forum and Search --> : help with Regular Expression


obaid1982
07-10-2005, 11:45 PM
I am trying to extract a string from xml file using regex. Here is a sample of xml file:

<SENTENCE>
<TEXT>today is monday</TEXT>
<BIT>4</BIT>
<...> blah blah blah </...>
</SENTENCE>

Now i want to grab everything between <SENTENCE> and </SENTENCE> including the 'text' and 'bit' tags. My approach is to use negation:

String abs = "<SENTENCE>(^(</SENTENCE)*)</SENTENCE>";


but it doesnt quite work. I know how to negate single characters, but how do i negate an entire a string? Thanks in advance

Exuro
07-11-2005, 02:59 PM
I'm pretty sure you can just do this with a "lazy star", which matches the smallest number of characters as it can:
<SENTENCE>.*?</SENTENCE>
You'll probably need to use the MULTILINE (http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#MULTILINE) flag in your pattern.