|
-
implementing SRX Segmentation Rules in JavaScript
Hello ,
I want to implement the SRX Segmentation Rules using javascript to "extract sentences from text".
In order to do this correctly I will have to follow the SRX rules.
eg. http://www.lisa.org/fileadmin/standa...0.html#refTR29
now there are two types of regular expressions
1. if found,sentence should break ,like ". "
2. if found, sentence should not break ,like abbreviation U.K or Mr.
For this again there are two parts
1. before breaking
2. after breaking
for example if the rule is
<rule break="no">
<beforebreak>\s*[0-9]+\.</beforebreak>
<afterbreak>\s</afterbreak>
</rule>
Which says if the pattern "\s*[0-9]+.\s" is found the segment should not break.
how do I implement using javascript, my be split function is not enough ?
Thread Information
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|
Bookmarks