Click to See Complete Forum and Search --> : remove words from string


Webskater
08-20-2003, 03:34 PM
Before passing phrases into a Knowledge Base search, I would like to loop through the phrase and remove a set list of words such as "and" "if" etc. Anyone know a clever way of doing this.
Cheers.

AdamGundry
08-20-2003, 04:00 PM
Something like this (using a RegExp):

str = 'The sentence goes here.';
removeWords = 'the|and|but|if';

re = new RegExp(removeWords, 'gi');
str = str.replace(re, '');

Adam

Webskater
08-21-2003, 04:58 AM
Thanks for your reply. Trying to add some more words into the list of words to be removed, I got stuck trying to add words like can't and don't. It does not like the apostrophe.
How can I get over this please.

Also I would like to be able to remove a full stop from the end of any words.

Thanks again for your help.

AdamGundry
08-21-2003, 05:16 AM
To include words with apostrophes, you need to escape them using a backslash, and you also need to escape the full stop, like this:

removeWords = 'can\\'t|don\\'t|\.';

See the RegExp documentation: http://devedge.netscape.com/library/manuals/2000/javascript/1.3/reference/regexp.html

Adam

Charles
08-21-2003, 05:49 AM
Or you can use the other syntax:

replace(/the|and|but|if|can't|don't/gi, '')

Webskater
08-21-2003, 07:03 AM
Thanks for your replies.

If I try to eliminate words thus:

|do|dont| //someone typing don't without apostrophe

the 'do' gets stripped off the front of 'dont' leaving 'nt'. Is there a way of forcing this to examine each word as a separate entity
i.e.
the characters between one space and the next form a word
the characters from the beginning of the string to the first space form a word
the characters from the last space to the end of the string form a word

Thanks again

pyro
08-21-2003, 07:13 AM
Use \b to designate a word boundry:

<script type="text/javascript">
str = "Dont try this at home.";
str = str.replace(/\b(the|and|but|if|can't|do|don't)\b/gi, '');
alert (str);
</script>

Charles
08-21-2003, 07:23 AM
Try flipping the do and the don't. That is to say, filter out the don't first.

Webskater
08-21-2003, 07:27 AM
Thanks for all your replies - it now works perfectly. This RegExp stuff is a couple of brain cells too far for me.
Thanks again.