Click to See Complete Forum and Search --> : simple regular expression


TriTech
10-02-2003, 02:00 PM
Hello all,

I need to create a web page that takes in a string of text that may or may not have punctuation in it (the punctuation is not an important element) using a prompt and then I need to use match to get each individual word from the string. I have tried a few different regular expressions and all I can ever get is the first word of the string as the first element of the array. Does anyone know how to properly create this regular expression?

Thanks,
Jeff

pyro
10-02-2003, 02:25 PM
Try something like this:

<script type="text/javascript">
str = prompt("Enter some text","");
ary = str.match(/\S+/g); //ary is an array of the words.
for (i=0; i<ary.length; i++) {
alert (ary[i]);
}
</script>

Charles
10-02-2003, 02:56 PM
<script type="text/javascript">
<!--
alert(prompt('Enter some text.', '').split(/\s+/))
// -->
</script>

Jeff Mott
10-02-2003, 04:51 PM
I believe there was also some debate over this some time ago. Pyro's example will keep unwanted punctuation such as commas or periods with the word, and Charles' will strip away wanted punctuation such apostrophes. Perhaps we should completely re-evaluate what should constitute a word boundary. For example, underscores, apostrophes or short hyphens should not constitute a boundary. Though they should probably also not be the first character. Or in the particular case of numbers (e.g., 1,024.5) commas and periods, and possibly a leading plus or minus should not constitute boundaries. And perhaps also accommodating for scientific notation.

Some patterns to build from...word = /\w+(?:-\w+)*(?:'[a-z]*)?/i
number = /\d{1,3}(?:,\d{3})*(?:\.\d*)?/
scientific_notation = /\d(?:\.\d*)?e[+-]\d+/i
number_prefix = /[$+-]/ // possibly also foreign currency symbols
number_postfix = /[%¢]/ // possibly also degree symbol, or carrot (^)
// for the ascii convention in indicating
// exponentiationAny other thoughts, corrections or additions are welcomed.

Charles
10-03-2003, 06:25 AM
Originally posted by Jeff Mott
I believe there was also some debate over this some time ago. Pyro's example will keep unwanted punctuation such as commas or periods with the word, and Charles' will strip away wanted punctuation such apostrophes. You've misread my post, something that I do quite often. Our two example are pretty darn close to each other. Pyro is doing a global match for all consecutive non-white space characters while I'm splitting the string on consecutive white space characters. Since we're trying to split the string up into its parts, I thought it might make more sense to use String.split().

Ice3T
10-26-2003, 11:57 AM
YES... this IS, exactly what I needed!

thank you