Click to See Complete Forum and Search --> : parsing


LLuaP
05-25-2005, 04:20 PM
It seems that the seemingly simple problems are the ones that get you. I'm still a newbie so this is probably peanuts for the monks here.

Background
First, I am using ActivePerl 5.8 on a WinXP machine. I have been coding in Perl for about a week now so I'm still very green.

The Problemo
I'm having a problem parsing a long text with a repeating pattern. I want to be able to extract some strings from a LONG string similar to this one:

$string = 'xxxxxgoodooo[foo 1bar]999,999SOMEWORDS[/foo bar]xxxgoodooo[foo 2bar]oo323,434ooo[/foo bar]xxxgoodooo[foo 3bar]ooooo[/foo bar]';

where:
"xxx" could be any alphanumeric in any amount.
"ooo" could be any alphanumeric in any amount
"bar" could be any alphanumeric in any amount

-----------
I want to extract and print only the pattern (goodooo[foo ] [/foo ]) that has numbers and alphabetic characters, IN ESSENCE, i want to extract something like:

goodooo[foo bar]999,999 AND SOME ALPHABETICAL[/foo bar]

but NOT:

goodooo[foo bar]ALPHABETICAL ONLY[/foo bar]



I am able to extract the strings with ALPHABETICAL using \w with this code:


while($string =~ m/(good\w+\[foo\s+\w+]\w+\[\/foo\s+\w+])/ig)
{
print $1, "\n";
print "printed the ones without numbers in the middle!\n";
}



But when try extracting the one with NUMBERS & ALPHABETICAL using a combination of \d and \w I get nothing. Here is my code for that:


while($string =~ m/(good\w+\[foo\s+\w+]\d+\w+\[\/foo\s+\w+])/ig)
{
print $1, "\n";
print "printed the ones WITH numbers in the middle\n";
}


Summary

So my problem is that when I try to match a mixture of alphabetical and numeric, it doesn't extract it into the $1 variable. I thought \w represented alphanumeric.

What am I doing wrong? When I try to extract the numbers themselves I have no problem but for somereason when I mix the numbers with alphabetical characters there's no match in that string. Could it be a bug in ActivePerl or in my reasoning? =)

I would appreciate any help. Thanks for reading!
LLuaP