Click to See Complete Forum and Search --> : Perl--Backreferences


StuPeas
08-18-2006, 02:04 PM
This one should be simple (I hope).


$letters = "aaaabbbb";

$letters =~m/\w*(.)\1\w*/;

print "the letter $1 was found consecutively";


In the above, why isnt $1 set to "a" instead of "b".

The way i see it, the first match (\w*) would match the first "a".
The second match ((.)) would match the second "a".
The third match (\1) will only now match if it is the same as (.), which it is, i.e "a".
And the last match (\w*) would be taken care of by the remainder.

So why does perl skip the "a"'s and instead match on the "b"',.

TIA

NogDog
08-18-2006, 08:40 PM
I think the problem is that \w* will match as many characters as it can ("*" being 0-n occurences), so the first \w* is matching "aaaabb", then the dot is matching the 3rd "b", and the \1 back-reference is matching the last "b". (The last \w* doesn't have to match anything, since 0 matches is valid for "*".)

Or something like that. :)

StuPeas
08-19-2006, 06:30 AM
so if aaaabbbb is replaced with "aaaabbbb agsferwtdddd" why doesnt \1 match the "d"'s.

dragle
08-21-2006, 10:57 AM
so if aaaabbbb is replaced with "aaaabbbb agsferwtdddd" why doesnt \1 match the "d"'s.
Because \w will only match "word" characters; i.e., a-zA-Z0-9 and underscore. The space between your two character strings won't match the first \w*, so the regexp stops there, then backs up to see if any shorter strings match the rest of your desired expression. So the "b" is still printed.

If you remove the space between your two character segments and try it, i.e., "aaaabbbbagsferwtdddd", then the "d" will be printed (assuming your original example code).

If you haven't seen them already, these pages may be helpful:

http://search.cpan.org/dist/perl/pod/perlretut.pod
http://search.cpan.org/dist/perl/pod/perlre.pod

Good luck!

StuPeas
08-21-2006, 01:45 PM
Thanx for the explanation and the links Dan,,,much appreciated

Stu