Click to See Complete Forum and Search --> : need a new regexp function similar to "m" combined with "push"


Ultimater
10-07-2005, 08:47 PM
I need to create a new function that can accept two arguments, a string and a regular expression-string and preform the regular-expression on the string and return an array containing every match.

The syntax would look like:

my @myResaults=gMatch($myString,$myRegExpString);



In otherwords, instead of having to do:

sub buildMarkingsArray{
push(@markings, $_[0]);return "";
}
our @markings;my $marking;
my $body='
<body>
...
<!-- Start marking -->
&lt;data1&gt;
<!-- End marking -->
...
<!-- Start marking -->
&lt;data2&gt;
<!-- End marking -->
...
</body>
';

$marking=$body;
$marking =~ s/<\!--\s*Start\s+marking\s*-->(.*?)<\!--\s*End\s+marking\s*-->/buildMarkingsArray($1)/iegs;
# @markings now contains ('&lt;data1&gt;','&lt;data2&gt;')


I would be able to do:

my $body='
<body>
...
<!-- Start marking -->
&lt;data1&gt;
<!-- End marking -->
...
<!-- Start marking -->
&lt;data2&gt;
<!-- End marking -->
...
</body>
';

my @markings=gMatch($body,"<\\\!--\\s*Start\\s+marking\\s*-->(.*?)<\\\!--\\s*End\\s+marking\\s*-->");
# @markings now contains ('&lt;data1&gt;','&lt;data2&gt;')


If this is too hard, I hope you've got a better idea how to turn $body into an array ('&lt;data1&gt;','&lt;data2&gt;').

Thanks in advance.

Charles
10-07-2005, 10:02 PM
What is it that you are trying to accomplish?

Nedals
10-07-2005, 10:52 PM
I would be able to do:
<..bunch of code..>Seems to me that the result is more complex that what you started with. :)

Ultimater
10-08-2005, 11:09 PM
Seems to me that the result is more complex that what you started with.
I was only trying to simplifiy things for the thread. Actually it's the-other-way-around because the value of $body is obtained via the contents of another file.

The purpose of the Perl file is to make multiple themes for my website and be able to manage and update all of the themes at once by updating the contents of a single text file that needs to be phrased. After pharsing the contents of the text file into an array of all the important information, I will be able to have Perl print a different page depending on the theme the user chose which is sent as a parameter to the program.

Charles
10-09-2005, 08:28 AM
I'm still not exactly clear about what you're doing but it looks to me like XSLT would be a better way to go. But Perl is the beloved language and XSLT tools don't yet exist for it.

It's always a bad idea to try to manipulate SGML or XML with regular expressions. That's why we have parsers. If you're trying to build HTML then I would suggest that you use the CGI module's element creating routines. If you really do need to parse HTML then I would use the TreeBUilder or one of the Parser modules.

Nedals
10-09-2005, 01:17 PM
I'm not really sure what you're doing, but instead of trying to parse the regex, why not just parse a start and end 'token' and build the regex within the subroution.

<!-- Marking -->
&lt;data1&gt;
<!-- /Marking -->

my @myResults=gMatch($myString,'Marking');

Question:
What's the meaning of 'phrase'? Do you mean 'parse'?

Ultimater
10-09-2005, 02:16 PM
That looks like a better gMatch subroutine than I previously had in-mind, Nedals! Can you please help me with the defining code?

What's the meaning of 'phrase'? Do you mean 'parse'?
Yeah sorry, I always mix-up 'phrase' with 'parse' because I pronounce them alike. I just checked with m-w.com and typed in 'parse' only to find it pronounced like parse (http://m-w.com/cgi-bin/audio.pl?parse001.wav=parse) while I was pronouning it as phrase (http://m-w.com/cgi-bin/audio.pl?phrase01.wav=phrase), hehe. I did something similar in the past with the word 'syntax' and I used to always pronounce it as 'sign ex' eventhough I totally omitted the 't' sound and it made it very hard for me to spell the word. Ahh, that m-w.com website, what would I do without you!? A big thanks for catching that!

Nedals
10-09-2005, 03:46 PM
use strict;
use Data::Dumper;

my $string = <<STR;
<body>
...
<!-- Marking -->
&lt;data1&gt;
<!-- /Marking -->
...
<!-- Marking -->
&lt;data2&gt;
<!-- /Marking -->
...
</body>
STR

my @results = gMatch($string,'Marking');
print Dumper(\@results);

exit;

sub gMatch() {
my ($string,$token) = @_;
my @result = ();

## Code as written keeps any 'returns' within tokens.
# $string =~ s/(\n|\r)//g; ## add this line, and remove 's' option below, to get rid of 'returns'
@result = ($string =~ /<!-- $token -->(.*?)<!-- \/$token -->/igs);
return @result;
}

Ultimater
10-09-2005, 05:02 PM
Thanks for the reply Nedals. The defining code is awesome and your option to remove new lines is a great idea and worth an implementation. I revised it a bit and added a third optional argument to be able to have an option to remove new lines. Definitely a subroutine worth adding to my CPCM module (common Perl coding module). Charles, your suggestion was practical but not quite what I was looking for. I'm sure if I explained my issue better in my first post, you would have came up with the answer I was looking for. Even if I used the module you were talking about, I would have still be stumped as to how to use it. Thanks again Nedals.

sub gMatch{
my ($string,$token,$arg3) = @_;
my @result = ();


if($arg3){
$string =~ s/(\n|\r)//g;
@result = ($string =~ /<!-- $token -->(.*?)<!-- \/$token -->/ig);
return @result;
}
else{
@result = ($string =~ /<!-- $token -->(.*?)<!-- \/$token -->/igs);
return @result;
}


}


edit:
This is much more flexable by allowing different start and end instances:

sub gMatch{
my ($string,$token1,$token2,$omitnewlines) = @_;
my @result = ();


if($omitnewlines){
$string =~ s/(\n|\r)//g;
@result = ($string =~ /$token1(.*?)$token2/ig);
return @result;
}
else{
@result = ($string =~ /$token1(.*?)$token2/igs);
return @result;
}


}