*Tom*
12-17-2008, 09:49 AM
I'm tring to write a simple text search that ignores any non-alphanumeric characters. For example, I don't want a user to enter something like "find this" and not get a match because the string that's being searched contains something like "find-this", so I remove any unwanted characters from both strings before looking for a match.The problem is, I'd like to be able to display the string afterwards with the matched part highlighted, but I want to be able to display the original string and not the version with the removed characters.
Here's a simplified version of the code:
$string = 'the term "find-this" is contained within this string';
$search_term = 'find this';
$string =~ s/[^a-z0-9]//gi; # 'thetermfindthisiscontainedwithinthisstring'
$search_term =~ s/[^a-z0-9]//gi; # 'findthis'
if($string =~ /$search_term/i)
{
$string =~ s/($search_term)/\<b>$1<\/b>/gi; # what gets printed: 'theterm<b>findthis</b>iscontainedwithinthisstring'
print $string; # what I want printed: 'the term "<b>find-this</b>" is contained within this string'
}
else
{
print 'No match was found';
}
The only way I've managed to get what I wanted is, instead of removing the unwanted characters I replace them with a single character (in this example I've used a period) so the search can still find a match. I hold a list of all the removed characters in another variable so I can put them all back in place of the periods after the search.
$removedchars = $string;
$removedchars =~ s/[a-z0-9]//g; # ' "-" '
$string =~ s/[^a-z0-9]/./gi; # 'the.term..find.this..is.contained.within.this.string'
$search_term =~ s/[^a-z0-9]/./gi; # 'find.this'
if($string =~ /$search_term/i)
{
$string =~ s/($search_term)/\<b>$1<\/b>/gi;
@string_chars = split(//, $string);
@removed_chars = split(//, $removedchars);
for(my $i=0,$j=0;$i<@string_chars;$i++)
{
$string_chars[$i] = $removed_chars[$j++] if($string_chars[$i] eq '.')
}
$string = join('', @string_chars);
print $string;
}
else
{
print 'No match was found';
}
This works, but it seems way over the top for something relatively simple. I'll be looking for matches within a few hundred lines of text with each search the user performs, so I'd rather have a more efficient method. Can anyone suggest a better way of doing this?
Thanks for any help.
Here's a simplified version of the code:
$string = 'the term "find-this" is contained within this string';
$search_term = 'find this';
$string =~ s/[^a-z0-9]//gi; # 'thetermfindthisiscontainedwithinthisstring'
$search_term =~ s/[^a-z0-9]//gi; # 'findthis'
if($string =~ /$search_term/i)
{
$string =~ s/($search_term)/\<b>$1<\/b>/gi; # what gets printed: 'theterm<b>findthis</b>iscontainedwithinthisstring'
print $string; # what I want printed: 'the term "<b>find-this</b>" is contained within this string'
}
else
{
print 'No match was found';
}
The only way I've managed to get what I wanted is, instead of removing the unwanted characters I replace them with a single character (in this example I've used a period) so the search can still find a match. I hold a list of all the removed characters in another variable so I can put them all back in place of the periods after the search.
$removedchars = $string;
$removedchars =~ s/[a-z0-9]//g; # ' "-" '
$string =~ s/[^a-z0-9]/./gi; # 'the.term..find.this..is.contained.within.this.string'
$search_term =~ s/[^a-z0-9]/./gi; # 'find.this'
if($string =~ /$search_term/i)
{
$string =~ s/($search_term)/\<b>$1<\/b>/gi;
@string_chars = split(//, $string);
@removed_chars = split(//, $removedchars);
for(my $i=0,$j=0;$i<@string_chars;$i++)
{
$string_chars[$i] = $removed_chars[$j++] if($string_chars[$i] eq '.')
}
$string = join('', @string_chars);
print $string;
}
else
{
print 'No match was found';
}
This works, but it seems way over the top for something relatively simple. I'll be looking for matches within a few hundred lines of text with each search the user performs, so I'd rather have a more efficient method. Can anyone suggest a better way of doing this?
Thanks for any help.