www.webdeveloper.com
Results 1 to 5 of 5

Thread: Problem with search function

  1. #1
    Join Date
    Sep 2006
    Location
    England
    Posts
    29

    Problem with search function

    I'm tring to write a simple text search that ignores any non-alphanumeric characters. For example, I don't want a user to enter something like "find this" and not get a match because the string that's being searched contains something like "find-this", so I remove any unwanted characters from both strings before looking for a match.The problem is, I'd like to be able to display the string afterwards with the matched part highlighted, but I want to be able to display the original string and not the version with the removed characters.

    Here's a simplified version of the code:

    Code:
    $string = 'the term "find-this" is contained within this string';
    $search_term = 'find this';
    
    $string =~ s/[^a-z0-9]//gi;      # 'thetermfindthisiscontainedwithinthisstring'
    $search_term =~ s/[^a-z0-9]//gi; # 'findthis'
    
    if($string =~ /$search_term/i)
    {
    	$string =~ s/($search_term)/\<b>$1<\/b>/gi; # what gets printed:   'theterm<b>findthis</b>iscontainedwithinthisstring'
    	print $string;                              # what I want printed: 'the term "<b>find-this</b>" is contained within this string'
    }
    else
    {
    	print 'No match was found';
    }
    The only way I've managed to get what I wanted is, instead of removing the unwanted characters I replace them with a single character (in this example I've used a period) so the search can still find a match. I hold a list of all the removed characters in another variable so I can put them all back in place of the periods after the search.

    Code:
    $removedchars = $string;
    $removedchars =~ s/[a-z0-9]//g;			# '  "-"     '
    $string =~ s/[^a-z0-9]/./gi;			# 'the.term..find.this..is.contained.within.this.string'
    $search_term =~ s/[^a-z0-9]/./gi;		# 'find.this'
    
    if($string =~ /$search_term/i)
    {
    	$string =~ s/($search_term)/\<b>$1<\/b>/gi;
    	@string_chars = split(//, $string);
    	@removed_chars = split(//, $removedchars);
    
    	for(my $i=0,$j=0;$i<@string_chars;$i++)
    	{
    		$string_chars[$i] = $removed_chars[$j++] if($string_chars[$i] eq '.')
    	}
    	$string = join('', @string_chars);
    	print $string;
    }
    else
    {
    	print 'No match was found';
    }
    This works, but it seems way over the top for something relatively simple. I'll be looking for matches within a few hundred lines of text with each search the user performs, so I'd rather have a more efficient method. Can anyone suggest a better way of doing this?
    Thanks for any help.

  2. #2
    Join Date
    Nov 2002
    Location
    Milan, MI
    Posts
    152
    How 'bout:
    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my $test_string = 'This string has "find-this" within it.';
    my $fragment = 'find this';
    
    (my $frag_comp = $fragment) =~ s%[^a-zA-Z0-9]+%\[\^a\-zA\-Z0\-9\]\+%g;
    if ($test_string =~ /$frag_comp/) {
       (my $print_string = $test_string) =~ s%($frag_comp)%<b>$1</b>%g;
       print $print_string, "\n";
    }
    else {
       print 'Term not found.', "\n";
    }
    I haven't bench-marked it so I can't speak as to whether it's any more or less efficient than what you have. The point is, instead of a dot, replace the characters in the search term that you don't care about with the pattern itself. Then just compare the text to the resulting pattern (no need to modify the text itself; so no need to worry about tracking the original contents). I'm making an assumption here that you should match more than one non-alpha character; if that's incorrect just take the plusses out of the regex.

    HTH,
    Dan

  3. #3
    Join Date
    Dec 2002
    Location
    Pleasanton, CA
    Posts
    2,132
    Here's another way to do it
    Code:
    my $string = 'the term find-this is contained within this string';
    my $search_term = 'find this';
    
    # In search term, convert any non-alphanumeric chars to '.' (match any char)
    $search_term =~ s/\W/\./g;
    
    if($string =~ /$search_term/i) {
    	$string =~ s/($search_term)/<b>$1<\/b>/gi;
    	print "$string\n";		# prints 'the term <b>find-this</b> is contained within this string';
    }
    else {
    	print "No match was found\n";
    }

  4. #4
    Join Date
    Sep 2006
    Location
    England
    Posts
    29
    Hi, thanks for your replies, the solution looks so obvious now . You're right about me wanting to match more than one non-alpha character, that only occurred to me after I'd posted but I couldn't work out how to edit my post (if that's even possible).

    An unrelated question:

    print "No match was found\n";
    print 'No match was found', "\n";

    Is one way any better than the other?

  5. #5
    Join Date
    Nov 2002
    Location
    Milan, MI
    Posts
    152
    print "No match was found\n";
    print 'No match was found', "\n";

    Is one way any better than the other?
    I did that out of habit (not always a good one). Single quotes can be faster in many cases because they don't have to be interpolated (whereas double quotes do). But in this case the simpler:
    Code:
    print "No match was found.\n";
    is the faster statement according to benchmarks. But note that in this example:
    Code:
    my $nf = 'No match was found.';
    print $nf, "\n";
    print "$nf\n";
    the first print statement is the faster one.

    Cheers!
    Dan

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles