www.webdeveloper.com
Results 1 to 10 of 10

Thread: Problem in finding all occurences of substring in a string

  1. #1
    Join Date
    Mar 2008
    Posts
    49

    Problem in finding all occurences of substring in a string

    HI GURUS

    I have a file containing different sub strings (phrases like I am,I am not etc) per line. I am reading this file line by line and I want to find all occurrences of a substring in another STRING (that is in fact a line of a another file being read line by line) ... But I don't know at all..some kind of weird response ... for some string it tells ..for other it does not .. I don't understand ..here is code and please help:

    Code:
    for ($x=0; $x<$length1; $x=$x+1){  # for each sentence in the file
    
    print "GOING TO READ SENTENCE NUMBER $x\n";
    ######################################################
    $tot_BC=0;
    for ($k=0;$k<$L_BC;$k=$k+1) {#reading SUBSTRING FILES
    
    $cnt=0;
    
    $B_C[$k]=lc($B_C[$k]);
    $file_content[$x]=lc($file_content[$x]);
    
    $cnt=()=$file_content[$x]=~/($B_C[$k])/gi;
    
    if ($cnt>0) {
    $words_found=$words_found."____".$B_C[$k];
    }
    $tot_BC=$tot_BC+$cnt;
    } ## end of for k
    
    #####################################################
    
    push(@BC_array,$tot_BC);  # pushing total number of Bcommons in a sentence in array
    }# end
    I hope u understand problem...

  2. #2
    Join Date
    Dec 2002
    Location
    Pleasanton, CA
    Posts
    2,132
    I don't know exactly what you are trying to do, but here's some code that may help.
    Code:
    my @file_content = (  # from sentence file
    	'this is a test',
    	'it is a test only',
    	'here to demonstrate this code',
    	'with these test substrings',
    );
    my @substrings = (  # from substring file
    	'is a',
    	'this',
    	'test',
    );
    
    for (@file_content) {
    	my $line = lc $_;		# Line from file in lowercase
    	for (@substrings) {		# Test to see if the line contains any of the substrings
    		if ($line=~/$_/) {
    			# substring found in this string
    			print "$_ substring found in [$line]\n";
    		}
    	}
    	#push ??????;
    }

  3. #3
    Join Date
    Mar 2008
    Posts
    49

    Hi

    Thanks for Reply

    But don't you think, its the same code I have already provided and its not working.. however I have changed with Default Variable $_ but still not working..Infact I count the Total occurrences of each Substrings in MainString..that is why I have $cnt and then I add $cnt in $tot_BC to get a TOTAL number of substrings in MainString (a line of a text doc)...and then put $tot_BC in an array so at the end I have an array with one element for each line read.

    I wasted today's day just solving this problem

    Still hanging .... Its just not working ...even the MAIN STRING contains the occurrences of substrings but at the end the array is empty i.e. all zero elements

  4. #4
    Join Date
    Dec 2002
    Location
    Pleasanton, CA
    Posts
    2,132
    The code I provided is working code in that it will detect the substring in the sentence.
    I did not, however, understand what your $words_found, @BC_array, and $tot_BC are supposed to represent/contain.

    My guess...
    $words_found appears to contain a string of the multiple iterations of substrings
    @BC_array contains the number of times any substring matches in the sentence using $tot_BC

    What are you trying to accomplish? What do you do with the results?

  5. #5
    Join Date
    Mar 2008
    Posts
    49
    I wrote all in Detail..Word-Found is just a string that contains all the substrings found in main string (line read) separated by ___ .. it was just for debugging..

    I just want to find all occurences of a substring in main string..keep it adding so that at the end I know total number of substrings found in one sentence i.e. line read from file..

    and then I put this number tot_BC in an array element like if File I am reading to calculate the number of substrings in it , contains 10 Lines then I will calculate number of strings in 10 lines and then PUT total in array ..

    array element number 1 will contain the total number of substrings found in line 1 of document being gread..i hope u understand

  6. #6
    Join Date
    Dec 2002
    Location
    Pleasanton, CA
    Posts
    2,132
    Like this?
    Code:
    my @file_content = (
    	'this is a test',
    	'it is a test',
    	'here to demonstrate this code',
    	'with these test substrings',
    );
    my @substrings = (
    	'is a',
    	'this',
    	'test',
    );
    
    my @counts = ();			# Array to keep count of substrings in sentences
    my $substr_count = 0;
    for (@file_content) {
    	my $line = lc $_;		# Line from file in lowercase
    	$substr_count = 0;		# Clear count for next sentence
    
    	for (@substrings) {		# Test to see if the line contains any of the substrings
    		if ($line=~/$_/) {
    			$substr_count++;
    		}
    	}
    	push @counts, $substr_count;
    }
    
    print join(',',@counts);
    
    #Result: [3,2,1,1];
    Sentence 1 has 3 substrings  (this, is a, test)
    Sentence 2 has 2 substrings  (is a, test)
    Sentence 3 has 1 substring   (this)
    Sentence 4 has 1 substring   (test)

  7. #7
    Join Date
    Mar 2008
    Posts
    49
    Yes

    Till now it was not working for me.. some problem.. I will tell u in morning whether it works or not ... I have a big program and its just part of it

    Thanks anyway

    Goodnight Dear

  8. #8
    Join Date
    Mar 2008
    Posts
    49
    First of all the program you wrote is supposed to check the existence of a substring in main string.. But I want TOTAL occurences of a substring in main string i.e. if there are multiple like "this is test" can occure twice in a main string.. second I don't know at all why my program is not working at all...
    Here is the code .. x is Loop for reading Lines from document and k loop is reading substrings from another file...at the end i push tot_bc in array but array is empty even strings are present in document

    Code:
    for ($x=0; $x<$length1; $x=$x+1){  # for each sentence in the file
    
    print "GOING TO READ SENTENCE NUMBER $x\n";
    ######################################################
    $tot_BC=0;
    $cnt=0;
    for ($k=0;$k<$L_BC;$k=$k+1) {
    
    
    #$temp = quotemeta($BC[$k]);         #quotemeta() is a standard perl function and it escapes all non-alphanumeric characters in your variable.
    $phrase=lc($B_C[$k]);
    $file_content[$x]=lc($file_content[$x]);
    print "searching phrase $phrase\n";
    if ($file_content[$x]==~/($phrase)/){
    print "Yes FOUND WE HAVE FOUND\n";
    $cnt=$cnt+1;
    }
    
    if ($cnt>0) {
    $words_found=$words_found."____".$_;
    }
    $tot_BC=$tot_BC+$cnt;
    
    } ## end of for k

  9. #9
    Join Date
    Dec 2002
    Location
    Pleasanton, CA
    Posts
    2,132
    I cannot tell you why your program is not working. I don't know what 'not working' means.
    But....
    Code:
    for ($x=0; $x<$length1; $x=$x+1) ---> for ($x=0; $x<$length1; $x++)
    but better to use the method I gave you 
    
    if ($file_content[$x]==~/($phrase)/) ---> if ($file_content[$x]=~/$phrase/){
    
    $cnt=$cnt+1; ---> $cnt++;
    
    $words_found=$words_found."____".$_;  ---> $words_found .= "____$phase";
    
    $tot_BC=$tot_BC+$cnt;  ---> $tot_BC += $cnt;
    Based on the code you supplied you do not 'use strict'. You should, it will save you a ton of headaches into the future.

    To get TOTAL occurances, change this
    Code:
      for (@substrings) {		# Test to see if the line contains any of the substrings
        if ($line=~/$_/) {
          $substr_count++;
        }
      }
    -to-
      for (@substrings) {		# Test to see if the line contains any of the substrings
        my @temp = ($line=~/$_/g);			# Put every, 'g', matched substring into an array
        $substr_count += scalar @temp;		# add the array length to count
      }

  10. #10
    Join Date
    Mar 2008
    Posts
    49
    Thanks for ur reply..

    Problem has been identified.. Each substring being read contained a "NEWLINE" character so we have to remove that character using Chomp Function ...Thanks again

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles