Problem in finding all occurences of substring in a string
HI GURUS
I have a file containing different sub strings (phrases like I am,I am not etc) per line. I am reading this file line by line and I want to find all occurrences of a substring in another STRING (that is in fact a line of a another file being read line by line) ... But I don't know at all..some kind of weird response ... for some string it tells ..for other it does not .. I don't understand ..here is code and please help:
Code:
for ($x=0; $x<$length1; $x=$x+1){ # for each sentence in the file
print "GOING TO READ SENTENCE NUMBER $x\n";
######################################################
$tot_BC=0;
for ($k=0;$k<$L_BC;$k=$k+1) {#reading SUBSTRING FILES
$cnt=0;
$B_C[$k]=lc($B_C[$k]);
$file_content[$x]=lc($file_content[$x]);
$cnt=()=$file_content[$x]=~/($B_C[$k])/gi;
if ($cnt>0) {
$words_found=$words_found."____".$B_C[$k];
}
$tot_BC=$tot_BC+$cnt;
} ## end of for k
#####################################################
push(@BC_array,$tot_BC); # pushing total number of Bcommons in a sentence in array
}# end
I don't know exactly what you are trying to do, but here's some code that may help.
Code:
my @file_content = ( # from sentence file
'this is a test',
'it is a test only',
'here to demonstrate this code',
'with these test substrings',
);
my @substrings = ( # from substring file
'is a',
'this',
'test',
);
for (@file_content) {
my $line = lc $_; # Line from file in lowercase
for (@substrings) { # Test to see if the line contains any of the substrings
if ($line=~/$_/) {
# substring found in this string
print "$_ substring found in [$line]\n";
}
}
#push ??????;
}
But don't you think, its the same code I have already provided and its not working.. however I have changed with Default Variable $_ but still not working..Infact I count the Total occurrences of each Substrings in MainString..that is why I have $cnt and then I add $cnt in $tot_BC to get a TOTAL number of substrings in MainString (a line of a text doc)...and then put $tot_BC in an array so at the end I have an array with one element for each line read.
I wasted today's day just solving this problem
Still hanging .... Its just not working ...even the MAIN STRING contains the occurrences of substrings but at the end the array is empty i.e. all zero elements
The code I provided is working code in that it will detect the substring in the sentence.
I did not, however, understand what your $words_found, @BC_array, and $tot_BC are supposed to represent/contain.
My guess...
$words_found appears to contain a string of the multiple iterations of substrings
@BC_array contains the number of times any substring matches in the sentence using $tot_BC
What are you trying to accomplish? What do you do with the results?
I wrote all in Detail..Word-Found is just a string that contains all the substrings found in main string (line read) separated by ___ .. it was just for debugging..
I just want to find all occurences of a substring in main string..keep it adding so that at the end I know total number of substrings found in one sentence i.e. line read from file..
and then I put this number tot_BC in an array element like if File I am reading to calculate the number of substrings in it , contains 10 Lines then I will calculate number of strings in 10 lines and then PUT total in array ..
array element number 1 will contain the total number of substrings found in line 1 of document being gread..i hope u understand
my @file_content = (
'this is a test',
'it is a test',
'here to demonstrate this code',
'with these test substrings',
);
my @substrings = (
'is a',
'this',
'test',
);
my @counts = (); # Array to keep count of substrings in sentences
my $substr_count = 0;
for (@file_content) {
my $line = lc $_; # Line from file in lowercase
$substr_count = 0; # Clear count for next sentence
for (@substrings) { # Test to see if the line contains any of the substrings
if ($line=~/$_/) {
$substr_count++;
}
}
push @counts, $substr_count;
}
print join(',',@counts);
#Result: [3,2,1,1];
Sentence 1 has 3 substrings (this, is a, test)
Sentence 2 has 2 substrings (is a, test)
Sentence 3 has 1 substring (this)
Sentence 4 has 1 substring (test)
First of all the program you wrote is supposed to check the existence of a substring in main string.. But I want TOTAL occurences of a substring in main string i.e. if there are multiple like "this is test" can occure twice in a main string.. second I don't know at all why my program is not working at all...
Here is the code .. x is Loop for reading Lines from document and k loop is reading substrings from another file...at the end i push tot_bc in array but array is empty even strings are present in document
Code:
for ($x=0; $x<$length1; $x=$x+1){ # for each sentence in the file
print "GOING TO READ SENTENCE NUMBER $x\n";
######################################################
$tot_BC=0;
$cnt=0;
for ($k=0;$k<$L_BC;$k=$k+1) {
#$temp = quotemeta($BC[$k]); #quotemeta() is a standard perl function and it escapes all non-alphanumeric characters in your variable.
$phrase=lc($B_C[$k]);
$file_content[$x]=lc($file_content[$x]);
print "searching phrase $phrase\n";
if ($file_content[$x]==~/($phrase)/){
print "Yes FOUND WE HAVE FOUND\n";
$cnt=$cnt+1;
}
if ($cnt>0) {
$words_found=$words_found."____".$_;
}
$tot_BC=$tot_BC+$cnt;
} ## end of for k
I cannot tell you why your program is not working. I don't know what 'not working' means.
But....
Code:
for ($x=0; $x<$length1; $x=$x+1) ---> for ($x=0; $x<$length1; $x++)
but better to use the method I gave you
if ($file_content[$x]==~/($phrase)/) ---> if ($file_content[$x]=~/$phrase/){
$cnt=$cnt+1; ---> $cnt++;
$words_found=$words_found."____".$_; ---> $words_found .= "____$phase";
$tot_BC=$tot_BC+$cnt; ---> $tot_BC += $cnt;
Based on the code you supplied you do not 'use strict'. You should, it will save you a ton of headaches into the future.
To get TOTAL occurances, change this
Code:
for (@substrings) { # Test to see if the line contains any of the substrings
if ($line=~/$_/) {
$substr_count++;
}
}
-to-
for (@substrings) { # Test to see if the line contains any of the substrings
my @temp = ($line=~/$_/g); # Put every, 'g', matched substring into an array
$substr_count += scalar @temp; # add the array length to count
}
Problem has been identified.. Each substring being read contained a "NEWLINE" character so we have to remove that character using Chomp Function ...Thanks again
Bookmarks