Click to See Complete Forum and Search --> : Problem in finding all occurences of substring in a string
kimskams80
12-01-2008, 05:50 AM
HI GURUS
I have a file containing different sub strings (phrases like I am,I am not etc) per line. I am reading this file line by line and I want to find all occurrences of a substring in another STRING (that is in fact a line of a another file being read line by line) ... But I don't know at all..some kind of weird response ... for some string it tells ..for other it does not .. I don't understand ..here is code and please help:
for ($x=0; $x<$length1; $x=$x+1){ # for each sentence in the file
print "GOING TO READ SENTENCE NUMBER $x\n";
######################################################
$tot_BC=0;
for ($k=0;$k<$L_BC;$k=$k+1) {#reading SUBSTRING FILES
$cnt=0;
$B_C[$k]=lc($B_C[$k]);
$file_content[$x]=lc($file_content[$x]);
$cnt=()=$file_content[$x]=~/($B_C[$k])/gi;
if ($cnt>0) {
$words_found=$words_found."____".$B_C[$k];
}
$tot_BC=$tot_BC+$cnt;
} ## end of for k
#####################################################
push(@BC_array,$tot_BC); # pushing total number of Bcommons in a sentence in array
}# end
I hope u understand problem...
Nedals
12-01-2008, 05:03 PM
I don't know exactly what you are trying to do, but here's some code that may help.
my @file_content = ( # from sentence file
'this is a test',
'it is a test only',
'here to demonstrate this code',
'with these test substrings',
);
my @substrings = ( # from substring file
'is a',
'this',
'test',
);
for (@file_content) {
my $line = lc $_; # Line from file in lowercase
for (@substrings) { # Test to see if the line contains any of the substrings
if ($line=~/$_/) {
# substring found in this string
print "$_ substring found in [$line]\n";
}
}
#push ??????;
}
kimskams80
12-01-2008, 07:12 PM
Thanks for Reply
But don't you think, its the same code I have already provided and its not working.. however I have changed with Default Variable $_ but still not working..Infact I count the Total occurrences of each Substrings in MainString..that is why I have $cnt and then I add $cnt in $tot_BC to get a TOTAL number of substrings in MainString (a line of a text doc)...and then put $tot_BC in an array so at the end I have an array with one element for each line read.
I wasted today's day just solving this problem :(
Still hanging .... Its just not working ...even the MAIN STRING contains the occurrences of substrings but at the end the array is empty i.e. all zero elements
Nedals
12-01-2008, 08:13 PM
The code I provided is working code in that it will detect the substring in the sentence.
I did not, however, understand what your $words_found, @BC_array, and $tot_BC are supposed to represent/contain.
My guess...
$words_found appears to contain a string of the multiple iterations of substrings
@BC_array contains the number of times any substring matches in the sentence using $tot_BC
What are you trying to accomplish? What do you do with the results?
kimskams80
12-01-2008, 08:20 PM
I wrote all in Detail..Word-Found is just a string that contains all the substrings found in main string (line read) separated by ___ .. it was just for debugging..
I just want to find all occurences of a substring in main string..keep it adding so that at the end I know total number of substrings found in one sentence i.e. line read from file..
and then I put this number tot_BC in an array element like if File I am reading to calculate the number of substrings in it , contains 10 Lines then I will calculate number of strings in 10 lines and then PUT total in array ..
array element number 1 will contain the total number of substrings found in line 1 of document being gread..i hope u understand
Nedals
12-01-2008, 08:33 PM
Like this?
my @file_content = (
'this is a test',
'it is a test',
'here to demonstrate this code',
'with these test substrings',
);
my @substrings = (
'is a',
'this',
'test',
);
my @counts = (); # Array to keep count of substrings in sentences
my $substr_count = 0;
for (@file_content) {
my $line = lc $_; # Line from file in lowercase
$substr_count = 0; # Clear count for next sentence
for (@substrings) { # Test to see if the line contains any of the substrings
if ($line=~/$_/) {
$substr_count++;
}
}
push @counts, $substr_count;
}
print join(',',@counts);
#Result: [3,2,1,1];
Sentence 1 has 3 substrings (this, is a, test)
Sentence 2 has 2 substrings (is a, test)
Sentence 3 has 1 substring (this)
Sentence 4 has 1 substring (test)
kimskams80
12-01-2008, 08:43 PM
Yes
Till now it was not working for me.. some problem.. I will tell u in morning whether it works or not ... I have a big program and its just part of it
Thanks anyway
Goodnight Dear
kimskams80
12-02-2008, 05:58 AM
First of all the program you wrote is supposed to check the existence of a substring in main string.. But I want TOTAL occurences of a substring in main string i.e. if there are multiple like "this is test" can occure twice in a main string.. second I don't know at all why my program is not working at all...
Here is the code .. x is Loop for reading Lines from document and k loop is reading substrings from another file...at the end i push tot_bc in array but array is empty even strings are present in document
:confused:
for ($x=0; $x<$length1; $x=$x+1){ # for each sentence in the file
print "GOING TO READ SENTENCE NUMBER $x\n";
######################################################
$tot_BC=0;
$cnt=0;
for ($k=0;$k<$L_BC;$k=$k+1) {
#$temp = quotemeta($BC[$k]); #quotemeta() is a standard perl function and it escapes all non-alphanumeric characters in your variable.
$phrase=lc($B_C[$k]);
$file_content[$x]=lc($file_content[$x]);
print "searching phrase $phrase\n";
if ($file_content[$x]==~/($phrase)/){
print "Yes FOUND WE HAVE FOUND\n";
$cnt=$cnt+1;
}
if ($cnt>0) {
$words_found=$words_found."____".$_;
}
$tot_BC=$tot_BC+$cnt;
} ## end of for k
Nedals
12-02-2008, 02:03 PM
I cannot tell you why your program is not working. I don't know what 'not working' means.
But....
for ($x=0; $x<$length1; $x=$x+1) ---> for ($x=0; $x<$length1; $x++)
but better to use the method I gave you
if ($file_content[$x]==~/($phrase)/) ---> if ($file_content[$x]=~/$phrase/){
$cnt=$cnt+1; ---> $cnt++;
$words_found=$words_found."____".$_; ---> $words_found .= "____$phase";
$tot_BC=$tot_BC+$cnt; ---> $tot_BC += $cnt;
Based on the code you supplied you do not 'use strict'. You should, it will save you a ton of headaches into the future.
To get TOTAL occurances, change this
for (@substrings) { # Test to see if the line contains any of the substrings
if ($line=~/$_/) {
$substr_count++;
}
}
-to-
for (@substrings) { # Test to see if the line contains any of the substrings
my @temp = ($line=~/$_/g); # Put every, 'g', matched substring into an array
$substr_count += scalar @temp; # add the array length to count
}
kimskams80
12-03-2008, 04:49 AM
Thanks for ur reply..
Problem has been identified.. Each substring being read contained a "NEWLINE" character so we have to remove that character using Chomp Function ...Thanks again