Click to See Complete Forum and Search --> : [RESOLVED] Loops gone wild!!!!


TheDragonReborn
10-12-2008, 07:59 AM
Hello to all,

I am having an issue with a program and was hoping someone could offer some advice. Please keep in mind that I am a novice programmer and that I may not be as fluent as I should be in the lingo of PHP. Everything works fine (mySQL connection, file upload, data extraction). My only issue is when it comes to the loops I am using to print my data and put that data into a database. For some reason, the php page is repeating certain lines dozens of times, and the lines that contain data multiple times. What is expected is that each entry should be displayed once and entered into the database once.

That being said, Here is my code.

<?php

/////////////////////////////////////////////////////////////////////////////
////////// CONNECT TO DATABASE AND OPEN FILE //////////////////////////////
/////////////////////////////////////////////////////////////////////////////

// Eliminate Error Notice
error_reporting(E_ALL & ~E_NOTICE) ;

// This will get an input file from an html page and deposit it
// into a mySQL database
// Last modified - 10/12/08

// Test if file was uploaded
if(! $_FILES['dataFile']['tmp_name'])
{
echo "<html><body><h1>No file uploaded</h1></body></html>" ;
exit ;
}

// Test connection to mySQL
// Don't forget to change the password to '*******' !!!!!
$link = mysql_connect( 'localhost', '******', '*******' )
or die( 'Could not connect to mySQL:'.mysql_error() ) ;

echo "Successful connection !! \n\n" ;

// Test connectionto database 'gkoenig'
mysql_select_db( 'gkoenig')
or die('Could not connect to database!') ;

echo "Successful selection of database !! \n\n\n" ;

// This is where we input the file into the database
// using mySQL commands

$fh = fopen( $_FILES['dataFile']['tmp_name'], 'r') ;

//////////////////////////////////////////////////////////////////////////
//////////// GET SEQUENCE(S) FROM FLAT FILE ////////////////////////////
//////////////////////////////////////////////////////////////////////////

// Parse through Swiss-Prot file and extract the seq_id (Accession),
// seq_type (PRT,DNA, RNA), and seq_data

// Enumerate different sequences added
$count = 0 ;

$seq_id = Array() ;
$seq_type = Array() ;

$sequence = Array() ; // Data split into array (Buffer for sequence data)
$seq_data = Array() ; // Data stored in array starting


//$token = Array() ;
//$switch = True ;

while($text = fgets($fh))
{

// Grab the seq_id from lines that start with 'AC'
if(strstr("$text", "AC") AND (strpos("$text", "AC")==0))
{
// $token = strtok($text, " \n\t") ;
// $seq_id = $token[1] ;
// $switch = False ;


$line = str_split($text) ;
$seq_id[$count] = $line[5].$line[6].$line[7].$line[8].$line[9].$line[10] ;

// Don't stop at anymore "AC" strings that may be in file
//$switch = False ;

// Empty sequence array so that it can accept new data
while ($sequence)
{
array_pop($sequence) ;
}

}

// Stop at sequences and collect
if(strstr("$text", "SQ") AND (strpos("$text", "SQ")==0))
{
// Collect sequence and add to array one at a time
while( ($c = fgetc($fh)) != "/")
{
array_push($sequence,$c) ;
}

}

// Determine the sequence type
foreach($sequence as $char)
{
if($char != 'A' || $char != 'C' || $char != 'G' || $char != 'T')
{
$seq_type[$count] = "PRT" ;
break ;

}
else
{
$seq_type[$count] = "DNA" ;
}
}


// Turn array into long string
$seq_data[$count] = implode($sequence) ;

// Turn switch back on after sequence is collected so that
// if multiple sequences are present, it can begin anew
// The next "AC" after sequence would be from new sequence
// Hopefull no "AC" in first line
//$switch = True ;

// In case of more then one sequence
$count += 1 ;

}

// Get total number of sequences for "for-loop"
$number_of_seq = $count+1 ; // Started $count at 0



?>

<html>
<head>
<title>Sequences in Database</title>
</head>

<body>
<h1>Sequences in Database</h1>

<?php

THE DREADED LOOP


/// TAKE THIS OUT AFTER TEST !!!!!

for ($i = 0 ; $i < $number_of_seq ; $i++)
{
echo "<br/>" ;
echo " \n\nHere is the sequence id: $seq_id[$i] \n\n<br/>" ;
echo " Here is the sequence type:$seq_type[$i] \n\n<br/>" ;
echo " Here is the sequence:$seq_data[$i] " ;
}



//////////////////////////////////////////////////////////////////
/////////// INSERTION INTO DATABASE ///////////////////////////
//////////////////////////////////////////////////////////////////

THE DREADED LOOP, AGAIN


// Add all sequences to database (DON'T FORGET THAT ARRAY STARTS AT 0)
for($x = 0 ; $x < $number_of_seq ; $x++)
{
// Put data into Database
$query = "INSERT into sequences VALUES ('".$seq_id[$x]."', '".$seq_type[$x]."', '".$seq_data[$x]."') " ;

$result = mysql_query($query)
or die('Data insertion failed:'.mysql_error() ) ;
}



// Show data that was input into 'sequences' Database

$query2 = 'SELECT * from sequences' ;

$result2 = mysql_query($query2)
or die('Query failed:'.mysql_error() ) ;

echo "<table border=1> \n" ;

while( $row = mysql_fetch_assoc($result2) )
{
echo "\t<tr>\t" ;

foreach ($row as $col)
{
echo "\t\t<td>$col</td>\n" ;

}

echo "\t</tr>\n" ;
}

echo "</table>\n" ;


?>

</body>

</html>

Any suggestions would be great. Also, I am using an HTML page to input the file to the php action file. I do not this could be the source, but I figured I would mention it.

Thank you

scragar
10-12-2008, 08:18 AM
try:

////// THE DREADED LOOP

for ($i = 0 ; $i < $number_of_seq ; $i++)
{
echo "<p>
\nHere is the sequence id: {$seq_id[$i]}
\n<br/>
Here is the sequence type: {$seq_type[$i]}
\n<br/>
Here is the sequence: {$seq_data[$i]} </p>" ;

$query = "INSERT into sequences VALUES ('{$seq_id[$i]}', '{$seq_type[$i]}', '{$seq_data[$i]}')";
$result = mysql_query($query) or die('Data insertion failed:'.mysql_error() ) ;

echo "<p>{$query}</p>";
}
in place of your current loop, it should give you better data and help you spot your problem.



PS: If your getting testing the start of the string a few times for different starting characters you find it better to use substr($text, 0, 2) (http://php.net/substr) which will return the first 2 characters of the string.

TheDragonReborn
10-12-2008, 08:35 AM
Still not resolving crazy loops. Here is the output page:

Successful connection !! Successful selection of database !!
Sequences in Database

Here is the sequence id:
Here is the sequence type:
Here is the sequence:

Here is the sequence id: Q04917
Here is the sequence type:
Here is the sequence:

Here is the sequence id:
Here is the sequence type:
Here is the sequence:

Here is the sequence id:
Here is the sequence type:
Here is the sequence:

... EVEN MORE REPEATS !!! ELIMINATED THEM TO FIT OUTPUT

Here is the sequence id:
Here is the sequence type:
Here is the sequence:

Here is the sequence id:
Here is the sequence type:
Here is the sequence:

Here is the sequence id:
Here is the sequence type:
Here is the sequence:

Here is the sequence id:
Here is the sequence type: PRT
Here is the sequence: MGDREQLLQR ARLAEQAERY DDMASAMKAV TELNEPLSNE DRNLLSVAYK NVVGARRSSW RVISSIEQKT MADGNEKKLE KVKAYREKIE KELETVCNDV LSLLDKFLIK NCNDFQYESK VFYLKMKGDY YRYLAEVASG EKKNSVVEAS EAAYKEAFEI SKEQMQPTHP IRLGLALNFS VFYYEIQNAP EQACLLAKQA FDDAIAELDT LNEDSYKDST LIMQLLRDNL TLWTSDQQDE EAGEGN

Here is the sequence id:
Here is the sequence type: PRT
Here is the sequence: MGDREQLLQR ARLAEQAERY DDMASAMKAV TELNEPLSNE DRNLLSVAYK NVVGARRSSW RVISSIEQKT MADGNEKKLE KVKAYREKIE KELETVCNDV LSLLDKFLIK NCNDFQYESK VFYLKMKGDY YRYLAEVASG EKKNSVVEAS EAAYKEAFEI SKEQMQPTHP IRLGLALNFS VFYYEIQNAP EQACLLAKQA FDDAIAELDT LNEDSYKDST LIMQLLRDNL TLWTSDQQDE EAGEGN

Here is the sequence id:
Here is the sequence type:
Here is the sequence:


What I want is just

Here is the sequence id: Q04917
Here is the sequence type: PRT
Here is the sequence: MGDREQLLQR ARLAEQAERY DDMASAMKAV TELNEPLSNE DRNLLSVAYK NVVGARRSSW RVISSIEQKT MADGNEKKLE KVKAYREKIE KELETVCNDV LSLLDKFLIK NCNDFQYESK VFYLKMKGDY YRYLAEVASG EKKNSVVEAS EAAYKEAFEI SKEQMQPTHP IRLGLALNFS VFYYEIQNAP EQACLLAKQA FDDAIAELDT LNEDSYKDST LIMQLLRDNL TLWTSDQQDE EAGEGN

Any suggestions???

scragar
10-12-2008, 08:47 AM
ok, then the loop you commented as being bad isn't the nasty one, it's your assignment of the variables, proberly because of your use of an index which isn't exactly well thought out.

TRY:

while($text = fgets($fh))
{
$tmp = substr($text, 0, 2);
if($tmp == "AC")
{
$seq_id[] = substr($text, 5, 5);

// Easy way to empty an array :p
$sequence = Array();

}
// Stop at sequences and collect
if($tmp == "SQ")
{
// Collect sequence and add to array one at a time
while(($c = fgetc($fh)) != "/")
{
array_push($sequence,$c) ;
}

}

// Determine the sequence type
foreach($sequence as $char)
{
if($char != 'A' || $char != 'C' || $char != 'G' || $char != 'T')
{
$seq_type[] = "PRT" ;
break ;

}
else
{
$seq_type[] = "DNA" ;
}
}


// Turn array into long string
$seq_data[] = implode($sequence);

}

// Get total number of sequences for "for-loop"
$number_of_seq = count($seq_id);
...
should work better, let me know.

TheDragonReborn
10-12-2008, 09:03 AM
Thanks for the tips ( I know my code isn't very professional). Good news is that if I use a file with one entry, it works. However, when I load a file with multiple sequences, it isn't the sequence part; only the seq_id and seq_type.

Here is the code

while($text = fgets($fh))
{

$tmp = substr($text, 0, 2) ;

// Grab the seq_id from lines that start with 'AC'
if($tmp == "AC")
{
$seq_id[] = substr($text, 5, 5);

// Easy way to empty an array :p
$sequence = Array();
}

// Stop at sequences and collect
if($tmp == "SQ")
{
// Collect sequence and add to array one at a time
while( ($c = fgetc($fh)) != "/")
{
array_push($sequence,$c) ;
}

}

// Determine the sequence type
foreach($sequence as $char)
{
if($char != 'A' || $char != 'C' || $char != 'G' || $char != 'T')
{
$seq_type[] = "PRT" ;
break ;

}
else
{
$seq_type[] = "DNA" ;
}
}


// Turn array into long string
$seq_data[] = implode($sequence) ;

// Turn switch back on after sequence is collected so that
// if multiple sequences are present, it can begin anew
// The next "AC" after sequence would be from new sequence
// Hopefull no "AC" in first line
//$switch = True ;

// In case of more then one sequence
//$count += 1 ;

}


I'm not sure but the line

// Turn array into long string
$seq_data[] = implode($sequence) ;

seems to be the culprit.


output looks like this

Successful connection !! Successful selection of database !!
Sequences in Database

Here is the sequence id: P0070
Here is the sequence type: PRT
Here is the sequence:

Here is the sequence id: Q8N1E
Here is the sequence type: PRT
Here is the sequence:

Here is the sequence id: Q86SG
Here is the sequence type: PRT
Here is the sequence:

Here is the sequence id: P6162
Here is the sequence type: PRT
Here is the sequence:

Here is the sequence id: Q6UWQ
Here is the sequence type: PRT
Here is the sequence:

Here is the sequence id: Q7Z4W
Here is the sequence type: PRT
Here is the sequence:

Here is the sequence id: Q96KX
Here is the sequence type: PRT
Here is the sequence:

Here is the sequence id: Q96QH
Here is the sequence type: PRT
Here is the sequence:

Here is the sequence id: O7595
Here is the sequence type: PRT
Here is the sequence:

Here is the sequence id: Q8IXA
Here is the sequence type: PRT
Here is the sequence:


???

scragar
10-12-2008, 09:11 AM
while($text = fgets($fh))
{

$tmp = substr($text, 0, 2) ;

// Grab the seq_id from lines that start with 'AC'
if($tmp == "AC")
{
$seq_id[] = substr($text, 5, 5);


// here we set the sequence for the last loop
if(count($seq_id) == 1)
$seq_data[] = implode($sequence) ;


$sequence = Array();
}

// Stop at sequences and collect
if($tmp == "SQ")
{
// Collect sequence and add to array one at a time
while( ($c = fgetc($fh)) != "/")
{
array_push($sequence,$c) ;
}

}

foreach($sequence as $char)
{
if($char != 'A' || $char != 'C' || $char != 'G' || $char != 'T')
{
$seq_type[] = "PRT" ;
break ;

}
else
{
$seq_type[] = "DNA" ;
}
}


// Turn array into long string
}
// and here we set the sequence for the very last loop
$seq_data[] = implode($sequence) ;


let me know if that solves the problem(I havn't been able to test it or anything, but it looks right).

TheDragonReborn
10-12-2008, 09:22 AM
No luck, printing only:

Successful connection !! Successful selection of database !!
Sequences in Database

Here is the sequence id: Q0491
Here is the sequence type: PRT
Here is the sequence:



I appreciate the time your spending to help.

scragar
10-12-2008, 09:25 AM
Doh!

implode($glue, $peices);

replace where I've got$seq_data[] = implode($sequence) ;
with$seq_data[] = implode('', $sequence);

TheDragonReborn
10-12-2008, 09:42 AM
Still no luck. Not picking up sequence. Strange because this wasn't an issue before.

TheDragonReborn
10-12-2008, 09:50 AM
SORRY. I take it back, its working fine. Ignore last post. :D

Thanks again for the help. Greatly appreciated.

scragar
10-12-2008, 09:51 AM
OK

EDIT: oh, right :P