Click to See Complete Forum and Search --> : Newbie PHP question: need help parsing and removing specific text.


OM2
04-11-2008, 09:47 AM
I'm writing some code where I pull in the source code of my HTML page and print out on a new page the exact source code.

I need to add an extra that I can't get my head round how to do.

What I want, in the text of the source code, any text that is like the following:

<-- My admin code start -->
// code
// code
// code
<-- My admin code end -->

Should be omitted in the print out that I make.
The bit in the middle can be of any length.

ALSO: there will be several instances of the bits I want to leave out.

The fact that the bit in the middle is not fixed and the fact that there might be several instances of the text is what throws me - I don't know where to start.
I'm sure it's probably only 2 or 3 lines of code! Or is it a little more complex than this...?

Any help would be appreciated.

Thanks.


OM

blue-eye-labs
04-11-2008, 10:10 AM
you could use regex: preg_replace() and match the start and end patterns I suppose...
Or you could split the string up using a " " delimiter and write a loop function to find the end of the admin code...
I would recommend using regex.

TJ111
04-11-2008, 10:12 AM
If you get the file contents via fopen, fread, etc, then you can use string functions to replace the bits you want.

Using your example, here is something that should (untested) replace everyting between those two comments, including the comments themselves.

$file = "some/file.html";

$fd = fopen($file, "r");
$contents = fread($fd, filesize($file));
fclose($fd)

$contents = preg_replace("/<!--\s*My admin code start\s*-->.*?<!--\s*My admin code end\s*-->/g", "", $contents);



If you want more info on regular expressions, read the tutorial in my signature.

OM2
04-11-2008, 10:38 AM
thanks for the reply. that's brilliant.
i think there must be a small bug though: it's not working for me. :(

here's my code (with ur code added):


<?php
$url = "http://" . $_SERVER[ 'HTTP_HOST' ] . "/" . urldecode( $_GET[ "u" ] );
$lines = file( $url );
?><?php

foreach( $lines as $line_num => $line ) {

$line = htmlspecialchars( $line );
$line = str_replace( "&lt;", '<span>&lt;', $line );
$line = str_replace( "&gt;", '&gt;</span>', $line );
$line = str_replace( "&lt;!--", '<em>&lt;!--', $line );
$line = str_replace( "--&gt;", '--&gt;</em>', $line );

$line = preg_replace("/<!--\s*Admin start\s*-->.*?<!--\s*Admin end\s*-->/g", "", $line);

echo $line . "<br/>\n";

}
?>


its giving me this error:

"Warning: preg_replace() [function.preg-replace]: Unknown modifier 'g' in /home/trade/public_html/duplicatingdirs/rishi/test1/mainFiles/source.php on line 14"

i'm looking up preg_replace now: but let me know if u know what the problem is...
thanks.

TJ111
04-11-2008, 10:44 AM
Oh, sorry, remove the "g" at the end of the first string. Also, it looks like your HTML chars are converted into HTML special chars, so you may need to change the "<" and ">" to $lt; and &gt; .

OM2
04-11-2008, 10:45 AM
hmmm... trying to learn regular expressions in 5 minutes isn't possible.
i cant seem to find the answer. :(

OM2
04-11-2008, 10:55 AM
still no joy i'm afraid. :(

this what i've tried:

<?php
$url = "http://" . $_SERVER[ 'HTTP_HOST' ] . "/" . urldecode( $_GET[ "u" ] );
$lines = file( $url );
?><?php

foreach( $lines as $line_num => $line ) {

$line = htmlspecialchars( $line );
$line = str_replace( "&lt;", '<span>&lt;', $line );
$line = str_replace( "&gt;", '&gt;</span>', $line );
$line = str_replace( "&lt;!--", '<em>&lt;!--', $line );
$line = str_replace( "--&gt;", '--&gt;</em>', $line );

$line = preg_replace("/<!--\s*Admin start\s*-->.*?<!--\s*Admin end\s*-->/", "", $line);

echo $line . "<br/>\n";

}

?>

(the above just removing the g)

and this:

<?php
$url = "http://" . $_SERVER[ 'HTTP_HOST' ] . "/" . urldecode( $_GET[ "u" ] );
$lines = file( $url );
?><?php

foreach( $lines as $line_num => $line ) {

$line = htmlspecialchars( $line );
$line = str_replace( "&lt;", '<span>&lt;', $line );
$line = str_replace( "&gt;", '&gt;</span>', $line );
$line = str_replace( "&lt;!--", '<em>&lt;!--', $line );
$line = str_replace( "--&gt;", '--&gt;</em>', $line );

$line = preg_replace("/$lt;!--\s*Admin start\s*--&gt;.*?$lt;!--\s*Admin end\s*--&gt;/", "", $line);

echo $line . "<br/>\n";

}

?>

removing the g got rid of the error.
but it's not catching the text i hope it would.
as a trial, i'm using the following text:

<!-- Admin start -->
Catch me!
<!-- Admin end -->

any suggestions?
thanks.

OM2
04-11-2008, 12:36 PM
this is really bugging me. :(
i've spent a few hours tryinmg to figure out how to do it!!!

u said: $lt;
i assume u meant &lt;?

i've tried this... and still have no luck.

what i have now is:

$line = preg_replace("/&lt;!--\s*Admin start\s*--&gt;.*?&lt;!--\s*Admin end\s*--&gt;/", "", $line);

did a bit of reading from ur website. i get everything apart from the use of the '?'

i don't think what i want to do is especially exrtraordinary...?

if it helps... the text i would need to match can be contained all on one line?
not sure if that makes a difference?

ALSO: how processor intensive is doing such a search?
i would imagine it takes a lot of cpu?

TJ111
04-11-2008, 01:17 PM
Can you show me the source of what is output by this (just the area around the block you want to remove)?

OM2
04-11-2008, 01:43 PM
well... i have 2 files.
index.php:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Regular Expression Problem</title>
</head>

<body>

<!-- astart -->
code i want to leave out<br /><br />
<!-- aend -->


<a href="source.php?u=<?= $_SERVER[ "PHP_SELF" ] ?>" target="_blank">View Source</a>
<br />
<br />
<img src="images/content.jpg" width="200" height="150" /><br />
<br />
<img src="images/fafi_-_girls_rock.jpg" width="391" height="217" /><br />
<br />
<img src="images/virtual.jpg" width="150" height="100" />
</body>
</html>


the source.php files doing the regular experession:

<?php
$url = "http://" . $_SERVER[ 'HTTP_HOST' ] . "/" . urldecode( $_GET[ "u" ] );
$lines = file( $url );
?><?php

foreach( $lines as $line_num => $line ) {

$line = htmlspecialchars( $line );
$line = str_replace( "&lt;", '<span>&lt;', $line );
$line = str_replace( "&gt;", '&gt;</span>', $line );
$line = str_replace( "&lt;!--", '<em>&lt;!--', $line );
$line = str_replace( "--&gt;", '--&gt;</em>', $line );

$line = preg_replace("/&lt;!--\s*astart\s*--&gt;.*&lt;!--\s*bstart\s*--&gt;/", "", $line);

echo $line . "<br/>\n";

}

?>

finally... the output i am getting:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Regular Expression Problem</title>
</head>

<body>

<!-- astart -->
code i want to leave out<br /><br />
<!-- aend -->


<a href="source.php?u=//regularexpression/index.php" target="_blank">View Source</a>
<br />
<br />
<img src="images/content.jpg" width="200" height="150" /><br />
<br />
<img src="images/fafi_-_girls_rock.jpg" width="391" height="217" /><br />
<br />
<img src="images/virtual.jpg" width="150" height="100" />
</body>
</html>

this is doing my head in. :(

let me know if u can suggest anything.
thanks.

TJ111
04-11-2008, 02:17 PM
Ahh, thats because your using the file function, which returns the file as an array, not a string. Also, I forgot the "s" flag. Also threw in the "i" flag to make it case insensitive.

<?php
$url = "http://" . $_SERVER[ 'HTTP_HOST' ] . "/" . urldecode( $_GET[ "u" ] );



$fd = fopen($url, "r");
$contents = fread($fd, filesize($url));
fclose($fd);

$contents = preg_replace("/<!--\s*astart\s*-->.*?<!--\s*aend\s*-->/is", "", $contents);

$contents = htmlspecialchars($contents);
$find = array(
"&lt;",
"&gt;",
"&lt;!--",
"--&gt;"
);

$replace = array(
'<span>&lt;',
'&gt;</span>',
'<em>&lt;!--',
'--&gt;</em>'
);

$contents = str_replace($find, $replace, $contents);


echo nl2br($contents);


?>

OM2
04-11-2008, 02:44 PM
WOW!
erm... i think i would have been here for a few more days without u!
i ran the code... and came up with the following errors:

Warning: filesize() [function.filesize]: stat failed for http://www.trademerchants.com//regularexpression/index.php in /public_html/regularexpression/source.php on line 5

Warning: fread() [function.fread]: Length parameter must be greater than 0 in /public_html/regularexpression/source.php on line 5

looking through ur code now and trying to understand.
let me know if u can fix the problem - i think we're nearly there!
THANKS!

TJ111
04-11-2008, 02:46 PM
Are there supposed to be two forward slashes in the url?

OM2
04-11-2008, 02:59 PM
erm... no... i did spot that before... id din't bother getting rid of the extra slash since it didn't seem to effect anything.
i got rid of the extra slash: still same error reported.
let me know what u think. thanks.

OM2
04-11-2008, 03:01 PM
just to confirm, i now have:

$url = "http://" . $_SERVER[ 'HTTP_HOST' ] . urldecode( $_GET[ "u" ] );

TJ111
04-11-2008, 03:11 PM
If its located on the same server, just use local file paths instead of http:// filepaths, your allow_url_fopen might not be enabled in your php.ini

OM2
04-11-2008, 03:43 PM
ok... that worked.
i'm now just putting $url='index.php'

i had look on php.net for filesize.
there's a few good examples of code.

one of them was:

function urlfilesize($url,$thereturn) {
if (substr($url,0,4)=='http') {
$x = array_change_key_case(get_headers($url, 1),CASE_LOWER);
$x = $x['content-length'];
}
else { $x = @filesize($url); }
if (!$thereturn) { return $x ; }
elseif($thereturn == 'mb') { return round($x / (1024*1024),2) ; }
elseif($thereturn == 'kb') { return round($x / (1024),2) ; }
}

i tried to adjust this and use it in ur code.
it worked one way: but reported an error about the wrong number of arguements.
but: then i tried commenting out and redefining the funciton and make it take only one variable: it didn't like that.

BUT: i just realised... i DON'T want to print out th contents of the php file.
i want to print out the html that the php file makes. :( :( :( :( :(

TJ111
04-11-2008, 03:45 PM
I'm confused now lol. Show me what you have now.

OM2
04-11-2008, 04:05 PM
thanks for ALL the replies. u've been a star! :)
i got my solution (well... i think!).
i used the old code i had + ur guidance on what i was doing wrong + ur preg_replace function... and here's what we have:

<?php
$url = "http://" . $_SERVER[ 'HTTP_HOST' ] . urldecode( $_GET[ "u" ] );
$lines = file( $url );
?>
<?php
$wholePage = "";

foreach( $lines as $line_num => $line )
{
$line = htmlspecialchars( $line );
$line = str_replace( "&lt;", '<span>&lt;', $line );
$line = str_replace( "&gt;", '&gt;</span>', $line );
$line = str_replace( "&lt;!--", '<em>&lt;!--', $line );
$line = str_replace( "--&gt;", '--&gt;</em>', $line );

$wholePage = $wholePage . $line;
}

$wholePage = preg_replace("/&lt;!--\s*astart\s*--&gt;.*?&lt;!--\s*aend\s*--&gt;/is", "", $wholePage);

echo $wholePage;
?>

(i left out the call to nl2br() on purpose.)

does that look ok...? :)

TJ111
04-11-2008, 04:22 PM
Yeah, that looks good. You can use implode though to turn it into a string, might shorten up the code a bit. Thats optional though.


<?php
$url = "http://" . $_SERVER[ 'HTTP_HOST' ] . urldecode( $_GET[ "u" ] );
$lines = file( $url );

foreach( $lines as $line )
{
$line = htmlspecialchars( $line );
$line = str_replace( "&lt;", '<span>&lt;', $line );
$line = str_replace( "&gt;", '&gt;</span>', $line );
$line = str_replace( "&lt;!--", '<em>&lt;!--', $line );
$line = str_replace( "--&gt;", '--&gt;</em>', $line );
}

$wholePage = preg_replace("/&lt;!--\s*astart\s*--&gt;.*?&lt;!--\s*aend\s*--&gt;/is", "", implode($lines));

echo $wholePage;
?>

OM2
04-11-2008, 04:27 PM
surely that wont work?
the reason y it wasn't working in the first place was because i was applying the preg_replace to $line - which was only a line stored in an array (as opposed to being needed to be applied to the whole string in full).

in ur code... where does $wholeString come from?
i cant se it coming from any where?

let me know what u think.
thanks.

OM2
04-11-2008, 04:29 PM
ooops i take tthat back.
i was seeing it on a seperate line.
i think ur cod ei sbetter.
i'll use that. :)

OM2
04-11-2008, 04:31 PM
erm actually... that doesnt work after all. :)
i need to print out the source code.
whats being printed out is the html??
so the output is the website itself.

TJ111
04-11-2008, 04:39 PM
<?php
$url = "noname5.php";
$lines = file( $url );

foreach( $lines as &$line )
{
$line = htmlspecialchars( $line );
$line = str_replace( "&lt;", '<span>&lt;', $line );
$line = str_replace( "&gt;", '&gt;</span>', $line );
$line = str_replace( "&lt;!--", '<em>&lt;!--', $line );
$line = str_replace( "--&gt;", '--&gt;</em>', $line );
}
$wholePage = preg_replace("/&lt;!--\s*astart\s*--&gt;.*?&lt;!--\s*aend\s*--&gt;/is", "", implode($lines));
echo nl2br($wholePage);
?>

Works now :). I just changed "$lines as $line" to "$lines as &$line", so $line is just a reference to the item in the array $lines. If that makes sense.

Edit: make sure to change $url back.

OM2
04-15-2008, 07:29 PM
for some reason: not all automatic messages of new replies are sent: so i didnt know ud replied.

thanks for all the help.