I have had a problem with some scripts I wrote (Screenscrapers) that worked great in PHP4, but stopped working the minute I upgraded to PHP5.
I can change all of my filenames to have the .PHP4 extension and this solves the problem, but since this encompasses a number of sites, internal links and hundreds of files, this is not my first choice solution.
Here is the scraper, what it does, is it takes items from the zazzle Results Page by category, strips out the formatting, adds my affiliate ID and then I can present these items on my page.
PHP Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Test of scrape</title>
$rf_id="238219236805025733";
// Regular expression to parse "&rf=" and the $rf_id into the existing link
$page=preg_replace("/(.*?)(href\s*=+\s*[\"\'])(.*?)([\"\'])(.*?)/is","$1$2$3?rf=$rf_id$4$5",$page);
My ideal solution would be an .htaccess file that I could put in any directory under PHP5 to make it default to php4.
I have tried this, to no avail (.htaccess):
Define: does not work. Are there any error messages in the PHP error log?
Most upgrade problems I've seen are due to a change in the PHP configuration, not actual version issues (e.g. dependence on register_globals, or in this case maybe due to allow_url_fopen being turned off would be a likely candidate).
"Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
~ Terry Pratchett in Nation
The problem is in the regular expression for the href replacements. Some experimenting with adding in: echo $page; before or after that preg_replace line will show you this. This works for me in PHP 5:
PHP Code:
<?php
$page = file_get_contents("http://www.zazzle.com/cool+smiley+gifts");
//comment out the <span> tags completely
$page = preg_replace('/<span/', "<!-- <span", $page);
$page = preg_replace('/<\/span>/', "<\/span> -->", $page);
$page = preg_replace('/<a /', "<a rel=\"nofollow\" ", $page);
$rf_id="238219236805025733";
// Regular expression to parse "&rf=" and the $rf_id into the existing link
$page = preg_replace("/(href\s*=+\s*\"[^\"]*)/is", "$1?rf=$rf_id", $page);
$test = explode('<div style="position:relative" class="clearfix">',$page);
for($t=1;$t<=count($test)-2;$t++){
print "<div class=\"gridCellInfo\" id=\"page_products_$t\">"; //append _$t to the id, invalid html to have multiple elements same id
print $test[$t];
}
?>
The regex did not seem to like the question marks after the asterisk's so much, and that seemed unnecessary anyhow.
Last edited by astupidname; 02-26-2011 at 09:18 PM.
Bookmarks