Click to See Complete Forum and Search --> : Regular expression for parsing <frame> tags


shawn3217
02-15-2006, 10:05 AM
I'm having a little trouble with a regular expression for parsing the "src" attribute from a frame tag. This is what I have so far:

$pattern = '/<[\s]*frame[\s]*src=[\'"]([^\'"]+)[\'"]([^>]+)?>/s';

This works fine if "src" follows the word frame, but not for something like this:

<FRAME NAME="TOP" MARGINHEIGHT=0 MARGINWIDTH=0 SCROLLING=NO NORESIZE SRC="top.htm">

I have a number of pages that I need to parse. The frame attributes do not appear in any predictable order. Can anyone help? I'm not so great with regular expressions.

On a side note, does anyone know how to account for \n or \r that appear between tags? For example:

<a
href="somepage.html">some page</a>

My link parsing pattern doesn't seem to work when there is a line break:

$pattern = '#HREF=[\'"]([^\'"]+)[\'"]([^>]+)?>#i';

pyro
02-15-2006, 10:21 AM
Try this:

<?PHP
$str = '<FRAME NAME="TOP" MARGINHEIGHT=0 MARGINWIDTH=0 SCROLLING=NO NORESIZE SRC="top.htm">';
preg_match('/<frame(.*?)src="(.*?)"/i', $str, $matches);
echo $matches[2];
?>

NogDog
02-15-2006, 11:39 AM
Here's my version (match will be in $matches[1]):

$pattern = '/<\s*frame\b[^>]*\bsrc=[\'"]([^\'"]+)[\'"][^>]*>/isU';

ShrineDesigns
02-15-2006, 02:06 PM
/<frame(?!set)[^<>]*src=[\'\"]?([^<>\'\"]*)[\'\"]?>/i

shawn3217
02-15-2006, 02:08 PM
Thank you both. NogDog's version worked best for me because it also handles line breaks as well. Thanks again.