shawn3217
02-15-2006, 10:05 AM
I'm having a little trouble with a regular expression for parsing the "src" attribute from a frame tag. This is what I have so far:
$pattern = '/<[\s]*frame[\s]*src=[\'"]([^\'"]+)[\'"]([^>]+)?>/s';
This works fine if "src" follows the word frame, but not for something like this:
<FRAME NAME="TOP" MARGINHEIGHT=0 MARGINWIDTH=0 SCROLLING=NO NORESIZE SRC="top.htm">
I have a number of pages that I need to parse. The frame attributes do not appear in any predictable order. Can anyone help? I'm not so great with regular expressions.
On a side note, does anyone know how to account for \n or \r that appear between tags? For example:
<a
href="somepage.html">some page</a>
My link parsing pattern doesn't seem to work when there is a line break:
$pattern = '#HREF=[\'"]([^\'"]+)[\'"]([^>]+)?>#i';
$pattern = '/<[\s]*frame[\s]*src=[\'"]([^\'"]+)[\'"]([^>]+)?>/s';
This works fine if "src" follows the word frame, but not for something like this:
<FRAME NAME="TOP" MARGINHEIGHT=0 MARGINWIDTH=0 SCROLLING=NO NORESIZE SRC="top.htm">
I have a number of pages that I need to parse. The frame attributes do not appear in any predictable order. Can anyone help? I'm not so great with regular expressions.
On a side note, does anyone know how to account for \n or \r that appear between tags? For example:
<a
href="somepage.html">some page</a>
My link parsing pattern doesn't seem to work when there is a line break:
$pattern = '#HREF=[\'"]([^\'"]+)[\'"]([^>]+)?>#i';