Click to See Complete Forum and Search --> : extract line/text from html


hudo
10-29-2003, 11:37 AM
Hello,


I'd like to download from mobile.de all offers of a dedicated make (eg: Adria
motorhome).
On each page are 20 offers displayed. The next 20 offers can be downloaded by
appending the parameter &top=21:

'http://www.mobile.de/SID.hWVIki2L-c0NtGY1DBC5w-t-vaNexlCsAsK%F3P~BmSB10LsearchPublicJ1067382559A1LsearchPublicIMotorhomeS-t-vpLtt~BmPA1B20A0/cgi-bin/searchPublic.pl?_form=search&sr_make=400&top=21',

Each page contains a line like:

<td height=14 valign="top" class=small><nobr><A onmouseover="window.status = 'letzte Seite'; return true;" onmouseout="window.status='';" HREF="http://www.mobile.de/SID1ahNXjaxupyK6DR2bJj0Iw-t-vaNexlCsAsK%F3P%F3R~BmSB10LsearchPublicJ1067436348A1LsearchPublicIMotorhomeY-t-vctpLtt~BmPA1A1B20C242X-t-vMk_xsO~BSRA6C400A0A0/cgi-bin/searchPublic.pl?bereich=womo&top=241&" CLASS="small"><FONT COLOR="#333333" CLASS="small">letzte Seite</FONT></A></nobr></td>

The particular expressions in this line are 'letzte Seite' and top=241
which indicates how to download the last page (by appending &top=241)

Ok, my problem is: How can I extract the parameter top=241 ???
I would like to fetch the first html-page parse it (but how ??) to extract this parameter
and download successively all relevant html-pages by appending top=21; top=41;
... top=241

Another question: Exists a possibility to fetch all relevant pages without
knowing the parameter top=241 ????



My draft-code :

---------------------------------------------------------------------------------
#!/usr/bin/perl -w

#!/usr/bin/perl -MLWP::Simple -e "getprint 'http://$ARGV[0]'" > $file

#use strict;
use warnings;
use LWP::Simple;

my $i=1;

### http://www.mobile.de/SID.hWVIki2L-c0NtGY1DBC5w-t-vaNexlCsAsK%F3P~BmSB10LsearchPublicJ1067382559A1LsearchPublicIMotorhomeS-t-vpLtt~BmPA1B20A0/cgi-bin/searchPublic.pl?_form=search&sr_make=400&sr_model=&doSearch.x=38&doSearch.y=16&sr_priceFrom=-2&sr_priceTo=-2&sr_mileageFrom=-2&sr_mileageTo=-2&sr_registrationDateFrom=-2&sr_registrationDateTo=-2&sr_category=-2&sr_powerRange=-2&sr_color=-2&sr_engineType=-2&sr_country=-2&sr_zip=&sr_zipRadiusTo=-2&sr_sortOrder=0&sr_daysOldTo=-2

#'http://www.mobile.de/SID.hWVIki2L-c0NtGY1DBC5w-t-vaNexlCsAsK%F3P~BmSB10LsearchPublicJ1067382559A1LsearchPublicIMotorhomeS-t-vpLtt~BmPA1B20A0/cgi-bin/searchPublic.pl?_form=search&sr_model=&doSearch.x=38&doSearch.y=16&sr_priceFrom=-2&sr_priceTo=-2&sr_mileageFrom=-2&sr_mileageTo=-2&sr_registrationDateFrom=-2&sr_registrationDateTo=-2&sr_category=-2&sr_powerRange=-2&sr_color=-2&sr_engineType=-2&sr_country=-2&sr_zip=&sr_zipRadiusTo=-2&sr_sortOrder=0&sr_daysOldTo=-2&sr_make=400&top=241',

foreach my $url (
'http://www.mobile.de/SID.hWVIki2L-c0NtGY1DBC5w-t-vaNexlCsAsK%F3P~BmSB10LsearchPublicJ1067382559A1LsearchPublicIMotorhomeS-t-vpLtt~BmPA1B20A0/cgi-bin/searchPublic.pl?_form=search&sr_model=&doSearch.x=38&doSearch.y=16&sr_priceFrom=-2&sr_priceTo=-2&sr_mileageFrom=-2&sr_mileageTo=-2&sr_registrationDateFrom=-2&sr_registrationDateTo=-2&sr_category=-2&sr_powerRange=-2&sr_color=-2&sr_engineType=-2&sr_country=-2&sr_zip=&sr_zipRadiusTo=-2&sr_sortOrder=0&sr_daysOldTo=-2&sr_make=400',
'http://www.mobile.de/SID.hWVIki2L-c0NtGY1DBC5w-t-vaNexlCsAsK%F3P~BmSB10LsearchPublicJ1067382559A1LsearchPublicIMotorhomeS-t-vpLtt~BmPA1B20A0/cgi-bin/searchPublic.pl?_form=search&sr_model=&doSearch.x=38&doSearch.y=16&sr_priceFrom=-2&sr_priceTo=-2&sr_mileageFrom=-2&sr_mileageTo=-2&sr_registrationDateFrom=-2&sr_registrationDateTo=-2&sr_category=-2&sr_powerRange=-2&sr_color=-2&sr_engineType=-2&sr_country=-2&sr_zip=&sr_zipRadiusTo=-2&sr_sortOrder=0&sr_daysOldTo=-2&sr_make=700',
'http://www.mobile.de/SID.hWVIki2L-c0NtGY1DBC5w-t-vaNexlCsAsK%F3P~BmSB10LsearchPublicJ1067382559A1LsearchPublicIMotorhomeS-t-vpLtt~BmPA1B20A0/cgi-bin/searchPublic.pl?_form=search&sr_model=&doSearch.x=38&doSearch.y=16&sr_priceFrom=-2&sr_priceTo=-2&sr_mileageFrom=-2&sr_mileageTo=-2&sr_registrationDateFrom=-2&sr_registrationDateTo=-2&sr_category=-2&sr_powerRange=-2&sr_color=-2&sr_engineType=-2&sr_country=-2&sr_zip=&sr_zipRadiusTo=-2&sr_sortOrder=0&sr_daysOldTo=-2&sr_make=1000',
'http://www.mobile.de/SID.hWVIki2L-c0NtGY1DBC5w-t-vaNexlCsAsK%F3P~BmSB10LsearchPublicJ1067382559A1LsearchPublicIMotorhomeS-t-vpLtt~BmPA1B20A0/cgi-bin/searchPublic.pl?_form=search&sr_make=400&top=241'
) {



### here follows my attempt to extract/print the desired parameter #######
#my $html = get ("$url")
#or die "Couldnt get ist\n";
#$html =~ m{letzte Seite};
#print "$html\n";
########################################


my $file =$ARGV[0] || die "Filename angeben !\n";
my $status = getstore( $url, "$file-$i.html" );

$i=$i+1;
}

#my $status = get "http://$url" > $file.html;

--------------------------------------------------------------------

Thanx in advance