Click to See Complete Forum and Search --> : Fetch web pages into text files


gmelamed
06-07-2003, 10:56 AM
I am looking for an easier way to handle automatic fetch and save of a web pages. I have more than 100 http adrresses that I want to fetched and save as text file. (I will later parse the files to capture the information I am looking for).
I was thinking that DOS program that take as an argument the http address and then use dos re-direction symbols ( ">" sends the output) could be a good solution.
I am looking for either this dos program that will do this trick, or a different approch to the task.

Thanks,
Gilad

Charles
06-07-2003, 11:09 AM
I use Perl for that sort of thing all of the time. (For Windows see http://www.activestate.com/Products/ActivePerl/). And here is an example of a script that I tossed together to grab some files off of the internet.

#!c:\perl\bin\perl.exe

use strict;
use HTML::LinkExtor;
use LWP::Simple;

my $source = 'http://www.mde.state.md.us/Programs/WaterPrograms/SedimentandStormwater/stormwater_design/index.asp';
my $destination = 'c:\texts\md378\\';

sub callBack {
my($tag, %attr) = @_;
if ($attr{href} && $attr{href} =~ m|/([^/]+\.pdf)$|) {
print "\n$attr{href}";
print "\tError" if (is_error mirror $attr{href}, "$destination$1");
};
};

my $p = HTML::LinkExtor->new(\&callBack, $source);

$p->parse (get $source);
$p->eof;