Click to See Complete Forum and Search --> : html page query
buggy
09-17-2004, 12:23 PM
I have an array which contains 7000, url's, I am using getstore() in a loop to get each one of the page and store it onto my c:\ and it is taking a long time (hours), i realise I have to go over a network each time but is there anyway I can speed it up.
buggy
09-18-2004, 08:33 AM
Just to add I dont need to save the page at all, for each of the pages I am looking to extract 2 pieces of info from the page. If I could just read the file it would be fine, saving it doesn't matter. Would get() instead of getstore() make it any quicker.
silent11
09-20-2004, 09:24 AM
Originally posted by buggy
Just to add I dont need to save the page at all, ... Would get() instead of getstore() make it any quicker.
I think most of your time is being used simply downloading the files, not writing to disk, however if you don't need to save the files locally, dont.
You could use threads, or fork() to grab many sites at the same time to increase the speed or your program.
buggy
09-20-2004, 11:13 AM
Thank you for your reply.
Firstly to correct my post there are 27000 url's in the array.
16,000 from one site and 11,000 from the 2nd site. The first site has urls of the form:
http://jobs.nicemove.ie/viewjob.asp?job=173771&t=920200440804PM
and the second of the form
http://www.jobs.ie/detail.php?NXS=07c593965baba8ac88c6c9a6062f6224&_t=s&_c=4&_r=260&csp=1&ID=85921
The numbers will only change. They are all in an array curr_file. How would I go about using threads or fork to speed it up. I dont need to save them really, i am just looking for the file info to extract 2 pieces of information. I can take the job and category id to get test it.