I want to create a website which will srape a number of different websites to obtain details from them and store the information into a database. I want to do daily scraping and we are talking around 20,000 pages to scrape. For this kind of task, I want to use cloud services and I am looking at Amazon EC2 service.
The code that will do the scraping is in PHP, and I have a database table with the 20,000 pages to go and fetch. What is the best way to do a process like this? Services like this is new to me.
Ususally I would do a cron service to call the page however im sure there is a better, more efficiant way of doing this.