Faster strpos() ?
I'm trying to write a script that grabs and filters integers from pages under a different domain. It looks like this:
Which should be easy to interpret.
1: $value = file_get_contents("http://sample.url.com");
2: $begin_pos = strpos($value,"<b>Current Value:</b>");
3: $raw_val = substr($value,$begin_pos,30);
4: $clean_val = filter_var($raw_val, FILTER_SANITIZE_NUMBER_INT);
1: Reads the source of the target page into a string.
2: Searches for a string within the source that will indicate where the integer is.
3: Goes to that spot in the source and returns the next 30 characters which should contain the integer.
4: Filters the 30 characters for the integer alone.
The script works as it should, but it's too slow. On top of this, my entire script is written to output around 150 integers the same way. It takes 6-8 minutes to finish.
Is there faster way to do this, even if it involves another language? Thanks for reading.
Can you post a sample of what you are scanning, and mark in red all the parts you are trying to capture.
For instance http://www.sample.url.com:
<b>Current Value:</b> 1,413
The actual pages I'm trying to pull values from each have 15kb+ source. That may be the reason the script takes so long, strpos() has a lot of source to sift through. I'm not sure.
Last edited by okendoze; 05-07-2009 at 03:56 PM.
Why not just use a regex?
I'll have to look that up. Would it be quicker?
Maybe. It's hard to say until you try.
Probably the slowest part of all of that is the file_get_contents() of the remote file. Anything we talk about here to optimize parsing/selecting text from it probably will be dealing in milliseconds at most, whereas the HTTP retrieval may be measured in whole seconds.
"Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
~ Terry Pratchett in Nation
How to Ask Questions the Smart Way
(not affiliated with this site, but well worth reading)
For info, strpos is MUCH faster than using the regex engine, but listen to NogDog, he's right about this.
Originally Posted by okendoze
I see. I was thinking I could write a script to pull the values from the remote pages on its own every hour or so, and have it write the values to a text file on my server. Then I could use this script to pull the values from the text file and it would probably be much faster. But how I would have a script execute on it's own every hour, I don't know.
Last edited by okendoze; 05-07-2009 at 06:22 PM.
You might also consider using AJAX to send asynchronous requests if you dont care about doing them in a specific order. Should cut the time dramatically.
Last edited by Mindzai; 05-07-2009 at 06:32 PM.
Cron looks like it would do the trick, but I'm on a shared hosting contract that doesn't support it. I've never really studied AJAX, so I'm not sure how it could be used for this purpose. Maybe I'll set this project aside until I'm better prepared. Thanks for your help, bokeh, NogDog, & Mindzai.
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)
Tags for this Thread