I am a newbie. I want to know how to extract/scrap dynamic content from web pages. I am able to get the static contents using Web harvest API. Thanks in advance for helping.
I am doing a trial assignment on web scraping. I m a fresher. Using Web harvest (Java) API i am able to extract static content. But some data are enclosed inside javascript functions and html element. Need some guidance. Thanks in advance for helping.
If the javascript is completing the content you'll need to parse through it to find out where the data is actually coming from (i.e. javascript variable, ajax call, etc).
@criterion : the contents are present within html elements
For example :
<div id='tttt'> Hai </div>
<div id='zzzz'> Hello </div>
<li> <a ... > yyyyy </a> </li>
<li> <a ... onclick=return viewdetails('12345','yyyyyy','3435534','','rtefdg')> View </a> </li>
I am able to extract the contents "Hai" and "Hello". But i am unable to extract the contents "yyyyyy" ,"3435534". Because the contents are present within html tag. Currently i am using Web harvesting API for extracting the contents from website. This API gives the result after filtering the html elements. So that i am unable to extract html attributes value.
@criterion : the contents are present within html elements
For example :
<div id='tttt'> Hai </div>
<div id='zzzz'> Hello </div>
<li> <a ... > yyyyy </a> </li>
<li> <a ... onclick=return viewdetails('12345','yyyyyy','3435534','','rtefdg')> View </a> </li>
I am able to extract the contents "Hai" and "Hello". But i am unable to extract the contents "yyyyyy" ,"3435534". Because the contents are present within html tag. Currently i am using Web harvesting API for extracting the contents from website. This API gives the result after filtering the html elements. So that i am unable to extract html attributes value.
What does the viewdetails function look like? Is it using AJAX?
I think, there is no ajax call. Viewdetails function performs to display the parameter value (data) into new small window in that same page when you click view link in that page. The Viewdetails functions gets parameter value when the site loaded initially. I want to scrape the parameter value from site.
I think, there is no ajax call. Viewdetails function performs to display the parameter value (data) into new small window in that same page when you click view link in that page. The Viewdetails functions gets parameter value when the site loaded initially. I want to scrape the parameter value from site.
Instead of guessing what might happen can you just post the function? I can guess at an answer or blindly suggest all sorts of things that may not help at all.
Bookmarks