www.webdeveloper.com
Results 1 to 11 of 11

Thread: Scrap Dynamic Content

  1. #1
    Join Date
    Jul 2012
    Posts
    5

    Scrap Dynamic Content

    Hi all,

    I am a newbie. I want to know how to extract/scrap dynamic content from web pages. I am able to get the static contents using Web harvest API. Thanks in advance for helping.

    Regards,
    Nakul Sargur

  2. #2
    Join Date
    Dec 2005
    Location
    FL
    Posts
    7,263

    Question Why?


    Why?

  3. #3
    Join Date
    Jul 2012
    Posts
    5
    I am doing a trial assignment on web scraping. I m a fresher. Using Web harvest (Java) API i am able to extract static content. But some data are enclosed inside javascript functions and html element. Need some guidance. Thanks in advance for helping.

  4. #4
    Join Date
    Jan 2009
    Posts
    3,346
    If the javascript is completing the content you'll need to parse through it to find out where the data is actually coming from (i.e. javascript variable, ajax call, etc).

  5. #5
    Join Date
    Jul 2012
    Posts
    5
    @criterion : the contents are present within html elements
    For example :
    <div id='tttt'> Hai </div>
    <div id='zzzz'> Hello </div>
    <li> <a ... > yyyyy </a> </li>
    <li> <a ... onclick=return viewdetails('12345','yyyyyy','3435534','','rtefdg')> View </a> </li>

    I am able to extract the contents "Hai" and "Hello". But i am unable to extract the contents "yyyyyy" ,"3435534". Because the contents are present within html tag. Currently i am using Web harvesting API for extracting the contents from website. This API gives the result after filtering the html elements. So that i am unable to extract html attributes value.

  6. #6
    Join Date
    Jan 2009
    Posts
    3,346
    Quote Originally Posted by Nakul Sargur View Post
    @criterion : the contents are present within html elements
    For example :
    <div id='tttt'> Hai </div>
    <div id='zzzz'> Hello </div>
    <li> <a ... > yyyyy </a> </li>
    <li> <a ... onclick=return viewdetails('12345','yyyyyy','3435534','','rtefdg')> View </a> </li>

    I am able to extract the contents "Hai" and "Hello". But i am unable to extract the contents "yyyyyy" ,"3435534". Because the contents are present within html tag. Currently i am using Web harvesting API for extracting the contents from website. This API gives the result after filtering the html elements. So that i am unable to extract html attributes value.
    What does the viewdetails function look like? Is it using AJAX?

  7. #7
    Join Date
    Jul 2012
    Posts
    5
    I think, there is no ajax call. Viewdetails function performs to display the parameter value (data) into new small window in that same page when you click view link in that page. The Viewdetails functions gets parameter value when the site loaded initially. I want to scrape the parameter value from site.

  8. #8
    Join Date
    Jan 2009
    Posts
    3,346
    Quote Originally Posted by Nakul Sargur View Post
    I think, there is no ajax call. Viewdetails function performs to display the parameter value (data) into new small window in that same page when you click view link in that page. The Viewdetails functions gets parameter value when the site loaded initially. I want to scrape the parameter value from site.
    Instead of guessing what might happen can you just post the function? I can guess at an answer or blindly suggest all sorts of things that may not help at all.

  9. #9
    Join Date
    Jul 2012
    Posts
    5
    Code:
    <li id='tt2#10383004'><a onclick="new ContactDetailDiv('10383004','Alpine Housing','Pc7mCSeZzz4=','Builder','Syed Baseeruddin','3327','N','55.0 Lac(s)','Sale','/property-builder-details/developer-Alpine-Housing-in-Bangalore&operating-in=Bangalore&id=Pc7mCSeZzz4=?stdStatus=Y','/property-for-sale-rent/real-estate-builder-in-Bangalore-developer-Alpine-Housing&id=Pc7mCSeZzz4=');viewContactsDivDWR('10383004','9972595345','','91-80-32975001','search','property','false','N');_gaq.push(['_trackPageview', '/view-contact-details.html']);" href='javascript:void(0);' >View Contact Details</a></li>

  10. #10
    Join Date
    Jan 2009
    Posts
    3,346
    That is alot more detail...just need one more bit. The part that defines what "ContactDetailDiv()" does/is. Looks like it is an object.

  11. #11
    Join Date
    Oct 2013
    Posts
    1
    Quote Originally Posted by Nakul Sargur View Post
    @criterion : the contents are present within html elements
    For example :
    <div id='tttt'> Hai </div>
    <div id='zzzz'> Hello </div>
    <li> <a ... > yyyyy </a> </li>
    <li> <a ... onclick=return viewdetails('12345','yyyyyy','3435534','','rtefdg')> View </a> </li>

    I am able to extract the contents "Hai" and "Hello". But i am unable to extract the contents "yyyyyy" ,"3435534". Because the contents are present within html tag. Currently i am using Web harvesting API for extracting the contents from website. This API gives the result after filtering the html elements. So that i am unable to extract html attributes value.
    If you see the actual values in page source and details are always same in number (as it seems to me), then why don't you just try regex?

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles