www.webdeveloper.com
Results 1 to 12 of 12

Thread: Scrap Dynamic Content

  1. #1
    Join Date
    Jul 2012
    Posts
    5

    Scrap Dynamic Content

    Hi all,

    I am a newbie. I want to know how to extract/scrap dynamic content from web pages. I am able to get the static contents using Web harvest API. Thanks in advance for helping.

    Regards,
    Nakul Sargur

  2. #2
    Join Date
    Dec 2005
    Location
    FL
    Posts
    7,356

    Question Why?


    Why?

  3. #3
    Join Date
    Jul 2012
    Posts
    5
    I am doing a trial assignment on web scraping. I m a fresher. Using Web harvest (Java) API i am able to extract static content. But some data are enclosed inside javascript functions and html element. Need some guidance. Thanks in advance for helping.

  4. #4
    Join Date
    Jan 2009
    Posts
    3,346
    If the javascript is completing the content you'll need to parse through it to find out where the data is actually coming from (i.e. javascript variable, ajax call, etc).

  5. #5
    Join Date
    Jul 2012
    Posts
    5
    @criterion : the contents are present within html elements
    For example :
    <div id='tttt'> Hai </div>
    <div id='zzzz'> Hello </div>
    <li> <a ... > yyyyy </a> </li>
    <li> <a ... onclick=return viewdetails('12345','yyyyyy','3435534','','rtefdg')> View </a> </li>

    I am able to extract the contents "Hai" and "Hello". But i am unable to extract the contents "yyyyyy" ,"3435534". Because the contents are present within html tag. Currently i am using Web harvesting API for extracting the contents from website. This API gives the result after filtering the html elements. So that i am unable to extract html attributes value.

  6. #6
    Join Date
    Jan 2009
    Posts
    3,346
    Quote Originally Posted by Nakul Sargur View Post
    @criterion : the contents are present within html elements
    For example :
    <div id='tttt'> Hai </div>
    <div id='zzzz'> Hello </div>
    <li> <a ... > yyyyy </a> </li>
    <li> <a ... onclick=return viewdetails('12345','yyyyyy','3435534','','rtefdg')> View </a> </li>

    I am able to extract the contents "Hai" and "Hello". But i am unable to extract the contents "yyyyyy" ,"3435534". Because the contents are present within html tag. Currently i am using Web harvesting API for extracting the contents from website. This API gives the result after filtering the html elements. So that i am unable to extract html attributes value.
    What does the viewdetails function look like? Is it using AJAX?

  7. #7
    Join Date
    Jul 2012
    Posts
    5
    I think, there is no ajax call. Viewdetails function performs to display the parameter value (data) into new small window in that same page when you click view link in that page. The Viewdetails functions gets parameter value when the site loaded initially. I want to scrape the parameter value from site.

  8. #8
    Join Date
    Jan 2009
    Posts
    3,346
    Quote Originally Posted by Nakul Sargur View Post
    I think, there is no ajax call. Viewdetails function performs to display the parameter value (data) into new small window in that same page when you click view link in that page. The Viewdetails functions gets parameter value when the site loaded initially. I want to scrape the parameter value from site.
    Instead of guessing what might happen can you just post the function? I can guess at an answer or blindly suggest all sorts of things that may not help at all.

  9. #9
    Join Date
    Jul 2012
    Posts
    5
    Code:
    <li id='tt2#10383004'><a onclick="new ContactDetailDiv('10383004','Alpine Housing','Pc7mCSeZzz4=','Builder','Syed Baseeruddin','3327','N','55.0 Lac(s)','Sale','/property-builder-details/developer-Alpine-Housing-in-Bangalore&operating-in=Bangalore&id=Pc7mCSeZzz4=?stdStatus=Y','/property-for-sale-rent/real-estate-builder-in-Bangalore-developer-Alpine-Housing&id=Pc7mCSeZzz4=');viewContactsDivDWR('10383004','9972595345','','91-80-32975001','search','property','false','N');_gaq.push(['_trackPageview', '/view-contact-details.html']);" href='javascript:void(0);' >View Contact Details</a></li>

  10. #10
    Join Date
    Jan 2009
    Posts
    3,346
    That is alot more detail...just need one more bit. The part that defines what "ContactDetailDiv()" does/is. Looks like it is an object.

  11. #11
    Join Date
    Oct 2013
    Posts
    1
    Quote Originally Posted by Nakul Sargur View Post
    @criterion : the contents are present within html elements
    For example :
    <div id='tttt'> Hai </div>
    <div id='zzzz'> Hello </div>
    <li> <a ... > yyyyy </a> </li>
    <li> <a ... onclick=return viewdetails('12345','yyyyyy','3435534','','rtefdg')> View </a> </li>

    I am able to extract the contents "Hai" and "Hello". But i am unable to extract the contents "yyyyyy" ,"3435534". Because the contents are present within html tag. Currently i am using Web harvesting API for extracting the contents from website. This API gives the result after filtering the html elements. So that i am unable to extract html attributes value.
    If you see the actual values in page source and details are always same in number (as it seems to me), then why don't you just try regex?

  12. #12
    Join Date
    Jul 2014
    Location
    Ahmedabad
    Posts
    3
    Hi,

    This thread is too old but i am posting my answer for new readers.

    If any one wants to extract data from web then they can use web data extraction tools which available (Free/Paid) on the internet.

    Yes, this tool extracts data in html forms (Not sure about dynamic). I will give you one example.

    If you have online market store and if you want to compare your product price with any other online store then you can use this kind of tool. You just need to run this tool and add URL which you want to add then it will give you whole business data in proper structure.

    So these tools are very useful for your business intelligence solution.
    If readers of this thread have any query feel free to ask.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles