Results 1 to 9 of 9

Thread: Get data from cells on a webpage

  1. #1
    Join Date
    Jul 2012

    Get data from cells on a webpage

    Hi, I need some advice.

    My son is doing a project and he needs to get the data from cells in a complicated webpage. This is the link:

    http://www.jisilu.cn/data/sfnew/#tlink_3 (Don't be put off by the Chinese, this has something to do with the Shanghai Stock Exchange I believe.)

    The very last column on the right is all he needs. The column holds the maturity dates of bonds, like: 2017-01-03

    This page is dynamically updated.

    We have no idea how to address the individual cells or how to get their content.

    Any tips, pointers or advice please?

    I use Linux all the time, perhaps a Bash script could do this? Maybe even a tip as to where I could better ask this question.


  2. #2
    Join Date
    Feb 2012
    Pensacola, FL
    This sounds fun, any language restrictions? Right click the cell, inspect element. Look for anything unique on the tag you can use to loop through only cells that match that info.

  3. #3
    Join Date
    Jul 2012
    What he wants each time is in this *<td> Don't know how to stop it being read as html here. I'll try using * No language restrictions!

    *<td title="定折说明:12/01(无下折,净值<1元无定折)" style="width:60px;white-space: nowrap" data-name="next_recalc_dt"><span style="font-style:italic">2016-12-01</span>*</td>
    What he wants is some kind of script that will do this automatically.

    What language would that need to be in? He mentioned R Language, but I haven't even heard of that, which is saying nothing, as I am not a computer person!

    The tag would be the data-name I suppose, the date when the bond matures. Columns 1 and 2 would also be needed, as that is the numeric and alphanumeric names of the bonds. The rest he doesn't need.

    Very grateful for any tips!

  4. #4
    You can create or run a Web query to retrieve text or data from a Web page. ... some formatting, scripts, image files (HTML only), or lists of data in a single cell.

  5. #5
    Join Date
    Jul 2012
    Could you give me a pointer on how to do that, where to look for tips, what to search for, what to read up on, which script language to use?? Then we can bend it to fit our needs. As I said, I have extremely little experience with html.


  6. #6
    Join Date
    Feb 2012
    Pensacola, FL
    Depending on what you need to do with the data, you could use any language, even user scripts (Javascript) that run through your browser. I would use PHP or Javascript but that's just because I am most familiar with those.

  7. #7
    Join Date
    Mar 2012
    Firstly, you cannot do this in HTML. HTML is a simple page display language. It supports no variables, loops or data processing (other than the ability to accept form data and pass it to an external process for manipulation).

    The process you are asking to perform is called "page scraping" and is not very reliable, as it depends upon the structure of the web page remaining largely unchanged (which you have no control over). However, as this is merely a training exercise, that is not a major issue in this case.

    The most common languages used to process a web page to extract the info you require are JavaScript (browser based) or PHP (server based). Which you choose generally depends upon your experience, but in this case should probably be determined by the language being studied. I.e. If a person is studying JavaScript, a PHP solution (although it may be superior) would not be appropriate!

    Oh, and by the way, we do not do students' homework for them. What we need is for the student to make an attempt at the task (in whatever language is appropriate) and we may then be able to assist with specific issues that arise.

  8. #8
    Join Date
    Aug 2004
    A bit of poking around suggests to me that the data in question gets populated by a separate AJAX request, so you might have to dig through the JavaScript on the page to figure out what that call is, in which case you can call it yourself, perhaps receiving a JSON object that will be easier to work with than the HTML.
    "Well done....Consciousness to sarcasm in five seconds!" ~ Terry Pratchett, Night Watch

    How to Ask Questions the Smart Way (not affiliated with this site, but well worth reading)

    My Blog
    cwrBlog: simple, no-database PHP blogging framework

  9. #9
    Join Date
    Jul 2012

    I said to him, the new data must be entered by someone somewhere, by hand, presumably. Get to them and ask them to send it directly. That's my poor man's version of " the data in question gets populated by a separate AJAX request".

    Well, thanks to all for your input, see if he can get it done!

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
HTML5 Development Center



X vBulletin 4.2.2 Debug Information

  • Page Generation 0.15569 seconds
  • Memory Usage 2,925KB
  • Queries Executed 15 (?)
More Information
Template Usage (33):
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_global_above_footer
  • (1)ad_global_below_navbar
  • (1)ad_global_header1
  • (1)ad_global_header2
  • (1)ad_navbar_below
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (1)ad_thread_first_post_content
  • (1)ad_thread_last_post_content
  • (1)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)headinclude_bottom
  • (9)memberaction_dropdown
  • (1)navbar
  • (4)navbar_link
  • (1)navbar_moderation
  • (1)navbar_noticebit
  • (1)navbar_tabs
  • (2)option
  • (9)postbit
  • (9)postbit_onlinestatus
  • (9)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available (6):
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files (26):
  • ./showthread.php
  • ./global.php
  • ./includes/class_bootstrap.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/functions_navigation.php
  • ./includes/class_friendly_url.php
  • ./includes/class_hook.php
  • ./includes/class_bootstrap_framework.php
  • ./vb/vb.php
  • ./vb/phrase.php
  • ./includes/functions_facebook.php
  • ./includes/functions_calendar.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_notice.php
  • ./packages/vbattach/attach.php
  • ./vb/types.php
  • ./vb/cache.php
  • ./vb/cache/db.php
  • ./vb/cache/observer/db.php
  • ./vb/cache/observer.php 

Hooks Called (70):
  • init_startup
  • friendlyurl_resolve_class
  • init_startup_session_setup_start
  • database_pre_fetch_array
  • database_post_fetch_array
  • init_startup_session_setup_complete
  • global_bootstrap_init_start
  • global_bootstrap_init_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • load_show_variables
  • load_forum_show_variables
  • global_state_check
  • global_bootstrap_complete
  • global_start
  • style_fetch
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • strip_bbcode
  • friendlyurl_clean_fragment
  • friendlyurl_geturl
  • forumjump
  • cache_templates
  • cache_templates_process
  • template_register_var
  • template_render_output
  • fetch_template_start
  • fetch_template_complete
  • parse_templates
  • fetch_musername
  • notices_check_start
  • notices_noticebit
  • process_templates_complete
  • friendlyurl_redirect_canonical
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • memberaction_dropdown
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • build_navigation_data
  • build_navigation_array
  • check_navigation_permission
  • process_navigation_links_start
  • process_navigation_links_complete
  • set_navigation_menu_element
  • build_navigation_menudata
  • build_navigation_listdata
  • build_navigation_list
  • set_navigation_tab_main
  • set_navigation_tab_fallback
  • navigation_tab_complete
  • fb_like_button
  • showthread_complete
  • page_templates