Get data from cells on a webpage
Hi, I need some advice.
My son is doing a project and he needs to get the data from cells in a complicated webpage. This is the link:
http://www.jisilu.cn/data/sfnew/#tlink_3 (Don't be put off by the Chinese, this has something to do with the Shanghai Stock Exchange I believe.)
The very last column on the right is all he needs. The column holds the maturity dates of bonds, like: 2017-01-03
This page is dynamically updated.
We have no idea how to address the individual cells or how to get their content.
Any tips, pointers or advice please?
I use Linux all the time, perhaps a Bash script could do this? Maybe even a tip as to where I could better ask this question.
This sounds fun, any language restrictions? Right click the cell, inspect element. Look for anything unique on the tag you can use to loop through only cells that match that info.
What he wants each time is in this *<td> Don't know how to stop it being read as html here. I'll try using * No language restrictions!
What he wants is some kind of script that will do this automatically.
*<td title="定折说明：12/01(无下折，净值<1元无定折)" style="width:60px;white-space: nowrap" data-name="next_recalc_dt"><span style="font-style:italic">2016-12-01</span>*</td>
What language would that need to be in? He mentioned R Language, but I haven't even heard of that, which is saying nothing, as I am not a computer person!
The tag would be the data-name I suppose, the date when the bond matures. Columns 1 and 2 would also be needed, as that is the numeric and alphanumeric names of the bonds. The rest he doesn't need.
Very grateful for any tips!
You can create or run a Web query to retrieve text or data from a Web page. ... some formatting, scripts, image files (HTML only), or lists of data in a single cell.
Could you give me a pointer on how to do that, where to look for tips, what to search for, what to read up on, which script language to use?? Then we can bend it to fit our needs. As I said, I have extremely little experience with html.
Firstly, you cannot do this in HTML. HTML is a simple page display language. It supports no variables, loops or data processing (other than the ability to accept form data and pass it to an external process for manipulation).
The process you are asking to perform is called "page scraping" and is not very reliable, as it depends upon the structure of the web page remaining largely unchanged (which you have no control over). However, as this is merely a training exercise, that is not a major issue in this case.
Oh, and by the way, we do not do students' homework for them. What we need is for the student to make an attempt at the task (in whatever language is appropriate) and we may then be able to assist with specific issues that arise.
"Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
~ Terry Pratchett in Nation
How to Ask Questions the Smart Way
(not affiliated with this site, but well worth reading)
I said to him, the new data must be entered by someone somewhere, by hand, presumably. Get to them and ask them to send it directly. That's my poor man's version of " the data in question gets populated by a separate AJAX request".
Well, thanks to all for your input, see if he can get it done!
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)