www.webdeveloper.com
Page 1 of 4 123 ... LastLast
Results 1 to 15 of 55

Thread: Save an HTML table as CSV

Hybrid View

  1. #1
    Join Date
    Mar 2014
    Location
    Corpus Christi, TX
    Posts
    88

    Save an HTML table as CSV

    All,

    There is a table that I need to import into a MySQL Database. That task seems daunting. Short of that, I'd like a CSV file of the table that I can then upload to my server and import later.

    The table I need is behind a user / pass page. I can log in, but do not have access to the scripts, only the HTML.

    I have tried doing this with a VBS that had a great number of SendKeys. That has failed and I need another solution.

    Can anyone point me in the right direction?

  2. #2
    Join Date
    Mar 2005
    Location
    Behind you...
    Posts
    1,031
    If I understand you correctly you need a CSV of a table, but you can only get the table printed out in HTML (as you do not have access to the scripts that read this table)?

    I just want to make sure I understand what you are asking before I run off in any direction with my answer.

  3. #3
    Join Date
    Mar 2014
    Location
    Corpus Christi, TX
    Posts
    88
    Yes, that is exactly correct.

  4. #4
    Join Date
    Mar 2005
    Location
    Behind you...
    Posts
    1,031
    I'll be honest in saying I've never done anything like this before. However I'll also be honest in saying I believe what you want is entirely possible. I'll have to sit down and think about the specifics of the script but javascript can be used to loop through a table and place all of the data into a JSON variable. Once you've got the data here then it can be passed to a PHP script that can actually save a .csv file (I do have some code lying around for the PHP to .csv part).

    So essentially you'd view the source of the page, copy the table and then paste it into say... a <textarea> on another page. This page would process the table, push it to PHP and it would get saved as a file.

  5. #5
    Join Date
    Mar 2014
    Location
    Corpus Christi, TX
    Posts
    88
    Wow... You make it sound so simple!

    In my first post, I noted that my final goal is to take the information in this table and put it into a MySQL Database. I don't need the CSV file per se, it was just a medium for importing in MySQL. If we can hash out way to pull the data in Javascript and import into PHP, I won't need the CSV at all.

  6. #6
    Join Date
    Mar 2005
    Location
    Behind you...
    Posts
    1,031
    I'm going to start this post by noting that this script is experimental. It's not quite as satisfactory as I'd like (a little bulky. I feel like it could be optimized), however it appears to get the job done. Also, to be fair, I have not seen the HTML tables you wish to export (and then re-import into your database) so this script may need tweaking to fit your specific case.

    All of that being said, here's an HTML Table to JSON converter I just made:
    HTML Code:
    <!doctype html>
    <html>
    <head>
    	<title>Table to JSON Converter</title>
    </head>
    <body>
    
    	<div id="output"></div>
    	<div>
    		Table HTML:<br />
    		<textarea id="tableHTML" style="width: 640px; height: 480px;"></textarea><br />
    		<input type="button" value="Convert to JSON" onclick="_TableToJSON(document.getElementById('tableHTML').value)" />
    	</div>
    
    	<script>
    	
    		var $jsonData = [];
    		function _TableToJSON($a) {
    			// We'll copy the table and make it an element on the page
    			var $colNames = [];
    			var $tmpTable = document.createElement("div");
    			$tmpTable.id = "tmpTable";
    			$tmpTable.innerHTML = $a;
    			document.body.appendChild($tmpTable);
    			
    			// Then we hide the table and store it in a js var
    			var $tableElement = document.getElementById("tmpTable");
    			$tableElement.style.display = "none";
    			$tableElement = $tableElement.childNodes[0];
    			var $tableNodes = [];
    			
    			// First let's get only the node elements (thead and tbody probably)
    			for(var $i = 0; $i < $tableElement.childNodes.length; $i++) {
    				// Let's check to see if it's an element node
    				if($tableElement.childNodes[$i].nodeType == 1) $tableNodes.push($tableElement.childNodes[$i]); // Save this node!
    			}
    			
    			// Now let's loop through our nodes and pull row data
    			// Yea...  it's kind of bulky
    			for(var $j = 0; $j < $tableNodes.length; $j++) {
    				var $tmpRow = $tableNodes[$j].childNodes;
    				var $rowCount = 0;
    				for(var $k = 0; $k < $tmpRow.length; $k++) {
    					if($tmpRow[$k].nodeType == 1) {
    						var $tmpCols = $tmpRow[$k].childNodes;
    						var $colCount = 0;
    						for(var $l = 0; $l < $tmpCols.length; $l++) {
    							if($tmpCols[$l].nodeType == 1) {
    								if($k == 0) {
    									$colNames.push($tmpCols[$l].innerHTML);
    								} else {
    									if($jsonData[$rowCount-1] == null || $jsonData[$rowCount-1] == undefined) $jsonData[$rowCount-1] = {};
    									$jsonData[$rowCount-1][$colNames[$colCount]] = $tmpCols[$l].innerHTML;
    								}
    								$colCount++;
    							}
    						}
    						$rowCount++;
    					}
    				}
    			}
    		}
    	</script>
    
    </body>
    </html>
    And here's a sample table I used to test it with:
    HTML Code:
    <table>
      <tr>
        <td>Col1</td>
        <td>Col2</td>
        <td>Col3</td>
        <td>Col4</td>
        <td>Col5</td>
      </tr>
      <tr>
        <td>Val1</td>
        <td>Val2</td>
        <td>Val3</td>
        <td>Val4</td>
        <td>Val5</td>
      </tr>
      <tr>
        <td>Val6</td>
        <td>Val7</td>
        <td>Val8</td>
        <td>Val9</td>
        <td>Val10</td>
      </tr>
    </table>
    From here I'll look at building a simple PHP script that reads the JSON object and writes it to a database.

  7. #7
    Join Date
    Mar 2009
    Posts
    521
    You might try something like this. I haven't tested it, but if it works properly it will return an object containing all of the table data. You can use JSON.stringify to put it into JSON format, and PHP can handle that.

    Code:
    function doMap(table){  //send the reference of the table as the argument
        var obj, i, j, len, jLen, td, hd, val, rows, rowLen;
        
        obj = {"headers": [], "data":[]};
        hd = table.getElementsByTagName('th');  //get table header values, if any
        len = hd.length;
        for(i = 0; i < len; i++){
    	obj.headers.push(hd[i].innerHTML);
        }
        rows = table.getElementsByTagName('tr');  //get all the rows in the table
        rowLen = rows.length;
        for(i = 0; i < rowLen; i++){  //loop throw the rows
    	obj.data[i] = [];
    	td = rows[i].getElementsByTagName('td');  //get all the cells in the row
    	jLen = td.length;
    	for(j = 0; j < jLen; j++){
    	    obj.data[i].push(td[j].innerHTML); //posh cell data into the array
    	}
        }
        return obj;
    }

  8. #8
    Join Date
    Mar 2014
    Location
    Corpus Christi, TX
    Posts
    88
    I'm taking a long weekend from work today. I will test these when I get back. However, in glancing at them, how will either of the two scripts work if I do not have write access to the server on which these pages are located? Perhaps my question wasn't worded correctly. I can log in to the page on the server, but I do not have access to the raw data on the server. All I can see is the table from which I need to pull data. The script would have to log in to the site (or I can set a VBS to do this), and then pull the data on the client-side.

  9. #9
    Join Date
    Mar 2005
    Location
    Behind you...
    Posts
    1,031
    In both cases you would essentially log in and view the table data on a webpage. From here you would just view the page's source and copy the <table> in which this data is in. You can paste it into the <textarea> in my example and use either my or Tcobb's function to convert that HTML <table> into a structured object. There would be one final step in which this object is passed to a simple PHP script that simply loops through and adds each row to a database (your database).

  10. #10
    Join Date
    Mar 2009
    Posts
    521
    You actually don't need javascript at all. You can use PHP directly to get the page as a string, and you can parse the table data from there.

    $page = file_get_contents("http://somewhere.com");

  11. #11
    Join Date
    Mar 2014
    Location
    Corpus Christi, TX
    Posts
    88
    Tcobb - Since this page requires user/pass, how do I automate login? My first thought would be to have user/pass saved locally on the computer, then launch the php grabbing the contents of the page I would need. Would you concur, or am I making this too complicated on myself?

  12. #12
    Join Date
    Mar 2009
    Posts
    521
    Since this page requires user/pass, how do I automate login? My first thought would be to have user/pass saved locally on the computer, then launch the php grabbing the contents of the page I would need. Would you concur, or am I making this too complicated on myself?
    If you google "automatic login php" you will find a number of scripts for accomplishing this. You might also take a look at the PHP DOMDocument class.

  13. #13
    Join Date
    Mar 2014
    Location
    Corpus Christi, TX
    Posts
    88
    I'm having quite a bit of trouble automating this. I'm almost unsure of the possibility of achieving this.

    I am trying to automate an hourly report on numerous devices scattered throughout the country. Each of these devices is hooked up to a laptop that will be running a small web server (MOWES) and will have internet access. The IP for these devices will remain the same. These devices allow me to log in and view statistics, but I cannot seem to connect to them using the "file_get_contents" command, as I get a 500 error which I assume is because it is looking for the file in the root local host instead of looking on the device. I am putting the "HTTP" in front of the IP address, but alas, it still looks in localhost.

    My thought process for creating this hourly report is to log in to the device, pull the table containing the data, export all results into a MySQL database, and have the report run every hour for any stations with stats outside set limits.

    I would appreciate some one-on-one help. I have received much help on this forum, but I simply can't seem to get this ball rolling. I have used a combination of VBS, Javascript, PHP, and screaming. Perhaps we can both learn from this process.

  14. #14
    Join Date
    Mar 2005
    Location
    Behind you...
    Posts
    1,031
    Just to clarify some things here, you were using CURL to first 'log in' to a certain page and then attempting to use file_get_contents() on said page to obtain the data you need. The clarification I need would be that CURL is indeed working on the IP/address you entered, indicating that the script is able to properly connect to the correct location.

    Also, not to jump around topics too much (though this is all related to the same end result/goal) you also were setting a variable named $store which captured the result of the CURL. I haven't really used CURL all that much but the return value is essentially whatever the server send back after executing a script and so I wonder is it possible to get your stat page as the result of the CURL, rather than adding an additional step of using file_get_contents()? I don't know how the server-side of the login system works so I don't know what is actually returned after logging in.

    And on a similar note, it was also mentioned by NogDog that file_get_contents() will not send cookies (which are likely required here) so perhaps you just need to send a second CURL request. file_get_contents() is a simple one-liner that can grab the contents of a page but CURL can still get the page contents as well, just with a few more lines.

  15. #15
    Join Date
    Mar 2014
    Location
    Corpus Christi, TX
    Posts
    88
    Here is my latest attempt at using CURL to get the login page... Not the page I will eventually need, just the login page. When I run this script, I am getting a 500 server error. It appears to be looking on localhost, not the device to which I am connected. Thoughts on where the error(s) might be?

    PHP Code:
    <?php 
    error_reporting
    (E_ALL); 
    ini_set'display_errors','1');
    $url='http://192.168.69.154';
     
    $ch curl_init();
        
    $user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0'
        
    curl_setopt ($chCURLOPT_URL$url);
        
    curl_setopt ($chCURLOPT_USERAGENT$user_agent);
        
    curl_setopt ($chCURLOPT_HEADER0);
        
    curl_setopt ($chCURLOPT_FOLLOWLOCATION1);
        
    curl_setopt ($chCURLOPT_RETURNTRANSFER1);
        
    curl_setopt ($ch,CURLOPT_CONNECTTIMEOUT,120);
        
    curl_setopt ($ch,CURLOPT_TIMEOUT,120);
        
    curl_setopt ($ch,CURLOPT_MAXREDIRS,10);
        
    curl_setopt ($ch,CURLOPT_COOKIEFILE,"cookie.txt");
        
    curl_setopt ($ch,CURLOPT_COOKIEJAR,"cookie.txt");
        echo 
    curl_exec ($ch);
    function 
    get_web_page$url );

    curl_close($ch); 
    echo 
    $ch
        
        
    ?>

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles