Click to See Complete Forum and Search --> : Question about Database choices....


Mr Initial Man
04-19-2010, 01:29 PM
If you had, say, a 10KB XML file, would the difference between a server processing that into a page, and a server processing a query to a database containing the exact same info into a page be noticeable?

svidgen
04-19-2010, 02:50 PM
Probably not for single page-loads ... But, it's important to ask: What's in the file? And how are you transforming it? And what exactly is the negligible difference in load-time?

A negligible difference for a single page load can add up to a noticeable difference under heavy load. As as example, let's assume that it takes the server 0.1 sec to assemble the page from the XML file. And, let's say it takes 0.08 sec from the database. That's not a difference you'd ever notice if you were just tested performance by hitting the page.

Now, if we assume each page load maxes the CPU (for simplicity) and therefore only a single request is handled at a time, you're serving a theoretical max of 10 pages/sec with the XML data feed versus 12.5 page/sec with the DBMS data feed. If you get slashdotted, that could be the difference between a functioning website and nothing.

And of course, if you continually think of things in terms of whether a difference is noticeable, you can easily end up in a situation where the cumulative effects have lead to a very inefficient application. You might find yourself scratching your head, wondering why relatively simple page loads are taking several seconds. And you may find yourself chalking up your competitors' quick load times to better hardware. So, you'll start dumping money into more servers, unaware that a few differently made architectural decisions could have your site running beautifully on a pentium-II and 64MB of RAM.

... does that help?

Charles
04-19-2010, 05:02 PM
And let us not fail to consider the nature and complexity of the query. And if you are on a Windows platform you have other interesting options. You can save the file as CSV, or several other formats, and then use SQL. SQL on a CSV file is way weird but useful sometimes.

criterion9
04-19-2010, 09:36 PM
And if you are on a Windows platform you have other interesting options.
You have other option across OSes. Windows isn't the only system that can use csv files for example.

Mr Initial Man
04-19-2010, 10:32 PM
What's in the file?

Prices, descriptions, sizes, and so on.


And how are you transforming it?

PHP DOMDocument.

NogDog
04-20-2010, 04:47 AM
If your app wants to do things with the data that databases are specifically designed and optimized to do, e.g. search, compare, and sort; then it is highly likely that the DBMS will be able to do that significantly more efficiently and quickly than the corresponding PHP code you would need to do so via the DOM (for some unknown value of "significantly"), assuming a reasonably sane database structure with appropriate indexing.

For instance, if you only want to select items that are "blue", if the color column is indexed, then the select query would only need to read table rows that have the color column set to "blue", whereas your XML-based PHP code would have to first read the entire XML data into memory and parse it via the DOM class before you even start trying to find all items that are blue, which operation also will likely not be as efficient as the DBMS could do it.

svidgen
04-20-2010, 08:58 AM
Prices, descriptions, sizes, and so on.

PHP DOMDocument.

Can we see the file, the resulting page(s), and possibly some code?

... I mean, chances are we'll tell you to just benchmark it both ways and let us know what you find. And chances are we'll recommend using a DBMS regardless of what you find. (it will scale better with your data) But, I'd be interested to see what you're doing more specifically.

Mr Initial Man
04-20-2010, 03:00 PM
The webpage in question is at http://mrinitialman.zzl.org/SheepMaster/

The XML file is http://mrinitialman.zzl.org/SheepMaster/products.xml

This is the PHP handling the XML for the products page.

<?php

define('n',"\r\n");
define('t3',' ');
define('t4',t3.' ');
define('t5',t4.' ');

function Att($e, $a){return $e->getAttribute($a);}
function Desc($e){
foreach($e -> getElementsByTagName('Image') as $i){
echo t4 . '<img src="./Graphics_Products/' . Att($i, 'file') . '.jpg" alt="' . Att($i, 'text') . '" class="' . Att($i, 'align') . '">' . n;
}
foreach($e -> getElementsByTagName('P') as $p){
echo t4 . '<p>' . $p -> nodeValue . '</p>' . n;
}
}

$xml = new DOMDocument();
$xml -> loadXML(file_get_contents('./products.xml'));
$xml -> validate();
?>

<?php

$prod = $xml -> getElementById($_GET['p_id']);
$ttl = str_replace('_', ' ', Att($prod, 'name'));
$ttl2 = '('.$ttl.')';
$prod_type = $prod -> parentNode -> nodeName;

if($prod_type == 'Product_Groups'){ /* For Individual Products */
$items = $prod -> getElementsByTagName('Item');
foreach($items as $item){
$iid = $item -> getAttribute('id');
$info = $xml -> getElementById($iid);
$prc = $info -> getElementsByTagName('Price');
$dims = $info -> getElementsByTagName('Dim');

echo
t3 . '<div id="' . $iid . '">' . n .
t4 . '<h3>' . str_replace('_', ' ', $iid) . '</h3>' . n
;
Desc($info);
if($prc -> length > 1){
echo
t4 . '<table class="prices">' . n .
t5 . '<caption>Prices</caption>' . n
;
foreach($prc as $iprc){
echo t5 . '<tr><th>' . Att($iprc, 'name') . '</th><td>CDN&#x24;' . Att($iprc, 'amount') . '</td></tr>' . n;
}
echo t4 . '</table>' . n;
} else {
echo t4 . '<p class="price"><strong>Price:</strong> CDN&#x24;' . Att($prc -> item(0), 'amount') . '</p>' . n;
}
if($dims -> length){
echo
t4 . '<table class="dimensions">' . n .
t5 . '<caption>Dimensions</caption>' . n
;
foreach($dims as $dim){
echo t5 . '<tr><th>' . Att($dim, 'name') . '</th><td class="standard">' . Att($dim, 'standard') . '</td><td class="metric">' . Att($dim, 'metric') . '</td></tr>' . n;
}
echo t4 . '</table>' . n;
}
echo t3 . '</div>' . n;
}
} else { /* For Sets */
echo t3 . '<div>' . n;
Desc($prod);
echo
t3 . '<h3>Components</h3>' . n .
t4 . '<ul>' . n
;
foreach($prod -> getElementsByTagName('Comp') as $comp){
$part_type = Att($xml -> getElementById(Att($comp, 'cid')) -> parentNode, 'name');
$part_name =
($part_type == 'Plywood_Panel')?
(Att($xml -> getElementById(Att($comp, 'cid')), 'name') . ' ' . $part_type):
$part_type
;
$groups = explode(' ',Att($xml -> getElementById(Att($comp, 'cid')) -> parentNode, 'groups'));
$link = '<a href="./products.php?p_id=' . $groups[0] . '#' . Att($xml -> getElementById(Att($comp, 'cid')) -> parentNode, 'name') . '">' . str_replace('_', ' ', $part_name) . ((Att($comp, 'num') > 1)?'s':'') . '</a>';
echo t5 . '<li>' . Att($comp, 'num') . ' ' . $link . '</li>' . n;
}
echo t4 . '</ul>' . n;
if(Att($prod, 'price')){
echo t4 . '<p><strong>Price:</strong> CDN&#x24;' . number_format(Att($prod, 'price'), 2) . '</p>' . n;
}
echo t3 . '</div>' . n;
}

?>

svidgen
04-21-2010, 09:03 AM
Well, without delving too far into your code, it looks more convoluted and complex than a DBMS driven alternative would be. However, if you're dealing with a very limited set of products (or records in general), the overhead of parsing and searching the XML file could be less than that of establishing a database connection. (but, could potentially be overcome by using a persistent connection.)

You're in luck though! The DBMS version of the site should be pretty quick'n easy to implement. So, benchmarking the two solutions should also be fairly easy. (provided that you have command line access to a testing environment)

Let us know what you discover.

Charles
04-21-2010, 01:50 PM
the overhead of parsing and searching the XML file could be less than that of establishing a database connectionThen a SQL call to a CSV file might just be the answer.

svidgen
04-21-2010, 02:06 PM
Then a SQL call to a CSV file might just be the answer.

Haha ... maybe.

Mr Initial Man
04-21-2010, 03:29 PM
Well, without delving too far into your code, it looks more convoluted and complex than a DBMS driven alternative would be. However, if you're dealing with a very limited set of products (or records in general), the overhead of parsing and searching the XML file could be less than that of establishing a database connection. (but, could potentially be overcome by using a persistent connection.)

You're in luck though! The DBMS version of the site should be pretty quick'n easy to implement. So, benchmarking the two solutions should also be fairly easy. (provided that you have command line access to a testing environment)

Let us know what you discover.

How would I benchmark?

Then a SQL call to a CSV file might just be the answer.

How would I pull that off?

svidgen
04-21-2010, 03:36 PM
How would I benchmark?
Presuming you're running on a linux box (or similar) with apache, you'd use ab (http://www.manpagez.com/man/8/ab/).

Mr Initial Man
04-21-2010, 04:40 PM
Presuming you're running on a linux box (or similar) with apache, you'd use ab (http://www.manpagez.com/man/8/ab/).

I'm using Windows XP Home.

svidgen
04-21-2010, 04:41 PM
And IIS?

Mr Initial Man
04-21-2010, 05:24 PM
And IIS?
Actually, Apache.

Charles
04-21-2010, 05:39 PM
How would I pull that off?You'll need to put these pieces together:

http://php.net/manual/en/book.uodbc.php
http://www.connectionstrings.com/textfile

svidgen
04-21-2010, 06:50 PM
Actually, Apache.

Check the bin folder for the ab binary. I think it's "part of" apache.

Charles
04-22-2010, 06:20 AM
And I should have added: if you decide to use Excel to generate the CSV file and follow my suggestion above, you don't need to save as CSV. Reading an Excel file with SQL is just as much of a snap.

NogDog
04-22-2010, 12:31 PM
While reading a CSV/Excel file could be handy, this is not going to be as powerful as a full-blown DBMS (e.g. no column indexing, no native ability to do incremental back-ups, etc.) nor is it as portable (MySQL, PostgreSQL, SQLite, etc. can pretty much run on any platform). Therefore, while it could be a "quick fix" if you already have an existing spreadsheet with which you want to interact, it would not be a solution I would normally want to design toward. For that matter, porting a CSV file into a DBMS is typically quite simple (at least once you know what the steps are ;) ).

Charles
04-22-2010, 02:40 PM
This is a pretty small data set, though. No need to bring out the cannon to swat a fly. If the job is too small for a real DB then there are still plenty of other options.

svidgen
04-22-2010, 02:50 PM
Sure. And, if this is the case, you'll be best off reading concise data structure (CSV or JSON) directly from the application. Though, it's tough to say whether this is the case if you don't do some benchmarking.

Just about anything can be great in theory ...

NogDog
04-22-2010, 03:33 PM
This is a pretty small data set, though. No need to bring out the cannon to swat a fly. If the job is too small for a real DB then there are still plenty of other options.

Admittedly, I may have been led astray by the initial post stating the data was in a 10KB XML file, and thinking that would translate to several hundreds of records; and I don't mean to say that a true DBMS is the necessarily the only or best solution, but I think there tends to be a lot of misconceptions about the "big cannon" necessarily being slow or difficult to implement. As Svidgen points out, I don't know which would be faster without testing, and a lot would depend on the implementation details and the actual production environment. Plus, is it a given that the data set will remain small? And lastly, would this be a good excuse to simply use this as a learning experience in implementing a DBMS-driven back-end? Maybe you could use PHP's PDO (http://www.php.net/manual/en/book.pdo.php) interface, and start with a CSV/Excel file and an appropriate PDO connection DSN, and then later convert the data into a MySQL or PostgreSQL DB and simply change that DSN in order to see how the "big cannon" works. :)

Mr Initial Man
04-24-2010, 10:00 AM
No, it's a pretty small dataset.

However, I COULD tell a difference simply by watching it load. Kinda sad, isn't it.

svidgen
04-24-2010, 10:11 AM
So, you didn't find an ab.exe with the apache binaries?

Mr Initial Man
04-24-2010, 09:16 PM
Never looked. I could time the difference in performance with my watch.

JunkMale
04-25-2010, 04:25 AM
And let us not fail to consider the nature and complexity of the query. And if you are on a Windows platform you have other interesting options. You can save the file as CSV, or several other formats, and then use SQL. SQL on a CSV file is way weird but useful sometimes.
You can also do those things on linux...

I have seen base64 encoded strings placed in to records before now, solves allot of issues where content needs to be preserved.

As for load times, going by what I see in the phpMyAdmin console I run queries through, the SQL option is faster than reading a file from a folder.

As stated before... it all depends on the complexity of the query and also the actual content of the file and how it is processed or treated.

criterion9
04-26-2010, 10:11 AM
Another thing to consider about the overhead of SQL queries is whether the connection is optimized. An example: on one of my windows test/development boxes it takes 2-4 seconds to parse one of my pages with several queries on it. On my linux production box (with several optimization techniques employed) the same page loads imperceptibly fast (<1-4ms).