Click to See Complete Forum and Search --> : Storing files in MySQL vs. real files


harlock
01-23-2005, 05:49 PM
This is more of a MySQL question than a PHP question, but the two go hand in hand in this case.

I am writing a web based app with PHP and MySQL that accepts resumes online. The software asks some basic questions to help categorize the submission, and then stashes the user information along with their resume (uploaded via a form) where it can be retrieved by an HR dept. The resumes can be .doc or .pdf files.

The question is - Should I store the files as real files or as binary data in the MySQL database?

The problems with storing the files in MySQL are that the database would become huge in a very short time. There are about 3 or 4 resumes submitted per day. That's probably about a thousand per year. They want to keep most of the resumes around for 2 years. Each file may be 100k to several megabytes in size, due to .doc bloat. Although the machine's only task will be to hold that databse, would it slow down to a crawl searching with that much data? What if the files were stored in a seperate database on the same MySQL server? Would it make a difference since it doesn't need to open the giant DB with the binary data until someone actually needs to retrieve a file?

The advantage of storing them in the database is that everything is handled through MySQL, from storage to retrieval.

On the other side, the advantage of storing them as files is that they can be accessed directly and can be searched by other programs such as desktop search appliances.

If I stored them as real files, I would probably have PHP automatically create a directory structure based on date and job applied for. It would create directories based on month and split up the files by month. So it would look like:

master resume directory/category/2005-01/2005-01-04_joe-blow.doc

The full submission date would be prepended onto the filename, along with the submitters first and last name, acquired from the form input. The directory and file location would be stored with the user's entry in the MySQL database. When viewing that entry in the database, you would click on a generated link that points to the local filesystem as mounted on the client PC in HR.

If anyone has any thoughts or musings or technical advice, feel free to chime in.

ShrineDesigns
01-23-2005, 06:44 PM
storing the file in a database is not very practical, because it would quickly overwelm the sql server, it is better to store the path to a file or files in a database rather than the actual file

Stephen Philbin
01-23-2005, 07:07 PM
Surely if the files and related names were kept in seperate Innodb tables then the server could handle them without any problems at all? That way the server could just shoot through the names table without having to bother with any of the files except the one related to the name.

DaiWelsh
01-24-2005, 11:49 AM
I would suggest that you don't put data in the database that the database cannot take advantage of. Databases are very good at doing optimised searches against text and numeric fields particularly if you idnex sensibly. In most cases databases cannot search well against binary data .

This is particularly true of images (how would you search for green circles in an image with SQL?), but unless the docs have significant blocks of ascii or unicode text that is not hidden by the proprietary format, it is also true there.

The database could handle the data but would not make much use of it and file systems are designed for handling file data so better to put them there...?

HTH,

Dai