Click to See Complete Forum and Search --> : Synchronize Disparate Data


auxone
08-15-2008, 10:09 AM
Hey Guys, I need some expert advice.

I am trying to mirror the files in a series of directories on a web server to a separate database. The catch is that either the files on the database or directory can change, and every 15 min or so there has to be a VERY low-cpu low-bandwidth comparison of the data.

My original idea is to basically make a table with each row being a certain descriptive attribute of the data, like it's size, modified date, maybe an md5 hash, etc. Then when I need to make the comparison I check the actual file versus my attributes, determine which is the latest one, then update accordingly.

Do you guys think this will work?

The problem, is that it really needs to synch an entire set of directories with very little time and bandwidth. Is there such a thing as MD5 hashing an entire directory such that I can compare directories instead of files, then only compare files when the directory hash is different from what's in the database. See what I mean? I am kinda lost when it comes to optimizing the process and making it extremely light weight.

THANK YOU FOR ANY FEEDBACK. I love you guys!

PS> In reality, the database is mirroring a users files and when this development goes live it may be frequented by 10,000 plus people, so that is a 10,000 synchs every 15 min! Is that alot?!

chazzy
08-15-2008, 10:28 AM
is it an option to "cluster" the web servers together?

auxone
08-15-2008, 12:49 PM
Unfortunately, no.

Thanks for the response.