I wonder if someone might have some prior experience to the following issue.
Any advice would be greatly appreciated.
I am constructing a very large web app that will house a lot data.
The data is pulled in from partner services via RPC.
To streamline the data, it is automaticly put into its relevant database table enabling more accurate searches.
My concern is for the future as the data increases.
My current thought process is to have a server acting a controller routing requests through to a different server holding the requested data. Something like a 1 to many relationship.
As data is added to the database its put (in a relevant way) in to a server on a rotational basis.
This is what i am leaning towards.
Data is brought in and processed over time in to server 1.
Then it is replicated on server 2 for usage.
This doesnt make much sense to me, but could be the caffeine.
I would really appreciate any ideas, debates or approaches that others have used.
to be honest, i'm not a big fan of the idea of replication. anything that's not done "in real time" seems like it has the potential for causing problems (objects not being found when they should be, for example). you should think about clustering your database servers. this typically involves providing a shared file system for multiple servers to use.
Clustering is a High Availability solution, not a Scalability solution.
Replication works fine for very large applications, including those distributed over geographically distinct areas.
None of us know enough about your application to give a thorough answer. What do you mean by "a lot" of data? A server to act as an index for other servers (a you describe) can be implemented, but is probably not necessary, or even a good idea.
First of all, thanks to all that replied.
Let me see if I can be more descriptive.
We are working with a large organization that sells millions of products retail.
They range over 300 different categories.
What I have done sofar is to create tables for each category.
Products can easily be browsed by category and I have written
a details SQL query that enables a custom search to be performed on categories that relate to the search query.
My concern is that If we are starting off with millions of products, where will we be in a year. 1 billion?
In that case can the database handle this amount of data and more also can it serve user requests if the site is going to be as active as we suspect it will be.
We are running on ubuntu server Dedicated server
Intel Quad Core Xeon 2.13GHz
Any ideas, oppinions, pointers in the right direction are welcome and appreciated.
A million unique items? Wow. A well thought out schema and appropriate indexing is going to be key. That said, the amount of data isn't the issue, it's the amount of transactions. How many write operations (Inserts, Updates, Deletes) do you expect per hour or minute? That's the question you need to answer first. And how much data (in MB or GB) growth and change will you have over a defined period of time.
For a highly volatile OLTP system, that server isn't going to be able to handle it. Plus, I assume you will have other things besides the POS running on it (nightly jobs, lookups etc.), besides hitting it from the back end with your data imports.
Is this an upgrade to an existing system or a brand new app?
If it's an upgrade, measuring the current transaction volume and growth trends will help you determine your hardware needs. If it's a new application, you'll need to make some educated guesses. Scaling out can be done at a later date if necessary, but you'll need to have an idea well before it becomes necessary.
What database platform are you going to use? If MSSQL or Oracle, replication is pretty easy and dependable.
If you're planning something that big you're going to need some robust management tools and a highly scalable platform. I agree that the server isn't going to be able to handle it.
BTW, Microsoft has a Web Edition of SQL Server 2008. it's marketed as a scalable low cost option. (E.g., Can be 32 cores for an 8 core server, @$3K/processor). It has some capabilities that other providers either don't or charge more for: spatial data, native XML, integrated with Visual Studio, lots of reporting features.