What does update statistics do in SQL Server
can someone please give me a concise explanation of update statistics. I have googled it but not found anything,
Consider this scenario. You buy some very expensive hardware, and it's running full blast 24/7 (tons of users/queries/connections).
MySQL will receive a query and then make a decision about how to best execute it; an execution plan.
ENTER THE DILEMMA:
Your computing resources are limited, you want to make the most out of them; serve the most queries, satisfy the most clients, etc. MySQL will make a best guess about how to execute a query with LIMITED information-- so the execution plan for the query is not necessarily the best it could be, but it's good enough to get the job done. Now, if you have all of the relevant statistics about your tables and data- you can make a much better execution plan; maybe.
Getting the statistics takes time, and that time could be spent serving requests instead of digging through all of the data in the databases. Now lets say we use some DML and change the data in the databases, you have an option: update the statistics on the fly (this takes more time to finish each individual DML query)-- or you can just finish the request/query and be done with it, move onto the next one. I'm guessing here, but the latter is most desirable and is how MySQL will behave (possibly not for everything?.. this is really "under the hood" stuff).
Lets say you have a very small amount of data, so gathering the statistics on it takes almost no time at all. But if you have A TON of data, it will take substantially longer. This might pay off in the long run or it might just waste a bunch of computing time since MySQL was picking the best execution plan for all of your queries anyway.
Personally, unless super-high-performance is a requirement-- just ignore it, that's what I do and all of my MySQL queries still get executed correctly and within an acceptable amount of time (milliseconds).
I'm just going to ramble a little bit more and say this.
I said all of this about MySQL, but it is essentially the same dilemma for SQL-SERVER, however SQL-SERVER has a feature (for super pro-administrators) that allows you to specify an execution plan. I'm not saying I've done this-- the documentation warns novices about trying to do this (time consuming etc). The idea is that the user may know something about the data that SQL-SERVER doesn't, or the user may be able to predict what the data will look like in the future (10,000,000 more users, 50,000,000 new categories.... use this index instead of this other index, join these three tables first, and then join these other two tables, and then join the results of those two with a loop join / hash cluster-ma-jiggy, "i know it will be best, because we can prove that the result is going to be an empty set before we try to execute this extremely computationally intensive join")- whereas SQL-SERVER can not predict the future/understand intentions behind DDL, may have a skewed representation of how things truly are, and make false assumptions about the best method to transform data, because the statistics it uses to generate an execution plan are incomplete.
Now do you understand?
Last edited by eval(BadCode); 08-14-2012 at 10:18 AM.
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)