Click to See Complete Forum and Search --> : Grouping by multiple fields on a table with 12+ million records...


xvszero
09-22-2009, 01:33 PM
Hi, I'm trying to run some code. Basically we identify an account as a unique account by a combination of the class and account fields. Line3 is a unique identifier used per each account for each memo. So essentially, if I want to get a unique memo identifier I need 3 fields, and it might look something like this...


class | account | line3
100 | 100000 | 0 <-- first memo identifier for the 100-100000 account
100 | 100000 | 1 <-- second memo identifier for the 100-100000 account
200 | 100000 | 0 <-- first memo identifier for the 200-100000 account


In other words, it's the combination of the 3 that creates the unique identifier for any given memo.

So I'm pretty close in pulling up the memos correctly. If I just group by one field...


select class, account, group_concat(desc40 order by aline) as groupconcat from blfnotes group by class


It only takes about a minute to finish, and I get the type of data I need. But that isn't quite the correct data because it is ignoring the other two fields needed to create the unique identifier for a memo in the group by. What I really need is this...


select class, account, group_concat(desc40 order by aline) as groupconcat from blfnotes group by class, account, line3


(memos are broken up by line and stored in the desc40 field, with aline as the line # reference)

I am pretty sure that would return what I need, but when I try to run that it just goes on and on and on and never finishes. I let it go for about a half hour earlier and it didn't finish. This is for reports run sort of on the fly, I need something that works in a few minutes tops.

All 3 of those fields are indexed. Not sure what else I can actually do to make this thing run? Or maybe there is an alternative solution I am not thinking of?

svidgen
09-28-2009, 03:44 PM
You might try adding an index across all three fields:
alter table blfnotes add index class_account_line (class, account, line3);
That ought to allow your DBMS to detect groupings more quickly--though I couldn't tell you how much of an improvement it should be.

If you're using MySQL: http://dev.mysql.com/doc/refman/5.0/en/loose-index-scan.html

If you're not using MySQL, the same principles will likely apply.

xvszero
09-29-2009, 11:39 AM
I didn't even know indexes could be done on multiple columns like that.

Before:
10 rows (7 min 33.58 sec)

After:
10 rows (0.15 sec)

I love you.

It is still taking a long amount of time without putting a limit on, but at least we are getting there.

svidgen
09-29-2009, 12:19 PM
hehe ... very nice.