Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Group by of 400 million rows to deal with dupes is it best way?
problem: what is best way to deal with dupes (~1-2%) occuring in a 400 million row table (4-8 million rows are dupes). Is a group by with duped column the last one best??
we have a table with 402 million rows. It is 8 columns wide 14 gigs in size. structure: note cons_type is the column with dupes.
bill_month date account number(10) business_id varchar2(3) busine_spec varchar2(5) cons_type varchar2(2) cons_amount number(6,2)
there will be several rows for each combiantion of bill_month and account, however each one should have a unique cons_type. so for example the correct one will be: note unique t1, t2, t3
12/01/00 3433 ed we t1 43.22 12/01/00 3433 ed we t2 53.22 12/01/00 3433 ed we t3 563.22
However around 1-2% are dupes of cons_type and we need them to group them to get rid of duped cons_type
BAD row is : note that there are 2 t2
12/01/00 3433 ed we t1 43.22 12/01/00 3433 ed we t2 53.22 12/01/00 3433 ed we t2 412.00
should become:
12/01/00 3433 ed we t1 43.22 12/01/00 3433 ed we t2 465.22
It seems we need to
group by bill_month,account,business_id,busine_spec,cons_type
and sum(cons_amount) to get rid of dupes. But with 400 million rows its big
job
Does anybody have a better idea? That can be done in reasonable amount of time?
We can do a create table on result for performance and an order by would be very nice since the current sequence of rows is not optimal. (even with indexes finding 10,000 rows spread over 400 million rows is a big job. lots of jumping from disk to disk even when oracle knows exactly where it is. having them ordered by month and account would be best)
any ideas? Received on Sun May 06 2001 - 08:42:08 CDT