Re: Group by of 400 million rows to deal with dupes is it best way?

From: mel <jmel_at_mailnews.com>
Date: 7 May 2001 12:29:09 -0500
Message-ID: <3af6dae3$0$12276$45beb828@newscene.com>

one question: the rows which
breach uniqueness: do both rows get into the table or just the 2nd one?

because we need to sum the amount in both rows, they are dupes on cons type because we should have gotten just one cons type with full amount rather than 2 of them with amount split.

we do have partioning but partioning is not according to what our group by needs.

it is prationed for other uses.

the rows that
In article <989228112.15292.0.nnrp-10.9e984b29_at_news.demon.co.uk>, "Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote:
>
>You may want to think strategically on this one
>rather than just finding a solution to first problem.
>
>14 GB is a nuisance - can you partition the table
>by bill-month to get a more manageable unit size
>without adding too much overhead. De-duplicate
>one month at a time might be a lot easier to handle.
>This could also improve your query performance if
>your queries tended to be focused on bill_month.
>
>After you have eliminated duplicates, do you want
>to maintain uniqueness over the given columns ?
>If so you probably need a unique/primary constraint,
>and this could affect the way you solve the current
>problem.
>
>One detail to consider is that a GROUP BY that
>reduces the data volume by only a small amount
>is usually not very effective - and tends to get
>worse if you make it run parallel.
>
>
>One idea:
>Create a unique or pk constraint on the target
>columns - make the constraint deferrable,
>so that it is built with a non-unique index
>(8.1 feature), and create it disabled, non-
>validated.
>
>Try to enable the constraint, reporting
>exception into another table. This will
>copy the rowids of all the rows which
>breach uniqueness.
>
>Create a table which does the GROUP BY
>joining the rowid list to the base table - so
>the sort/group targets only 4-8M rows. (Aiming
>for a hash join might be the best path).
>
>Delete from the base table where rowid in
>(select stored rowid).
>
>Insert into base table results from created
>sub-total table.
>
>Enabled the unique / primary key constraint.
>
>
>
Received on Mon May 07 2001 - 12:29:09 CDT