Re: consolidation of multiple rows

From: <canoe414_at_telecom-digest.zzn.com>
Date: Tue, 11 Mar 2008 21:48:34 -0700 (PDT)
Message-ID: <72effd7b-1fe7-42d7-bcd8-498f0c755d7c@u10g2000prn.googlegroups.com>


On Mar 5, 1:51 pm, canoe..._at_telecom-digest.zzn.com wrote:
> On Mar 5, 10:49 am, --CELKO-- <jcelko..._at_earthlink.net> wrote:
>
>
>
> > 1) If two of the rows in your raw data staging table have different
> > values for the same attribute, say a phone number, which one do you
> > pick or does it matter? If not, then use this skeleton:
>
> > SELECT company_name, MAX(phone_nbr), MAX(..), ..
> > FROM RawData
> > GROUP BY company_name;
>
> > else find the conflicts:
>
> > SELECT company_name AS conflicting_data
> > FROM RawData
> > GROUP BY company_name
> > HAVING COALESCE (NULLIF(COUNT (DISTINCT phone_nbr), 1), NULLIF(COUNT
> > (DISTINCT ..), 1), .. ) IS NOT NULL ;
>
> > 2) If two of the rows in your raw data staging table have the same
> > value for the same attribute, do you need to know or do you just use
> > it?
>
> > 3) If all of the rows in your raw data have NULLs for an attribute, do
> > you use the NULL or want to now about it?
>
> > There are a lot of games we can play with a HAVING clause on the raw
> > data. You might want to get a copy of THINKING IN SETS and look at
> > Chapter 10, "Scrubbing Data with Non-1NF Tables" for some more
> > ideas.
>
> --- from Ed
>
> > A table like raw_Companies is called a staging table
> > and Companies is the production table.
>
> Thank you Ed for supplying the standardized naming for the these
> tables, makes
> googling (v.?) more accurate...
>
> --- from CELKO>...Chapter 10...
>
> Thank you , all these suggestions are helpful. There'll be scrubbing
> o'plenty happening shortly.
> As for the data, I'm merging ~20 xls totalling >200Krows into this
> table, using Perl to scrub the data per field perl line b4 loading
> into the staging table.
>
> I'll give these suggestions a try shortly.
>
> Thanks to all who responded,
>
> Still-Learning Steve

OK, this topic seems to gone off the rails in the last week....

Shiffling thru some books(!) and some more googling armed with better keywords
has made me realize the solution to my problem is a

       custom aggregate function

which turned up this

      http://www.oracle.com/technology/oramag/oracle/06-jul/o46sql.html

which sounds perfect for what I want. Since I'll eventually be migrating my app to Oracle this would seem to be the solution.

So thanks to all the responded,

CASE CLOSED Still-learning Steve Received on Tue Mar 11 2008 - 23:48:34 CDT

Original text of this message