Re: consolidation of multiple rows

From: <canoe414_at_telecom-digest.zzn.com>
Date: Wed, 5 Mar 2008 12:51:15 -0800 (PST)
Message-ID: <3724875c-582f-4d60-91dc-61cfb25324cc@d4g2000prg.googlegroups.com>


On Mar 5, 10:49 am, --CELKO-- <jcelko..._at_earthlink.net> wrote:
> 1) If two of the rows in your raw data staging table have different
> values for the same attribute, say a phone number, which one do you
> pick or does it matter?  If not, then use this skeleton:
>
> SELECT company_name, MAX(phone_nbr), MAX(..), ..
>   FROM RawData
>  GROUP BY company_name;
>
> else find the conflicts:
>
> SELECT company_name AS conflicting_data
>   FROM RawData
>  GROUP BY company_name
> HAVING COALESCE (NULLIF(COUNT (DISTINCT phone_nbr), 1), NULLIF(COUNT
> (DISTINCT ..), 1), .. ) IS NOT NULL ;
>
> 2) If two of the rows in your raw data staging table have the same
> value for the same attribute, do you need to know or do you just use
> it?
>
> 3) If all of the rows in your raw data have NULLs for an attribute, do
> you use the NULL or want to now about it?
>
> There are a lot of games we can play with a HAVING clause on the raw
> data.  You might want to get a copy of THINKING IN SETS and look at
> Chapter 10, "Scrubbing Data with Non-1NF Tables" for some more
> ideas.

  • from Ed
    > A table like raw_Companies is called a staging table
    > and Companies is the production table.

Thank you Ed for supplying the standardized naming for the these tables, makes
googling (v.?) more accurate...

  • from CELKO
    >...Chapter 10...
    Thank you , all these suggestions are helpful. There'll be scrubbing o'plenty happening shortly. As for the data, I'm merging ~20 xls totalling >200Krows into this table, using Perl to scrub the data per field perl line b4 loading into the staging table.

I'll give these suggestions a try shortly.

Thanks to all who responded,

Still-Learning Steve Received on Wed Mar 05 2008 - 14:51:15 CST

Original text of this message