Re: consolidation of multiple rows
Date: Wed, 5 Mar 2008 12:51:15 -0800 (PST)
On Mar 5, 10:49 am, --CELKO-- <jcelko..._at_earthlink.net> wrote:
> 1) If two of the rows in your raw data staging table have different
> values for the same attribute, say a phone number, which one do you
> pick or does it matter? If not, then use this skeleton:
> SELECT company_name, MAX(phone_nbr), MAX(..), ..
> FROM RawData
> GROUP BY company_name;
> else find the conflicts:
> SELECT company_name AS conflicting_data
> FROM RawData
> GROUP BY company_name
> HAVING COALESCE (NULLIF(COUNT (DISTINCT phone_nbr), 1), NULLIF(COUNT
> (DISTINCT ..), 1), .. ) IS NOT NULL ;
> 2) If two of the rows in your raw data staging table have the same
> value for the same attribute, do you need to know or do you just use
> 3) If all of the rows in your raw data have NULLs for an attribute, do
> you use the NULL or want to now about it?
> There are a lot of games we can play with a HAVING clause on the raw
> data. You might want to get a copy of THINKING IN SETS and look at
> Chapter 10, "Scrubbing Data with Non-1NF Tables" for some more
- from Ed
> A table like raw_Companies is called a staging table
> and Companies is the production table.
Thank you Ed for supplying the standardized naming for the these
googling (v.?) more accurate...
- from CELKO
Thank you , all these suggestions are helpful. There'll be scrubbing o'plenty happening shortly. As for the data, I'm merging ~20 xls totalling >200Krows into this table, using Perl to scrub the data per field perl line b4 loading into the staging table.
I'll give these suggestions a try shortly.
Thanks to all who responded,
Still-Learning Steve Received on Wed Mar 05 2008 - 14:51:15 CST