Re: Star Schema from 'Flattened' CSV

From: Jerry Stuckle <jstucklex_at_attglobal.net>
Date: Thu, 2 Nov 2017 21:10:16 -0400
Message-ID: <otgfll$t1d$1_at_jstuckle.eternal-september.org>


On 11/2/2017 4:48 PM, Aaron Peters wrote:
>
> CSV Description:
>
> The CSV file contains multiple columns related to the performance of a given product. It has a product ID, a product description, unit sale price, total units sold, total units returned, as well as names/quotes from several vendors giving the price per unit. In the future I expect vendors will be added and will want my Schema to handle this eventuality gracefully.
>
> I. Consider the trivial case where the columns of the CSV reference only 1 vendor price:
>
> CSV COLUMNS:
>
> prod_id, prod_desc, unit_price, tot_sold, tot_ret, vend_name, vend_price
>
> STAR SCHEMA:
>
> fact_performance
> - unit_price
> - tot_sold
> - tot_ret
> - vend_price
>
> dim_product
> - prod_id
> - prod_desc
>
> dim_vendor
> - vend_name
>
>
> II. Consider the non-trivial case where the columns of the CSV reference multiple vendor quotes:
>
> CSV COLUMNS:
>
> prod_id, ..., vend1_name, vend1_price, vend2_name, vend2_price
>
> STAR SCHEMA:
>
> fact_performance
> - total_sales
> - total_units_returned
> - total_units_sold
> - vend1_price
> - vend2_price
>
> dim_product
> - prod_id
> - prod_desc
>
> How do I account for multiple vendors in a dimension table?
>
> If I can offer any clarification please lest me know.
>

(I don't know why my first try didn't go to the list...)

First thing you need to do is read up on normalization (I hate to admit it, but Wikipedia has a pretty good start here). For instance, your fact_performance table does not meet 1st normal form because it has multiple columns for vendor price.

You need at least three tables, possibly more. You want a product table with only information unique to that product. The same with a vendor table. Then a third table with vendor-product specific information such as price.

You may also want to consider price changes. Do you want to track the number of products sold by price? If so, you need a fourth table.

Proper normalization will help you with your design. I generally aim for third normal on most of my tables.

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex_at_attglobal.net
==================
Received on Fri Nov 03 2017 - 02:10:16 CET

Original text of this message