Re: Data Model - Please Comment on my Approach

From: Karsten Farrell <kfarrell_at_belgariad.com>
Date: Fri, 23 May 2003 21:09:49 GMT
Message-ID: <MPG.19382df8f5e7ef0e989797_at_news.la.sbcglobal.net>


Comments embedded...

lindadok_at_xs4all.nl said...
> Helo,
>
> I have just replaced someone and inherited a number of database that are not
> very well structured. I would like your comments/suggestion on how to
> improve the schema.
>
> My team has a list of postcodes in an access database. The codes start from
> 1011AA. A table stores the following Social Demographic information is
> stored about each Six Character (1011AA) postcode:
>
> 1.The total number of households
> 2.Total Number of 1 person households
> 3. Total Number of 2 person households
> 3.Total Number of 3 person households
> 4.Total Number of 4 person households
> 5.Total Number of 5 person households
> 6. Total Number of 6 or more person households
> 7.Total Average Head of Household < 25
> 8.Total Average Head of Household 25 -29
> 9.Total Average Head of Household 30-34
> 10.Total Average Head of Household 35-39
> 11. Incone < 13, 500
> 12. Income 13500- 20500
> 13. House Type Apartment
> 14. House Type Terrace House
> 15. House Type Semi-Detached
> 16. House Type Farm
> ...
> etc, etc
> There are several tables with Social Economic Daata, Purchasing power, etc,
> etc, that have the same format. As you can see above, the field names
> contain reserved symbols,, numerals, spaces which is naughty, etc, etc, but
> that a different story altogether.
>
> The Social Demographic table has over 50 fields, containing total number
> (count) of people. No inserts are made to this table, however, the data is
> updated when new data is received. Quite often, we receive data from
> clients. This data contains postcodes, so we join these tables together, to
> give customers social
>
> From a personal perspective, 55 fields is too many for 1 table. I think it
> would be better to break the tables, and then join them using a view. Each
> year, the data is updated.
>
> This is how I'm thinking of going about it.
>
> 1. Create a table that has all the postcodes:
>
>
> CREATE TABLE t_PostCode(
> PostCodeId int IDENTITY PRIMARY KEY
> PostCode char(6) not null)
>

Good table.

> 2. Create a table that contains a description of the different types of
> homes
>
> CREATE TABLE t_HouseType(
> HouseTypeId int IDENTITY PRIMARY KEY
> Description varchar(20) not null)
>
> A row in this table would look like this:
> 1 Apartment
> 2 Terrace
>
> ...

Good table.

> 3. Create a table that stores the total number of houses in a particular
> postcode
> CREATE TABLE t_PostCodeHouseType(
> PostCodeHouseType int IDENTITY PRIMARY KEY,
> PostCodeId int FOREIGN KEY REFERENCES t_PostCode(PostCodeID),
> HouseTypeId int FOREIGN KEY REFERENCES t_HouseType(HouseTypeID),
> TotalNumber int )
>

Oops. If you store "aggregate" information such as the sum of amounts or count of items, you'll have a maintenance nightmare. Better to do the "counting" in your select statement when you retrieve the data - eg,

  select PostCode, Description, count(*)   from datatable dt, t_PostCode pc, t_HouseType ht   where pc.PostCodeID = dt.PostCodeID
  and ht.HouseTypeID = dt.HouseTypeID   group by dt.PostCodeID, dt.HouseTypeID

You need to create your datatable with rows that contain PostCodeID (FK), HouseTypeID (FK), and any other required data.

> 4. Create a table that stores the age of the head of the household
>
> CREATE TABLE t_HouseHoldHeadAge(
> HouseHoldHeadAgeId int IDENTITY PRIMARY KEY,
> Description varchar(20)
> )
>

Good table.

> 5. Create a table that stores the details in point 4 for each postcode
>
> CREATE TABLE t_PostCodeHouseHoldHeadAge(
> PostCodeHouseHoldHeadAgeId int,
> PostCodeId int FOREIGN KEY REFERENCES t_PostCode(PostCodeID),
> HouseHoldHeadAgeId int FOREIGN KEY REFERENCES
> t_HouseHoldHeadAge(HouseHoldHeadAgeId ) )
>

Oops. Every time someone gets old enough to move into the next age bracket, you'll have to update their record (the HouseHoldHeadAgeID column).

Perhaps create a table with the year of birth, instead of their current age bracket (HouseHoldHeadAgeID), and calculate their age when you select the data.

> I would really appreciate any feedback on my approach. I am by no means an
> expert at data modelling, but I want to do this correctly.
>
> Thanks in advance.
>
> Linda
>

Overall, you're on the right track. Just try to keep "calculated" fields out of your database. Store the data that allows you to calculate the value when you retrieve a row and you'll save yourself a lot of unneeded maintenance time.

Also, if PostCode is sacred and will NEVER change (rare occurance if it is), you can eliminate the t_PostCode table and store the PostCode, rather than the PostCodeID, directly in the tables where it appears as a foreign key. Sometimes it's preferable to not leave your database in a too theoretical state; sometimes that makes retrieval difficult and it's worthwhile to denormalize the physical database.

-- 
/Karsten
DBA > retired > DBA
Received on Fri May 23 2003 - 23:09:49 CEST

Original text of this message