Re-design of data storage table

From: Paul <paulwragg2323_at_hotmail.com>
Date: Fri, 20 Feb 2009 06:32:34 -0800 (PST)
Message-ID: <3228e19f-98d0-43ff-87dc-a788feeca12e_at_v38g2000yqb.googlegroups.com>



Hi,

I have been tasked with redesigning the way that our application data is stored in the database. I have come up with the way I think it should be done but I wanted to see how others would handle the same problem. We have to do this on SQL Server (and Oracle). I cannot use our actual database to give examples I am afraid, but I can use something similar.

Each record in our data table has certain attributes. I am going to use animals as an example. We have 2 tables that are important.

  1. The attributes table. This stores the attributes that can be associated with certain types of animal. For example, No. of legs, Tail etc. Each attribute is stored as a record.
  2. The data table. This stores the actual data.

I have posted example tables below in SQL Server:

CREATE DATABASE Redesign_example
GO

CREATE TABLE ANIMAL_TYPES
(ID INT NOT NULL,

TYPE VARCHAR(50) NOT NULL,
CONSTRAINT PK_ANIMAL_TYPES PRIMARY KEY (ID)) GO

CREATE TABLE ANIMAL_ATTRIBUTES
(ID INT NOT NULL,

TYPEID		INT		NOT NULL,
NAME		VARCHAR(50)	NOT NULL,
FIELDNAME	VARCHAR(50)	NOT NULL,

CONSTRAINT PK_ANIMAL_ATTRIBUTES PRIMARY KEY (ID)) GO

ALTER TABLE ANIMAL_ATTRIBUTES
ADD CONSTRAINT FK1_ANIMAL_ATTRIBUTES FOREIGN KEY (TYPEID) REFERENCES ANIMAL_TYPES (ID)
GO

CREATE TABLE ANIMAL_DATA
(ID INT NOT NULL,

TYPEID		INT	NOT NULL,
NUM1		INT	NULL,
NUM2		INT	NULL,

CONSTRAINT PK_ANIMAL_DATA PRIMARY KEY (ID)) GO

ALTER TABLE ANIMAL_ATTRIBUTES
ADD CONSTRAINT FK1_ANIMAL_DATA FOREIGN KEY (TYPEID) REFERENCES ANIMAL_TYPES (ID)
GO

INSERT INTO ANIMAL_TYPES
(ID, TYPE)

VALUES (1, 'Dog')

INSERT INTO ANIMAL_TYPES
(ID, TYPE)

VALUES (2, 'Bird')

INSERT INTO ANIMAL_ATTRIBUTES
(ID, TYPEID, NAME, FIELDNAME)

VALUES (1, 1, 'Age', 'NUM1')

INSERT INTO ANIMAL_ATTRIBUTES
(ID, TYPEID, NAME, FIELDNAME)

VALUES (2, 1, 'Tail?', 'NUM2')

INSERT INTO ANIMAL_ATTRIBUTES
(ID, TYPEID, NAME, FIELDNAME)

VALUES (3, 2, 'Can Fly?', 'NUM1')

INSERT INTO ANIMAL_ATTRIBUTES
(ID, TYPEID, NAME, FIELDNAME)

VALUES (4, 2, 'Wingspan (cm)', 'NUM2')

INSERT INTO ANIMAL_DATA
(ID, TYPEID, NUM1, NUM2)

VALUES (1, 1, 5, 1) INSERT INTO ANIMAL_DATA
(ID, TYPEID, NUM1, NUM2)

VALUES (2, 1, 13, 0) INSERT INTO ANIMAL_DATA
(ID, TYPEID, NUM1, NUM2)

VALUES (3, 2, 0, 50) INSERT INTO ANIMAL_DATA
(ID, TYPEID, NUM1, NUM2)

VALUES (4, 2, 1, 300) Hopefully the above is fairly simple to understand.

Basically, the user views the data by performing the following:

1)	Selects which animal they would like to view data for.
2)	Selects which columns they would like to see
3)	We then build up the SQL to query the data based on the selections.
In order to see which columns we need to query in the data table we look at the ANIMAL_ATTRIBUTES.FIELDNAME value corresponding to the selected attributes.

So if the user selects to view the Wingspan data for birds the query would be:

SELECT NUM1
FROM ANIMAL_DATA
WHERE TYPEID = 2 For dogs tail data it would be:

SELECT NUM2
FROM ANIMAL_DATA
WHERE TYPEID = 1 I hope from this very simplistic example you can see how the mappings work between the attributes and the data table.

As a new attribute is added it takes up a column in the data table. Clearly this is bad design but it is how we inherited the database and finally we are looking at a way to change this. As it stands there are limits on the number of numerical, textual and date attributes that can be added to the system – we provide a certain number of columns for storing the data, once these limits are reached no more attributes can be added.

We wish to change the way this data is stored so that the limits can be removed completely. As I say, I have a solution in mind but I do not want to influence how others would approach this. So really the challenge is how this can be solved in order to completely removed limits on the number of attributes that can be stored within the system, whilst ensuring that performance does not suffer. We are looking at storing millions of data records.

I would greatly appreciate any suggestions/input into how this problem can best be solved. Thank you for reading this.

Paul Received on Fri Feb 20 2009 - 08:32:34 CST

Original text of this message