Re: Design question

From: Kenneth Downs <MyUseNetHandle_at_linuxmail.org>
Date: Sun, 04 May 2003 12:55:11 -0400
Message-ID: <d2h39b.l65.ln_at_mercury.downsfam.net>


Quoting unnamed sources, san claimed:

> Hi,
> I have an application that deals with an entity called "Document". A
> "Document" can have several attributes. These attributes can be either
> of {int, float, date, string} type. However, I do not know beforehand
> the number of attributes. The user may add more attributes to this
> "Document" entity. For example, right now it may have attributes such
> as <Author>, <Creation_Date>. Later, a user may add <Num_Pages>. So,
> modeling the "Document" entity as a relation with attributes as column
> names is difficult. I had this approach in mind and wanted to know if
> there is a better way.
>
> I was thinking of creating a relation DocAttr(AttributeName,
> AttributeValue). I can create 4 such relations for each of {int,
> float, string and date}. DocAttrInt, DocAttrFloat, DocAttrString and
> DocAttrDate. Then I can store <Author, "Some Name"> in DocAttrString.
>
> This way more attributes can be added on later or existing ones can be
> removed.
>
> Is there a better way? Keep in mind that I am required to use an
> RDBMS.
>
> Thanks,
> San

It seems you are asking not so much "How do I design system X", but rather, "How do I protect against future design changes?"

This is an excellent question, especially in your generalized formulation, because all systems with any life in them are changing. Therefore, it is necessary in any development strategy to know how you will handle changes to the schema.

In my experience, however, moving attributes out to separate tables does not pay off that well. It buys you the ability to add data elements without changing the schema, but at a huge price. First and foremost, the simplest SQL statements (Select * from Table1) will no longer work. Next, you must still change code in the host language to adapt to and handle the data, after all, if it's there, it must be for a reason, and some code is likely to handle it in some way.

The logical end-point of this idea is a highly abstracted collection of perhaps 5 or 6 tables that list only objects, relationships, and properties. But again, you must then reinvent the equivalent of SQL to handle Inserts, updates, deletes and selects. Imagine your "Document" is now a 1:M and you have extended tables listing properties for both header and detail. How do you craft a simple query that filters on properties in both? It can be done, sure, but simple it is not.

So, if we accept that we don't want to break SQL, your UI dev tool, your reporting tool, and everything else, we see that all you really need is a way to make schema changes with a minimum of fuss, which is really what everyone needs. This for me has always been the Data Dictionary, a collection of simple tables that describe the schema itself. Some mechanism takes changes to the dictionary and turns them into physical changes on a real live DB server. Then, dependening on how well integrated the system is, other things such as UI, reporting tools, import/export engines and other stuff can all be driven off the dictionary.

Perhaps if you give us some idea of what products you are looking at, project size, scope, etc., we can make a stab at some recommendations.

-- 
Kenneth Downs
Received on Sun May 04 2003 - 18:55:11 CEST

Original text of this message