Representation for Heterogeneous Attribute Set

From: <robertbrown1971_at_yahoo.com>
Date: 10 Feb 2005 22:06:41 -0800
Message-ID: <1108102001.898621.256240@l41g2000cwc.googlegroups.com>

My company is working on a bond derivative portfolio analysis tool and we're facing a problem that I did not see adequately addressed any where in literature. I really did RTFM. I'm very experienced in relational modelling (10+ years) so this is not a case of not understanding the principles. Here is the problem stripped of irrelevant context. The problem below is simplified for the sake of the example so don't sweat the details.

THE PROBLEM

There are many types of bonds, each type has a different set of attributes, different attribute names, different attribute datatypes.

For example, bond A has two variables: a yearly interest rate and date of issue, B has five variables: an interest rate and 4 specific dates on which various portions of principal need to be paid, bond C has a set of 4 variables: interest rate in period 1, interest rate in period 2, the date on which the bond can be put back to the issuer, and two dates on which the bond can be called by the issue. And so on.

So, on the first attempt I could represent each bond type as its own table. For example,

create table bond_type_a (rate INTEGER, issue_date DATE) create table bond_type_b (rate INTEGER, principle_date1 DATE, principle_date2 DATE, principle_date3 DATE, principle_date4 DATE) create table bond_type_c (rate1 INTEGER, rate2 INTEGER, put_date DATE, call_date DATE)

This is the nice relational approach but it does not work because:

2. There are many thousands of bond types thus we would have to have many thousands of tables which is bad.

3. The client needs to be able construct the bond types on the fly through the UI and add it to the system. Obviously, it would be bad if each new type of bond created in the UI resulted in a new table.

4. When a user loads the bond portfolio it needs to be very fast. In the table per type approach if a user has a 100 different types if bond in the portfolio you would have to do 100 joins. This is a heavily multi user environment so it's a non-starter. It's impossibly slow.

THE SOLUTIONS So now that we ditched the table per bond type approach we can consider the followiing solutions (unpleasant from the relational point of view):

Name-Value pairs.

create table bonds (bond_id INTEGER, bond_type INTEGER, attribute_id INTEGER, value VARCHAR(255))

Comment: The client does not like this approach because they want to run various kinds of reports and thus they doe not want the values to be stored as VARCHAR. They want the DB to enforce the datatype.

2. Typed Name-Value pairs.

create table bonds (bond_id INTEGER, bond_type INTEGER, attribute_id INTEGER, int_val INTEGER, string_val VARCHAR(255), date_val DATE_

Comment: The client does not like this because the table is sparse. Every row has two empty fields.

3. Link table with table per data type.

create table bonds (bond_id INTEGER)

create table bond_int_data (bond_id INTEGER REFERENCES bonds(bond_id), value INTEGER)
create table bond_string_data (bond_id INTEGER REFERENCES bonds(bond_id), value VARCHAR(255))
create table bond_date_data (bond_id INTEGER REFERENCES bonds(bond_id), value DATE)

Comment: This meets most of the requirements but it just looks ugly.

4. Dynamic Mapping

create table (bond_id INTEGER, int_val1 INTEGER, int_val2 INTEGER, date_val1 DATE, date_val2 DATE, string_val1 VARCHAR(255), string_val2 VARCHAR(255)) Then you have to add some dynamic mapping in your code which will provide bond specific mapping (say, stored in an XML file). For example,

For bond_A: yearly_rate maps to int_val1, issue_date maps to date_val1 For bond_C: rate1 maps to int_val1, rate2 maps to int_val2, put_date maps to date_val1, call_date maps to date_val2)

Comment: This is very good for performance because when I load a portfolio of different bond types I can pull them all in in one SELECT statement. However this approach has a problem that the table is sparse. The number of fields of each type has to be as high as to accmodate the most complex bond while simple bonds will only be using two or three.

THE QUESTIONS: Are the four approaches I described above exhaustive? Are there any other that I overlooked? Received on Fri Feb 11 2005 - 00:06:41 CST