Re: A searchable datastructure for represeting attributes?

From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 27 Feb 2002 15:28:30 -0800
Message-ID: <c0d87ec0.0202271528.250fb92a_at_posting.google.com>


The short answer is "No" and the long answer is that you are committing what I call "The OO Programmer's Mistakes" now that I have had to clean it up a few times. Your first approach is a META-DATABASE model trying to wedge itself into a DATABASE. It just don't work!! And I get $1500 to $2000 per day to tell your boss this when it falls apart -- before we start replacing the entire schema. In my experience, the collapse starts about 1-2 years after deployment -- when the database starts to get big.

The (object_id, attribute, value) model soon becomes the (object_id, attribute, value, unit_of_measure) model, which further mutates as more and morde meta-data columns are added. Every typo becomes a new attribute. American cars have 'color' and afre mesured in English units, but British cars have 'colour' and are measured in metric units, and then there is a 'clr' floating around as an orphan, etc.)

Or the typos become unverifiable values for an attribute. Look at this check clause:

CREATE TABLE Cars
(...
 weight DECIMAL(8,2) NOT NULL -- unit is pounds

        CHECK(weight BETWEEN 1000.00 AND 5000.00),   ...

I have a specific attribute upon which to hang a constraint. Nowq try that with your firt model:

CREATE TABLE Things
(thing_name VARCHAR(255) NOT NULL,
 attribute VARCHAR(255) NOT NULL,
 value VARCHAR(255) NOT NULL,
 CHECK ((CAST (value AS DECIMAL(8,2)

         BETWEEN 1000.00 AND 5000.00)
        AND attribute = 'weight'
        AND thing_name = 'car'),

  << insert hundreds of check() constriants here >>,  PRIMARY KEY (thing_name, attribute));

and don't for get to add error handling for a failed CAST() statement, when someone keys in (weight = 'Thursday').

Since value can be any possible datatype, your model has to force it into the most general datatype -- usually the longest VARCHAR(n) in the SQL engine -- then convert it to numeric or data datatypes for calculations. Sure hope everyone almosts used the same conversion; it would realllly screw you up if one guy used CAST (value AS INTEGER), or CAST (value AS DECIMAL (8,1), or CAST (value AS FLOAT), etc.  

The problem is that OO people think that a table is an object instance or a class. Ther was a post on the SQL Server newsgroup last year from a guy who wanted to create and drop a table for each shopping cart on a website. We talked him out of it. A domain is a class.

What you are trying to do is drive nails with a screw driver. Wrong tool.

>> ... The problem is that in our case is 2-fold:

  1. the attribute keys change frequently and on the fly (ie when the application is running) so "alter table drop/add column" is not practical when the table is large. <<

Attributes do not have keys; keys are a speical case of an attribute. That statement makes no sense. But if you mean that you want to allow users to re-design your database on the fly, then you failed to provide a good data model and need to start over.

>> 2) There are many, many attributes (~500) and only 3 or 4 at most
are set
for any given object. <<

Object? Relational databases store entities and relationships, not object!! See what I mean about your mindset being OO? But let's look at this for a minute. You are telling me that each entity in your database has 3-4 attributes?? At least one of them has to be the primary key or you don't have a table at all. While it is possible, it does not sound very likely.

>> The application is structured in such a way that when an object
[sic] does not have an attribute set it gets a default value associated with that attribute key [sic]. So from the perspective of the application all objects [sic] have all fields [sic] specified but the database only stores separate rows for those object that have value different from default. If you structure the schema with approach_2 your usage of space increases by a factor of about 100 consumed for storage of NULLs or default values. <<

So anyone really can insert anything into this mightmare!! What are you trying to model? The whole of creation?

>> Imagine you have a database of residents in a particular town and
you have attributes like phone number, address etc. But you also have attribtutes like place of birth which by nature of the data is the same for 99% of the residents of the town (eg the town itself)and is different for the 1% who moved in from elsewhere. <<

Fine, you have defined the attributes that make a resident and have an idea for a default value for one of the attributes. We need to worry about a key, but we can skip that for now.

>> Now imagine you have 500 attibutes with this quality 99% of objects
have the
same value and 1% has a different value. <<

Attributes of what? Attributes do not float around by themselves. Are back in meta-data land? When you mix meta data and data together, it is all jumbled.
But making the assumption that youmenat to say that in most of the tables, most of the attributes have an expected vlaue that can be used as a default value, I'd say this is great. I only have to program DEFAULT cluases once in the database and not in my application code.

>> If you use the trivial [sic] solution (as I described in
approach_2) your database is going to be 100 bigger than it needs to be. It becomes a case of denormalization in a way. <<

A normalized database is not trival. What you are doing is denormalized -- the rows in the "Things" table are not single facts (see Dr. Codd's definitinof a database).

>> Imagine also that attributes are constantly added and deleted as
the application is running. <<

What a nightmare!! Not only would you never know if your data is valid, you would never even know what it was from minute to minute.

>> When you think about this example you will see that approach_1
makes a lot more sense except for the fact that searching it requires lots of self-joins (or correlated subqueries). <<

It makes no sense and does not work.

>> ... Is there any datastructure that has the flexability and
"normalization" of approach_1 and the ease of search of approach_2. <<

You want a persistent object store of some sort. I don't know if anyone hs such a thing that can be accessed easily, to be honest. Do not do this in SQL. Received on Thu Feb 28 2002 - 00:28:30 CET

Original text of this message