Re: Best way to design table to store attributes?

From: <patrick61z_at_yahoo.com>
Date: Fri, 23 Jan 2009 12:39:41 -0800 (PST)
Message-ID: <6d625658-7af0-4d20-95f6-5c2eadb75163_at_v38g2000yqb.googlegroups.com>



On Jan 22, 7:18 pm, carl.henth..._at_gmail.com wrote:
> On Jan 22, 12:53 pm, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
>
>
>
> > carl.henth..._at_gmail.com wrote:
> > > I have an entity that along with all of the usual attributes, has a
> > > set of 140 bit flag attributes that I need to track all yes/no values
> > > of. I started going down the path of creating a table 150 columns
> > > wide, but after typing a few of them thought there may be a better
> > > way. :-)
> > > I am thinking of creating a table like this:
> > > Create table details (
> > > DetailsID int identity(1000,1),
> > > Name nvarchar(40),
> > > Descr nvarchar(60),
> > > Category nvarchar(40) )
> > > Thus I can take what would have been my column headers and use them in
> > > Name field, throw what would have been the bit flag value and throw
> > > that into the description field. I would tie them together with a
> > > category value to combine them into logical groups.
> > > I have around 1.2 million entities to store this info for, so this new
> > > table will get long fast.
>
> > > Another passing thought was to store these sets of values as XML for
> > > each entity. I have to admit I dont know what the advantage is for
> > > that other than the row count savings. how easy would it be to search
> > > through a 140 field xml blob for 1 million entites?
>
> > > Any advice is appreciated on how to tackle this.
>
> > Before folks start pointing and laughing and snickering about you
> > re-inventing EAV yet again, I thought I would point out that the
> > simplest logical representation for a boolean in a relational system is
> > simply the existence of a tuple in a relation.
>
> > Thus, a design that has 150 tables each with a single column
> > theoretically suffices.
>
> > On the other hand, such a design would clearly violate the POOD. Some
> > might question the principle of orthogonal design, but I think one
> > ignores it at one's own risk when creating 150 tables with identical
> > predicates.
>
> > At a more basic level, though, are you sure you have correctly modelled
> > your problem? 150 independent booleans creates a state machine with
> > somewhere around 10^45 states. That's a big state machine. Without any
> > sort of transition constraints, that creates a fully connected state
> > machine with 10^45 states and somewhere around 10^90 allowable transitions.
>
> > Those are big numbers, and it seems unlikely you really need such an
> > unwieldy state machine for each row of your table.- Hide quoted text -
>
> > - Show quoted text -
>
> Thank you for bypassing the hazing of the rookie, I appreciate it
> greatly! :-)
> For the purposes of this table, my entity is a person, and the 150 bit
> fields are a combination of columns describing all of the travel
> interests that they may have in a "yes I am interested", or "No, I am
> not interested" format. While it is possible to have a person
> interested in all 150 travel destinations, it is very unlikely, thus
> giving me a subset to work with. Since the set of "yes" values
> determines which marketing pieces get sent out, I do need to keep
> track of those, obviously, but I do not see the reason for keeping
> track of the "no's" since marketing would not be sent to them anyways.
> In my proposed table the absense of a row is the same as a "No" value.
>
> Being moslty self taught, I was not familier with the term EAV. But
> after looking it up, I can see that my issue is very simualr to a
> patient-symptom set of data for a doctor. Sounds like this is the way
> to go after all.
>
> I still have to wonder if there is a better way, however, to store/
> read that info. Perhaps a quasi-binary where each bit represents a
> field in a footprint? or even the XML blob.
> thank you again for the response!

I would use repeating groups, or simply use a blob. I ran into the same situation and had 150 or so "fields" with a range of a few values (I went ahead and simple used a character position and it worked fine).

You don't have to do everything in your database layer, and even if you did, many sql implementations give you tools to programmatically dissect the field to extract further values that could then be used in sql queries themselves.

If all you have knowledge of is RDBMS tools then yes, you are going to not do well with problem domains that aren't as well served by RDBMS tools. But then again you don't see SQL specialists designing video codecs either, so theres no real dilemma despite some of the noise I see posted in this group protesting otherwise. Received on Fri Jan 23 2009 - 14:39:41 CST

Original text of this message