Re: A searchable datastructure for represeting attributes?
Date: Wed, 27 Feb 2002 14:13:00 -0500
Message-ID: <3C7D2FBB.1567139F_at_thankyou.very_much>
Obnoxio The Clown wrote:
> On Wed, 27 Feb 2002 11:16:15 -0500, no_spam_for_me_at_thankyou.very_much
I would love to. In fact we did use approach_2 for a a long time and
unfortunately it proved impractical. Yes approach_2 has all the benifits I
described plus as you correctly pointed out it's easier to report on. The
problem is that in our case is 2-fold:
Even though the object model of my particular situation is too long and
boring to go into I will give the following example: Imagine you have a
database of resdidents in a particular town and you have attributes like
phone number, address etc. But you also have attribtutes like place of birth
which by nature of the data is the same for 99% of the residents of the town
(eg the town itself)and is different for the 1% who moved in from elsewhere.
Now imagine you have 500 attibutes with this quality 99% of objects have the
same value and 1% has a different value. If you use the trivial solution (as
I described in approach_2) your database is going to be 100 bigger than it
needs to be. It becomes a case of denormalization in a way. Imagine also
that attributes are constantly added and deleted as the application is
running. When you think about this example you will see that approach_1
makes a lot more sense except for the fact that searching it requires lots
of self-joins (or correlated subqueries). If it was not for searching
approach_1 below would be a far superior solution. Now my question is: Is
there any datastructure that has the fliexibility and "normalization" of
> wrote:
Thanks
dre
>
> >
> >THE QUESTION: Is there any good datastructure for representing
> >attributes which has the advantages of approach_1 and the search and
> >sort convenience of approach_2 below? Seems like a problem many people
> >would
> >have run into.
> >
> >
> >1)
> >
> >objectid key value
> >78 phone 555-1212
> >78 addr Main Street
> >79 phone 222-2222
> >79 addr Green Street
> >
> >(keys could be normalized of course but I skip that for this example)
> >
> >
> >2)
> >
> >objectid phone addr
> >
> >78 555-1212 Main Street
> >79 222-2222 Green Street
> >
> >The first approach has several advantages
> >
> >a) It is flexible. That is you do not need to change the schema to add
> >new attribute types.
>
> I think this could best be described as a false economy. Also, it
> doesn't play well with third-party report writers, etc.
>
> >b) It is normalized in a sense that if an attribute does not have a
> >value you can represent it by its absense in the table istead of
> >duplicating NULL in every feild that does
> > not have a value. In my case this is actually a critical thing
> >because there are a lot of attributes whose values are not set.
>
> <shudder>
>
> >Overall the first approach serves my case much better than second
> >approach. However it has important disadvantages in that the search by a
> >
> >combination of attributes as
> >well is sort by attributes is very cumbersome and slow. For example if I
> >
> >wanted to find all objects where phone number is 555-1212 AND address
> >equals "Main Street" I would need to
> >do a self join like
> >
> >select t1.objectid from mytable t1, mytable t2 where
> >t1.objectid=t2.objectid and t1.phone='555-1212' and t2.addr='Main
> >Street'. There are many problems here
> >
> >1) very slow when table is large
> >2) you need to do a self join for every attribute in the AND-ed
> >expression.
> >3) If you want to sort by an attribute other than the search key you
> >need to self join yet again.
> >
> >
> >By contrast with approach 2 all these problems go away:
> >
> >select id from mytable where phone='555-1212' and addr='Main Street'
> >order by some_field
Received on Wed Feb 27 2002 - 20:13:00 CET