Re: A searchable datastructure for represeting attributes?

From: crappy <crappygolucky_at_hotmail.com>
Date: 27 Feb 2002 15:41:06 -0800
Message-ID: <ce31c410.0202271541.2606a61d@posting.google.com>

(ng list trimmed to oracle only)

i'm wondering, if you really wanted to use approach 1 for physical storage, you might be able to create a view that pivots the table to have the searchable 'feel' of approach 2, and i think with oracle's optimizer, performance that's at least better than approach 1 in cases where many attributes are involved in the search criteria. the only problem is you will have to update the view definition every time an attribute is added, but that's better than adding a column to the table every time. i.e.,

create or replace view v as
select oid, max(decode(key, 'phone', value, null)) as phone, max(decode(key, 'addr', value, null)) as addr, max(decode(key, 'color', value, null)) as color from tab
group by oid

(theoretically you might even use a query to generate that view definition based on all possible values for your 'key'.)

just need to add one of those 'max(decode(..' lines for each possible attribute.

i think in most cases, in terms of performance, this will work better than the self-join-per-attribute, as you mentioned. i could be wrong tho, i haven't tried it.

just a thought.

no_spam_for_me_at_thankyou.very_much wrote in message news:<3C7D2FBB.1567139F_at_thankyou.very_much>...
> Obnoxio The Clown wrote:
>
> > On Wed, 27 Feb 2002 11:16:15 -0500, no_spam_for_me_at_thankyou.very_much
> > wrote:
>
> > Use approach 2.
>
> I would love to. In fact we did use approach_2 for a a long time and
> unfortunately it proved impractical. Yes approach_2 has all the benifits I
> described plus as you correctly pointed out it's easier to report on. The
> problem is that in our case is 2-fold:
>
> 1) the attribute keys change frequently and on the fly (ie when the
> application is running) so "alter table drop/add column" is not practical
> when the table is large.
>
> 2) There are many many attrubutes (~ 500) and only 3 or 4 at most are set
> for any given object. The application is structured in such a way that when
> an object does not have an attribute set it gets a default value associated
> with that attribute key. So from the perspective of the application all
> objects have all fields specified but the database only stores separate rows
> for those object that have value different from default. If you structure
> the schema with approach_2 your usage of space increases by a factor of
> about 100 consumed for storage of NULLs or defaulft values.
>
> Even though the object model of my particular situation is too long and
> boring to go into I will give the following example: Imagine you have a
> database of resdidents in a particular town and you have attributes like
> phone number, address etc. But you also have attribtutes like place of birth
> which by nature of the data is the same for 99% of the residents of the town
> (eg the town itself)and is different for the 1% who moved in from elsewhere.
> Now imagine you have 500 attibutes with this quality 99% of objects have the
> same value and 1% has a different value. If you use the trivial solution (as
> I described in approach_2) your database is going to be 100 bigger than it
> needs to be. It becomes a case of denormalization in a way. Imagine also
> that attributes are constantly added and deleted as the application is
> running. When you think about this example you will see that approach_1
> makes a lot more sense except for the fact that searching it requires lots
> of self-joins (or correlated subqueries). If it was not for searching
> approach_1 below would be a far superior solution. Now my question is: Is
> there any datastructure that has the fliexibility and "normalization" of
> approach_1 and the ease of search of approach_2. I do not mean ease in a
> sense that the query has to be trivial but in a sense that it would be fast.
> Combining SQL with application logic is also fair game as long as it
> achieves the results.
>
> Thanks
>
> dre
>
>
>
>
>
>
>
>
>
>
>
> >
> > >
> > >THE QUESTION: Is there any good datastructure for representing
> > >attributes which has the advantages of approach_1 and the search and
> > >sort convenience of approach_2 below? Seems like a problem many people
> > >would
> > >have run into.
> > >
> > >
> > >1)
> > >
> > >objectid key value
> > >78 phone 555-1212
> > >78 addr Main Street
> > >79 phone 222-2222
> > >79 addr Green Street
> > >
> > >(keys could be normalized of course but I skip that for this example)
> > >
> > >
> > >2)
> > >
> > >objectid phone addr
> > >
> > >78 555-1212 Main Street
> > >79 222-2222 Green Street
> > >
> > >The first approach has several advantages
> > >
> > >a) It is flexible. That is you do not need to change the schema to add
> > >new attribute types.
> >
> > I think this could best be described as a false economy. Also, it
> > doesn't play well with third-party report writers, etc.
> >
> > >b) It is normalized in a sense that if an attribute does not have a
> > >value you can represent it by its absense in the table istead of
> > >duplicating NULL in every feild that does
> > > not have a value. In my case this is actually a critical thing
> > >because there are a lot of attributes whose values are not set.
> >
> > <shudder>
> >
> > >Overall the first approach serves my case much better than second
> > >approach. However it has important disadvantages in that the search by a
> > >
> > >combination of attributes as
> > >well is sort by attributes is very cumbersome and slow. For example if I
> > >
> > >wanted to find all objects where phone number is 555-1212 AND address
> > >equals "Main Street" I would need to
> > >do a self join like
> > >
> > >select t1.objectid from mytable t1, mytable t2 where
> > >t1.objectid=t2.objectid and t1.phone='555-1212' and t2.addr='Main
> > >Street'. There are many problems here
> > >
> > >1) very slow when table is large
> > >2) you need to do a self join for every attribute in the AND-ed
> > >expression.
> > >3) If you want to sort by an attribute other than the search key you
> > >need to self join yet again.
> > >
> > >
> > >By contrast with approach 2 all these problems go away:
> > >
> > >select id from mytable where phone='555-1212' and addr='Main Street'
> > >order by some_field
Received on Wed Feb 27 2002 - 17:41:06 CST