Re: EAV (Re: Object-relational impedence)

From: topmind <topmind_at_technologist.com>
Date: Wed, 2 Apr 2008 22:06:57 -0700 (PDT)
Message-ID: <190c91cd-08d7-489e-921d-8b58c46b7fd7_at_i12g2000prf.googlegroups.com>

Brian Selzer wrote:
> "topmind" <topmind_at_technologist.com> wrote in message
> news:a5fa390d-025e-4708-93e0-588d573faa2d_at_u10g2000prn.googlegroups.com...
> >
> >
> > Brian Selzer wrote:
> >> "topmind" <topmind_at_technologist.com> wrote in message
> >> news:a742d466-e6a9-4d13-a1ba-953567fcb3dc_at_u10g2000prn.googlegroups.com...
> >> [big snip]
> >>
> >> > It's a gut feeling for everyone here because it's never been tried in
> >> > practice. We cannot really know with near certainty all the possible
> >> > usage scenarios (query patterns) and which are more frequent. It's an
> >> > educated WAG at best.
> >> >
> >>
> >> Perhaps some testing is in order.
> >>
> >> > There's another suggestion that nobody's mentioned here that I know
> >> > of: Add each new column to the Widgets table as needed, creating a
> >> > "sparse" table. If we are going to allow add/change to schemas, then
> >> > why not just add the new frippen column to the Widgets table and be
> >> > done with it? Sparse tables are not really space hogs on most modern
> >> > RDBMS like they were in the past. The overhead for unused (null)
> >> > columns is small.
> >> >
> >>
> >> It's not about space. Separate tables can be indexed separately,
> >> permitting
> >> the selection of better query plans. A lot of nulls, such as you would
> >> find
> >> in a sparse table, skews the statistics for columns, making some indexes
> >> that would otherwise appear optimal less attractive, leading to poor
> >> performance.
> >
> > I doubt queries against sparse columns would be common. Generally
> > there would first be a query involving the "widget type" and/or
> > attribute type, which would greatly narrow what is being searched.
> > And, depending on how the optimizer and index system is built, you
> > don't need to store Nulls *in* the indexes. Think about it. There are
> > a few operations where indexing nulls makes sense, but they are not
> > common enough to bother in my opinion. There are other ways to get the
> > same info. But, a given vendor may have seen it different. In short,
> > it depends on the DB engine.
> >

> You're entitled to your opinion, even if it's wrong. There may not even be
> a need for a "widget type" attribute.

Perhaps "type" is not quite the right description. It would identify a specific widget implementation, say the "button from Foo Inc.". That can be used to look up the dictionary(s) of available attributes and features.

>


> > EAV's don't need to store nulls. You just don't include that entry. If

> > a given index engine is not null-friendly, then perhaps an EAV is

> > better than a sparse table as far as performance.
> >

> No. EAVs don't store nulls. But their structure makes creating usable
> indexes extremely difficult if not impossible. That makes me question
> whether performance would be better with an EAV.

I don't see why this would be the case. Being a "skinny" table, you can easily index every column. It is hard to out-index that.

> >> The idea of a sparse table is ugly because it involves nulls.
> >> Nulls--especially those that indicate 'there shouldn't be a value
> >> here'--should be avoided whenever possible. If you're interested, there
> >> have been many discussions on cdt regarding nulls. If you can wade
> >> through
> >> the flames, you might even find some useful information.
> >>
> >> Data has an inherent structure.
> >
> > Sometimes its "inherent structure" does NOT fit static tables well.
> > The real world is not always friendly to a given abstraction. And,
> > although I am against "nulls" for strings, the concept of some kind of
> > "empty" cell (zero length) is not necessarily a bad thing. (In some
> > RDBMS, zero length and null are the same thing, in others they are
> > not.)
> >

>
> I don't think you understand "inherent structure."

I suspect you have a personal pet concept that you gave that label. But you're free to prove me wrong. "Structure" is usually a humanbuilt abstract model to approximate or describe some limited aspect of the real world, and there are many possible ways to model the same given thing.

> >> Stuffing everything into one table, whether
> >> it be an EAV table or a sparse table imposes an alien structure on the
> >> data,
> >> introducing redundancy and complexity and usually reducing performance.
> >> A
> >> solution that has one table per widget type also imposes a structure on
> >> the
> >> data, one that inevitably introduces redundancy. The 'table-happy'
> >> design
> >> that I suggested earlier does not introduce redundancy and does not
> >> require
> >> nulls.
> >
> > Your solution does NOT reduce redundancy because the foreign key has
> > to be repeated over and over again. And, empty cells don't
> > necessarily reduce performance, per above. And, filling up table-space
> > is ugly and not debugging-friendly. Fat table-spaces slow me down. I
> > cannot speak for everybody's psychology and hand-eye-mouse-brain
> > coordination, but it slows down MINE.
> >

>
> By most, foreign key references do not constitute redundancy.

I disagree. For one, it fills up the schemas with duplication regardless of what is actually in the tables. Human eyes have to sift over all that.

> In direct
> image systems, foreign key references are problematic because updates must
> often physically cascade. But not all systems work that way. Some systems
> internally use pointers instead of duplicating what is referenced so that
> physical cascades are unnecessary. And don't kid yourself: empty cells
> necessarily reduce performance, they just may not reduce it noticably.

I would bet that joins slow things down more than empty cells. But we can agree to treat it as a wash for this debate.

>


> > I will respect your choice because the tradeoffs are all sticky either

> > way. But, I do not agree with it, nor do I agree with your performance

> > criticisms and redundancy claims. They appear incorrect.
> >

-T- Received on Thu Apr 03 2008 - 07:06:57 CEST

This message: [ Message body ]
Next message: paul c: "Re: CODASYL-like databases"
In reply to Brian Selzer: "Re: EAV (Re: Object-relational impedence)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message