Re: EAV (Re: Object-relational impedence)

From: topmind <topmind_at_technologist.com>
Date: Sun, 30 Mar 2008 02:08:40 -0700 (PDT)
Message-ID: <a742d466-e6a9-4d13-a1ba-953567fcb3dc_at_u10g2000prn.googlegroups.com>


Brian Selzer wrote:
> "topmind" <topmind_at_technologist.com> wrote in message
> news:988c8429-32c9-4aa6-a913-c241da9eba32_at_h11g2000prf.googlegroups.com...
> >
> >
> > Brian Selzer wrote:
> >> "topmind" <topmind_at_technologist.com> wrote in message
> >> news:f743faa4-962b-4048-9d3d-203951b57e17_at_e6g2000prf.googlegroups.com...
> >> >
> >> >
> >> > Brian Selzer wrote:
> >> >> "topmind" <topmind_at_technologist.com> wrote in message
> >> >> news:1f649255-eba1-4e82-82b1-99228e761fa5_at_c19g2000prf.googlegroups.com...
> >> >> >
> >> >> >
> >> >> > Brian Selzer wrote:
> >> >> >> "topmind" <topmind_at_technologist.com> wrote in message
> >> >> >> news:79cae24d-bca7-47e5-94f6-7bb5bfa508f1_at_s8g2000prg.googlegroups.com...
> >> >> >> >
> >> >> >> >
> >> >> >> > Brian Selzer wrote:
> >> >> >> >> "topmind" <topmind_at_technologist.com> wrote in message
> >> >> >> >> news:a8855c5d-1e3f-46ef-a99a-5c09b94bfba5_at_u10g2000prn.googlegroups.com...
> >> >> >> >> > (addendum)
> >> >> >> >> >
> >> >> >> >> > For an example of the need for flexible columns/attributes,
> >> >> >> >> > consider
> >> >> >> >> > a
> >> >> >> >> > relational-based GUI system where widget state and info are
> >> >> >> >> > tracked
> >> >> >> >> > in
> >> >> >> >> > some kind of RDBMS. We want it to be able to add new widgets
> >> >> >> >> > from
> >> >> >> >> > different vendors without each widget needing its own table(s)
> >> >> >> >> > (unless
> >> >> >> >> > its a really special need).
> >> >> >> >> >
> >> >> >> >> > There seems to be a need for either a EAV table (row-based
> >> >> >> >> > dynamism)
> >> >> >> >> > or Dynamic Relational (column-based dynamism) of some kind so
> >> >> >> >> > that
> >> >> >> >> > attributes that are not known up-front can be tracked.
> >> >> >> >> >
> >> >> >> >> > If you suggest otherwise, please present your design.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> Well it's really simple. You have a single table with
> >> >> >> >> attributes
> >> >> >> >> that
> >> >> >> >> are
> >> >> >> >> common to all possible widgets, and when a set of attributes
> >> >> >> >> that
> >> >> >> >> are
> >> >> >> >> not
> >> >> >> >> known up front needs to be tracked, a new table or set of tables
> >> >> >> >> is
> >> >> >> >> created
> >> >> >> >> to house the set of attributes that references the table with
> >> >> >> >> the
> >> >> >> >> common
> >> >> >> >> attributes. It's really very simple and doesn't require a
> >> >> >> >> special
> >> >> >> >> dynamism
> >> >> >> >> (Is that the correct use of that word?) mechanism.
> >> >> >> >
> >> >> >> > Having different vendor widgets being able to create their own
> >> >> >> > little
> >> >> >> > table is not very practical. There are versioning issues, for
> >> >> >> > one.
> >> >> >> > What if a widget creates a table that does not work with the
> >> >> >> > current
> >> >> >> > version or brand of our GUI RDBMS?
> >> >> >> >
> >> >> >>
> >> >> >> Since when do widgets create their own tables?
> >> >> >
> >> >> > I thought that was *your* suggestion. May I suggest you describe a
> >> >> > sample scenario to reduce unstated assumptions between us.
> >> >> >
> >> >>
> >> >> Widgets don't necessarily need to know how to serialize themselves.
> >> >> They
> >> >> only need to know how to provide a snapshot of themselves for
> >> >> serialization.
> >> >> Some other part of the system may be responsible for committing that
> >> >> snapshot to the database.
> >> >>
> >> >> >>
> >> >> >> > And, I find gazillion tables hard to navigate when looking for
> >> >> >> > tables
> >> >> >> > or debugging. I never liked table-happy designs, but individuals
> >> >> >> > may
> >> >> >> > have personal preferences that differ.
> >> >> >> >
> >> >> >>
> >> >> >> For me, it's a lot easier to wade through a gazillion tables than
> >> >> >> to
> >> >> >> try
> >> >> >> to
> >> >> >> decipher the content in an EAV table.
> >> >> >
> >> >> > That's you, not me. In this case I prefer EAV.
> >> >> >
> >> >> >> The 'table-happy' design permits the
> >> >> >> definition of relationships between attributes.
> >> >> >> That is a whole lot more
> >> >> >> difficult when everything is stuffed in an EAV table. For example,
> >> >> >> suppose
> >> >> >> that every widget that has attribute Y also has attribute Z. Try
> >> >> >> enforcing
> >> >> >> that rule with everything stuffed in an EAV table. With the
> >> >> >> 'table-happy'
> >> >> >> design, all you need is to specify that attributes Y and Z are in
> >> >> >> the
> >> >> >> same
> >> >> >> table.
> >> >> >
> >> >> > I don't see why that would be a common scenario for GUI's. I cannot
> >> >> > even think of a single realistic case right now. I am not saying it
> >> >> > will never happen, but its not common enough to tilt the decision
> >> >> > unless its close.
> >> >> >
> >> >> > Plus, lots of similar tables often results in having to do UNION
> >> >> > queries, which are ugly and slow. With EAV, one could get/dump the
> >> >> > GUI
> >> >> > attributes in a single query. With your myriad table suggestion, one
> >> >> > would have to first take an inventory of all custom-widget tables.
> >> >> >
> >> >>
> >> >> I don't see why you would need to do union queries. I think you're
> >> >> stuck
> >> >> on
> >> >> the idea that each widget type has its own table. That's not at all
> >> >> what
> >> >> I
> >> >> am suggesting. There is a table for attributes that are common to all
> >> >> possible widgets, and a table for each set of attributes that are
> >> >> common
> >> >> not
> >> >> to all possible widgets, but rather to a subset of all possible
> >> >> widgets.
> >> >> So
> >> >> for example, suppose that widget A has attributes T, U, V, and W, and
> >> >> widget
> >> >> B has attributes T, U, V, X, and Y, and widget C has attributes T, U,
> >> >> V,
> >> >> Y
> >> >> and Z, you wouldn't necessarily have a table for each type of widget.
> >> >> Assuming that T, U, and V are common to all widgets, you would have
> >> >> one
> >> >> table, the common table, with attributes T, U, and V. Let's also
> >> >> assume
> >> >> that values for T uniquely identify a widget, meaning that T would be
> >> >> the
> >> >> primary key of the common table. Then you would have a table with T
> >> >> and
> >> >> W,
> >> >> a table with T and X, one with T and Y, and one with T and Z, each
> >> >> referencing the common table. Widget A would have a row in the common
> >> >> table
> >> >> and one in the table with T and W; widget B would have a row in the
> >> >> common
> >> >> table, one in the table with T and X and one in the table with T and
> >> >> Y;
> >> >> widget C would have a row in the common table, one in the table with T
> >> >> and Y
> >> >> and one in the table with T and Z. When a new widget, D, that has
> >> >> attributes T, U, V, X and P needs to be recorded, a new table with T
> >> >> and
> >> >> P
> >> >> would need to be created, and then widget D would have a row in the
> >> >> common
> >> >> table, one in the table with T and X and one in the new table with T
> >> >> and
> >> >> P.
> >> >
> >> > How is this better than an EAV table? At least with EAV's, you don't
> >> > have to create new tables for columns that don't already exist.
> >> >
> >>
> >> You're kidding yourself if you think there is any less work with an EAV
> >> table. So what if you have to create a new table for columns that don't
> >> already exist. Unless you don't care about integrity, you will have to
> >> alter the constraints on an EAV table whenever you add an attribute that
> >> doesn't already exist. The columns in each new table are typed. The
> >> values
> >> in the EAV table cannot be, otherwise you wouldn't be able to store names
> >> and addresses and numbers and images in the same table. This means that
> >> for
> >> each attribute, you have to define a constraint that specifies the values
> >> allowable for that attribute. There are other reasons that this is
> >> better.
> >> I listed one earlier: if whenever A, B, then put A and B in the same
> >> table.
> >> Another is that indexes can be applied to individual columns or groups of
> >> colums. Constraints are simpler, and therefore easier to read,
> >> understand
> >> and maintain, not to mention that being on separate objects, they can
> >> operate independently, improving both performance and scalability.
> >
> > A bunch of joins is not exactly fast. I don't believe you.
> >

>

> Where did you come up with a bunch of joins? Even so, a merge join even
> over multiple tables has linear complexity when the inputs are already
> sorted.
>

> >> If you
> >> have hundreds of different attributes, then you have hundreds of sets of
> >> values, and therefore hundreds of additional compares just to determine
> >> which constraints apply. I could go on, but it should be obvious that
> >> you
> >> don't really gain anything by using an EAV table, you just end up
> >> reinventing the wheel: instead of using the type checking that is built
> >> into
> >> the system, you have to roll your own.
> >
> > I am not much of a fan of types. A data dictionary can be used for
> > validation if needed. Anyhow, this sounds like a perpetual argument
> > that keeps repeating the same arguments over and over. Each solution
> > has its own drawbacks and compromises.
> >
>

> Some more than others.
>

> > And *if* new tables are to be created, I think dedicated tables per
> > widget, other than the shared ones, is preferable to one table per new
> > column.
> >
>

> To each his own. Can you offer any justification for what you think, or is
> it just your gut feeling.

It's a gut feeling for everyone here because it's never been tried in practice. We cannot really know with near certainty all the possible usage scenarios (query patterns) and which are more frequent. It's an educated WAG at best.

There's another suggestion that nobody's mentioned here that I know of: Add each new column to the Widgets table as needed, creating a "sparse" table. If we are going to allow add/change to schemas, then why not just add the new frippen column to the Widgets table and be done with it? Sparse tables are not really space hogs on most modern RDBMS like they were in the past. The overhead for unused (null) columns is small.

>
> > -T-

-T- Received on Sun Mar 30 2008 - 12:08:40 CEST

Original text of this message