Re: EAV - again

From: James K. Lowden <jklowden_at_speakeasy.net>
Date: Sun, 8 Feb 2015 18:59:02 -0500
Message-Id: <20150208185902.32703a7d.jklowden_at_speakeasy.net>


On Sun, 8 Feb 2015 12:44:20 +0000
Eric <eric_at_deptj.eu> wrote:

> > I think what he means is really unanalyzed or poorly organized
> > data. I think that's a better term because it puts the emphaisis
> > where it belongs: not on the "nature" of the data, but on the
> > undone work of preparing them for analysis.
>
> Perhaps the term should be "non-data", and I am thinking of a picture,
> or a poem, or a newspaper article which, these days, are stored in a
> computer system. Of course there will be meta-data for them which,
> from out point of view, is data and should be properly structured.
> But how do we say, in our system, that a picture is linked to a poem
> because it is claimed to be the inspiration for the poem. This pretty
> much has to be a generic link because you can't predict what link
> descriptions people will want, so a schema handling this is going to
> look like EAV, isn't it?

Not at all. You're describing three tables: poems, pictures, and inspirations. Poems and Pictures are N:M, and inspirations hold keys to both.

But you know I'm going to shoot down every proposal, right? Because the answer is always per above, unravelling the EAV rows to define a table structure. :-)

> even in an organisation trying do do it properly, and with
> someone in charge of logical design who was both competent and
> reasonable, I have seen a case of 2 - maybe it was just out of habit,
> but still...

I'll give you a better example from those who we have every reason to believe know better. Microsoft SQL Server supports "extended properties" on database objects, cf.
"sp_addextendedproperty" (https://msdn.microsoft.com/en-us/library/ms180047.aspx). Note the _at_value parameter is "sql_variant". These properties are nothing more nor less than a list of name-value pairs associated with the object. I say "associated" advisedly; they aren't part of the schema defintion insofar as they do *not* appear as columns in the catalog.

Many times over the years I wished for a data dictionary that was an intrinsic part of the DBMS. I wanted to define provenance, ownership, administrative responsibility, application use, textual (or mathematical) description, etc. I wanted to be able to verify completeness and enforce dictionary requirements as part of database administration, to be able to say, e.g., how many columns lacked a description. I thought it would be useful for developers to be able to scan all the descriptions for, say, "trade date" without regard to the column's name.

SQL Server's extended properties seem at first blush to be the obvious place to house a data dictionary. Except that one quickly notices they are pointlessly orthogonal to the schema. Instead of a powerful way to extend the system catalog to add meaning and enforce policy, we get a list of nearly invisible name-value pairs. A repeating group right there in the catalog. Brilliant. Thanks.

I like to imagine the feature-design group at Microsoft discussing my obvious proposal and deciding on your reason #1: no, we can't have ordinary users mucking with the system catalog. Too powerful, could have unforeseen consequences. Just let them attach a little EAV to each thing. It satisfies the request to be able to extend the catalog, but not in a way that could interfere with its proper use. And, you know, it's like totally flexible.

--jkl Received on Mon Feb 09 2015 - 00:59:02 CET

Original text of this message