Re: Attribute name prefixes, domains, joins, ISO 11179

From: <archon_at_quantumfire.com>
Date: 27 Nov 2005 17:18:54 -0800
Message-ID: <1133140734.530316.58710_at_g44g2000cwa.googlegroups.com>


mAsterdam wrote:

> Better, yes, but still bad. Which calendar do you use?
> With dates the ownership is more problematic than with codes.
> Who decided that you would be the authority?

Can't I decide that I am the authority, and use whichever calender I feel like? After all, the attribute is specific to the system (an audit of when the row was added), and as the creator of the hypothetical system, I would be the only sensible authority. Taking my explanation of what I meant by "domain" originally, being the underlying machine type plus extra database constraints, the probability is that very few domains that I would use in my system would be wholly defined by external authorities: the constraints are, for the majority of attributes, going to be specific to my system, and so the domains are too. Or perhaps that's misguided, and the user defined type name should reflect only the mathematical/predefined domain constraints, and only the attribute name itself should reflect system specific constraints?

> Let's assume that those issues are settled:
> Which aspects of the domain do you/we want
> to see reflected in the domain name?
> First think about that and then see if you
> can fit those into the scheme of things promoted
> by the naming standard.

If we are declaring new user defined types, I think I would want the domain name to include at least a representation term as per 11179, and probably a class term. Possibly not a qualification term, since qualification terms by definition do not change the fundamental meaning. But to be honest, I don't really care about domain names for the purpose of this discussion, I'm far more interested in column names.

> > What about if they
> > are the same domain and the attributes have the same name?
>
> ==table.column== ==type==
> Loan.Intrestrate Percentage
> Deposit.Intrestrate Percentage

I assume that no meaningful join of Loan and Deposit can be made on InterestRate, but if that's the case, then are these attribute names misleading? I believe the "natural join" operation, if implemented, would attempt to join these two tables on the given attribute since they are of the same name and type, but there's certainly nothing natural about that join.

> I'ld really like it more if you'ld
> come up with your own example.

Fair enough. How about a table of widgets which can have 2 different paints, one the top half and one on the bottom half:

Paints(
 PaintCode PK,
 PaintIsWaterproof,
 PaintColourName,
 /* etc */)

Widgets(
 WidgetNbr PK,
 TopHalfPaintCode FK references Paints
 BottomHalfPaintCode FK references Paints)

"Code" is a representative term. In my data dictionary I would describe it as a short (less than, say, 8 characters) character string which does not have a meaningful order when sorted but which acts as a unique identifier.
"PaintIsWaterproof" is an interesting one, it does not have an explicit representative term. I wonder if it is obvious that this is a boolean value? The "IsWaterproof" part as a whole is the 11179 property term. Harking back for a second, perhaps "Requesting" and "Approving" would also be better described as property terms, the wiki might have it wrong there. In any case, the same arguments about "dropping the term is allowed (but not always required!) in order to find relationshps" still applies.
"PaintColourName" is an easy one, its a simple Class/Property/Representation breakdown, where "name" is the sort of character data one would normally expect for a name.

In widgets, WidgetNbr is Class + Representation. Nbr suggets alphanumeric data used for unique identification (explained more below) The interesting ones are tophalf... and bottomhalf... These are property components of the respective attribute names. We can drop them when looking for relationships. I have not included the word "widget" in the attribute names because that would suggest paint codes are part of the Widget class of things, where they are actually part of the Paint class of things. Hence the argument for FKs not getting the class term. And thinking about that a bit more, I guess it follows from that reasoning that all other attributes *should* get the class term as a prefix, regardless of how redundant it may seem (or how much extra typing it requires ;) But this gets back to my original question, and I can't say I'm convinced either way on the matter.

To be explicit about this issue of redundancy: in, say, sql server, the fully qualified name of, say, the WidgetNbr attribute would be something along the lines of...

<server name>.<database name>.Widgets.WidgetNbr

So it seems we could just as well call it

<server name>.<database name>.Widgets.Nbr

and not lose any information. But I'm not so sure about that. The fully qualified name tells us that Nbr is an attribute of the WIdgets table, yes. But that does not necessarily mean that it is "owned" by the Widgets class of thing! We only know that it is because Widgets is also part of the attribute name. This tells us that you cannot have a WidgetNbr without a Widget. Contrast this with, say, TopHalfPaintCode, where the FQN would be

<server name>.<database name>.Widgets.TopHalfPaintCode

Again, this tells us only that the column exists on the widgets table. To infer from this that you cannot have a PaintCode without a Widget would be incorrect. A PaintCode just happens to be an attribute that a widget can have, and it has two of them. You can certainly have PaintCodes even if you dont have Widgets, there they are in the Paints table. In fact, if we were to include "Widget" here, we would have "WidgetTopHalfPaintCode", which would contradictingly contain two different class terms ("Widget" and "Paint").

> Skipping '_Number' does not harm this interpretation:
> Requesting_Employee would still be interpreted as 'the
> Employee who requested the workorder', but - moving away
> from semantics for now - the Employee table could have
> multiple candidate keys so '_Number' might be necessary
> to disambiguate.
>
> This makes _Number a qualifier term in the sense that it is
> "necessary to uniquely identify a data element"
> (ISO/IEC 11179-5:1995 (E) p 8 under 'qualifier term').

I'm sure number is a representation term in this example. It gives us some idea of what kind of values will be in the column. I would interpret a "number" attribute to be an identifying attribute, so either a PK, CK or FK column, with alphanumeric data on which math operations are invalid, but which can be sorted in a meaningful order. Invoice number, contract number, etc. Now, this is only my gut feeling, to be certain of the intended meaning we could check the data dictionary. But even without a data dictionary, representative terms are enough to give a hint and most people will come up with reasonably similar notions given the same representative term. Or at least, not entirely dissimilar notions. Nobody is going to assume that a column with a representation term "number" is storing date values.

> > I have to say, ISO 11179 is not very clear about one point, which was a
> > source of confusion for me. To quote the wiki entry on the WorkOrders
> > example
> >
> > "For Requirements_Text, the full name (i.e., the name that goes in the
> > registry, or data dictionary) is WorkOrder_Requirements_Text; the Object
> > part is omitted because it is declared in the WorkOrders table".
> >
> > Given this rule, why can't we call "WorkOrder_Number" simply "Number"?
>
> Given this rule we can.

But does that make sense? Given that number is usually indicative of a key attribute, and if we are using such keys in a system, you'd end up with a whole bunch of attributes in different relvars all called just "number". How would you know which joins were meaningful? Received on Mon Nov 28 2005 - 02:18:54 CET

Original text of this message