Re: no names allowed, we serve types only

From: Kevin Kirkpatrick <kvnkrkptrck_at_gmail.com>
Date: Fri, 19 Feb 2010 08:36:58 -0800 (PST)
Message-ID: <1d5114f7-1b81-4860-9eec-dfb722975051_at_f42g2000yqn.googlegroups.com>


On Feb 18, 9:44 pm, David BL <davi..._at_iinet.net.au> wrote:
> On Feb 15, 3:01 pm, Keith H Duggar <dug..._at_alum.mit.edu> wrote:
>
>
>
> > Of course if you are dealing with crippled domain support then names
> > are essential. But envision a system with rich type support. I'm
> > asking
> > if, in that world, we even need to bother with attribute names at all?
>
> I think a rich type system is desirable.  I suggest there should be as
> many types as necessary according to exactly where it is considered
> useful to test for equality.  No more and no less.
>
> For example, it isn't particularly useful to compare a person's weight
> with a person's height, so it would be reasonable to have specialised
> types like MASS_IN_KG and LENGTH_IN_METRES.  These are like copies of
> NUMBER, but the idea here would be for them to really be distinct
> types (i.e. no implicit conversions).  I think they should be strict
> subtypes of NUMBER.   They would inherit arithmetic operators from
> NUMBER, such as the signature
>
>     NUMBER  <-- NUMBER + NUMBER
>
> which appears to defeat the purpose somewhat.  E.g.  it is possible to
> add a MASS_IN_KG to a LENGTH_IN_METRES.  However this is safe because
> it returns NUMBER which cannot be implicit downcast to a more
> specialised type.
>
> Ideally the type system would add all the appropriate specialisations
> of the operators (which for example are covariant on return type).
> E.g.   a plus operator that allows two masses to be added to give a
> mass.  Strictly speaking this is a distinct operator to the plus
> operator inherited from NUMBER, so operator overloading would be
> necessary to make this practical.
>
> A curiosity is that the super-type NUMBER doesn't play the role of a
> scalar (where by scalar I mean a dimensionless quantity).  For
> example, if one tried to define a scalar multiplication like this
>
>     MASS_IN_KG  <-- NUMBER * MASS_IN_KG
>
> one would find that any specialisation of NUMBER can serve as the
> scalar.   Instead one seems to need to define a common sub-type of all
> the specialisations of NUMBER, to serve the role of a dimensionless
> number in the type system.
>
> BTW it seems suspicious to me to ever perform equijoins on attributes
> on a type that approximates a dense set of numbers.
>
> I also think specialisations of STRING are quite useful, and happily
> the problem is much simpler.  E.g. we could define strict subtypes of
> STRING such as CITYNAME and SUPPLIERNAME.  This assumes we aren't
> interested in cases where a supplier's name happens to match a city's
> name.  i.e. we aren't using the database to answer trivia questions!
>
> These rich types will help protect the user from silly mistakes when
> constructing queries.  The user can still bypass the type checking by
> using explicit coercions, but it is expected that these are almost
> never required in practical examples.
>
> I think another important use of rich data types is to allow the DBMS
> to determine what joins are likely between attributes,  and this would
> seem important for example in the TransRelational Model by Tarin.
>
> Ok, now back to the subject of the thread.   Despite using a rich set
> of types I believe it is still necessary to use attribute names which
> serve as role names.  For example consider the external predicate
>
>     Supplier S is located in city L and dispatches products to city D.
>
> Consider furthermore that in that application it's considered useful
> to join on L=D.  In that case the attribute names L and D need to have
> the same type (e.g. CITYNAME).
>
> Introducing (alias) types named SUPPLIERLOCATIONCITY and
> SUPPLIERDISPATCHCITY doesn't seem a good idea to me.  I like to think
> of these as roles not types.
>
> Keith's suggestion seems similar to the Universal Relation idea.  In
> that case it is required that all roles across all relations be
> globally unique.  The goal is for logical independence - a user can
> write a query without needing to specify the access path in the sense
> of the explicit joins amongst the underlying relations.  That's a big
> advantage!
>
> However W.Kent in "Consequences of Assuming a Universal Relation",
> claims (amongst other things) that attribute names quickly proliferate
> and obfuscate (his words).  E.g. DATE-PROJECT-ASSIGNED-DEPT.

The more I read, the more I find myself warming up to this idea.

Dave, could your SUPPLIERLOCATIONCITY vs SUPPLIERDISPATCHCITY (and other such objections) be resolved by having two kinds of subtyping - one allowing implicit coersion and the other not - e.g.

TYPE CITYNAME SUBTYPE STRING EXPLICIT COERSION TYPE SUPPLIERLOCATIONCITY SUBTYPE CITYNAME IMPLICIT COERSION TYPE SUPPLIERDISPATCHCITY SUBTYPE CITYNAME IMPLICIT COERSION Which might be a way of allowing various types of cities to be joined, while still flagging direct comparisons of SUPPLIERLOCATIONCITY to, say, COUNTRYNAME?

Admittedly, SUPPLIERDISPATCHCITY seems quite verbose for a type name - but perhaps that's just based on us having put up with (attribute name, simple type) pairings for so long.

In fact, it's a bit surprising that Date wouldn't have pushed more in this direction himself - in his "Database in Depth (pg 24)", one begins to notice a distinct lack of elegance as he uses rich typing: PARTS = {PNO PNO, PNAME PNAME, COLOR COLOR, WEIGHT WEIGHT, CITY CHAR} (one wonders why he chose type CHAR for CITY, when CITY should only naturally, without explicit coersion, be comparable to other CITY values, not any arbitrary strings of characters)

Anyway, very intriguing idea Keith. Received on Fri Feb 19 2010 - 17:36:58 CET

Original text of this message