Re: no names allowed, we serve types only

From: David BL <davidbl_at_iinet.net.au>
Date: Thu, 18 Feb 2010 19:44:45 -0800 (PST)
Message-ID: <8fe88800-e9f9-409c-bc30-06e136619c0d_at_s36g2000prh.googlegroups.com>


On Feb 15, 3:01 pm, Keith H Duggar <dug..._at_alum.mit.edu> wrote:

>

> Of course if you are dealing with crippled domain support then names
> are essential. But envision a system with rich type support. I'm
> asking
> if, in that world, we even need to bother with attribute names at all?

I think a rich type system is desirable. I suggest there should be as many types as necessary according to exactly where it is considered useful to test for equality. No more and no less.

For example, it isn't particularly useful to compare a person's weight with a person's height, so it would be reasonable to have specialised types like MASS_IN_KG and LENGTH_IN_METRES. These are like copies of NUMBER, but the idea here would be for them to really be distinct types (i.e. no implicit conversions). I think they should be strict subtypes of NUMBER. They would inherit arithmetic operators from NUMBER, such as the signature

    NUMBER <-- NUMBER + NUMBER

which appears to defeat the purpose somewhat. E.g. it is possible to add a MASS_IN_KG to a LENGTH_IN_METRES. However this is safe because it returns NUMBER which cannot be implicit downcast to a more specialised type.

Ideally the type system would add all the appropriate specialisations of the operators (which for example are covariant on return type). E.g. a plus operator that allows two masses to be added to give a mass. Strictly speaking this is a distinct operator to the plus operator inherited from NUMBER, so operator overloading would be necessary to make this practical.

A curiosity is that the super-type NUMBER doesn't play the role of a scalar (where by scalar I mean a dimensionless quantity). For example, if one tried to define a scalar multiplication like this

    MASS_IN_KG <-- NUMBER * MASS_IN_KG

one would find that any specialisation of NUMBER can serve as the scalar. Instead one seems to need to define a common sub-type of all the specialisations of NUMBER, to serve the role of a dimensionless number in the type system.

BTW it seems suspicious to me to ever perform equijoins on attributes on a type that approximates a dense set of numbers.

I also think specialisations of STRING are quite useful, and happily the problem is much simpler. E.g. we could define strict subtypes of STRING such as CITYNAME and SUPPLIERNAME. This assumes we aren't interested in cases where a supplier's name happens to match a city's name. i.e. we aren't using the database to answer trivia questions!

These rich types will help protect the user from silly mistakes when constructing queries. The user can still bypass the type checking by using explicit coercions, but it is expected that these are almost never required in practical examples.

I think another important use of rich data types is to allow the DBMS to determine what joins are likely between attributes, and this would seem important for example in the TransRelational Model by Tarin.

Ok, now back to the subject of the thread. Despite using a rich set of types I believe it is still necessary to use attribute names which serve as role names. For example consider the external predicate

    Supplier S is located in city L and dispatches products to city D.

Consider furthermore that in that application it's considered useful to join on L=D. In that case the attribute names L and D need to have the same type (e.g. CITYNAME).

Introducing (alias) types named SUPPLIERLOCATIONCITY and SUPPLIERDISPATCHCITY doesn't seem a good idea to me. I like to think of these as roles not types.

Keith's suggestion seems similar to the Universal Relation idea. In that case it is required that all roles across all relations be globally unique. The goal is for logical independence - a user can write a query without needing to specify the access path in the sense of the explicit joins amongst the underlying relations. That's a big advantage!

However W.Kent in "Consequences of Assuming a Universal Relation", claims (amongst other things) that attribute names quickly proliferate and obfuscate (his words). E.g. DATE-PROJECT-ASSIGNED-DEPT. Received on Fri Feb 19 2010 - 04:44:45 CET

Original text of this message