Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Paul Vernon <paul.vernon_at_ukk.ibmm.comm>
Date: Fri, 28 Feb 2003 18:39:44 -0000
Message-ID: <b3oagi$jna$1_at_sp15at20.hursley.ibm.com>


"Paul" <pbrazier_at_cosmos-uk.co.uk> wrote in message news:51d64140.0302280933.4ba19da0_at_posting.google.com...
> "Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message
news:<b3j115$kdq$1_at_sp15at20.hursley.ibm.com>...
> > > Maybe we need an implementation of the integer domain in DBMSs that
> > > will cover ALL integers i.e. given enough machine space any integer
> > > can be stored. So this would deal with the countably infinite in some
> > > logical sense, but we'd still have problems with the polar-cartesian
> > > mapping (which requires uncountable infinities).
> >
> > I'm with Jan. Our database logical model needs to be discrete and finite.
I
> > think all our (non abstract) types need to be finite and our model just
need
> > to get over it and deal with it.
>
> Maybe a better word would be "unbounded" rather than "infinite".
>
> Although any relvar can only have a finite number of tuples at any
> given time, there is no theoretical upper limit; the number of tuples
> is unbounded.

Every implementation on a given piece of hardware at a given time point has a limit on the number of tuples in a relvar.

I suggest that the model should recognise that fact, declare that there is an theoretical upper limit (in every given implementation) and make the value of such a limit available in some variable or other in the DBMS.

FYI DB2 is limited to 4 * 10^9 rows in a table per partition, with a max of 999 partitions, so that's 4 * 10^12, (although 'storage' might be limiting first - every row has 10 bytes of overhead so a table with a single CHAR(1) would be at least 40TBs with that many rows)

> So by analogy, if it's good enough for relvars to have an unbounded
> number of tuples,

I don't believe that it is good enough.

Lets say I want my DBMS to store in a relation as many prime numbers as it can possibly store. To do that (although I'll admit I'm not sure why I might want to) I would need to know what it's upper limit on the number of tuples in a relvar was.

Alternatively, without knowing the upper bound on the cardinality of a relvar, how do I know what INTEGER datatype is sufficient to count the tuples in any relvar? If we have bounded domains, then by this logic we also need bounded relvars.

Aside: DB2 introduced it's 64bit BIGINT mostly to count tables with > 4.3 billon rows

> why shouldn't it be OK for domains to be able to
> have an unbounded number of values? i.e. although we can only ever
> consider finite subsets of the domain, there are an infinite number of
> these subsets. The domain is never realised anyway so we are never
> going to have to deal with the infinite.

Maybe there is a good case for infinite, unrealisable domains in the database model. TTM talks about dummy types, others abstract or noninstantiable types. Or maybe Jan's suggestion of two logical models, a physical logical model and a database logical model is better (or complementary)

> > For queries, I could comprehend delaying the specification of the size of
a
> > type until the query compile time, at which point the DBMS decides upon a
> > maximum amount of storage that the query can consume, and from that calc
(plus
> > say requested response time) fixes the size of any types in the query as
> > needed. In other words I could accept a type whose upper bound was
'storage'
> > just as long as each time that type was used, a particular value for that
> > bound was fixed. What I do not accept in our database logical model is any
> > concrete type whose size is 'infinite'. Such a type is just going to
> > RAISE_ERROR (of some kind) when any use of it exceeds the storage capacity
of
> > the machine anyway, so there is no logical difference between an
'infinite'
> > type and a 'storage' type fixed at query time.
>
> I realise that physical implementation considerations come into play;
> maybe you could hint to the DBMS that your attribute will almost
> certainly be under a given value. Then for low values the DBMS would
> use efficient but less flexible storage/computation methods. But if by
> chance the value did go over the limit it could still struggle on
> (maybe warning the user that performance is affected). So the errors
> are only generated in response to physical limitations of the machine,
> not of the logical model.

That kind of thing is an implementation issue and is not to do with the model. I would agree however that a good implementation would take into consideration of the likely distribution of values in an attribute.

E.g. I'd love to be able to have a logical domain of say CHAR(800), but knowing that 99% of my values will be less than 20 characters, map it to a physical type say FIXED_VAR_CHAR(20,780).

Nonetheless my logical type would be CHAR(800). Or maybe my logical type would be CHAR('Storage'), where 'Storage' could be calculated any time I interfaced with the database and requested to use the value of 'Storage'. I guess this would mean that my database schema was logically changing each time which might be a bit tricky logically, but then time does not seem to come into the logical model much at all, so maybe there is no big issue there.

Regards
Paul Vernon
Business Intelligence, IBM Global Services Received on Fri Feb 28 2003 - 19:39:44 CET

Original text of this message