Re: One Ring to Bind Them

From: Eric Kaun <ekaun_at_prodigy.net>
Date: Mon, 19 Jul 2004 17:11:59 GMT
Message-ID: <zhTKc.2358$4L7.190_at_newssvr33.news.prodigy.com>


Anthony W. Youngman wrote:
> In message <40EDEBB2.8060707_at_prodigy.net>, Eric Kaun <ekaun_at_prodigy.net>
> writes

>> No formalization is needed; metadata is data. It's just data with a 
>> different domain, but there's no reason to think it obeys different 
>> laws or requires different structure.

> Yes - but metadata can be used by the database while data can't.

Not quite true - if a DBMS supports a user-extensible typing system, then it can "use" those types without understanding anything about them. This is where SQL and so many other DBMSs completely fall down: 1. by requiring the user to rely solely on the types already provided by the vendor
2. (the case with SQL today) making the type system so baroque as to be useless
3. (also the case with SQL and JDBC/ODBC/etc. today) making the embedding of data in programs so difficult as to force a compromise back to the lowest-common-denominator primitives again.

Or some combination of the three.

>> I'm confused. How does placing it in an RDBMS make it no longer 
>> metadata? The system catalog (metadata - data about your data) can be 
>> represented relationally (or as XML if you're feeling masochistic).

>
> Because you've converted it to data! And the system catalog doesn't let
> you store ALL metadata AS metadata. It will only let you store metadata
> it recognises.

Of course. So are you saying that 1) lists should be commonly-understood "metadata", or 2) that Pick/MV let you extend the metadata recognized by the DBMS?

If you're saying #1, then I could argue as well for other types (and would say relation-valued attributes are far more powerful and useful than lists). If you're saying #2, then again the typing mechanism would help, though user-defined functions and views can aid somewhat.

What types of metadata does the DBMS need to recognize?

>> How does the RDBMS "understand" no meaning in it? And how do other 
>> DBMSs "understand" meaning? The constraints and relation definitions 
>> of the metadata are as much meaning as the RDBMS can have.

>
> In other words, an RDBMS is incomplete. :-)

Heh.

>>
>>> The ordering in a list is metadata. Convert that into a set to put 
>>> into  an rdbms and ORDER is now just a meaningless (as far as the db 
>>> engine is  concerned) bit of data.
>>
>> No, in that case order is gone, vanished. If you don't state it, the 
>> RDBMS doesn't know about it. On the other hand, it doesn't assume 
>> anything either. Order is easily represented, and again if you're 
>> masochistic, you can store a list-typed attribute.

>
> But if you DO state it, the RDBMS doesn't know anything about it,
> either! What do you mean by a "list-typed attribute"? Do you mean a
> column that contains ordering information?

No, I meant a single attribute that stores a list, much like in MV. The difference is that it's not "first order" to the database; user-defined types are orthogonal to relations.

So what does a Pick DB "know" that the RDBMS wouldn't? And how do you tell it?

>>> That's where MV and OO fundamentally differ. They try to *avoid* 
>>> converting metadata to data, so that the db engine can be intelligent 
>>> and take advantage of it to optimise things.

I've read this several times, and still don't know what you mean. How does OO avoid converting metadata to data? I'd say you're wrong; in Java you can use classes like Class, Constructor, Method, etc. to do "higher-order" operations, so the metadata is effectively converted to the same sorts of things you write your programs in (i.e. classes). The new JDK1.5 metadata will simply expand this; the metadata will still be accessible as "data".

Other languages do similar things (albeit in a much more elegant way than Java).

> The whole point of a database is it STORES data, it does *not*
> UNDERSTAND data. By converting metadata into data, you are now forcing
> "intelligence" into the application.

No, you're forcing intelligence [sic] into the RDBMS. You're telling it what's allowed and what's not. What other meaning of "data definition" is there?

My ongoing gripe about declaration vs. procedure is based on descriptions of meaning. With procedural code, the meaning is implicit; if you're lucky, the code was written in a clear way, and you can see the meaning. With declarative, you don't guess (nor do you have to implement in an algorithmic sense). The language/engine/DBMS does the monkey work for you.

> A relational database thinks in terms of sets. In order to have a list,
> you need to create extra DATA, and the database itself can't take
> advantage of it, because it doesn't understand it.

Right, it understands relations and values; the types of those values are something different. But what exactly does it matter? You seem to be implying that lists are so useful as to be first-class citizens to the DBMS, and I say they're not; I'd prefer sets, for one thing (and no, from that standpoint, RDBMSs don't "do" sets either). Or even bags. Or perhaps relations themselves. Lists are in so many cases poor substitutes for a real data structure - as the presence of "they-gotta-be-correlated" attributes in Pick files (e.g. QUANTITY list-valued attribute and PRODUCT list-value-attribue to store line item data for an order - better not lose the order or an item in one, or you're hosed).

> DATA is what is stored IN a database. METADATA is data that is USED BY
> the database. There *is* a difference, and the difference is crucial.
> The more metadata you can leave as metadata, rather than convert to
> data, the more information the database has available to it to optimise.

That's ignoring what you mentioned earlier - the metadata that the DBMS can understand. Are you saying that the metadata needs to be left in so that later on, when the DBMS is extended in some way, it can now comprehend what previously meant nothing to it?

And again, the concept of metadata (at least in the discussion at hand) only has meaning in the context of datatypes. You seem to be saying that because lists are Very Important Things, that the DBMS must "understand" them as metadata, in much the same way as it understands files and fields. I'm saying that's not needed, because you can define a List type   which the RDBMS can manipulate like any other type you want to define, though if you want the benefit of relational manipulation (a good thing which would eliminate, for example, many many lines of code), you must express the data relationally.

> How does an RDBMS optimise access to a list, if it doesn't have any
> understanding of what a list is?

So it's an optimization question? In short, it wouldn't - no more than it would optimize access to an Order type I've defined (including line items). Then again, if it were a relation-valued attribute, it could optimize that with the same machinery with which it optimizes the rest of the relations.

But again, the main point here is what's important and what's not. Lists and their status as first-class DBMS citizens seems to be the point in question.

> That's the point of storing metadata *as* *metadata*. Because the
> database understands it.

It can only understand what it understands. What other types besides Lists need to "be" metadata?

  • erk
Received on Mon Jul 19 2004 - 19:11:59 CEST

Original text of this message