Re: One Ring to Bind Them

From: Anthony W. Youngman <wol_at_thewolery.demon.co.uk>
Date: Fri, 9 Jul 2004 23:25:13 +0100
Message-ID: <KHRs8yBJtx7AFwTe_at_thewolery.demon.co.uk>


In message <zBAEc.3447$IQ4.1222_at_attbi_s02>, Marshall Spight <mspight_at_dnai.com> writes
>"Dawn M. Wolthuis" <dwolt_at_tincat-group.comREMOVE> wrote in message
>news:cbt44u$8vd$1_at_news.netins.net...
>> "Marshall Spight" <mspight_at_dnai.com> wrote in message
>> news:nnIDc.125667$Sw.113988_at_attbi_s51...
>>
>> I'm not sure whether this answers your question as it depends on what you
>> mean by "relationship" but here is another type of relationship -- each
>> file(function/entity) requires a unique identifier for each record
>> (instance/row-ish) so that you have this relationship for a file named
>> People, for example
>>
>> People(12345)={all attributes of this person including those stored directly
>> as part of the People function and those derived via links to other
>> functions}
>
>Gotcha.
>
>
>> Another type of relationship it understands is a link placed in a "virtual
>> field" for derived data. So, even if the street address for People(12345)
>> is not part of the "base relation" (is not stored "in" People, the function
>> to link a foreign key to another file is a relationship that is understood.
>>
>
>Let me see if I understand this. You have a "file" of People and it might
>have, directly in it, a field that is a list of addresses, so we have one:many
>for People:Addresses.
>
>In another scenario, you might have a file People, and it would have
>directly in it a virtual field, whose value is a key into another file.
>The fact that it's virtual is a metadata bit, and the file being referenced
>is also metadata. Again, one:many for People:Addresses.
>
>The difference between a virtual field and a non-virtual field is
>one of implementation; the interface is the same either way. (Yes? No?)

Not quite. Yes the interface is the same, but your first example would have a PEOPLE file with an ADDRESS datafield.

The second example would have a PEOPLE file with an ADDRESS-KEY datafield and an ADDRESS virtual field. From the point of the person using the query language, they would neither know nor care that the two ADDRESS fields are fundamentally different "under the bonnet".
>
>> So, once that virtual field is defined, I can ask the database to
>>
>> List People Name Address
>
>Uh, "List" is a command, "People" is the file, and are Name and Address
>fields of the file people? (Whether virtual or not?)
>
>These files are functions because you are required to have a primary
>key, so the file is a function from <primary key domain> to
><field range>. Are you limited having a single field that is marked
>unique?
>

Integrity-wise, the only uniqueness that the database itself enforces is the primary key. Yes, this could be improved on ...
>
>> > Another example is managing data integrity in procedural application
>> > code. In RM this is considered a "stupid database trick" to quote from
>> > another thread. There are significant disadvantages to application-managed
>> > integrity rules, to the point where I do not consider it an approach
>> > worth discussing (and yes, I've used that approach in the real world.)
>> > However, it may be that this approach has lower overhead in situations
>> > where you have small development teams and single-application databases.
>>
>> I think I agree in principle that we do not want constraints in application
>> code, but would add that we don't want them stuck in the proprietary
>> database language, inaccessible to the application either.
>
>Yes, we've discussed this before, and I believe we agree that it's important
>that constaints be available to applications.
>
>
>> The odd thing is
>> that it really "seems like" the cause and effect are different -- you GET
>> smaller development teams when you use this approach and that is concerning
>> to me.
>
>I didn't quite follow this.
>

Putting constraints in the app not the database leads to smaller development teams.
>
>> Something is decidedly less expensive in terms of time for
>> maintaining and having the constraints in the same language as the rest of
>> the application just might be one of the keys to that.
>
>I'd buy that in a second. But I still want my constraints enforced (at least)
>centrally.
>

So would I :-) But I want my constraints *optional*.
>

>> > Please be specific. I am very interested in specific examples of specific
>> > operations or structures that you feel are hard to solve with RM or SQL
>> > and easy to solve with MV. I do believe there are some, but I want
>> > to know what they are better. As it stands I have a hard time evaluating
>> > the claims of the MV people, even the smart/nice ones such as you
>> > and Dawn. I'm not saying I believe, and I'm not saying I disbelieve.
>> > I just want to hear more specifics.
>>
>> But you see, I have a hard time evaluating the claims of people like me. I
>> don't have proof.
>
>I'm not asking for proof. I know you care a lot about proof, but I don't
>so much. Right now I'm more interested in hearing a lot of people's stories.
>So if you have use-cases for situations where you feel MV is better than
>the relational approach, I'm happy to hear them.
>

Well, you saw my example about the Australian breweries? Where one brewery stole a march on the rest and hammered the lot in the market place - apart from the one MV-based brewery that responded quickly enough to ride up with them?
>
>> I am very confident that I can find aspects of the
>> relational model that are not based on either mathematics or science (we've
>> had many such discussions in the past half year). I do not have any
>> scientific evidence that models other than relational have anything better
>> going for them. I have personal experience that is insufficient as proof
>> and a collection of anecdotes.
>
>Bring on the anecdotes!
>

The Witwatersrand study that said MV-based companies spent *half* the money that relational-based companies did on their databases.

The experience of MV practitioners involved in conversions from MV to relational - they *ALL* say that any company escaping with *just* a *doubling* in head count (plus the same in licence fees) has got off very lightly cost-wise.

The story I like, where consultants spent SIX MONTHS tuning a complex query so's it ran faster than the MV system it was replacing - and when they crowed to management that the new system was 10% faster than the old system they were brought down to earth with a big bang as the guy supporting the MV system pointed out that was running on an ancient P90 - the new system was a twin Xeon-800 box and surely it should be able to do better than just 10%? (Oh - and I'm prepared to bet dollars to cents that the MV query wasn't optimised AT ALL.)
>
>> I'm in search of better science on the
>> matter and a mathematical model that is as useful to the practitioner as the
>> RM.
>>
>> How do you think we could get evidence?
>
>Give me ten million dollars and 5 years and it should be no problem.
>Since I have neither, I'm willing to forego the whole proof thing.
>

Well, the first thing you'd have to do is find some way of showing that "data == tuple". It's all very well the relational model *asserting* that it is, but unless you've got some real-world conjecture that links the two, you're going to get nowhere.

Science has recently been surprised by the apparent existence of 5-quark bosons. I think investigating the relationship between "real data" and "relational tuples" (in other words, trying to formalise business analysis) might provide a few (to say the least) surprises ...
>
>> It seems to me that a class of
>> databases that advance the "older" approaches of Cache' and PICK could beat
>> today's SQL databases in a number of categories. How could I prove that
>> starting with PICK would be better than starting with SQL Server if we want
>> to provide highly scalable but relatively inexpensive and agile software
>> development environments in the future?
>
>I have serious doubts about the scalability claim, but then I have an
>extreme view of scalability which has been skewed by my workplace.
>However I can believe the agile part.
>

Anecdotally ... but there are apparently some pretty huge MV databases out there, and they haven't hit problems yet. At least, not ones attributable to the database - maybe the hardware isn't powerful enough, but relational would have hit the same problems a lot harder AND sooner.

Or redundancy, hardware scalability, what have you but all things that are external to the database.
>
>> It seems the best I can do is prove
>> that the relational model is not purely mathematics, but contains some
>> amount of religious claims.
>
>If I just stipulate that, will it help?
>
>Any time we are building a model, what we are doing is making design
>choices. It is good if these choices are consistent with good mathematics,
>but even if we completely succeed at that, it doesn't mean we are
>doing math and not design. It's always design.
>
>And there's not just one math, either. You come up with a formalism,
>and if it useful, then we rejoice. It's certainly possible for a formalism
>to be completely sound and self-consistent and utterly useless.
>

Exactly. So you see why we object when people say "relational MUST be right because it's based on mathematics". It's formal, sound, self-consistent, and ... :-)
>
>Marshall
>

Cheers,
Wol

-- 
Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999
Received on Sat Jul 10 2004 - 00:25:13 CEST

Original text of this message