Re: organizing tables?

From: Paul Vernon <paul.vernon_at_ukk.ibmm.comm>
Date: Wed, 23 Jul 2003 14:23:48 +0100
Message-ID: <bfm2e5$680$1_at_gazette.almaden.ibm.com>


Bob. I think we're starting to talk past each other a little. Not too surprising I think, as the framework that I'm working with here is not easy to construct clearly over a series of newsgroup posts. I'm doubt many others would have followed the journey any better though.

Some selected points however, as this is still interesting.

> you are assuming it will take a long time to evaluate the query, which is not
> necessarily true.

Agreed, but it is my position that all queries and updates must be OKed by the 'saftey system'.
OLTP queries and updates will presumably mostly all sail right though, but they still need to be checked.

> Second, you are ignoring what I said previously that your
> proposed safety system would have to decide quickly whether to abort if it
> is to take the decision out of my hands.

The rules about aborting don't have to be simple. Lots of policies could be defined. A time limit might cause a question to be sent to the user asking if a query is to be aborted or left running. That a user may or may not have a quota, or some max resource usage is again a policy matter. This is all detail.

What is not policy decision however, is that the dbms must have the ability to support such rules..

> As a designer, I might decide to support
> only partially an old version of an application. If any verification utility
> exists, it should warn me about behavioral changes, but it must not prevent
> me from instituting them.

Again, that is a policy decision (or more correctly, depends on the constraints on the possible schema changes allowed). If a given designer is not the 'super-user', then he may not be allowed to break some existing applications. What is not policy decision is that the dbms must have the ability to add such constraints on schema changes.

> > Obviously if there is no constraint to the effect that all schema
> > changes cannot break an existing users query, then he has to accept that
> > potential for fail. However a good practice is to give users advance notice of
> > such changes...
>
> Giving users advance notice is irrelevant to a discussion of namespaces and
> logical database partitions. Whether advance notice is given will depend on
> the urgency of the change and a host of other business requirements.

More policy.
The dbms must have the ability to implement various policies. E.g. that all schema changes to a subset of the databases must be declared to the database at least 24 hours in advance of implementation, and that this rule can only be overridden by MasterUserB, and then only after the insertion of a authorisation record by one of these jobs roles: CEO, COO, CFO....
Again, such policies are just constraints. Albeit constraints that reference USERs, TIME and (changes to) the database Schema.

> > > Rejecting updates that violate the predicate of the database is inherent
> > > to the model.
> >
> > Only for individual updates against the current value of the database and
> > only after (or during) the physical update.
>
> Huh? Constraints are logical not physical.

Agreed. My point was that implementations often only tell you that a update fails once they get round to physically doing the update.

Logically it's the difference between rejecting an update based only on the statement and the schema, and rejecting an update based on the statement, the schema and the current value of the database. I guess I should have stated it that way.

> > I want to be able to classify the following updates into "Guaranteed to Work",
> > "Guaranteed to Fail" or "Can't Say"
>
> To what purpose? The general cost of doing so is prohibitive; although, the
> dbms might optimize away many constraint evaluations as unecessary.

That's one answer.
The main purpose is simply to prove the correctness of (some) relational expressions and updates (i.e. relational programs)

> I disagree. Side-effects and hidden constraints simply constrain the values
> a variable can become.

So side-effects and hidden constraints are constraints. Sounds good to me. Let's indeed make them constraints then.

> That does break physical independence. If I develop and test a design on V5
> software and we move to a more powerful computing platform where V4 has been
> ported but V5 has not yet been released, I don't want my design to fail for
> no valid logical reason. We are talking about the very essence of physical
> independence.

Humm. OK the views have been validated on V5, so that they will be OK on V4, even if V4 itself cannot make that validation. So therefore we might want to allow an 'trusted override' for database views in the dbms. Maybe we would want to go so far as saying a given database view must have a signed certificate that it is valid. This certificate can be signed by any machine powerful enough (or with clever enough algorithms) to prove the validity. We could then imagine a super computer service that could perform such certification for lesser machines. In reality, maybe we do end up trusting a designer as a 'get out clause' for use 'in an emergency only', but I would want no-one to take such a trust lightly. Even 'get out clauses' are part of the model however...

> I am not denying anything about time. If I need an answer, I need an answer.
> I might prefer to have it sooner than later, but if I need it, I need it
> regardless.

I'm sorry to disillusion you, but if the answer you require will take millions of years for the dbms that you are using to compute, then you are indeed denying something about time.

Computations take time, I'm just trying to deal with it. For some computations we don't even know whether they will ever halt, I'm just trying to deal with it and not sweep the reality under the carpet.

> If I need to secure a relvar, I need to secure a relvar. If I
> need to introduce a side-effect, I need to introduce a side-effect.

If you need to break the logical model, then either the logical mode *is* broken, or you don't actually *need* to break it at all.

> Only a fully informed designer is in a position to decide
> whether a specific compromise is acceptable.

Only the logical model defines what 'compromises' are possible. If the logical model allows something, then by definition it is not really a compromise anymore. Designers can only work *within* the logical model. That quite simply, is the definition of the logical model. If a designer *has* to move outside the model, then the logical model is not complete - is not a valid logical model.

If you believe that designers have to move outside the model, then you must extend your model.

> Until the statement violates a logical constraint, the dbms has no logical reason
to reject it.

Indeed. My 'time limit' is a logical constraint. The 'safety system' is part of the logical model. All polices of that system would be explicit constraints on the database.

Thinking about it now, I see the need to explicitly represent all queries and update statements as relations in the database. The information would include the submitted statements, some USER id, the time of submission, estimated cost, resources used so far, etc. The 'safety system' could then be defined logically as constraints (and maybe 'autonomous agent actions'...) over such relations (call them 'run time' relations ??).

>
> > Yours is a position that is only valid
> > in the abstract world of maths where *time* can be ignored.
>
> I am not the one pretending we can ignore computability theory. I am the one
> saying that, because we cannot ignore time, we must accept the non-ideal
> situation where a user won't know whether an update will work until the user
> actually tries to execute it. Before the user tries to execute the specific
> update, the cost of predicting success or failure is prohibitive

The cost would not be prohibitive usually nor even often I suspect (and hope). And you can trade off the one off cost of proving success (or not) for an update against the multiple times it will actually run. The situation is exactly analogous to the way current DBMSes will cache or pre-compile the access plan for a query rather than recalculating it for ever execution of the query.

In today's terms what I'm proposing is not much more than that our query optimisers go one step further and proves the correctness of queries when it can.

> and after
> the user tries to execute it, it becomes necessary to evaluate success or
> failure regardless.

Regards
Paul Vernon
Business Intelligence, IBM Global Services Received on Wed Jul 23 2003 - 15:23:48 CEST

Original text of this message