Re: organizing tables?

From: Paul Vernon <paul.vernon_at_ukk.ibmm.comm>
Date: Tue, 22 Jul 2003 19:55:38 +0100
Message-ID: <bfk1g8$l30$1_at_gazette.almaden.ibm.com>


"Bob Badour" <bbadour_at_golden.net> wrote in message news:8YcTa.1150$K16.195515439_at_mantis.golden.net...
> > Switching our argument for a moment to the task of deciding wheter a given
> > update can be gurenteed to work, this paper would help make such a
decision
> > for say a INSERT INTO .. SELECT FROM update, where the target table has a
> > uniuqe constraint.
> >
> > Now as thier necessary and sufficent condition cannot always be tested
> > effiecntly, there will be some update statements will always work, but the
> > Saftey System will not let us spend the CPU to find that out (even if just
1
> > more cycle over the limit would have been enougth to make that decsion. -
> > that is the halting problem).
>
> Except that in the next release, due to an improved physical implementation,
> the dbms might reduce the CPU cost and suddenly the 'safety system' will
> change the logical outcome. The physical implementation must not affect the
> logical outcome.

Right, so in V4 of your dbms, a query always times out after 1 hour. In V5 your query now always returns in under the timeout limit. If that's a physical implementation affecting the logical outcome, then bring it on.

In V4 the query estimator part of the 'safety system' tells you your query will definitely breach the timeout limit and so doesn't even let it run, while in V5 the estimator gives you the green light for your query and runs it (and maybe or maybe not getting in under the limit) If that's a physical implementation affecting the logical outcome, then bring it on baby.

> Huh? A user must prepare to have any update fail and to react appropriately
> to the failure, because a logical change to the database might change an
> update guaranteed to succeed into an update with the potential to fail.

Logical changes (i.e. schema alterations) is for schema evolution to handle. Obviously this assumes that all 'repeatable' updates (and queries) are known the the database so that any proposed alteration of the schema can be verified. Obviously if there is no constraint to the effect that all schema changes cannot break an existing users query, then he has to accept that potential for fail. However a good practice is to give users advance notice of such changes...

> Rejecting updates that violate the predicate of the database is inherent to
> the model.

Only for individual updates against the current value of the database and only after (or during) the physical update.

I want to be able to classify the following updates into "Guaranteed to Work", "Guaranteed to Fail" or "Can't Say"

    A fixed (non-paramertised) update against *any* database value of a given database schema

    A parameterised update against *any* database value of a given database schema (for *any* parameter values)

and maybe

    A parameterised update against the current database value (for *any* parameter values)

plus maybe other conditions (e.g. take consideration of known database values that can only be changed by the current user - i.e. locks...)

> If your 'safety system' allows the user to proceed with 'unsafe' queries, I
> do not see what function it performs.

It allows you to go ahead with "Guaranteed to Work" and "Can't Say" queries/updates.

The dbms does not allow subset database views where hidden constraints exist that are not "Guaranteed to be Satisifed" for all possible updates for users of the subset database view.

If a user can see a constraint on a relvar, then it's OK (in general) for them to accept that an update might fail.
If a user can't see any constraint on a relvar, then it's not OK for them to accept that an update might fail IMO.

> Perhaps you are suggesting the dbms
> include a tool like 'lint' for C. I have no objection to such a dba tool,
> but it has nothing to do with namespaces or with the logical data model.

Remind me what lint does.

> > His set
> > of updates might switch in status from 'unproven' to 'proven', if he is
lucky,
> > but there would be no behaviour change (except, that is, for the speed of
> > update, if an update is guaranteed to work, the dbms can accept it and
return
> > control immediately to the user, making the physical update in the
background.
> > If the update might fail control can only be returned at the end of the
> > physical update).
>
> What would it do other than return control to the user?

It always returns, just not until the dbms can guarantee that it is performed (i.e. Logged). If an update is proven, you only have to log the update statement itself, not all the new tuple values. If DELETE FROM PARTS WHERE STATUS > 4 is proven to work (within some time limits), then why bother logging all the deleted tuples?

> > > One may come up with algorithms that transform some constraints such
that
> > > they no longer mention hidden relvars, but the general case is a hard
> > > problem. In fact, I suspect it is fairly easy to prove that one cannot
> > > describe some simple constraints without explicit reference to a
specific
> > > relvar.
> >
> > Quite possibly. Maybe the question is, if I ban such database views
> (subsets),
> > does this restrict users too much?
>
> I strongly suspect that identifying such views is also a hard problem.

Agreed. Does not mess my scheme up though. Just limits the number of database views that we can guarantee are valid.

> > Alternatively, do users value being able to
> > treat their database as absolutely a variable more than they value
flexibility
> > in the database views that can be constructed?
>
> Plenty of users would ask us to remove all those inconvenient constraints to
> variability, but that doesn't mean we should remove them.

They may do. Hard to say from this distance, but agreed; users don't always know what is best for them (even when they are willing to pay cash).

> > However, I suspect that the question really is: can we even have a self
> > consistent model unless every database view/subset IS a variable?
>
> Since every database view is a variable, I do not understand the importance
> of the question.

I have what seems to be a controversial definition of what a variable is:

  A variable is something that when set to a new value, that value is the one it gets set to.

  A variable is something that can be set to any value that is not constrained from not having.

  A variable is something that if set to a value that it is not constrained from not having, then that value is the one it gets set to.

A database that exhibits side-effects, or is a database view with hidden constraints is not a variable by my definition.

> > >Even if the security function prevents one from spelling out the
> > > constraint to a user, the dbms can still enforce it.
> >
> > I know, but I just don't think we should allow such database views. At
least
> > not until we're convinced that we absolutely have to.
>
> I am already convinced and will rely on the judgement of a designer to make
> the final decision. I think the onus is on you to demonstrate that your
> proposal solves a real problem and causes no harm.

Humm, yes what problem was I solving again? ;-) Lets see: Logical data indpendence, view updating, the idea of a database, users views of a database, security via data hidding, schema evolution, database program proving. You name it, this is (at least) a part of the solution.

[snip]
> > So if it can't decide (in some reasonable amount of time), then it should
> > reject the proposed database view.
>
> No, it must accept it. Rejecting it means a change in the implementation of
> the dbms will alter the logical outcome, and I reject any such proposal on
> principle: the principle of physical independence, to be exact.

I agree with physical independence (at least to the degree allowed by my first comments in this post).

The change in the outcome is

    before:     we could not decide if the view was safe - view not created
    after:        we now have the reourses/expertised to decide that the view
is safe - view is created

That no more breaks physical independence, that a faster processor changing the value of the statement.

    Query A runs in less than 1 hour

    before: false
    after: true

A different outcome, but not something that breaks physical independence.

For me 'logical practicality' dictates this once we understand the truths of computational complexity theory. Queries take time to run, proofs take time to run. Implementations can get better at both. Why say one is improvable but the other is not?

> > Implementations can then compete on the complexity of cases where they can
> > prove that no constraints are hidden in a database view. Simple.
>
> Physical implementations must not alter the logical outcomes. They can
> compete on performance. Your proposal sucks by any rational measure.

How does it harm a user that after an upgrade they can now define a database subset that they were not allowed to do before the upgrade?

Remember I'm not saying that 'normal' database updates start working that used to fail (short of those now coming in under resource limits whereas they before got canned), only database view definitions.

I see no problems. You seem to be denying that TIME exists (or at least that we're not allowed time limits because that breaks physical independence!!).

OTTH, if you are still not convinced, then why not treat a dbms upgrade like a schema change. I.e. revalidate all existing queries/updates to check for any now made invalid. I can't see why we should need that here. Can you give me an example where the positive change in ability to create a database view would impact an application?

> > After the upgrade, the dbms might now
> > allow the creation of database views that it did not allow before when it
> > couldn't
> > prove that no constraints were hidden.
>
> In other words, a change in the physical implementation changes what one can
> express a the logical level. Again, I reject that on principle, and I am
> surprised you would propose it in the first place.

I am just being 'logically practical'. Yours is a position that is only valid in the abstract world of maths where *time* can be ignored. Put a quantum computer on every desk and maybe I'll change my mind, but until then...

Regards
Paul Vernon
Business Intelligence, IBM Global Services Received on Tue Jul 22 2003 - 20:55:38 CEST

Original text of this message