Re: Indexes and Logical design

From: David Cressey <david.cressey_at_earthlink.net>
Date: Thu, 15 Sep 2005 15:00:12 GMT
Message-ID: <00gWe.12709$FW1.7772_at_newsread3.news.atl.earthlink.net>


"Roy Hann" <specially_at_processed.almost.meat> wrote in message news:YPWdnUoyif86rLTeRVnyvg_at_pipex.net...
> "David Cressey" <david.cressey_at_earthlink.net> wrote in message
> news:Qq9We.12635$FW1.9976_at_newsread3.news.atl.earthlink.net...
> >
> > "Roy Hann" <specially_at_processed.almost.meat> wrote in message
> > news:UqidndyZjK3RYbrenZ2dnUVZ8qqdnZ2d_at_pipex.net...
>
> > Due to what might be a historical accident, the model of the database
> made
> > visible to the programmers has been called the "logical model", in my
> > experience.
>
> Actually that is something I have been fretting about recently. In the
real
> (i.e. SQL) world there are actually two physical models: the one the
> programmers are required to see (for whatever reason), and the ones they
are
> required (or asked) to pretend they don't see.
>

Good point. For this reason, I've recently turned away from calling the SQL model "logical" at all, and instead saying that the tables are the reflection of the logical model in the implementation. That leaves open the question of what the indexes really are. If I have to say something that makes sense to me, I'm going to say that the indexes reflect the transaction model in the implementation. However, so do things like table partitions.

At this point, we really have to get down to "what's wrong with SQL". It seems as though the relational data model is capable of greater logical data independence that is possible for programmers who see the RDM through the lens of SQL.
But I'll defer to others on that question.

> The one they are required to see might have extra tables that were
> introduced when vertical fragmentation is the only solution to a
performance
> problem. Or it might have fewer tables than the logical model, because
> (rightly or wrongly) the design has been denormalized (again for
performance
> reasons). There are innumerable other reasons why extra tables are
> introduced, and others merged. But the tables in this model have to be
> known and explicitly manipulated by the SQL the programmers write, or at
> least be capable of it.
>

> The other physical model (the one the programmers are supposed to ignore)
is
> the one in which the secondary indices are exposed, and maybe the base

I'm assuming that by "secondary indexes" you mean the ones that don't emerge directly from the definition of primary keys.   And this reminds me of a subtle difference between Rdb/VMS and Oracle RDBMS that has tripped up more than one Oracle RDBMS expert who has been asked to work on Rdb/VMS. When you declare a primary key in Oracle, Oracle makes an index for you. In Rdb/VMS it does not (unless that's been added in recent versions). The consequence is that you end up with some tables with no indexes that the designer never would have left that way intentionally. But all this is a product specific digression.

> OK so far. Or at least I choose not to argue about it.

That's EXACTLY my position. I choose not to argue about it. I choose not to argue about it with the logical thinkers in this forum. Have it your way. I also choose not to argue about it at the places where I was called in to lend my expertise. Have it your way. And if the programmers at client sites, and the logical thinkers in here desperately need an argument clinic, why I'll simply introduce them o each other!

>
> > Besides, knowing the indexes helps the programmers
> > to better queries. [snip]
>
> This does bother me. What kind of rubbish DBMS requires the programmers
to
> be aware of an index in order to write "better queries? Assuming we're
> talking about SQL here, the only way knowledge of an index can help them
> write better code is by involving the index explicitly in a FROM clause.

It's even worse than you imagine. I can't tell you the number of times that a programmer has approached me with a query that has a "performance problem", they wanted help with. It turns out that the query has a logical error in it, and the logical error causes it to deliver wrong results 10 to 50 times more slowly than it should. When you fix the logical error, the "performance problem" mystically disappears.

Why programmers want to speed up a query that's giving wrong results is a continuing mystery to me, and I don't epxect to ever understand it. But, by golly, show them allthe indexes, and teach them how to tell which indexes the optimizer invokes,
and they find a lot more of their own bugs than they did before, in the course of making their programs perform well.

Explaining this behavior to logical htinkers is like Mr. Spock explaining earth to his fellow Vulcans.

> what happens when the DBA decides the index is not efficient and drops it?
> What happens when the optimal execution strategy would be to ignore the
> index, but there it is, hard-coded into the SQL?
>

> But perhaps I am misunderstanding you. Perhaps when you say "knowing the
> indexes", what you mean is "knowing the alternate physical keys"?

See above.

> The
> alternate keys definitely belong in the first of my physical models above,
> not with the indexes in the second. By referring to one of these in the
> WHERE clause the programmer certainly could write better code. And if the
> alternate keys happen to coincide with candidate keys (as the easily
could)
> then they already appear in the logical model. (Of course an alternate
> physical key need not be a candidate key--it might be deliberately
> ambiguous, to encourage physical clustering of related data.)
>

This all depends on how smart the optimizer is. Back in 1994, the Rdb optimizer was a LOT smarter than the Oracle optimizer. I got spoiled by the Rdb optimizer. That's a whole other discussion.

> I really am not happy about programmers calling my first physical model a
> "logical model", but what they call it doesn't change what it is. And if
> they call it the wrong thing, perhaps that's because in SQL there are two
> physical models but only one name. They just *can't* make clear which one
> they mean.

Maybe the best distinction (in the terminology) is the one Data Architect makes: between "Schema Objects" and "Database Objects". But I'm not sure.

This opens a whole other subject. If a database gets "denormalized" in the course of implementation, perhaps its good to keep a copy of original normalized model around, and in synch with whatever alterations get made to the implementation. That way, someone who wants a REAL logical view of the relations can get such a view without the obstructions imposed by SQL and by real world considerations. Is it worth the cost? I don't know. Received on Thu Sep 15 2005 - 17:00:12 CEST

Original text of this message