Re: Object Oriented Wrapper classes to SQL statements

From: Clifford Heath <cjh_nospam_at_osa.com.au>
Date: Fri, 14 Dec 2001 09:52:59 +1100
Message-ID: <3C19314B.A1B6DD89_at_osa.com.au>


David Cressey wrote:
> I think, although I could be wrong, that heuristics played a vital role in
> the optimizer's behavior. First off, there were
> two parts to the optimizer, the strategy generator, and the strategy
> evaluator. I believe the strategy generator used heuristics to come up with
> strategies that might be worth evaluating. And of course the strategy
> evaluator used clever ways of estimating what the probable cost of a
> strategy might be.

This is just an application of what AI folk call "heuristic search", and it's all related to game trees etc - like playing chess! There's a whole theory about how aggressive your heuristic can be and still come up with a near-optimal solution; if you go too aggressive, how far from optimal you're likely to be; what are the effects of cutting short the search (when is a plan "good enough")? Etc...

> The value of using an index might depend on something
> called the "fan out factor", as well as the underlying cardinalities.

When I implemented B-Trees with key value compression, it occurred to me that the number of bytes compressed from the prefix of a key value indicates at which byte the key value differs from its predecessor (duh), and that can be used as an indication of the likely selectivity of a prefix match of a given length. In other words, you can easily keep count of how many key values differ in each of their first n bytes, for many values of n. This might be exactly what the DBMSs do, I don't know, but it seems that this statistics-gathering isn't always so well integrated.

SQL Server, given a "like 'abc%'" clause, seems to have no idea of how selective that should be. In particular, with a stored procedure having a "like _at_param", it should record several alternate query plans depending on whether _at_param seems likely to select one, some, many or all rows. Or at least record the likely cost of the single stored plan in those cases, so it can decide whether to re-optimise the plan for this invocation.

> I wonder if the people who are building new RDBMS systems would benefit
> from such a comparison.

Are there any such people? I think that DBMS technology (relational and object-based) has a long way to go before we generate truly smart DBs. So many many ideas on how to make them smarter have been dangling in the breeze for ten or twenty years without being adopted...

Why do we have to declare indexes - can't the DBMS work out what indices are needed? Why can't horizontal or vertical clustering in inheritance hierarchies be automatically decided? In fact, why can't the purported clustering advantages of OBDMS (due mainly to total denormalisation) be automatically implemented by the DBMS, dynamically, based on observations of the transaction and query load, etc, etc, I think we have a long way to go. I want to see an RDBMS that takes a conceptual schema and dynamically determines an ongoing optimal physical mapping.

--
Clifford Heath, ManageSoft Corporation
Received on Thu Dec 13 2001 - 23:52:59 CET

Original text of this message