Re: Index compression vs. table compression

From: Richard Foote <richard.foote_at_bigpond.nospam.com>
Date: Sun, 23 Jan 2005 01:53:43 GMT
Message-ID: <HsDId.129984$K7.107588@news-server.bigpond.net.au>

Hi Howard,

Comments embedded.

"Howard J. Rogers" <hjr_at_dizwell.com> wrote in message news:cstu76$rra$1_at_news-02.connect.com.au...
> Thomas Kyte wrote:
> [snip]
>
>>
>> I think most people think of a single key lookup as "a lookup by key that
>> returns a single row -- a lookup".
>> Lookups are generally considered (in my experience) to return a single
>> row.
>
> Not in mine.

That Howard perhaps says more your experience than you might have intended....

Lookup tables by *definition* are (generally) small tables that contain dimension data that are used in relational databases to store attributes of fact tables. They are the tables that are referenced via Primary (or in some instances Unique) keys to determine the required attributes and are referenced (hopefully) via FK relationships from the parent tables.

So for example, if I want to know specific attribute details of an employee, I would *lookup* the employee details via the PK. I would not have a table that has multiple occurrences of the same key as it must therefore be a parent table and not a lookup table by definition.

This "definition" btw can be found in a number of the Oracle manuals (Performance Tuning, Data Warehouse, etc.).

If you use the term "lookup table" and not mean the above definition or what Tom, Jonathan and Daniel have been trying to teach you, then that's really "sloppy" writing I'm afraid.

>
>> You have a big range scan -- you have 32 rows that will be returned by
>> that, and
>> 32 rows that are "far apart" from eachother (eg: the way the data was
>> loaded
>> into a heap table would make it so that each block in this table --
>> howevever
>> many that was, would have empno 7934 on it.)

Are you seriously suggesting that this is the best you can do to get out of this one. Let's go back to your original quotes:

"how is the optimiser likely to read small, useful, lookup tables?.. er, via a FTS, probably, if they are genuinely small"

and

"A small table is always likely to be read via a FTS using CBO. Even for a single key lookup..."

Note the word *always*. Boy are these examples of "sloppy" writing....

As everyone else has been trying to teach you, if the cardinality is low enough, even for very small tables, an index access path is most efficient. In the case of a single key lookup, almost certainly so.

As Jonathan pointed out, this is a myth that you have propagated with you're text, whether you get it or not.

You have now come back with a table scan that reads over 7% of the data and say, hey look, I'm right, it's using the FTS. Note that if the table were to be 10 million rows in size, it might also be using the FTS because of the relatively high selectivity of the query.

Point is, small table lookups use an index for low cardinality queries, just like their bigger cousins, a point you certainly overlooked both times in what you said.

>>
>>
>> If you put it back to a lookup situation -- take this "key" and turn it
>> into
>> another "value" -- it reverts once again:
> >
>> ops$tkyte_at_ORA9IR2> select count(*) from emp where empno = 7934;
>> COUNT(*)
>> ----------
>> 32
>> ops$tkyte_at_ORA9IR2> update emp set empno = rownum;
>> 448 rows updated.
>> ops$tkyte_at_ORA9IR2> create index e_empno_idx on emp(empno);
>> Index created.
>> ops$tkyte_at_ORA9IR2> exec dbms_stats.gather_table_stats( user, 'EMP',
>> cascade=>true );
>> PL/SQL procedure successfully completed.
>> ops$tkyte_at_ORA9IR2> set autotrace traceonly explain
>> ops$tkyte_at_ORA9IR2> select * from emp where empno = 42;
>> Execution Plan
>> ----------------------------------------------------------
>> 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=37)
>> 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (Cost=2 Card=1
>> Bytes=37)
>> 2 1 INDEX (RANGE SCAN) OF 'E_EMPNO_IDX' (NON-UNIQUE) (Cost=1
>> Card=1)
>> ops$tkyte_at_ORA9IR2> set autotrace off
>
> Your example, it seems to me, simply demonstrates the smartness of the CBO
> in dealing with what it can determine to be searches for unique or
> nearly-unique data. I have no quarrel with that.
>
> But you are having to carefully arrange for that uniqueness to happen
> (with your update of empno to rownum), and I would suggest that is not
> likely to be generally typical.

This is a nonsense. Lookup tables have these things called *Primary Keys*. PKs are (or should be) a mandatory characteristic of a lookup table else child/fact tables will not be able to reference them via a FK. Lookup tables are primarily accessed via their PK and primarily return *one* row as a result.

In your "example", there is no PK. You Howard have manufactured a table that is by definition, not a lookup table, but a *fact* table because you have multiple occurrences of the empno. Perhaps it's a history table, perhaps it's a transaction table, but it ain't no standard lookup table. Therefore to join to this table in a sensible manner, you must join to it *with a lookup table* else you'll have a m-m relationship join (or it's a table that no self respecting relational data modeller would want to claim).

Therefore, if you want to know what department an employee works in, you join it with the dept lookup table and reference the *one* department. If you want to now know specific details of the employee (such) as a name, you join it with a lookup emp table and determine their name. Currently you have multiple occurrences of the emp attributes in your table which is generally not how you design tables in the real world, certainly not lookup tables anyway.

In short Howard, your quote is misleading and wrong in suggesting that small table lookups are *always* likely to be read via a FTS and your example is as far removed from a lookup table as can be.

>
>> it is only when you start range scanning largish amounts of data relative
>> to the
>> size of the table and the data is spread all over that the full scan
>> kicks in.
>>
>> So, it was in followup to this:
>>
>> </quote>
>>
>>>>is then a further problem: how is the optimiser likely to read small,
>>>>useful, lookup tables?.. er, via a FTS, probably, if they are genuinely
>>>>small.
>>>
>>>
>>> Not if they're being used for doing lookups, I hope.
>>
>>
>> Why?
>>
>> A small table is always likely to be read via a FTS using CBO. Even for a
>> single key lookup... </quote>
>>
>> that I said "er, well, actually no -- they are not"
>>
>>
>> a keyed lookup -- single value, single row -- full scan will in fact not
>> win
>> "probably" or "always likely"
>>
>> (opposite is true...)
>
> Then all we can say at the end of this is that your definition of a single
> key lookup is not mine.
>
> I cannot know how many rows will be returned by a query until the lookup
> has taken place.

Yes you can. In a dimension, lookup table that is standard to every relational database worth knowing, you will return 1 (or perhaps 0) rows. No more, because if you did, it would not be a lookup table because you'll be forced to perform a m-m relationship join.

> I do, however, know how many key values I am looking up.

Most people when they make a mistake pointed out to them say something akin to "Oopps, fair enough, thanks for the correction".

Most people that is ...

Cheers

Richard Received on Sat Jan 22 2005 - 19:53:43 CST