Path: text.usenetserver.com!out04a.usenetserver.com!news.usenetserver.com!in02.usenetserver.com!news.usenetserver.com!postnews.google.com!l12g2000cwl.googlegroups.com!not-for-mail
From: thomas.kyte@oracle.com
Newsgroups: comp.databases.oracle.server
Subject: Re: wildcard search and full tablescans
Date: 12 Dec 2006 06:01:40 -0800
Organization: http://groups.google.com
Lines: 89
Message-ID: <1165932099.969202.134690@l12g2000cwl.googlegroups.com>
References: <1165492532.019079.313140@l12g2000cwl.googlegroups.com>
   <1165574535.018195.240220@f1g2000cwa.googlegroups.com>
   <1165595951.445009@bubbleator.drizzle.com>
   <1165610601.833576.57370@f1g2000cwa.googlegroups.com>
   <1165891178.319140@bubbleator.drizzle.com>
NNTP-Posting-Host: 12.38.16.18
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
X-Trace: posting.google.com 1165932106 10559 127.0.0.1 (12 Dec 2006 14:01:46 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Tue, 12 Dec 2006 14:01:46 +0000 (UTC)
In-Reply-To: <1165891178.319140@bubbleator.drizzle.com>
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0,gzip(gfe),gzip(gfe)
Complaints-To: groups-abuse@google.com
Injection-Info: l12g2000cwl.googlegroups.com; posting-host=12.38.16.18;
   posting-account=sIsnLQwAAAARvYJo6PLbFuqdYUfWHtpS
Xref: usenetserver.com comp.databases.oracle.server:418993
X-Received-Date: Tue, 12 Dec 2006 09:01:46 EST (text.usenetserver.com)


DA Morgan wrote:
> Charles Hooper wrote:
>
> > Paraphrasing from "Expert Oracle Database Architecture" by Tom Kyte:
> > B*Tree indexes should be used only when retrieving a small portion of
> > the rows in a table - less than 20% of the rows.
>
> Oh I know Tom wrote that and I suspect he regrets having done so. Not
> because it isn't often correct ... but because often it is not correct.
>

Not at all - as was said "paraphrase" which could also be equated with
"out of context"

This was in a section where I was talking about indexes in general -
and I actually used "somewhere between 1 to 20%" -  but caveatted it
heavily.  You might use an index to retrieve EVERY row from a table
(first rows optimization for example), but in general, you are using
indexes to retrieve a relatively small percentage of the tables row.

I never said this was a hard and fast number - never said "never use it
for more than 20%".  I don't really "regret" writing it as I did

....
Here, only the index was used to answer the query-it would not matter
now what percentage of rows we were accessing, as we would use the
index only. We can see from the plan that the underlying table was
never accessed; we simply scanned the index structure itself.

It is important to understand the difference between the two concepts.
When we have to do a TABLE ACCESS BY INDEX ROWID, we must ensure we are
accessing only a small percentage of the total blocks in the table,
which typically equates to a small percentage of the rows, or that we
need the first rows to be retrieved as fast as possible (the end user
is waiting for them impatiently). If we access too high a percentage of
the rows (larger than somewhere between 1 and 20 percent of the rows),
then it will generally take longer to access them via a B*Tree than by
just full scanning the table.

With the second type of query, where the answer is found entirely in
the index, we have a different story. We read an index block and pick
up many "rows" to process, then we go on to the next index block,
and so on-we never go to the table. There is also a fast full scan we
can perform on indexes to make this even faster in certain cases. A
fast full scan is when the database reads the index blocks in no
particular order; it just starts reading them. It is no longer using
the index as an index, but even more like a table at that point. Rows
do not come out ordered by index entries from a fast full scan.

In general, a B*Tree index would be placed on columns that we use
frequently in the predicate of a query, and we would expect some small
fraction of the data from the table to be returned or the end user
demands immediate feedback. On a thin table (i.e., a table with few or
small columns), this fraction may be very small. A query that uses this
index should expect to retrieve 2 to 3 percent or less of the rows to
be accessed in the table. On a fat table (i.e., a table with many
columns or very wide columns), this fraction might go all the way up to
20 to 25 percent of the table. This advice doesn't always seem to
make sense to everyone immediately; it is not intuitive, but it is
accurate. An index is stored sorted by index key. The index will be
accessed in sorted order by key. The blocks that are pointed to are
stored randomly in a heap. Therefore, as we read through an index to
access the table, we will perform lots of scattered, random I/O. By
"scattered," I mean that the index will tell us to read block 1,
block 1,000, block 205, block 321, block 1, block 1,032, block 1, and
so on-it won't ask us to read block 1, then block 2, and then block
3 in a consecutive manner. We will tend to read and reread blocks in a
very haphazard fashion. This single block I/O can be very slow.
...............................


> I can, for example, fiddle with optimizer_index_cost_adj and
> optimizer_index_caching and make that number come out just about
> anywhere I want.
>
> Remember Connor's demo of how he could dial in just about any hit ratio
> he wanted? Somewhere I have one that does the same thing with B*Trees.
> I'll see if I can find it.
>
> Regards
> --
> Daniel A. Morgan
> University of Washington
> damorgan@x.washington.edu
> (replace x with u to respond)
> Puget Sound Oracle Users Group
> www.psoug.org