Re: separate data/inidex

From: Daniel Morgan <damorgan_at_exesolutions.com>
Date: Tue, 23 Apr 2002 15:27:51 GMT
Message-ID: <3CC57D67.DB1CD65@exesolutions.com>

"Howard J. Rogers" wrote:

> I confess to not having that particular book to hand (actually, I wouldn't
> touch it with an exceptionally long bargepole). I'm willing to give it a go,
> if I can lay my hands on the requisite original to see the basis of their
> table. Unless someone else wants to have a go in the meantime....
>
> HJR
>
> "Daniel A. Morgan" <damorgan_at_exesolutions.com> wrote in message
> news:3CC480A0.F03F5ACA_at_exesolutions.com...
> > Great ... we are 70% of the way there. Lets see if we can but the top on
> the box
> > and tie a ribbon around it.
> >
> > I am looking at at Kevin Loney and Marlene Theriault's book Oracle 8i DBA
> > Handbook, page 97, section title I/O Contention Among Datafiles. And the
> next
> > page where Table 4-2 contains estimated I/O weights. This is one of the
> books I
> > use to teach my classes at the U.
> >
> > Based on this material my conclusion, and the one I teach my students, is
> that
> > putting indexes into the same tablespace with data is a bad thing. In fact
> it
> > seems to me that a reasonable conclusion would be that putting anything
> else
> > with data is a bad thing. If Loney and Theriault are incorrect ... and we
> all
> > agree that the goal is I/O distribution. Could you please provide an
> corrected
> > table that would serve as a guideline as to where the I/O is? I don't mean
> a
> > table that will be used to put your feet to the fire ... but something
> that,
> > like Loney and Theriault's, can be used as a guideline.
> >
> > Thanks. I hate the thought that I am teaching the next generation of
> Oracle
> > developers and DBAs things that are not correct.
> >
> > Daniel Morgan
> >
> >
> >
> > Howard Rogers wrote:
> >
> > > I took, several posts ago actually. Connor took too.
> > >
> > > The replacement advice is: even out the I/O distribution across all
> devices.
> > > If you want it more wordy, the Oracle 9i doco you quoted earlier seems
> to
> > > have got it about right: avoid placing contending datafiles on the same
> > > device. That covers everything, really, on the assumption that users
> don't
> > > leap to the erroneous conclusion that indexes and their tables
> inevitably
> > > contend.
> > >
> > > Regards
> > > HJR
> > >
> > > "Daniel Morgan" <damorgan_at_exesolutions.com> wrote in message
> > > news:3CC4969D.73F557F0_at_exesolutions.com...
> > > > I should have commented on that fact ... that you were requesting more
> > > than 30% of the
> > > > data in the table. That has a huge potential impact.
> > > >
> > > > Obviously in a table with 100,000,000+ rows only someone crazier than
> I
> > > would try a stunt
> > > > like that. And if they do I hope they use the +ALL_ROWS hint.
> > > >
> > > > In my experience queries like that are generally scheduled for 2:00am
> on a
> > > Sunday. The
> > > > more general query that returns all rows, or a large percentage of
> rows,
> > > is generally one
> > > > from a look-up table.
> > > >
> > > > So at the risk of rubbing salt into a wound ... it appears that
> everyone
> > > counseling not
> > > > separating tables and indexes is avoiding providing a replacement rule
> of
> > > thumb. Why no
> > > > takers? ;-) What is the verbiage you think would be valid at technet
> and
> > > tahiti as advice
> > > > for developers and DBAs?
> > > >
> > > > Daniel Morgan
> > > >
> > > >
> > > >
> > > > Connor McDonald wrote:
> > > >
> > > > > Daniel Morgan wrote:
> > > > > >
> > > > > > With 10,000 rows I'd not argue with you. In fact below 40,000 rows
> I'd
> > > not argue
> > > > > > with you. At 10,000 rows the only index I would create would be
> one
> > > created by a
> > > > > > required constraint. But try that same experiment with the 1.4
> million
> > > parts in a
> > > > > > 747, understanding that every part may be in there multiple times
> (for
> > > example at
> > > > > > one time there was a choice 21 different pilot clipboards). Then
> add
> > > the sub-models
> > > > > > (different seating capacity, different galleys, etc.) then add
> 717,
> > > 737, 757, 767,
> > > > > > and 777 models.
> > > > > >
> > > > > > You are correct with the example you gave. But I can't believe
> that
> > > with
> > > > > > 100,000,000+ rows you would ever find this to be the case. Am I
> wrong?
> > > > > >
> > > > > > Daniel Morgan
> > > > > >
> > > > > > Connor McDonald wrote:
> > > > > >
> > > > > > > Daniel Morgan wrote:
> > > > > > > >
> > > > > > > > I agree. I thought he was saying that most of the time we do
> full
> > > table scans,
> > > > > > > > etc. And given that many of my application tables have been in
> the
> > > > > > > > multi-gigabyte range that would be a complete non-starter.
> > > > > > > >
> > > > > > > > When one is looking for a single part of a Boeing 747 ... one
> does
> > > not want to
> > > > > > > > scan every part for every plane ever made.
> > > > > > > >
> > > > > > > > Daniel Morgan
> > > > > > > >
> > > > > > > > Jonathan Lewis wrote:
> > > > > > > >
> > > > > > > > > Comments in-line
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Jonathan Lewis
> > > > > > > > > http://www.jlcomp.demon.co.uk
> > > > > > > > >
> > > > > > > > > Author of:
> > > > > > > > > Practical Oracle 8i: Building Efficient Databases
> > > > > > > > >
> > > > > > > > > Next Seminar - Australia - July/August
> > > > > > > > > http://www.jlcomp.demon.co.uk/seminar.html
> > > > > > > > >
> > > > > > > > > Host to The Co-Operative Oracle Users' FAQ
> > > > > > > > > http://www.jlcomp.demon.co.uk/faq/ind_faq.html
> > > > > > > > >
> > > > > > > > > Daniel A. Morgan wrote in message
> > > <3CC2CB08.64C747AF_at_exesolutions.com>...
> > > > > > > > > >Please help me here with two things.
> > > > > > > > > >
> > > > > > > > > >1. "Nowadays to satisfy queries, you're full scanning
> tables,
> > > full scanning
> > > > > > > > > indexes,
> > > > > > > > > >not using indexes at all,...."
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > It is very easy to misunderstand the intent of a comment
> > > > > > > > > on the newsgroup, but I think if you re-read Connor's
> post
> > > > > > > > > carefully you will appreciate that he is not saying "you
> > > must
> > > > > > > > > full scan tables"; he is simply listing a number of the
> > > mechanism
> > > > > > > > > that are commonly (and necessarily) activated in a
> modern
> > > system.
> > > > > > > > >
> > > > > > > > > >Since when? I work very hard to not do these things.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Don't work too hard at it. Sometimes the biggest
> > > > > > > > > performance problems are introduced by trying
> > > > > > > > > to push Oracle too hard into paths which use
> > > > > > > > > index range scans and index unique scans to
> > > > > > > > > access tables by rowid; resulting in excessive
> > > > > > > > > CPU usage and CBC latch contention.
> > > > > > >
> > > > > > > I'll give an example of what I was trying to convey in terms of
> > > "full
> > > > > > > scans good sometimes, full scans bad sometimes" using an example
> > > > > > > somewhat poached from JL's excellent seminar
> > > > > > >
> > > > > > > create table BOEING1 ( x number, y char(100));
> > > > > > > create table BOEING2 ( x number, y char(100));
> > > > > > > create index BOEING1 on BOEING1 ( x);
> > > > > > > create index BOEING2IX on BOEING2IX ( x);
> > > > > > >
> > > > > > > 10000 rows inserted into each table, 200 rows for each value of
> x
> > > from 0
> > > > > > > to 49. So after running an analyze...
> > > > > > >
> > > > > > > analyze table BOEING1 compute statistics;
> > > > > > > analyze table BOEING2 compute statistics;
> > > > > > >
> > > > > > > we can see that they are virtually identical..
> > > > > > >
> > > > > > > select table_name, num_rows, blocks, empty_blocks
> > > > > > > from user_tables;
> > > > > > >
> > > > > > > TABLE_NAME NUM_ROWS BLOCKS
> EMPTY_BLOCKS
> > > > > >
> > ------------------------------ ---------- ---------- ------------
> > > > > > > BOEING1 9999 153
> 6
> > > > > > > BOEING2 9999 153
> 6
> > > > > > >
> > > > > > > select table_name, column_name, NUM_DISTINCT,DENSITY
> > > > > > > from user_tab_columns
> > > > > > > where TABLE_NAME like 'TE%';
> > > > > > >
> > > > > > > TABLE_NAME COLUMN_NAME
> > > > > > > NUM_DISTINCT DENSITY
> > > > > > > ------------------------------ ------------------------------
> > > > > > > ------------ ----------
> > > > > > > BOEING1 X
> > > > > > > 50 .02
> > > > > > > BOEING2 X
> > > > > > > 50 .02
> > > > > > >
> > > > > > > select max(x), min(x), avg(x) from BOEING1;
> > > > > > >
> > > > > > > MAX(X) MIN(X) AVG(X)
> > > > > > > ---------- ---------- ----------
> > > > > > > 49 0 24.5024502
> > > > > > >
> > > > > > > select max(x), min(x), avg(x) from BOEING2;
> > > > > > >
> > > > > > > MAX(X) MIN(X) AVG(X)
> > > > > > > ---------- ---------- ----------
> > > > > > > 49 0 24.5024502
> > > > > > >
> > > > > > > But they exhibit spectacularly different performance
> > > characteristics...
> > > > > > >
> > > > > > > select /*+ FULL(t) */ count(y) from BOEING2 t
> > > > > > > where x < 20;
> > > > > > >
> > > > > > > took 156 consistent gets
> > > > > > >
> > > > > > > select /*+ INDEX(t) */ count(y) from BOEING2 t
> > > > > > > where x < 20;
> > > > > > >
> > > > > > > took 70 consistent gets,
> > > > > > >
> > > > > > > thus you get the "conventional" result of "we should be using an
> > > index"
> > > > > > > when examing this particular BOEING. But the same queries
> against
> > > > > > > BOEING1 give
> > > > > > >
> > > > > > > select /*+ FULL(t) */ count(y) from BOEING1 t
> > > > > > > where x < 20;
> > > > > > >
> > > > > > > took 156 consistent gets
> > > > > > >
> > > > > > > select /*+ INDEX(t) */ count(y) from BOEING1 t
> > > > > > > where x < 20;
> > > > > > >
> > > > > > > took 3120 consistent gets (20 times worse!)
> > > > > > >
> > > > > > > So - sometimes checking every Boeing ever made is the best
> option
> > > :-)
> > > > > > >
> > > > > > > Cheers
> > > > > > > Connor
> > > > > > >
> > > > > > > --
> > > > > > > ==============================
> > > > > > > Connor McDonald
> > > > > > >
> > > > > > > http://www.oracledba.co.uk
> > > > > > >
> > > > > > > "Some days you're the pigeon, some days you're the statue..."
> > > > >
> > > > > The example I displayed is somewhat table size independent - that
> is,
> > > > > the queries are requesting approx 40% of the table rows. The
> example
> > > > > shows that in some cases, the index is easily the best option, in
> other
> > > > > cases, the table scan is easily the best.
> > > > >
> > > > > If you had 100,000,000 rows it wouldn't matter - if the question
> > > > > requires probing 40% of the rows, then the index vs scan decision
> still
> > > > > needs to be evaulated
> > > > >
> > > > > Cheers
> > > > > Connor
> > > > > --
> > > > > ==============================
> > > > > Connor McDonald
> > > > >
> > > > > http://www.oracledba.co.uk
> > > > >
> > > > > "Some days you're the pigeon, some days you're the statue..."
> > > >
> >

I don't have it here at the office. But someone had the 8.0 version of their book and they use the following as an example in that one:

Disk Weight Contents

1                        Oracle Software
2            35        System, Control file 1
3            40        RBS
4            100      DATA
5            33        INDEXES
6            9          TEMP
7            3          TOOLS
8            40+      REDO LOGS
9            40+      APPLICATION & ARCHIVED REDO

It is not hard, from the above, for a reasonable person to say ... I'm not putting anything on the same physical disks holding my indexes. Is the above table reasonably accurate and you disagree with the conclusion? Or is the above table inaccurate in which case I invite you to provide values you think better reflect the real world.

Also, to what extent is this discussion predicated on having RAID and Veritas or other disk striping as opposed to raw devices?

Thanks.

Daniel Morgan Received on Tue Apr 23 2002 - 10:27:51 CDT