Re: separate data/inidex

From: RSH <RSH_Oracle_at_worldnet.att.net>
Date: Mon, 22 Apr 2002 18:18:52 GMT
Message-ID: <guYw8.40277$Rw2.3101459@bgtnsc05-news.ops.worldnet.att.net>

I don't think something as fundamental as I/O contention could change, unless Oracle's internal machinery now includes Query Row Precognition, where data about to be asked for is loaded into cache before the SQL hits the RDBMS.

Which would be kind of like 'thiotimoline', the mythical chemical Isaac Asimov invented that annoyed chemists because it wood dissolve just before the chemist added the solvent, only if he really WAS going to dissolve it.

In any case, no matter how many books have been authored by whom, and no aspersions meant, some of my best friends have written books, unless CBO decides an index (like a regular index on a field like "GENDER") [even before BMI's, can you imagine people indexing that???] would be dumb as heck to use, and dives direct in for a full scan; if statistics are kept fresh, indices and keys are intelligently constructed, I simply don't get how one can say (in Martha Stewartese) "Putting tables and the indexes they pertain to on the same physical device is 'a good thing'"

I'm probably just too stupid to understand this.

Also, the term "data warehouse" has been thrown about with an implication DW = full scans; there I do have to beg to differ. I don't know how other people build their data warehouses, but my goal has always been, to the best degree possible, to anticipate all the most common and rational ways users would want to view and use the information, and construct accordingly.

Say you have half a billion rows or more coming in from feeds originating from 80+ odd systems worldwide (hypothetically), and you don't happen to have more money than God. (Or, your corporation does have more money than God, but they won't give you much of it.)

So you do your flat table loads and then:

Extraction - pulling out atomic level rows based upon very picky WHEREs and storing that particular data 'as is'.
Summation - collapsing rows along particular criteria (say, rolling up all MJEs to a journal date/company/division/etc level) and adding them to your derivative data sets.
Accumulation - applying rows as transactions to tables that maintain year to date, forever-to-date, etc balance type data.

In our hypothetical systems, we did the apparently old fashioned load as is, as fast (i.e. DIRECT, etc) into huge input tables and then had a set of PL/SQL processes that ran to do the rest.

Not sorting out the physical storage, using several tiers of balancing decisions, would have been insanity. And we knew insanity, because our development - pre-production box had been set up with no thought given to these little details. And it was a pig.

Of course, I was hypothetically using raw partitions for anyplace Oracle would have tablespaces, diced up the RAID-1 farm into uniform 250 Mb chunks (except of course where file systems were living) as our hypothetical platform couldn't support VTOCS more finely diced, and the games began, including whiteboarding and a lot of thought and planning in the core team of three.

Tables and their indices (including the indices behind their PK's, UI's, the works) must not be on the same device.

2.Incoming feeder flat files live on spindles devoted to that purpose alone.

3. REDO lives by itself.

4. Raw disks shall never have any UNIX file systems on them.

5. Tables (and their associated indices) often used together should be spindle separated as much as possible (in addition to #1).

6. Large, flat tables (like our enormous feeder tables) should be striped across at least CPUCNT / 2 spindles, and not co-reside with tables or indices related to the derivative data, since that would be r/w head madness x 100.

7. The file system based disks including, especially, the enormous area reserved for incoming flat files, should be on a different controller than the feeder tables loaded by SQL*LOADER.

8. Balance again, making compromises based on what you can buy for the budget you managed to beg and grovel for.

I can't speak for anyone else, but this worked well for us. (hypothetically)

I think VERITAS and the other tools that abstract physical actual storage to the point even the system administrators can't tell you where things actually live, do not help matters. RAID-5, where bits of things live all over, isn't a dream either, since the best you have then is rank diversity and controller diversity.

In my simple, corncob-pipe smoking mentality, this seems like arguing that a bridge with two tollbooths works just as well as one with ten, and I just don't see how that makes sense.

RSH. "Daniel Morgan" <damorgan_at_exesolutions.com> wrote in message news:3CC4351E.244AADC3_at_exesolutions.com...
> This is repeated in many places but here are the first two links to it I
found.
> --------------------------------------------------------------------------

> Place Datafiles Appropriately
>
> Tablespace location is determined by the physical location of the
datafiles that
> constitute that tablespace. Use the hardware resources of your computer
appropriately.
>
> For example, if several disk drives are available to store the database,
it might be
> helpful to store table data in a tablespace on one disk drive, and index
data in a
> tablespace on another disk drive. This way, when users query table
information, both
> disk drives can work simultaneously, retrieving table and index data at
the same time.
>
> http://technet.oracle.com/doc/server.815/a67772/dfiles.htm#423
> ---------------------------------------------------------------------
> Place Datafiles Appropriately
>
> Tablespace location is determined by the physical location of the
datafiles that
> constitute that tablespace. Use the hardware resources of your computer
appropriately.
>
> For example, if several disk drives are available to store the database,
consider
> placing potentially contending datafiles on separate disks.This way, when
users query
> information, both disk drives can work simultaneously, retrieving data at
the same
> time.
>
>

http://otn.oracle.com/docs/products/oracle9i/doc_library/901_doc/server.901/ a90117/dfiles.htm#7384
>
> ---------------------------------------------------------------------
>
> You see what I mean about Oracle's own literature. You, and others here,
have clearly
> stated that the concept of simultaneously retrieving data is not valid and
yet every
> reference I can find at technet and tahiti states that this is the case.
If Couchman
> and others, myself included, repeat this the source is Oracle. And the
first
> reference, as you can see, clearly suggests separating tables and indexes
to improve
> IO.
>
> So is Tom Kyte another one of those 'experts' we should ignore? Just
kidding? Or can
> we get corrections to these things pubished in Oracle Magazine so that
they get wider
> distribution. Perhaps you could make your slideshow presentation available
for
> download at AskTom? And, if so, I would like permission to present it to
my students.
>
> Well here I go revising the curriculum again. ;-)
>
> Thanks.
>
> But can I get you to weigh in with a repacement rule or guideline?
>
> Thanks again.
>
> Daniel Morgan
>
>
>
> Thomas Kyte wrote:
>
> > In article <3CC30C98.4977F6F5_at_exesolutions.com>, "Daniel says...
> > >
> > >If the only reason you are building an index is for scanning it why not
an IOT?
> > >
> >
> > the index is a "skinnier" version of the table.
> >
> > an IOT is organized by the primary key. I needed a non-primary key set
of
> > columns...
> >
> > >With the respect to the rest of it ... fascinating.
> > >
> > >So who at Oracle is charged with correcting all of the information that
goes out
> > >under
> > >its name. Not just at Oracle Press, but at technet, tahiti, etc.
> > >
> >
> > it is hard to kill myths, they die slow painful deaths many times.
> >
> > if you point me to explicit doc refs, I'll "bug them"
> >
> > Oracle press -- cannot do a thing about that. anyone can publish for
Oracle
> > press.
> >
> > >Once again thanks for the clarification.
> > >
> > >But is there a general rule that developers can follow. Each
development project
> > >can
> > >not turn into an exploration of what "common knowledge" is wrong. Or
trying out
> > >a
> > >multitude of tablespace, extent combinations to find the one that is
optimal.
> > >Some
> > >rules must be applied or no rules are applied.
> > >
> > >Thanks,
> > >
> > >Daniel Morgan
> > >
> > >
> > >
> > >Thomas Kyte wrote:
> > >
> > >> In article <3CC2CB08.64C747AF_at_exesolutions.com>, "Daniel says...
> > >> >
> > >> >Please help me here with two things.
> > >> >
> > >> >1. "Nowadays to satisfy queries, you're full scanning tables, full
scanning
> > >> >indexes,
> > >> >not using indexes at all,...."
> > >> >
> > >> >Since when? I work very hard to not do these things.
> > >>
> > >>depends on the system you are building. Last benchmark I did ran best
without
> > >>indexes on the tables in most cases. DW vs OLTP.. Full scan away.
Every now
> > >> and again I would create an index for the sole purpose of using it as
a
> > >> "skinnier" table and fast full scanning it.
> > >>
> > >> >
> > >>>2. "Quality IO balancing is important...separating data and indexes
is not the
> > >> >same
> > >> >thing"
> > >> >
> > >>>I'll agree that I can't put an equals sign between them. But isn't
one way of
> > >> >guaranteeing that the goal will be met (I said one way, not the only
way) to
> > >> >force a
> > >> >physical separation?
> > >>
> > >>no, could be that indexes are more "cacheable" -- you could end up
with phyiscal
> > >>IO on the data and none on the index after a bit. Now you have an hot
disk and
> > >> a cold disk.
> > >>
> > >>Or conversely -- if the cacheability of the index is questionable, you
could end
> > >>up doing 3 or 4 physical IO's on the index to get to a single data
block (that
> > >> was already cached)
> > >>
> > >> Now, if I took a single tablespace with two datafiles -- one on
disk1, one on
> > >> disk2 and created the index and table in the same tablespace and used
lots of
> > >>extents for each -- I would achieve a nice even striping across the
two disks --
> > >>evenly distributing the data across the two devices (as we tend to
allocate an
> > >>extent from datafile1, then datafile2, then datafile1 and so on for
the objects
> > >> -- in the goal of spreading the IO around)
> > >>
> > >> Now -- regardless of the cacheability (or lack thereof) I get a nice
even
> > >> distribution of physical IO in most/many cases...
> > >>
> > >> Hey another reason why the "single extent theory" doesn't hold up ;)
> > >>
> > >> >
> > >> >Thanks,
> > >> >
> > >> >Daniel Morgan
> > >> >
> > >> >
> > >> >
> > >> >Connor McDonald wrote:
> > >> >
> > >> >> Daniel A. Morgan wrote:
> > >> >> >
> > >> >> > I can't believe it either. Please explain Howard.
> > >> >> >
> > >>>>> And please don't just throw out jargon like RAID and striping.
This
> > >>statement
> > >>>>> appears to fly in the face of more than a few books by well
respected
> > >>authors
> > >>>>> (not that you aren't one yourself now) as well as a lot of HTM and
PDF files
> > >> >>put
> > >> >> > out on technet.
> > >> >> >
> > >> >> > Daniel Morgan
> > >> >> >
> > >> >> > RSH wrote:
> > >> >> >
> > >> >> > > Mister Rogers,
> > >> >> > >
> > >> >> > > I can't believe you said this.
> > >> >> > >
> > >> >> > > Tablespaces containing tables, and tablespaces containing
indices
> > >>>>> > corresponding to them, would be like locating matter and
antimatter on the
> > >> >> > > same spindle.
> > >> >> > >
> > >>>>> > In RAID environments, I do not even like them sharing the same
controller;
> > >> >> > > failing that, certainly not sharing the same RAID ranks.
> > >> >> > >
> > >> >> > > How do you arrive at this astonishing (to me at least) notion?
> > >> >> > >
> > >> >> > > RSH.
> > >> >> > > "Howard J. Rogers" <dba_at_hjrdba.com> wrote in message
> > >> >> > > news:a9ov52$kjn$1_at_lust.ihug.co.nz...
> > >>>>> > > There is precisely zero physical reason for separating indexes
and
> > >>tables,
> > >>>>> > > at least in routine database configurations. No performance
gain arises
> > >>if
> > >>>> > > > you do. There are real *management* issues (tables *need*
backing up,
> > >> >> > > > indexes don't). And the rebuild command is fine to move the
indexes
> > >>>>> > > somewhere else if you decide to do it; but then the 'alter
table X move'
> > >>>> > > > command will do much the same by shunting the tables off
somewhere new,
> > >> >> > > too.
> > >> >> > > >
> > >> >> > > > Regards
> > >> >> > > > HJR
> > >> >> > > >
> > >> >> > > > "Troy Meyerink" <meyerink_at_usgs.gov> wrote in message
> > >> >> > > > news:Gus70n.CA3_at_igsrsparc2.er.usgs.gov...
> > >> >> > > > > You need to create an index tablespace and then move the
existing
> > >> >> > > indexes
> > >> >> > > > by
> > >> >> > > > > issuing an
> > >> >> > > > > 'alter index rebuild' statement.
> > >> >> > > > >
> > >> >> > > > > Troy Meyerink
> > >> >> > > > > Oracle DBA
> > >> >> > > > > Raytheon
> > >> >> > > > >
> > >> >> > > > >
> > >> >> > > > > "Titi" <thierry.constant2_at_wanadoo.fr> wrote in message
> > >> >> > > > > news:3cbf2607$0$15182$626a54ce_at_news.free.fr...
> > >> >> > > > > > Hi,
> > >> >> > > > > >
> > >> >> > > > > > For a user, I have data and index in one tablespace.
> > >> >> > > > > > I want to separate data and index ( good idea ??)
> > >> >> > > > > >
> > >> >> > > > > > How can do that ???
> > >> >> > > > > >
> > >> >> > > > > > Thanks in advance ...
> > >> >> > > > > > ( oracle 8.1.7)
> > >> >> > > > > >
> > >> >> > > > > >
> > >> >> > > > >
> > >> >> > > > >
> > >> >> > > >
> > >> >> > > >
> > >> >>
> > >> >> Separating data and indexes is a very (very) special case of the
> > >> >> argument for balancing IO. I would suspect that it came about
> > >> >> originally when everything when running rule based optimizer so
your app
> > >> >> was doing full scans or nested loops and that was it...Even then
the
> > >> >> advice was dubious in a multiuser database anyway...
> > >> >>
> > >> >> Nowadays to satisfy queries, you're full scanning tables, full
scanning
> > >> >> indexes, not using indexes at all, just using indexes, dumping
sort
> > >> >> segments out, dumping temporary hash workareas out and 'n'
permutations
> > >> >> of all them..
> > >> >>
> > >> >> Quality IO balancing is important...separating data and indexes is
not
> > >> >> the same thing
> > >> >> --
> > >> >> ==============================
> > >> >> Connor McDonald
> > >> >>
> > >> >> http://www.oracledba.co.uk
> > >> >>
> > >> >> "Some days you're the pigeon, some days you're the statue..."
> > >> >
> > >>
> > >> --
> > >> Thomas Kyte (tkyte@oracle.com) http://asktom.oracle.com/
> > >> Expert one on one Oracle, programming techniques and solutions for
Oracle.
> > >> http://www.amazon.com/exec/obidos/ASIN/1861004826/
> > >> Opinions are mine and do not necessarily reflect those of Oracle Corp
> > >
> >
> > --
> > Thomas Kyte (tkyte@oracle.com) http://asktom.oracle.com/
> > Expert one on one Oracle, programming techniques and solutions for
Oracle.
> > http://www.amazon.com/exec/obidos/ASIN/1861004826/
> > Opinions are mine and do not necessarily reflect those of Oracle Corp
>
Received on Mon Apr 22 2002 - 13:18:52 CDT