Re: Large tables, updates, selects and varchars

From: joel garry <joel-garry_at_home.com>
Date: Thu, 3 Apr 2008 10:49:51 -0700 (PDT)
Message-ID: <1b148cd5-4049-47c8-9626-81d148b92408@u36g2000prf.googlegroups.com>


On Apr 3, 5:10 am, chrism..._at_gmail.com wrote:
> On Apr 2, 9:02 pm, DA Morgan <damor..._at_psoug.org> wrote:
>
>
>
>
>
> > chrism..._at_gmail.com wrote:
> > > I am using Oracle 10g (10.2.0.3.0) and have a large table (1+ billion
> > > rows).  It currently has about 15 columns in it.  I have a requirement
> > > for a new varchar2(4000) column to go with the current data in that
> > > table.  I need to update this column after it has been added to the
> > > table.  I have been told that I may be better off putting the column
> > > in a separate table because by adding it to an existing table, Oracle
> > > has to jump around the hard drive to update it and find it, and
> > > therefore this will degrade performance.  Of course, by putting it in
> > > a separate table, a join will be required on selects.
>
> > > From a purely design point of view, it makes a lot more sense to add
> > > the column to the existing table, but I don't know the internals of
> > > Oracle.  When modifying existing tables on large Oracle databases, do
> > > you generally have to be wary of what columns you add?
>
> > > I can't really test this ahead of time--the update will take a very
> > > long time to perform, because there are many rows and a calculation is
> > > involved.
>
> > > Thanks.
>
> > 1+ billion rows with or without partitioning?
>
> > Is this the last change that will ever take place or part of what may
> > be ongoing modifications?
>
> > How is the new column going to be used? Will most queries need to
> > access the new data? Why VARCHAR2(4000) and not CLOB?
> > --
> > Daniel A. Morgan
> > Oracle Ace Director & Instructor
> > University of Washington
> > damor..._at_x.washington.edu (replace x with u to respond)
> > Puget Sound Oracle Users Groupwww.psoug.org
>
> We are partitioning with between 35 million and 60 million rows per
> partition.  This probably will not be the last change we are going to
> make.  There are likely to be future modifications.
>
> It's going to store file paths and most queries will need to access
> it.  We chose VARCHAR2 because the size of the data will never exceed
> 4000 chars, and we assumed that CLOBs were less efficient than
> VARCHAR2s.  Is this not the case?- Hide quoted text -
>
> - Show quoted text -

If you poke around in the docs (I think searching for varchar2(1999) will find it), you may find that things work slightly differently above and below that size.

When you are doing updating like this (with the data inline), you should be sure and have the proper amount of free space ready to avoid row chaining or row migration (be sure you understand those concepts, read the docs again, and google for instructions on how to check if you don't know).

My gut feel for this is you should evaluate the average and maximum lengths of the paths you will be inserting, and go for a varchar2 slightly larger than that maximum (unless is turns out the 255 Mathew said, or the 1024 Robert said). But you may want to test different possiblities, especially if the data lengths show a bifurcated distribution (ie, perhaps 98% are like ./mio2398f and the others are huge paths). Is the data currently in flat files? Are you on unix?

jg

--
@home.com is bogus.
Auditors do exactly what, again (part LXXXVII)?
http://www.signonsandiego.com/uniontrib/20080402/news_1b2prgn.html
Received on Thu Apr 03 2008 - 12:49:51 CDT

Original text of this message