Re: 10g - Need advise on large lookup table and optimizing io

From: DA Morgan <damorgan_at_psoug.org>
Date: Sat, 07 Jan 2006 13:46:45 -0800
Message-ID: <1136670401.140836@jetspin.drizzle.com>

peter wrote:
> Folks,
> I inherited an application that has 1 very large lookup table. This
> lookup table has
> a 90/10 read/write ratio. Most of the time it's read heavily, but
> during uploads
> records that don't currently exist in that table get inserted.
> This table looks like this.
>
> USER_ID| EMAIL | CLIENT_ID | DUPLICATE_ID
>
> ------------------------------------------------------------------------
> 1000000 'email1' 12312 NULL
> 1000001 'email2' 13434 1
>
> Anyway this table has 400 million records an is not partitioned. He
> way 2 main types of
> queries against this table.
>
> SELECT email from THISTABLE where DUPLICATE_ID IS NULL and user_id =
> ?
>
> The other type of query executed is a SELECT where this table is
> simply joined on
> USER_PROF_ID to pull the email address.
>
> SELECT A.EMAIL, B.COL1,B.COL2
> FROM DEMOGRAPHICS B, THISTABLE A
> WHERE B.USER_PROF_ID = A.USER_PROF_ID
>
> We now have indexes on the "THISTABLE" so that the actual data is never
> read. Only the indexes in place are used to satisfy all queries. The
> above SELECT query like many typical queries on our system pulls lots
> of data (500K records to 1 million records).
>
> We have one specific index on the columns
> USER_ID|EMAIL|DUPLICATE_ID|CLIENT_ID
> that we use to satisfy all the large select queries...these queries do
> an INDEX RANGE SCAN on the index columns I just described ..and so
> never need to access the table rows.
>
> Because the table/indexes have nearly 400 million records in them, and
> this index is read my all our "larger" select queries, this index is
> contented for and the IO waits are high.
>
> I know that this data model is not great, but i inherited it and can't
> really change too mch now. But I'm wondering what I can do to speed up
> the lookups during the SELECT queries against that very large index (
> Index now has 3 levels).
>
> I've been doing some testing with a single table hash cluster, but I'm
> not sure if the benefits of a hash cluster outperform the benefits (in
> terms of IO) of looking up the rows in the large index and not having
> to read the table data.
>
> If anyone out there has any input, I'd greatly appreciate it.
>
> --peter

A couple of things I'd like to know about the tables ... how many different user_ids and user_prof_ids in those 400M records. Is the data skewed? And I'd be inclined to take a very harsh look at "DUPLICATE_ID IS NULL" and this is forcing a full table scan.

Starting with DUPLICATE_ID ... can you use a default value in place NULL? If not consider a function based index and query for the value created by the function used to build the index.

With respect to the USER_ID = ... my first instinct would be to convert the table to either a sorted hash cluster (10g) or Index-Organized Table.

With respect ot USER_PROF_ID ... I'd need to know more about the data.

All of the above is just pure instinct and I wouldn't do any of it without tracing and testing extensively.

HTH

-- 
Daniel A. Morgan
http://www.psoug.org
damorgan_at_x.washington.edu
(replace x with u to respond)

Received on Sat Jan 07 2006 - 15:46:45 CST