Re: Designing database tables for performance?

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Tue, 13 Feb 2007 20:55:13 -0000
Message-ID: <oqednZ9qcNotuE_YnZ2dnUVZ8vudnZ2d_at_bt.com>


"Timasmith" <timasmith_at_hotmail.com> wrote in message news:1170997448.300092.235340_at_s48g2000cws.googlegroups.com...
> Hi,
>
> I am trying to get my head around who to best design a couple of
> tables for performance. The entities are fake so ignore that, I am
> just focusing on the table sizes and values.
>
> Suppose I had a table called items with 50,000 rows
>
> items (item_id number, parent_item_id number, item_name string)
>
> and I had a table called orders with 10,000,000 rows
>
> orders (order_id number, order_date timestamp, order_status number,
> customer_id number, item_id number)
>
>
> Now items are a hierarchy (hence the parent_item_id) and your average
> custom has 1000 rows in the orders table.
> The 1000 rows represents 100 different items on average.
>
> There are two primary use cases for querying. One is like this:
>
> select i.item_name, o.order_date, o.order_status
> from orders o, items i
> where o.customer_id = ?
> and o.order_date > sysdate-1
> and o.item_id = i.item_id
> and i.item_id in (?....?);
>
> Now ?...? represents a potentially very long list of items -
> essentially a subtree of the item hierarchy.
>
> The other query is searching for a few items across a longer timeframe
>
> from orders o, items i
> where o.customer_id = ?
> and o.order_date > sysdate-30
> and o.item_id = i.item_id
> and i.item_id in (?,?,?);
>
> It seems to me very difficult to optimize this, obviously we leave it
> to the database and cross our fingers that the optimizer will pick the
> right query in the right situation but I cant help thinking there must
> be *something* one can do to help out. The fact the items are a
> hierarchy doesnt help - difficult to avoid the 'in' statement without
> knowing how many levels there are in the hierarchy.
>
> Any ideas?
>

I may have missed something, but based on the number of orders per customer, and number of orders in the table, it seems as if the number of order for a customer since

       sysdate - 1
or sysdate - 30

(a) will be quite small
and
(b) will be quite scattered if you use a standard heap table.

Optimum strategy therefore seems to be to identify the orders for the customer in the date range using an indexed access path: and you could reduce the I/O costs of this step by storing the orders as an IOT (index organized table) possibly partitioned by date.

Once the orders have been identified, you can then filter out the orders which are not for the required items with the predicate against the item table.

An "IN SUBQUERY" with a hierarchical query against the items table (suitably indexed) should be sufficient to do the filtering in the case where you have a large item selection - followed by a join back to the items table to pick up the item name for the final order set. (Note: it really would be perfectly reasonable to have the items table appearing twice in the query).

To minimise the cost of the subquery, you could always create a one-to-one trigger on the items table to maintain in real-time clone table (item_id, parent_id) as an IOT - low-risk, no contention on maintenance, minimum time, minimum space, maximum efficient for the hierarchical subquery.

-- 
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com

Author: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
Received on Tue Feb 13 2007 - 21:55:13 CET

Original text of this message