Re: Interesting warehouse design challenge (long)

From: Keith Boulton <kboulton_at_ntlworld.com>
Date: Sat, 26 Jan 2002 10:43:20 -0000
Message-ID: <VGv48.26592$Ph2.4700925@news2-win.server.ntlworld.com>

I don't know too much about it, but could you not build a materialised view for each where clause and refresh it each time the main data is refreshed. I did something similar recently and it improved performance enormously, althouh it eventually turned out to be unnecessary ( improving run-time from .3 seconds to 0.02 doesn't seem worthwhile).

How big is the table? You may find that you can increase the size of the SGA to hold the entire table - in one case, I got a 10-fold performance improvement from this.

markoos <no_at_spam.com> wrote in message
news:6_p48.51619$d16.8708171_at_typhoon.nyc.rr.com...
> We have a 9i system functioning as a data warehouse on Win2k. We're stuck
> with a bad old design of a previous warehouse and need to optimize things.
>
> We have a table with 700,000 records and 10 columns that tracks our effort
> data (person works n hours on project 'abc' performing function 'def').
> Lets call it tbl_effort.
>
> We have ~10 manhour reports that run against this table from a CF web
> application. Each report can be run by any one of multiple user
> departments. The issue is that each department has a completely different
> set of criteria in which to query this table. Some departments filter by
> one column, others another column, others combinations of columns.
> Sometimes there are OR clauses as the rules change between date ranges.
> There is no direct rhyme or reason to these filters, they are just the way
> the business rules work. It's making it very difficult to index.
>
> Unfortunately we're applying these filters at runtime. The web
application
> pulls the 'where clause' from tbl_where_clause, and dynamically constructs
> the query (not my idea, trust me). Each department has one and only one
> where clause. Some effort records could be used by two departments, this
is
> many-to-many.
>
> The reports run between 10 seconds-2 minutes, often kicking off full table
> scans. I would like to improve this speed of this system by pre-filtering
> the data at night, creating any other tables, views, indexes, or
> materialized views as necessary to optimize the runtime speed.
>
> One of the requirements is that departments and where clauses can be added
> or edited mid-day, and the system needs to reflect the new filter
> immediately.
>
>
> Option 1:
>
> My thought is that we add a new table that joins to tbl_effort on
effort_ID,
> call it tbl_effort_join.
>
> CREATE TABLE tbl_effort_join (
> effort_ID INT NOT NULL,
> dept_ID INT NOT NULL
> );
> --and the appropriate indexes.
>
> At night we loop through a cursor of the departments, applying each where
> clause and inserting one record in the join table for each record returned
f
> rom the effort table. This could be implemented as a procedure that takes
in
> one department ID, dynamically constructs the query against tbl_effort,
and
> runs an INSERT INTO tbl_effort_join SELECT (dynamic query). I haven't
done
> this yet but I don't know why it wouldn't work.
>
> We then could tweak the end-user queries to use this table joined (or
create
> this as a view). All queries simply filter on dept=dept_ID instead of the
> long-winded where clause.
>
> If a where clause changes during the day, we call the procedure once for
> that department. That procedure would delete & re-insert rows, or more
> efficiently if possible.
>
>
>
> Option 2:
>
> My second design candidate is to have only one table. That table is
created
> with the same schema as tbl_effort, but with dept_ID added. Any of the
> original effort records that crossed depts would now have duplicate
records
> with different dept_IDs. Storage space is not an issue.
>
> We construct the table in the same way as option 1, with a procedure and a
> loop through the dept cursor. Also if where clause is added or updated,
we
> run the procedure which deletes & reinserts the records.
>
> I'm thinking that this is a stronger design, as having one table will be
> easier to index and create materialized views against. Also I'm thinking
> the queries may potentially run faster without having to join.
>
> I would consider index organizing this table if that sped up the queries.
>
>
>
> So the question is, which design is better? Should I opt for full
> normalization or go with one repetitive table? Is there a third design
> candidate I'm missing? Am I missing anything else?
>
>
> Thanks for any suggestions!
>
>
>
>
>
>
>
>
>
>
Received on Sat Jan 26 2002 - 04:43:20 CST