Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> REPOST: Re: Interesting warehouse design challenge (long)
As I said, I'm no expert, but I had a similar requirement recently and the
two things that made the biggest difference were:
This was an interactive reporting tool with drill-down through categories and the above changes took the per-query (=drill-down) time from 20-30 seconds - which is not acceptable 0 to about 0.3 seconds which is reasonable.
markoos <no_at_spam.com> wrote in message
news:6_p48.51619$d16.8708171_at_typhoon.nyc.rr.com...
> We have a 9i system functioning as a data warehouse on Win2k. We're stuck
> with a bad old design of a previous warehouse and need to optimize things.
>
> We have a table with 700,000 records and 10 columns that tracks our effort
> data (person works n hours on project 'abc' performing function 'def').
> Lets call it tbl_effort.
>
> We have ~10 manhour reports that run against this table from a CF web
> application. Each report can be run by any one of multiple user
> departments. The issue is that each department has a completely different
> set of criteria in which to query this table. Some departments filter by
> one column, others another column, others combinations of columns.
> Sometimes there are OR clauses as the rules change between date ranges.
> There is no direct rhyme or reason to these filters, they are just the way
> the business rules work. It's making it very difficult to index.
>
> Unfortunately we're applying these filters at runtime. The web
application
> pulls the 'where clause' from tbl_where_clause, and dynamically constructs
> the query (not my idea, trust me). Each department has one and only one
> where clause. Some effort records could be used by two departments, this
is
> many-to-many.
>
> The reports run between 10 seconds-2 minutes, often kicking off full table
> scans. I would like to improve this speed of this system by pre-filtering
> the data at night, creating any other tables, views, indexes, or
> materialized views as necessary to optimize the runtime speed.
>
> One of the requirements is that departments and where clauses can be added
> or edited mid-day, and the system needs to reflect the new filter
> immediately.
>
>
> Option 1:
>
> My thought is that we add a new table that joins to tbl_effort on
effort_ID,
> call it tbl_effort_join.
>
> CREATE TABLE tbl_effort_join (
> effort_ID INT NOT NULL,
> dept_ID INT NOT NULL
> );
> --and the appropriate indexes.
>
> At night we loop through a cursor of the departments, applying each where
> clause and inserting one record in the join table for each record returned
f
> rom the effort table. This could be implemented as a procedure that takes
in
> one department ID, dynamically constructs the query against tbl_effort,
and
> runs an INSERT INTO tbl_effort_join SELECT (dynamic query). I haven't
done
> this yet but I don't know why it wouldn't work.
>
> We then could tweak the end-user queries to use this table joined (or
create
> this as a view). All queries simply filter on dept=dept_ID instead of the
> long-winded where clause.
>
> If a where clause changes during the day, we call the procedure once for
> that department. That procedure would delete & re-insert rows, or more
> efficiently if possible.
>
>
>
> Option 2:
>
> My second design candidate is to have only one table. That table is
created
> with the same schema as tbl_effort, but with dept_ID added. Any of the
> original effort records that crossed depts would now have duplicate
records
> with different dept_IDs. Storage space is not an issue.
>
> We construct the table in the same way as option 1, with a procedure and a
> loop through the dept cursor. Also if where clause is added or updated,
we
> run the procedure which deletes & reinserts the records.
>
> I'm thinking that this is a stronger design, as having one table will be
> easier to index and create materialized views against. Also I'm thinking
> the queries may potentially run faster without having to join.
>
> I would consider index organizing this table if that sped up the queries.
>
>
>
> So the question is, which design is better? Should I opt for full
> normalization or go with one repetitive table? Is there a third design
> candidate I'm missing? Am I missing anything else?
>
>
> Thanks for any suggestions!
>
>
>
>
>
>
>
>
>
>
This message was cancelled from within The Unacanceller's glorious new software, Lotus 1-2-3 For Rogue Cancellers. Received on Sun Jan 27 2002 - 05:09:10 CST