Home » SQL & PL/SQL » SQL & PL/SQL » Table Design for summary data (Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi)
Table Design for summary data [message #379675] Wed, 07 January 2009 10:35
annagel
Messages: 220
Registered: April 2006
Senior Member
I have a design problem I was wondering if anyone had any ideas on. To begin with we have data which would look something like this.

CREATE TABLE company
(state VARCHAR2(2),
nbr_units NUMBER,
measure NUMBER,
flag1 VARCHAR2(1),
flag2 VARCHAR2(1),
flag3 VARCHAR2(1),
group1_id NUMBER,
group2_id NUMBER),
release_date DATE


So each company in our data-set exists in one state, and it has a number of units and a cumulative score, these scores are recalculated periodically to track performance over time, and all old revisions are retained. There are also a number of attributes the company can have. Some will be expressed as Yes/No values (flags) and others will have a larger set of values (groups). From this set of companies I am looking to do some state by state analysis, basically determining how companies meeting certain criteria in a state compare to the whole set of companies meeting those same criteria.

A statewide measure for a set of characteristics can be derived by summing the measure * nbr_units for each company in the state to meet the criteria and dividing by the sum of the number of units in the state:

SELECT   state, SUM (nbr_units * measure) / SUM (nbr_units) state_measure
    FROM company
   WHERE /*criteria*/
GROUP BY state


For comparison between states, a percentile is calculated by dividing the state's score for the given criteria by the maximum score across all states for that same criteria:

SELECT state, state_measure,
         state_measure
       / MAX (state_measure) OVER (PARTITION BY 1) measure_percentile
  FROM (SELECT   state,   SUM (nbr_units * measure)
                        / SUM (nbr_units) state_measure
            FROM company
           WHERE 1 = 1
        GROUP BY state)


Now we have tens of millions of companies so we need a good way to summarize this data ahead of time, we reproduce scores about once a month so would would also summarize new data about once a month. The rest of the time we would only be reporting on the data never modifying it.

We will be using the data in two ways

1) The user will be selecting the criteria they are interested in which could include any of the flags, or none, and any of the group values, or more than one of them or none. And then we will give them a display of the ranking for each state as of one date.

2) The user will be able to see the details for one state, they will be able to select the same criteria as with a global view, but they will see the state's results (percentile included) tracked across multiple dates.

For use #1 I have a working solution, I summarize my data with the following query:

SELECT   state, flag1, flag2, flag3, group1_id, group2_id, release_date,
         SUM (nbr_units) nbr_units_total,
         SUM (nbr_units * measure) measure_total
    FROM company
GROUP BY state, flag1, flag2, flag3, group1_id, group2_id, release_date;


In the resulting table I put bitmap indexes on the state, release_date, all the flags and all the groups. Since we are only doing inserts and updates once a month the insert overhead of the bitmap index is not an issue and it allows us to take advantage of the bitmap index join no matter what criteria the user decides to restrict on. Querying on this for all states can be done with the following query:

SELECT state, state_measure,
       state_measure
       / MAX (state_measure) OVER (PARTITION BY 1) state_percentile
  FROM (SELECT   state,   SUM (measure_total)
                        / SUM (nbr_units_total) state_measure
            FROM company_summary
           WHERE release_date = :DATE AND /*criteria*/
        GROUP BY state)


This will use the bitmap indexes and get me the data I want without needless access to extraneous rows.

My problem is #2, I can use a slightly modified version of the access method for all states, but doing so accesses a lot of data that I basically throw away. Ideally I would need a way of pre-computing the maximum score, storing it with each state, and thus being able to retrieve only the rows that relate directly to the state when looking at the historical details of a state. So after my extensive explanation, does anyone have an idea of how I could organize data to make this possible? I am not apposed to a second summary copy of the data so it does not need to follow any of the structure of my all state summary.

Thanks for any idea,
Andrew

Previous Topic: pl/sql select loop
Next Topic: comparison of to_char function
Goto Forum:
  


Current Time: Mon Nov 04 12:59:51 CST 2024