Re: Advice required for more efficient Oracle database definition/solution

From: Jim Kennedy <kennedy-family_at_attbi.com>
Date: Sat, 13 Apr 2002 14:55:51 GMT
Message-ID: <WFXt8.7158$HH5.679_at_rwcrnsc51.ops.asp.att.net>


This seems a perfect textbook case for a data warehouse schema.(star) I am thinking my look up table looks like:

create table questions(
questions_id integer not null, -- my primary key a sequence year integer not null, -- the year of the question maybe this should be a date incase it is not just annual
question_number integer not null, -- the number of the question answer varchar2(1) not null); -- the answer.

In this table you have all possible answers. For example, let's say for a 1997 questionaire question 35 was 'How many children do you have under the age of 21?' The 5 possible answers (bubbles that people filled in) are 0,1,2,3,4+ then the rows would look like:

4232      1997     35   0
4233      1997     35   1
4234      1997     35   2
4235      1997     35   3
4236      1997     35   4

Then you would have a fact or answer table looking something like this: create table answers(

    answers_id integer not null, -- the primary key a sequence     questions_id integer not null ) -- the key back to the questions table, which question and what answer and when(year or date)

You might have another table for the actual text of the question in case you want to know what it was.

create table question_text(
question_text_id integer not null, -- primary key a sequence question_id integer not null, -- the key back to the questions table same question might have been asked for multiple years. verbage varchar2(2000) not null) the actual text of the question - not sure how long it could be.

also you would probably want to know what the column headers for the questions were

create table question_answers(

question_answers_id integer not null,
question_text_id integer not null, -- links to the question
question_id integer not null, -- I would do this because the data is static
and this makes it easier if I want to join questions and question_answers to get the column headers. Not strickly academically correct, but I'm a pragmatist usually.
column integer not null,
answer_header varchar2(100) not null) -- the header eg 'N/A', etc.

You might use other labels. Then I would index and partition appropriately, (bitmap indexes, and probably partition on year) Should I give you the address to send the check to? :-) Jim

"Monty" <mmontreaux_at_hotmail.com> wrote in message news:6284dd3.0204130452.39ee3266_at_posting.google.com...
> Hello, just fielding opinion on a better solution than what is
> currently implemented for a large Oracle database. I've been brought
> in to a firm and asked to change the way a database is structured.
>
>
> Here's the spec, a single Oracle table
>
> VARCHAR2(20) NGD number (census form number)
> VARCHAR2(1) A digit (0..9) or character (Y, N, or X). 0 and X
> essentially mean Not Applicable and there are lots and lots of these
> entries in the table columns.
>
> The table was probably not initially created using something like the
> SQL below, but it has been shown this way as it clearly defines the
> existing table structure.
>
> CREATE TABLE FrmC1997(NGDid VARCHAR2(20), q001 VARCHAR2(1), q002
> VARCHAR2(1) ..... q464 VARCHAR2(1));
>
> This is not a typo. There are 464 question columns in this table. Each
> q??? value corresponds to a census question number (or where the
> government clerks have substituted a digit/character for ethnic race,
> profession etc).
>
> There are approximately 85 million records and the database and it has
> been provided to us to propose and demonstrate a better table
> structure for all this census information (for a government stats
> department) for large batch database queries.
>
> The database is currently being queried with very simply SQL queries
> from web-applications (SELECT something FROM tbl_Frm1997 WHERE q1='Y'
> AND q2='3' AND q3='X' AND q9='7' AND q387='N') and as we have no
> control over the many many web front ends that access this data, it
> has to stay like this. The SQL strings are often 4-5 hundred
> characters in length but are always of this simple SELECT something
> WHERE followed by lots of predicates. Thus we imagine the proposed
> solution would be to create a view called Frm1997 with this external
> schema definition, and access the underlying different format base
> tables.
>
> So, my query, a better table structure would be ? For any information
> that is missing, just imagine a standard census questionairre.
>
> The goal - faster access times for large processing jobs, and a more
> elegant solution.
>
> Our answer is currently the obvious answer (put each question number
> into it's own table and perform table joins in the view) but believe
> there maybe a more efficient and logical solution that we are
> overlooking. Any ideas greatly appreciated.
>
>
> Monty.
Received on Sat Apr 13 2002 - 16:55:51 CEST

Original text of this message