How to Populate a fact table from dimension tables.Kindly reply. [message #380301] |
Sat, 10 January 2009 07:51  |
sqlstar_student
Messages: 42 Registered: November 2007
|
Member |
|
|
Basically my aim is there are 5 dimensional tables that are created
Student->s_id primary key,upn(unique pupil no),name
Grade->g_id primary key,grade,exam_level,values
Subject->sb_id primary key,subjectid,subname
School->sc_id primary key,schoolno,school_name
year->y_id primary key,year(like 2008)
s_id,g_id,sb_id,sc_id,y_id are sequences
select * from student;
S_ID UPN FNAME COMMONNAME GENDER DOB
==============================
9062 1027 MELISSA ANNE f 13-OCT-81
...
9000 rows selected
select * from grade;
G_ID GRADE E_LEVEL VALUE
---------- ---------- ---------- ----------
73 A a 120
74 B a 100
75 C a 80
76 D a 60
77 E a 40
78 F a 20
79 U a 0
80 X a 0
...
18 rows selected
These are basically the dimensional views
Now according to the specification given, need to create a fact table as facts_table which contains all the dim tables primary keys as foreign keys in it.
The problem is when i say,I am going to consider a smaller example than the actual no of dimension tables 5 lets say there are 2 dim tables student,grade with s_id,g_id as p key.
create materialized view facts_table(s_id,g_id)
as
select s.s_id,g.g_id
from (select distinct s_id from student)s
, (select distinct g_id from grade)g
This results in massive duplication as there is no join between the two tables.But basically there are no common things between the two tables to join,how to solve it?
Consider it when i do it for 5 tables the amount of duplication being involved, thats why there is not enough tablespace.
I was hoping if there is no other way then create a fact table with just one column initially
create materialized view facts_table(s_id)
as
select s_id
from student;
then
alter materialized view facts_table add column g_id number;
Then populate this g_id column by fetching all the g_id values from the grade table using some sort of loop even though we should not use pl/sql i dont know if this works?
Any suggestions.
|
|
|
Re: How to Populate a fact table from dimension tables.Kindly reply. [message #380303 is a reply to message #380301] |
Sat, 10 January 2009 08:28   |
flyboy
Messages: 1903 Registered: November 2006
|
Senior Member |
|
|
Hi,
before using this kind of "fact table", you shall ask this question: what shall this table store and how shall be its content used? Populating all combinations means, that each student studied all subjects with every grade in every school in all years. Does it make any sense?
Maybe you should read something about Business Intelligence and OLAP concepts. As fact tables are typically used for aggregating information about dimension combinations in existing system (the school one in your case), so it shall have at least one measure (event count). Otherwise, you would not aggregate - you would have common OLTP evidence. As in normal systems, not all dimension combinations are present, the fact table is not very huge.
Also in your case, the "student" dimension seems to be too detailed to use. Are you really planning to make reports per individual students? Maybe their gender, age or some other personal info (nationality, ...) should suffice.
Also, instead of using "normal" tables (ROLAP) you could have a look at Oracle OLAP support (MOLAP).
|
|
|
|
Re: How to Populate a fact table from dimension tables.Kindly reply. [message #380307 is a reply to message #380303] |
Sat, 10 January 2009 08:57   |
flyboy
Messages: 1903 Registered: November 2006
|
Senior Member |
|
|
flyboy wrote on Sat, 10 January 2009 15:28 | What shall this table store and how shall be its content used? Populating all combinations means, that each student studied all subjects with every grade in every school in all years. Does it make any sense?
|
Does really every student pass all subjects with every grade in every school in all years? Even if this is a homework, you shall think what do the stored data represent.
sqlstar_student wrote on Sat, 10 January 2009 15:43 | I am not getting how to do that without involving lot of duplication.
|
When the dimensions are properly chosen (according to the report requirements), that "duplication" is not so huge. On the contrary, it shall be just a fraction of corresponding OLTP evidence storage.
|
|
|
Re: How to Populate a fact table from dimension tables.Kindly reply. [message #380311 is a reply to message #380301] |
Sat, 10 January 2009 09:38   |
sqlstar_student
Messages: 42 Registered: November 2007
|
Member |
|
|
Basically your quite right to say that without any logical common columns between the dimension tables it will produce results that every student studied every sibject and got every grade and its very rubbish,
I am confused at to whether the dimension tables can contain duplicated columns i.e column like upn(unique pupil no) i will also copy in another table so that when writing queries a join can be placed. i dont know whether thats right
These are the required queries from the star schema
Design a conformed star schema which will support the following queries:
a. For each year give the actual number of students entered for at A-level in the whole country / in each LEA / in each school.
b. For each A-level subject, and for each year, give the percentage of students who gained each grade.
c. For the most recent 3 years, show the 5 most popular A-level subjects in that year over the whole country (measure popularity as the number of entries for that subject as a percentage of the total number of exam entries).
I written the queries earlier based on dimesnion tables which were highly duplicated they were like
student
=======
upn
school
school
======
school(this column substr gives lea,school and the whole is country)
id(id of school)
student_group
=============
upn(unique pupil no)
gid(group id)
grade
year_col
========
year
sid(subject id)
gid(group id)
exam_level
id(school id)
grades_list
===========
exam_level
grade
value
subject
========
sid
subject
compulsory
These were the dimension table si created earlier and as you can see many columns are duplicated in other tables like upn and this structure effectively gets the data out of the schema as there are common column upon which we can link
But a collegue suggested that these dimension tables are wrong and they should not be this way and should not contain dupliated columns.
select distinct count(s.upn) as st_count
, y.year
, c.sn
from student_info s
, student_group sg
, year_col y
, subject sb
, grades_list g
, country c
where s.upn=sg.upn
and sb.sid=y.sid
and sg.gid=y.gid
and c.id=y.id
and c.id=s.school
and y.exam_lev=g.exam_level
and g.exam_level='a'
group by y.year,c.sn
order by y.year;
This is the code for the 1st query
I am confused now which structure is right.Are my earlier dimension tables which i am describing here or the new dimension tables which i explained above are right.
If what i am describing now is right i mean the dimension tables and the columns are allright then i just need to create a fact table with foreign keys of all the dimension tables.
|
|
|
Re: How to Populate a fact table from dimension tables.Kindly reply. [message #380314 is a reply to message #380301] |
Sat, 10 January 2009 10:00   |
flyboy
Messages: 1903 Registered: November 2006
|
Senior Member |
|
|
Quote: | I written the queries earlier based on dimesnion tables which were highly duplicated they were like
|
Maybe you shall realize the difference between dimension and fact tables.
SCHOOL, GRADES_LIST and SUBJECT are dimension tables.
The rest is either fact tables (STUDENT, YEAR_COL), but containing duplicates instead of measured count or some mix between (STUDENT_GROUP).
Quote: | I am confused now which structure is right.Are my earlier dimension tables which i am describing here or the new dimension tables which i explained above are right.
|
Both do not contain measures, so I would not call them "right".
Quote: | If what i am describing now is right i mean the dimension tables and the columns are allright then i just need to create a fact table with foreign keys of all the dimension tables.
|
You may fill it from YEAR_COL, maybe with joining some other tables. And remove a STUDENT column from the fact table and replace it with its count (measure).
[Updated on: Sat, 10 January 2009 10:01] Report message to a moderator
|
|
|
Re: How to Populate a fact table from dimension tables.Kindly reply. [message #380315 is a reply to message #380314] |
Sat, 10 January 2009 10:23   |
sqlstar_student
Messages: 42 Registered: November 2007
|
Member |
|
|
I understand that if there is a query like
For each A-level subject, and for each year, give the percentage of students who gained each grade.
Then based on the requested queries we create the dimension tables as the dimension tables are tables which are being used as part of a condition in a query everything else is fact.
so basically the dimension tables would be by looking at the query
Grade Dimension table
===========
g_id primary key to link to fact table a sequence of auto gen no
grade
exam_level
value
as its mentioned each A-level subject and show data by each grade
students dim table
====================
s_id primary key to link to fact table a sequence of auto gen no
upn
for count(upn) the no of students.
year D table
============
y_id primary key to link to fact table a sequence of auto gen no
year
as its asked by each year
similarly the other queries show that
we need
Subject Dimension table
=======================
sb_id pk a sequence of auto generated no as st.seq.nextval
subjectid
subject name
and
School dim table
=================
sc_id a sequence of auto generated no as school_seq.nextval
schoolid
schoolname
...
Fact data
=========
All foreign keys
s_id
g_id
sb_id
sc_id
y_id
But how will the queries return meaningful data if there are no common columns between the tables.
|
|
|
Re: How to Populate a fact table from dimension tables.Kindly reply. [message #380316 is a reply to message #380314] |
Sat, 10 January 2009 11:01   |
sqlstar_student
Messages: 42 Registered: November 2007
|
Member |
|
|
Ahhh i think i understand a little bit , i get to see your point now
Say there are 3 dimension tables
Student
====
s_id pk auto gen sequence
Studentid
Studentname
School
=====
sc_id pk auto gen sequence
school
schoolname
grade
=====
g_id pk auto gen sequence
e_level
fact table
========
s_id fk reference s_id
sc_id fk reference sc_id
g_id fk reference g_id
so when i want to get the data for the query which could be like
get me the no of students who are studying a-level (advanced level) all over the country
then the query will be something like
select count(s.upn)
from student s
, grade g
, fact_table ft
where ft.s_id=s.s_id
and ft.g_id=g.g_id
and g.e_level='a';
If this is allright,
I kindly need to know how to populate the fact table with the primary keys of the dim tables which become the fkey in fact table without duplication.Because as i explained above in the first post whenever i try to populate the fact table then it produces duplicated keys.
[Updated on: Sat, 10 January 2009 11:07] Report message to a moderator
|
|
|
|
|
Re: How to Populate a fact table from dimension tables.Kindly reply. [message #380320 is a reply to message #380301] |
Sat, 10 January 2009 11:45   |
 |
BlackSwan
Messages: 26766 Registered: January 2009 Location: SoCal
|
Senior Member |
|
|
You need to learn about 3rd Normal Form.
You need to be able to identify which student took which course(s) at what school during which year & got what grade(s).
After you can do than then you can build the summarized FACT tables.
Dimensions are tables that contain attributes that describe an object.
A fact is a table that collects quantitative information, such as sales.
The goal behind the fact tables is to precompute values & store them in the fact table so the summarized values do not have to be always computed "on the fly" when generating reports.
Some classic FACT tables are having sales information summarized over time; days, weeks, months, quarters, years, etc.
I do not see any "quantitative information" in your proposed fact table.
[Updated on: Sat, 10 January 2009 12:06] Report message to a moderator
|
|
|
Re: How to Populate a fact table from dimension tables.Kindly reply. [message #380358 is a reply to message #380320] |
Sun, 11 January 2009 05:13   |
sqlstar_student
Messages: 42 Registered: November 2007
|
Member |
|
|
It was my fault that there was nothing in common in the fact table and dimension table, now i got that rectified and the duplication is reduced totally.
I used the primary keys in dimension tables as foreign keys in fact tables,dropped the auto generated sequences i used before as primary keys and used columns in the dimension tables which maily rep that table as its primary key and made them as foreign keys to fact table and mapped references the primary key in the dimension table.
Thank you for all the replies.
|
|
|
|