Home » SQL & PL/SQL » SQL & PL/SQL » Downsides of Data Denormalization
Downsides of Data Denormalization [message #210689] |
Thu, 21 December 2006 16:14 |
scottwmackey
Messages: 515 Registered: March 2005
|
Senior Member |
|
|
We have an architecture like the following.
CREATE TABLE optional_info_a(
optional_info_a_id NUMBER
,optional_info_a_attr1 VARCHAR2(100)
,optional_info_a_attr2 VARCHAR2(100)
,optional_info_a_attr3 VARCHAR2(100)
,optional_info_a_attr4 VARCHAR2(100)
);
ALTER TABLE optional_info_a ADD CONSTRAINT optional_info_a_pk PRIMARY KEY (optional_info_a_id);
CREATE TABLE optional_info_b(
optional_info_b_id NUMBER
,optional_info_b_attr1 VARCHAR2(100)
,optional_info_b_attr2 VARCHAR2(100)
);
ALTER TABLE optional_info_b ADD CONSTRAINT optional_info_b_pk PRIMARY KEY (optional_info_b_id);
CREATE TABLE optional_info_c(
optional_info_c_id NUMBER
,optional_info_c_attr1 VARCHAR2(100)
,optional_info_c_attr2 VARCHAR2(100)
,optional_info_c_attr3 VARCHAR2(100)
);
ALTER TABLE optional_info_c ADD CONSTRAINT optional_info_c_pk PRIMARY KEY (optional_info_c_id);
CREATE TABLE main_table(
main_table_id NUMBER
,main_type NUMBER
,main_attr1 VARCHAR2(100)
,main_attr2 VARCHAR2(100)
,optional_info_a_id NUMBER
,optional_info_b_id NUMBER
,optional_info_c_id NUMBER);
ALTER TABLE main_table ADD CONSTRAINT main_table_pk PRIMARY KEY (main_table_id);
ALTER TABLE main_table ADD CONSTRAINT main_table_a_fk FOREIGN KEY (optional_info_a_id) REFERENCES optional_info_a;
ALTER TABLE main_table ADD CONSTRAINT main_table_b_fk FOREIGN KEY (optional_info_b_id) REFERENCES optional_info_b;
ALTER TABLE main_table ADD CONSTRAINT main_table_c_fk FOREIGN KEY (optional_info_c_id) REFERENCES optional_info_c;
CREATE UNIQUE INDEX main_table_u1 ON main_table(optional_info_a_id);
CREATE UNIQUE INDEX main_table_u2 ON main_table(optional_info_b_id);
CREATE UNIQUE INDEX main_table_u3 ON main_table(optional_info_c_id);
So the typical query to get the main_table info is something like this.
SELECT *
FROM main_table mt
LEFT JOIN optional_info_a a ON a.optional_info_a_id = mt.optional_info_a_id
LEFT JOIN optional_info_b b ON b.optional_info_b_id = mt.optional_info_b_id
LEFT JOIN optional_info_c c ON c.optional_info_c_id = mt.optional_info_c_id
As can be inferred from the structure there is a one-to-one relation between main_table and optional_info_x. Let's further add that the main_type column dictates which of the optional_info_x columns will have a value in main_table. There will be no overlap. Only one of the optional_info_x columns will be populated per row. Moreover, some main_types have none of the optional_info_x populated. I understand most people would consider the above design more elegant and thus better, but what are the real downsides to just doing this:
CREATE TABLE main_table(
main_table_id NUMBER
,main_type NUMBER
,main_attr1 VARCHAR2(100)
,main_attr2 VARCHAR2(100)
,optional_info_a_attr1 VARCHAR2(100)
,optional_info_a_attr2 VARCHAR2(100)
,optional_info_a_attr3 VARCHAR2(100)
,optional_info_a_attr4 VARCHAR2(100)
,optional_info_b_attr1 VARCHAR2(100)
,optional_info_b_attr2 VARCHAR2(100)
,optional_info_c_attr1 VARCHAR2(100)
,optional_info_c_attr2 VARCHAR2(100)
,optional_info_c_attr3 VARCHAR2(100)
);
It is also relevant that main_table is populated at installation to, at most, a million records, and will not significantly change size duing the life of the application. A record will never change its main_type.
The upside with the single table architecture, as I see it, is that the application will be much simpler to maintain, and I would think the queries would be faster. I don't see a downside. Am I missing something?
Thanks,
Scott
|
|
|
Re: Downsides of Data Denormalization [message #210702 is a reply to message #210689] |
Thu, 21 December 2006 20:07 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
The reason you don't see a downside is because there is none. The second architecture is right.
Anyone that says the first architecture is better is mistaken. It is both bad data modelling and bad physical implementation.
If you frequently perform large scans of the table and return none of the optional attrs, then you will be performing a little more IO, but this performance loss is more than compensated by not joining when you do need the optional attrs.
Ross Leishman
|
|
|
Re: Downsides of Data Denormalization [message #210746 is a reply to message #210702] |
Fri, 22 December 2006 03:05 |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
I'm not even sure you'll be performing more IO in the second model. In the first one, every time you need to get some optional attributes, you'll have to do a second Index scan, followed by a table read - If the optional table attributes are fairly common, then I reckon the second method may end up quicker as well as simpler.
The first model would be right if the relationships were 1-many, but for 1-1 relationships, I'd definitely take the second one.
|
|
|
Re: Downsides of Data Denormalization [message #210747 is a reply to message #210689] |
Fri, 22 December 2006 03:18 |
michael_bialik
Messages: 621 Registered: July 2006
|
Senior Member |
|
|
Why do you need to define all 3 groups of identical attributes:
CREATE TABLE main_table(
main_table_id NUMBER
,main_type NUMBER
,main_attr1 VARCHAR2(100)
,main_attr2 VARCHAR2(100)
,optional_info_attr1 VARCHAR2(100)
,optional_info_attr2 VARCHAR2(100)
,optional_info_attr3 VARCHAR2(100)
,optional_info_attr4 VARCHAR2(100)
);
If I understood correctly - such definition may be acceptable as well.
|
|
|
|
Re: Downsides of Data Denormalization [message #210868 is a reply to message #210850] |
Fri, 22 December 2006 15:12 |
andrew again
Messages: 2577 Registered: March 2000
|
Senior Member |
|
|
I agree that the second structure makes more sense in general, but there's insufficient information to understand the reason for the first structure.
One thing for sure - if you populate the main table and rarely update it, the row lengths aren't going to grow and you won't have row migration. You could well end up with row chaining, migration in the second structure if you subsequently add values to the "optional attributes". pctfree would likely be set differently in the 2 different structures.
A more common multi-table structure I sometimes see is to have the foreign keys in the other direction - the "advantage" being that you don't have to change the main table's structure to accociate other tables to it.
|
|
|
Re: Downsides of Data Denormalization [message #210938 is a reply to message #210868] |
Sat, 23 December 2006 22:33 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Fo completeness, Andrew (again) makes a good point. Chaining is a possibility if the optional attributes are added after the initial insert. As Andrew also pointed out, a customised PCTFREE parameter could all but fix this issue.
But even if you felt it was a big problem that couldn't be fixed easily: what is chaining anyway? It's when a row outgrows the free space in its block, and is migrated to another block. It doesn't change ROWID, its address is still the original block, but there is a "forwarding address" to the new block.
So what does this mean? To read that row, you need to read two blocks: the original and the new - that's an undesirable overhead. Two blocks.... hang on, that sounds familiar.... ah yes, thats how many blocks we have to read in the inappropriately named "normalised" model (I've done a quick revision of the 1st thru 5th normal forms, plus Boyce-Codd, and I can't find a reference for putting attributes functionally dependent on the same key in different tables).
In summary, the proposed solution to chaining some rows is to engineer fixed chaining on every row.
Ross Leishman
|
|
|
Re: Downsides of Data Denormalization [message #211579 is a reply to message #210938] |
Fri, 29 December 2006 11:36 |
scottwmackey
Messages: 515 Registered: March 2005
|
Senior Member |
|
|
Thanks again guys, but I don't think I need to worry about row chaining. If a column on a row is to ever have a value, it will have a value on insert. That value may change, but the size required to store it will not change significantly.
|
|
|
Goto Forum:
Current Time: Mon Dec 09 19:32:38 CST 2024
|