Home » SQL & PL/SQL » SQL & PL/SQL » Downsides of Data Denormalization
Downsides of Data Denormalization [message #210689] Thu, 21 December 2006 16:14 Go to next message
scottwmackey
Messages: 515
Registered: March 2005
Senior Member
We have an architecture like the following.
CREATE TABLE optional_info_a(
   optional_info_a_id    NUMBER
  ,optional_info_a_attr1 VARCHAR2(100)
  ,optional_info_a_attr2 VARCHAR2(100)
  ,optional_info_a_attr3 VARCHAR2(100)
  ,optional_info_a_attr4 VARCHAR2(100)
);
ALTER TABLE optional_info_a ADD CONSTRAINT optional_info_a_pk PRIMARY KEY (optional_info_a_id);


CREATE TABLE optional_info_b(
   optional_info_b_id    NUMBER
  ,optional_info_b_attr1 VARCHAR2(100)
  ,optional_info_b_attr2 VARCHAR2(100)
);
ALTER TABLE optional_info_b ADD CONSTRAINT optional_info_b_pk PRIMARY KEY (optional_info_b_id);


CREATE TABLE optional_info_c(
   optional_info_c_id    NUMBER
  ,optional_info_c_attr1 VARCHAR2(100)
  ,optional_info_c_attr2 VARCHAR2(100)
  ,optional_info_c_attr3 VARCHAR2(100)
);
ALTER TABLE optional_info_c ADD CONSTRAINT optional_info_c_pk PRIMARY KEY (optional_info_c_id);


CREATE TABLE main_table(
   main_table_id      NUMBER
  ,main_type          NUMBER
  ,main_attr1         VARCHAR2(100)
  ,main_attr2         VARCHAR2(100)
  ,optional_info_a_id NUMBER
  ,optional_info_b_id NUMBER
  ,optional_info_c_id NUMBER);
ALTER TABLE main_table ADD CONSTRAINT main_table_pk PRIMARY KEY (main_table_id);
ALTER TABLE main_table ADD CONSTRAINT main_table_a_fk FOREIGN KEY (optional_info_a_id) REFERENCES optional_info_a;
ALTER TABLE main_table ADD CONSTRAINT main_table_b_fk FOREIGN KEY (optional_info_b_id) REFERENCES optional_info_b;
ALTER TABLE main_table ADD CONSTRAINT main_table_c_fk FOREIGN KEY (optional_info_c_id) REFERENCES optional_info_c;
CREATE UNIQUE INDEX main_table_u1 ON main_table(optional_info_a_id);
CREATE UNIQUE INDEX main_table_u2 ON main_table(optional_info_b_id);
CREATE UNIQUE INDEX main_table_u3 ON main_table(optional_info_c_id);

So the typical query to get the main_table info is something like this.
SELECT *
FROM main_table mt
LEFT JOIN optional_info_a a ON a.optional_info_a_id = mt.optional_info_a_id
LEFT JOIN optional_info_b b ON b.optional_info_b_id = mt.optional_info_b_id
LEFT JOIN optional_info_c c ON c.optional_info_c_id = mt.optional_info_c_id

As can be inferred from the structure there is a one-to-one relation between main_table and optional_info_x. Let's further add that the main_type column dictates which of the optional_info_x columns will have a value in main_table. There will be no overlap. Only one of the optional_info_x columns will be populated per row. Moreover, some main_types have none of the optional_info_x populated. I understand most people would consider the above design more elegant and thus better, but what are the real downsides to just doing this:

CREATE TABLE main_table(
   main_table_id         NUMBER
  ,main_type             NUMBER
  ,main_attr1            VARCHAR2(100)
  ,main_attr2            VARCHAR2(100)
  ,optional_info_a_attr1 VARCHAR2(100)
  ,optional_info_a_attr2 VARCHAR2(100)
  ,optional_info_a_attr3 VARCHAR2(100)
  ,optional_info_a_attr4 VARCHAR2(100)
  ,optional_info_b_attr1 VARCHAR2(100)
  ,optional_info_b_attr2 VARCHAR2(100)
  ,optional_info_c_attr1 VARCHAR2(100)
  ,optional_info_c_attr2 VARCHAR2(100)
  ,optional_info_c_attr3 VARCHAR2(100)
);

It is also relevant that main_table is populated at installation to, at most, a million records, and will not significantly change size duing the life of the application. A record will never change its main_type.

The upside with the single table architecture, as I see it, is that the application will be much simpler to maintain, and I would think the queries would be faster. I don't see a downside. Am I missing something?

Thanks,
Scott
Re: Downsides of Data Denormalization [message #210702 is a reply to message #210689] Thu, 21 December 2006 20:07 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
The reason you don't see a downside is because there is none. The second architecture is right.

Anyone that says the first architecture is better is mistaken. It is both bad data modelling and bad physical implementation.

If you frequently perform large scans of the table and return none of the optional attrs, then you will be performing a little more IO, but this performance loss is more than compensated by not joining when you do need the optional attrs.

Ross Leishman
Re: Downsides of Data Denormalization [message #210746 is a reply to message #210702] Fri, 22 December 2006 03:05 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
I'm not even sure you'll be performing more IO in the second model. In the first one, every time you need to get some optional attributes, you'll have to do a second Index scan, followed by a table read - If the optional table attributes are fairly common, then I reckon the second method may end up quicker as well as simpler.

The first model would be right if the relationships were 1-many, but for 1-1 relationships, I'd definitely take the second one.
Re: Downsides of Data Denormalization [message #210747 is a reply to message #210689] Fri, 22 December 2006 03:18 Go to previous messageGo to next message
michael_bialik
Messages: 621
Registered: July 2006
Senior Member
Why do you need to define all 3 groups of identical attributes:

CREATE TABLE main_table(
main_table_id NUMBER
,main_type NUMBER
,main_attr1 VARCHAR2(100)
,main_attr2 VARCHAR2(100)
,optional_info_attr1 VARCHAR2(100)
,optional_info_attr2 VARCHAR2(100)
,optional_info_attr3 VARCHAR2(100)
,optional_info_attr4 VARCHAR2(100)
);

If I understood correctly - such definition may be acceptable as well.
Re: Downsides of Data Denormalization [message #210850 is a reply to message #210746] Fri, 22 December 2006 11:44 Go to previous messageGo to next message
scottwmackey
Messages: 515
Registered: March 2005
Senior Member
Thanks guys. It's good to one is not completely losing it.
Re: Downsides of Data Denormalization [message #210868 is a reply to message #210850] Fri, 22 December 2006 15:12 Go to previous messageGo to next message
andrew again
Messages: 2577
Registered: March 2000
Senior Member
I agree that the second structure makes more sense in general, but there's insufficient information to understand the reason for the first structure.

One thing for sure - if you populate the main table and rarely update it, the row lengths aren't going to grow and you won't have row migration. You could well end up with row chaining, migration in the second structure if you subsequently add values to the "optional attributes". pctfree would likely be set differently in the 2 different structures.

A more common multi-table structure I sometimes see is to have the foreign keys in the other direction - the "advantage" being that you don't have to change the main table's structure to accociate other tables to it.
Re: Downsides of Data Denormalization [message #210938 is a reply to message #210868] Sat, 23 December 2006 22:33 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Fo completeness, Andrew (again) makes a good point. Chaining is a possibility if the optional attributes are added after the initial insert. As Andrew also pointed out, a customised PCTFREE parameter could all but fix this issue.

But even if you felt it was a big problem that couldn't be fixed easily: what is chaining anyway? It's when a row outgrows the free space in its block, and is migrated to another block. It doesn't change ROWID, its address is still the original block, but there is a "forwarding address" to the new block.

So what does this mean? To read that row, you need to read two blocks: the original and the new - that's an undesirable overhead. Two blocks.... hang on, that sounds familiar.... ah yes, thats how many blocks we have to read in the inappropriately named "normalised" model (I've done a quick revision of the 1st thru 5th normal forms, plus Boyce-Codd, and I can't find a reference for putting attributes functionally dependent on the same key in different tables).

In summary, the proposed solution to chaining some rows is to engineer fixed chaining on every row./forum/fa/1600/0/

Ross Leishman
Re: Downsides of Data Denormalization [message #211579 is a reply to message #210938] Fri, 29 December 2006 11:36 Go to previous message
scottwmackey
Messages: 515
Registered: March 2005
Senior Member
Thanks again guys, but I don't think I need to worry about row chaining. If a column on a row is to ever have a value, it will have a value on insert. That value may change, but the size required to store it will not change significantly.
Previous Topic: How is it possible?!
Next Topic: Importing .sql file!
Goto Forum:
  


Current Time: Mon Dec 09 19:32:38 CST 2024