Normalize / Denormalize [message #219709] |
Thu, 15 February 2007 10:48 |
faiz_hyd
Messages: 294 Registered: February 2006
|
Senior Member |
|
|
Hi,
Trying to work on existing Data model , need some advice from you people on denormalization or to keep existing normalize structure.
currently have the following structure in
batch table pk batch_id
txn table pk txn_id fk batch_id
wkst_queue table fk txn_id
Because of some changes we need to make to support new queries,
new suggested struncture is the following structure:
batch pk batch_id
txn pk txn_id fk batch_id
txn_routing pk txn_routing_id fk txn_id
wkst_queue fk txn_routing_id
The transaction queuing query needs information from the batch table, so currently joining the txn and batch tables to the wkst_queue table to get at the batch information. No information from txn is needed other than the fk to the batch_id. With this change, I will have to join txn_routing to txn, and txn to batch to get the batch information. This query will probably be the most often executed query.
will it be better to denormalize wkst_queue by adding the batch_id to it. wkst_queue rows are short lived in nature, normally should be only a few minutes. The batch_id will never change.
I can understand this much info is not enough to give an exact answer, but still from your experiences what do you will suggest.
Regards
[Edited to improve formatting]
[Updated on: Fri, 16 February 2007 11:50] by Moderator Report message to a moderator
|
|
|
Re: Normalize / Denormalize [message #219914 is a reply to message #219709] |
Fri, 16 February 2007 12:08 |
smartin
Messages: 1803 Registered: March 2005 Location: Jacksonville, Florida
|
Senior Member |
|
|
You are right, I don't have an answer for you.
But my design approach is normalize first, and denormalize only when shown that it is needed. In that way I don't try to solve a problem that I don't have. The database was meant to join, and often times it can do a simple pk id join just fine.
Also, situations like these are candidates for btree and hash clusters. If the data will always be accessed together at the same time, sometimes you can store it together to begin with.
On the other hand, if your queue table is temporary in nature, and never is updated in any way, and your batch table is never updated in any way, then the impact of the extra batch id would be less. Less enough to matter is up to you. But those are some big ifs, because one of the whole points of normalization is to prevent you having to maintain data in more than one spot.
But I would also ask if the current data stucture is an appropriate representation of your system. Is it a one to many to many relationship? If so, why would changing it to become a one to many to many to many relationship be needed just to "support some additional queries"? Did the fundamental system characteristics change?
You could also ask whether the queue table is a part of your actual model in the first place, since data comes and goes constantly, or if it is just a scratch / working data set holder, in which case, should it be a real table at all, rather than just a query or a process or a temp table?
|
|
|