Re: Database question ?

From: --CELKO-- <jcelko212_at_earthlink.net>
Date: 14 Aug 2004 11:24:30 -0700
Message-ID: <18c7b3c2.0408141024.61c8b74c_at_posting.google.com>


>> Modularisation comes at a price, though. Black-box thinking blocks
taking advantage of any hidden specifics. In this case not being allowed to know how the data gets stored blocks taking advantage of any specific storage strategy. <<

In practice, that kind of performance tuning is done with secondary index creation. If you really want to get donw to fine details, you can adjust pages sizes, fill factors and things like that on the PHYSICAL database side of the house.

Far, far more performance and integrity problems are caused by bad LOGICAL design. I fix databases for a living, and my "high score" was making a procedure 2300 times faster by removing 300+ lines of bad code. I can typically hit 1-2 orders of magnitude improvement with fairly simple re-designs and re-writes.

Storage management is not the real problem -- bad programmers are. But it is cheaper to buy storage -- another rant.

>> The question: "what gets encapsulated in which module?" has a
pre-cooked answer for one part (the storage strategy will be somehow taken care of by/within the DBMS) when working with a relational model. To me it is not clear that that is always the right answer. <<

A lot of the products self-organize now -- cooked to order rather than pre-cooked, as it were :) Ingres has over a dozen indexing methods and can pick one on its own; DB2 re-arranges storage now; I do not know what Yukon is promising about this. Having a human do this kind of thing is expensive; nobody will tune a database 24/7 and they probably get some of it wrong simply because they cannot process more than 5 to 7 things at once. A computer can inspect the history, the statisitics, and the actual data.

As a simple example, what is the most common newbie mistake? Making everything in the world VARCHAR(255) or VARCHAR(50) or another oversized "magic default" value. When is the last time you looked at the actual lengths of *all* the CHAR(n) and VARCHAR(n) columns in your database to compare the max length used versus the length allocated? Probably never.

When you have VARCHAR(255) for a column that ought to be, say CHAR(35)-- the size of an address line on a 3.5 inch mailing lablel -- you will get oversized garbage in that column. You will fool the optimizer into making allowances for the possibility of a longer than needed string.

I have seen a clean up like that, done with a mailing list package and some other data scrubbing procedures reduce the size of a database by almost 20%. More rows per datapage means better performance as well as better data integrity. Received on Sat Aug 14 2004 - 20:24:30 CEST

Original text of this message