Re: Database question ?

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Sun, 15 Aug 2004 14:07:04 +0200
Message-ID: <411f51e9$0$65124$e4fe514c_at_news.xs4all.nl>


--CELKO-- wrote:

(mAsterdam:)
>>>Modularisation comes at a price, though. Black-box thinking blocks >>>taking advantage of any hidden specifics....

> In practice, that kind of performance tuning is done with secondary
> index creation. If you really want to get donw to fine details, you
> can adjust pages sizes, fill factors and things like that on the
> PHYSICAL database side of the house.

"Tuning" is not taking advantage. It is damagecontrol, the damage of the modularisation.

> Far, far more performance and integrity problems
> are caused by bad LOGICAL design.

Definitely. Just to make sure: I am not arguing contra modular design, just pro leaving the door open to different approaches. Some technologies are only viable with higher levels integration.

> I fix databases for a living, and my "high score" was
> making a procedure 2300 times faster by removing 300+ lines of bad
> code. I can typically hit 1-2 orders of magnitude improvement with
> fairly simple re-designs and re-writes.

:-)

> Storage management is not the real problem -- bad programmers are. But
> it is cheaper to buy storage -- another rant.

Again - no argument here.

>>>The question: "what gets encapsulated in which module?" has a
>>>pre-cooked answer for one part (the storage strategy will be
>>>somehow taken care of by/within the DBMS) when working with
>>>a relational model. To me it is not clear that that is always
>>>the right answer.

>
> A lot of the products self-organize now -- cooked to order rather than
> pre-cooked, as it were :) Ingres has over a dozen indexing methods
> and can pick one on its own; DB2 re-arranges storage now; I do not
> know what Yukon is promising about this. Having a human do this kind
> of thing is expensive; nobody will tune a database 24/7 and they
> probably get some of it wrong simply because they cannot process more
> than 5 to 7 things at once. A computer can inspect the history, the
> statisitics, and the actual data.
>
> As a simple example, what is the most common newbie mistake? Making
> everything in the world VARCHAR(255) or VARCHAR(50) or another
> oversized "magic default" value. When is the last time you looked at
> the actual lengths of *all* the CHAR(n) and VARCHAR(n) columns in your
> database to compare the max length used versus the length allocated?
> Probably never.
>
> When you have VARCHAR(255) for a column that ought to be, say
> CHAR(35)-- the size of an address line on a 3.5 inch mailing lablel --
> you will get oversized garbage in that column. You will fool the
> optimizer into making allowances for the possibility of a longer than
> needed string.
>
> I have seen a clean up like that, done with a mailing list package and
> some other data scrubbing procedures reduce the size of a database by
> almost 20%. More rows per datapage means better performance as well
> as better data integrity.

More tuning. Important, yes.
It fits well with a demand-pull approach. My point is just: supply-push, i.c. technology-push works too. Not as well, not as accepted, but essential to innovation nevertheless. Received on Sun Aug 15 2004 - 14:07:04 CEST

Original text of this message