Re: Help Data-Types

From: GoranG <no_at_spam.net>
Date: Tue, 20 Aug 2002 10:32:30 +0200
Message-ID: <abp3mu4asg9nvjc58cm89051qim37fe86m_at_4ax.com>


On 19 Aug 2002 16:10:00 -0700, mikharakiri_at_yahoo.com (Mikito Harakiri) wrote:

>GoranG <no_at_spam.net> wrote in message news:<pfn1mu8u6e6tph4gpcal282bsc6kcbpjfh_at_4ax.com>...

<cut>

>> >> What the heck would an IP address go under? Text? Integer (long, short?) ??
<cut>
>> >Four integer columns?
>>
>> My first idea as well. Seems that nothing is lost by normalizing and a
>> lot is gained...
>
>I'm not sure what you mean by normalization here. Higher normal forms
>(>=2) aren't concerned with domain splitting at all, while 1st normal
>form is about disallowing nested relations. AFAIK Database theory
>tells us very little if nothing at all about functions applied upon
>domains. None of the relational algebra operators allow any functions.
>On practical side, however, if we split ip address into four parts,
>then we want concatenating them (with dots, or not) in some view as a
>convenience for the end-user/application-developer.

I do mean 1st normal form - in a sense that this it might be considered nested relation (future introduction of fifth octet also suggests this...).
Strictly speaking:
To get to the second octet of IP address you need to specify relation's key + additional information => therefore not in 1NF regarding this property (now the only question is weather you can call 2nd octet a property or not...).

(I now that this is being picky, or maybe just plain wrong, but I would just like to get this right in my mind...)  

>> So, I am puzzled by amount of the answers that suggest contrary...
>>
>> Is it so because the IP address is almost all ways used as a whole (in
>> usual applications) and therefore should be thought of as whole in
>> semantic sense? Unless you are modeling a router ;)
>
>Again, since a simple view can aggregate a whole from the parts, there
>is no logical whole vs. part distinction at all. Clearly, the only
>distinction is a programming convenience of using the whole or parts.
>Probably, this is the only reason stopping us from the design like
>this
>
>table person (
> ...
> birthday_century integer,
> birthday_year_in_tens integer,
> birthday_year_in_ones integer,
> ...
>)

Ok. I think I am beggining to see the light. You are saying that this is only physical design implication - and that on the logical level you simply say birthday even if it will becom birthday_century, birthday_year_in_tens and birthday_year_in_ones (and that same applies to all domain considerations)?

And you will do so for every case where listed property (whole or parts) is not part of any relationship: so theoretically there might exist problem space which is best described with model in which birthday_year_in_tens is part of a relationship and therefore it makes sense to model it (or even this does not matter at logical level, only that the relationship exist?)

<cut>

>> Standard answer that I read to such questions is: It depends on the
>> semantic model which depends on the intention of usage.
>
>That "advice" contains zero information for me.

It could be my english. Also, If I get to it I'll lookup what I abviouslly misunderstood and misquoted and from where.  

<cut>
>
>> >Then, enforsing constraints and querieng is the
>> >simplest of all. Alternatively, it could be just a single integer:
>> >
>> >a3*256^3+a2*256^2+a1*256^1+a0
>>
>> My second idea as well. This seems more natural since it fits the
>> actual data type (it'll even support IPv6). Data representation in
>> xxx.xxx.xxx.xxx is considered formatting function (part of this
>> formatting is data extraction). However textual representation seem
>> cheaper (speed vs space costs), faster (in most cases, since it is
>> already formatted) and better supported (bitwise operations are less
>> supported compared to string functions - all IMHO)
>
>I can hardly see any speed vs. cost tradeoff here. First, the storage
>difference between varchar and integer is negligeable. With more
>options progressively avaliable in the database (column data
>compression ,for example) the difference ceases to zero. Second, one
>don't have to use bitwise operations: modulo and division are
>supported by virtully all database products. The CPU difference
>between biwise and ariphmetic operations is negligeable from optimizer
>perspective.

Good points. If I get some time I'll run some tests.

>> Final digressions - does the following statement hold water?:
>> For all proposed data types there exist semantics in which the
>> proposed data type is the better then all other.
>
>I don't think so. If you agree that different representations differ
>by complexity measure only, then it is well known that for finite data
>structures there is fundamentally no absolute complexity measure.

I don't get this... my mind just goes blank when I read this, i don't get how can something be well or not be well known based on the truthfullness of the proposition that I agree on something, i don't know how does infinity or finity of data structures relate to existance of absolute complexity measure... etc... big blank...

Thank you for your other explanations ;) I'll think more....

>On more practical note, there also is one more issue related to part
>vs. whole -- query optimization. It could be easily seen that if we
>split a column into many, then the optimizer might possibly have more
>complete statistics about the data distribution. Maybe, the data are
>skewed in such a wierd way that it can't be vizible from the whole --
>single column distribution. Therefore, splitting the data seems to be
>better performance approach. On the other hand, nothing prevents
>database engine from collecting statistics on pseudocolumns (and we
>already do so for functional indexes).

all the best...

( GoranG79 AT hotmail.com ) Received on Tue Aug 20 2002 - 10:32:30 CEST

Original text of this message