Re: A new proof of the superiority of set oriented approaches: numerical/time serie linear interpolation

From: Brian Selzer <brian_at_selzer-software.com>
Date: Mon, 30 Apr 2007 08:56:47 -0400
Message-ID: <kIlZh.4358$uJ6.2721_at_newssvr17.news.prodigy.net>


"Cimode" <cimode_at_hotmail.com> wrote in message news:1177922847.045131.119570_at_y80g2000hsf.googlegroups.com...

> On Apr 30, 9:19 am, "Brian Selzer" <b..._at_selzer-software.com> wrote:

>> "Cimode" <cim..._at_hotmail.com> wrote in message
>>
>> news:1177913751.812247.69630_at_n76g2000hsh.googlegroups.com...
>>
>>
>>
>> > On 29 avr, 23:01, "Brian Selzer" <b..._at_selzer-software.com> wrote:
>> >> "Cimode" <cim..._at_hotmail.com> wrote in message
>>
>> >>news:1177873628.360842.277700_at_u30g2000hsc.googlegroups.com...
>> >> [snip]
>>
>> >> > I can not say I disagree with what you are saying but I still think
>> >> > you are missing the point of the example.
>>
>> >> > As a practionner who used intensively both procedural and set based
>> >> > methods, on may manage to get response time to be faster using
>> >> > cursors
>> >> > (that still need s to be established) but that does not mean that
>> >> > performance as a whole is improved.
>>
>> >> > Self joins poor implementations is a known direct image systems
>> >> > limitation. That is the issue I was trying to underline here.
>> >> > Discussing tuning on a specific SQL DBMS implementation is not the
>> >> > point of the thread nor the point of the NG. THe main point here
>> >> > is
>> >> > to see if linear interpolation could be a way of handling
>> >> > systematically missing numeric/datetime data...
>>
>> >> I think that self-joins would be problematic regardless of the
>> >> implementation. They are necessary only when dependencies exist
>> >> between
>> >> tuples within the same relation. It's like using a single relation
>> >> for
>> >> graphs instead of one relation for verticies and a separate relation
>> >> for
>> >> edges. It can be done, but should it?
>> > I do not understand where do you see a self join?
>>
>> >> On to your main point: Are you suggesting that the schema definition
>> >> for
>> >> a
>> >> temporal relation include some form of "active" default definition,
>> >> wherein
>> >> a scalar expression would be stored in lieu of a value? Or maybe not
>> >> stored, but evaluated whenever a missing value is accessed? Sounds
>> >> like
>> >> an
>> >> interesting idea. I think the semantics would require some form of
>> >> second-order logic, however.
>> > I just think that handling missing data through NULLS is just the
>> > worst way of doing it. So I think interpolation may probably be
>> > closer to what Codd had in mind by formulating the prerequisite for a
>> > dbms to be able to have a *systematic way missing data*.(or at least
>> > numeric/datetime data)
>>
>> NULLs can be eliminated by splitting a relation horizontally.
> I could say that on a logical perspective, I agree partly with that.
> But the chance for a system that do not separate logical and physical
> (namely a direct image system) layers to implement *systematically*
> such method is almost unexistant.  That is why I brought up the idea
> of interpolation.
>

I'm not sure I understand you. Missing information is a problem independent of any implementation. Why would the implementation have any impact at all on null elimination? It's a design technique, and the system only does what it has been told: if the system has been told, NOT NULL, then that is what it enforces.

>> The only
>> thing lost in the process is the sense of applicability, but that can be
>> overcome by creating a third relation with an applicability attribute and
>> either one or two foreign key constraints, depending on whether or not
>> the
>> attribute always applies. With nullable attributes there's no need for a
>> third relation, and sometimes not even a second. If an attribute always
>> applies (a functional dependency exists), but some of the values may be
>> absent, then a nullable attribute would be in order. If an attribute
>> sometimes applies, but when it does, a value is always present, then a
>> separate relation with a foreign key constraint would suffice. If an
>> attribute sometimes applies, but even when it does, a value may be
>> absent,
>> then a separate relation with a foreign key constraint along with a
>> nullable
>> attribute would be needed. In this way it can be determined from the
>> schema

>

>> definition whether an attribute applies always or sometimes, and when it
>> does apply, whether a value can be absent.
> You lost me on that.  RM is 2VL not 3VL.  I do not understand what you
> mean by nullable attributes.  Are you speaking on a logical
> perspective? physical implementation?
>

I am speaking from a logical perspective. A nullable attribute is one that allows nulls. The RM includes the concept of null. This has been one of Date's pet peeves with Codd over the years. In December, 1986 Codd wrote an article in the SIGMOD RECORD, Vol. 15, No. 4 entitled "Missing Information (Applicable and Inapplicable) in Relational Databases." In it he talks about using A marks and I marks instead of null to make clear the main reason that a datum is missing. I think that the same thing can be accomplished using a single null: if the attribute participates in a functional dependency, then it is clear that the attribute applies for every tuple in the relation. If it doesn't, then due to POFN, it should be in another relation anyway, and the presence of a tuple in the other relation should indicate that the attribute is applicable for a given referenced tuple. This means that a null should only ever indicate that an applicable value is missing, eliminating the need for I marks altogether. This should also simplify the systematic treatment of nulls.

>
> Received on Mon Apr 30 2007 - 14:56:47 CEST

Original text of this message