Re: Checking data consistency in the context of from-to-dimensions

From: Kevin Kirkpatrick <kvnkrkptrck_at_gmail.com>
Date: Wed, 1 Jul 2009 09:54:01 -0700 (PDT)
Message-ID: <8d5b14d0-7b9d-46eb-9c66-28e6e1a76728_at_x5g2000yqk.googlegroups.com>


On Jun 30, 3:36 pm, Hans Mayr <mayr1..._at_gmx.de> wrote:
> Hello,
>
> Thanks to everybody.
>
> For the recommendation to read Lorentzos, Date and Darwen's book:
> Another reader has sent me an email, strongly recommending to read
> Date's "Introduction to Database Systems" first. I will look at
> Snodgrass’ book, but from the title I assume that it does not cover
> exactly the multi-dimensional cases I described.
>
> Philipp: Sorry, but I think the problems I wrote about are more
> complex than the one you offer a solution for.
>
> To be honest: It was fun to think about my problem and come up with
> the solution I stated in my original message. However I do not have
> the time to dig into database theory though I'd love to do so. What I
> need now is a solution, best a hint if there is one "standard
> solution" for the type of problem I described. So maybe I posted to
> the wrong group.
>
> Thanks anyway.
>
> Best,
>
> Hans

Hi Hans - I may not have thought this through enough, but it seems the problem domain is such that the min/max values for the dimensions will usually be limitted to a set of discrete values (in your interestrates example, amount borders would be chosen from $1000 intervals under $20000 and months would be whole numbers under 36)

With that assumption, constraints can be handled as follows:

Let D1, D2, ... Dn represent discrete sets of allowable boundary values (in your interestrate example, D1={<d1=0>, <d1=1000>, ..., <d1=20000>} and D2 = {<d2=1>, <d2=2>, <d2=3>, ...<d2=36>} )

Let Dn_INT be the interval between discrete values in Dn (1000 for D1_INT in your example)

Let T = {key, min_d1, max_d1, min_d2, max_d2, ... min_dn, max dn}

Then you can check

SELECT D1.d1, D2.d2, ... Dn.dn , COUNT (T.key) AS matches FROM
D1 JOIN D2 JOIN ... DN
left outer join T
ON (

D1.d1 >= T.min_d1 AND D1.d1+D1_INT <= T.max_d1 AND
D2.d2 >= T.min_d2 AND D2.d2+D2_INT <= T.max_d2 AND
...
Dn.dn >= T.min_dn AND Dn.dn+Dn_INT <= T.max_dn
)
GROUP BY D1.d1, D2.d2, ... Dn.dn
HAVING COUNT (T.key) <> 1

This query will return all overlaps (matches > 1) and all gaps (matches=0).

Of course, this approach only works if "D1 JOIN D2 JOIN ... DN" isn't computationally infeasible due to large cardinalities of the underlying sets. Received on Wed Jul 01 2009 - 18:54:01 CEST

Original text of this message