# Re: Checking data consistency in the context of from-to-dimensions

Date: Wed, 1 Jul 2009 09:54:01 -0700 (PDT)

Message-ID: <8d5b14d0-7b9d-46eb-9c66-28e6e1a76728_at_x5g2000yqk.googlegroups.com>

On Jun 30, 3:36 pm, Hans Mayr <mayr1..._at_gmx.de> wrote:

*> Hello,
**>
**> Thanks to everybody.
**>
**> For the recommendation to read Lorentzos, Date and Darwen's book:
**> Another reader has sent me an email, strongly recommending to read
**> Date's "Introduction to Database Systems" first. I will look at
**> Snodgrass’ book, but from the title I assume that it does not cover
**> exactly the multi-dimensional cases I described.
**>
**> Philipp: Sorry, but I think the problems I wrote about are more
**> complex than the one you offer a solution for.
**>
**> To be honest: It was fun to think about my problem and come up with
**> the solution I stated in my original message. However I do not have
**> the time to dig into database theory though I'd love to do so. What I
**> need now is a solution, best a hint if there is one "standard
**> solution" for the type of problem I described. So maybe I posted to
**> the wrong group.
**>
**> Thanks anyway.
**>
**> Best,
**>
**> Hans
*

Hi Hans - I may not have thought this through enough, but it seems the problem domain is such that the min/max values for the dimensions will usually be limitted to a set of discrete values (in your interestrates example, amount borders would be chosen from $1000 intervals under $20000 and months would be whole numbers under 36)

With that assumption, constraints can be handled as follows:

Let D1, D2, ... Dn represent discrete sets of allowable boundary values (in your interestrate example, D1={<d1=0>, <d1=1000>, ..., <d1=20000>} and D2 = {<d2=1>, <d2=2>, <d2=3>, ...<d2=36>} )

Let Dn_INT be the interval between discrete values in Dn (1000 for D1_INT in your example)

Let T = {key, min_d1, max_d1, min_d2, max_d2, ... min_dn, max dn}

Then you can check

SELECT D1.d1, D2.d2, ... Dn.dn , COUNT (T.key) AS matches
**FROM
**

**D1 JOIN D2 JOIN ... DN
**

left outer join T

ON (

D1.d1 >= T.min_d1 AND D1.d1+D1_INT <= T.max_d1 AND D2.d2 >= T.min_d2 AND D2.d2+D2_INT <= T.max_d2 AND ... Dn.dn >= T.min_dn AND Dn.dn+Dn_INT <= T.max_dn)

GROUP BY D1.d1, D2.d2, ... Dn.dn

HAVING COUNT (T.key) <> 1

This query will return all overlaps (matches > 1) and all gaps (matches=0).

Of course, this approach only works if "D1 JOIN D2 JOIN ... DN" isn't computationally infeasible due to large cardinalities of the underlying sets. Received on Wed Jul 01 2009 - 18:54:01 CEST