Re: differences

From: Mikito Harakiri <mikharakiri_at_yahoo.com>
Date: 1 Feb 2002 09:23:05 -0800
Message-ID: <bdf69bdf.0202010923.504085a0_at_posting.google.com>


Clifford Heath <cjh_nospam_at_managesoft.com> wrote in message news:<3C59C7C7.6394E328_at_managesoft.com>...
> Mikito Harakiri wrote:
> > I doublechecked diff source file, it's undoubtedly an heuristical
> > comparison algorithm, so it can't be considered as a reference
> > implementation.
>
> Computation of (one of) the minimal sets of differences is an NP-complete
> problem. You won't be able to write it in SQL. I have seen some good
> papers on strategies for getting close without brute-force search though.
>
> I don't think you answered the question anyhow. The way I read your
> initial question I would have answered "use an outer join on one field
> and an inequality test on the other", but now with talk of diff I'm
> not sure that's what you want. Instead of examples, can't you provide
> a propositional description/definition of what you want?

OK, let's simplify the problem. Instead of string differences let's better query a minimal longest ascending subsequence of integers: http://www.tiac.net/users/cri/mlas.html. Indeed, a sequence <0,9,3,2,5,6,8,1,4,7> can be viewed as a permutation of <0,1,2,3,4,5,6,7,8,9>, and we want to find a closest match.

As a first step, let's transform a sequence into relation:

pos val
--- ---

  0     0
  1     9
  2     3
  3     2
  4     5
  5     6
  6     8
  7     1
  8     4
  9     7

The minimal longest ascending subsequence would be:

pos val
--- ---

  0     0
  1     2
  2     4
  3     7

If the technicality of adjucting the first column to ascending sequence of integers without gaps is bothering anybody, I can reduce the requirement and leave the original positions untouched:

pos val
--- ---

  0     0
  3     2
  8     4
  9     7

because only the "pos" order is significant.

According to http://www.tiac.net/users/cri/mlas.html, the problem complexity is O(N*logN). (I was not aware of the fact that NP complete problems are not expressible in SQL -- could you please provide a reference?).

SQL query, anybody? Received on Fri Feb 01 2002 - 18:23:05 CET

Original text of this message