Re: differences
Date: 1 Feb 2002 09:23:05 -0800
Message-ID: <bdf69bdf.0202010923.504085a0_at_posting.google.com>
Clifford Heath <cjh_nospam_at_managesoft.com> wrote in message news:<3C59C7C7.6394E328_at_managesoft.com>...
> Mikito Harakiri wrote:
> > I doublechecked diff source file, it's undoubtedly an heuristical
> > comparison algorithm, so it can't be considered as a reference
> > implementation.
>
> Computation of (one of) the minimal sets of differences is an NP-complete
> problem. You won't be able to write it in SQL. I have seen some good
> papers on strategies for getting close without brute-force search though.
>
> I don't think you answered the question anyhow. The way I read your
> initial question I would have answered "use an outer join on one field
> and an inequality test on the other", but now with talk of diff I'm
> not sure that's what you want. Instead of examples, can't you provide
> a propositional description/definition of what you want?
OK, let's simplify the problem. Instead of string differences let's better query a minimal longest ascending subsequence of integers: http://www.tiac.net/users/cri/mlas.html. Indeed, a sequence <0,9,3,2,5,6,8,1,4,7> can be viewed as a permutation of <0,1,2,3,4,5,6,7,8,9>, and we want to find a closest match.
As a first step, let's transform a sequence into relation:
pos val
--- ---
0 0 1 9 2 3 3 2 4 5 5 6 6 8 7 1 8 4 9 7
The minimal longest ascending subsequence would be:
pos val
--- ---
0 0 1 2 2 4 3 7
If the technicality of adjucting the first column to ascending sequence of integers without gaps is bothering anybody, I can reduce the requirement and leave the original positions untouched:
pos val
--- ---
0 0 3 2 8 4 9 7
because only the "pos" order is significant.
According to http://www.tiac.net/users/cri/mlas.html, the problem complexity is O(N*logN). (I was not aware of the fact that NP complete problems are not expressible in SQL -- could you please provide a reference?).
SQL query, anybody? Received on Fri Feb 01 2002 - 18:23:05 CET