Re: eliminate duplicate in JOIN

From: Michael Austin <maustin_at_firstdbasource.com>
Date: Sun, 05 Apr 2009 20:19:04 -0500
Message-ID: <dMcCl.13592$hc1.156_at_flpi150.ffdc.sbc.com>



Carl Forsman wrote:
> The following query will return Triple word from a table like the
> following:
>
> pretty, woman, a
> a, pretty woman
> china,great, wall
> wall, great, china
> great, wall, china
>
> =============================
> How can I elimate rows that contain the same words?
> The output should be:
>
> pretty, woman, a
> wall, great, china
> -OR-
> a, pretty woman
> wall, great, china
> -OR-
> a, pretty woman
> great, wall, china
>
> ... etc
>
> =============================
> The output should not have 2 row containsexact 3 words
>
> For example, if I have 3 words ( pretty + woman + a )
> pretty, woman, a
>
> Then the output will not contain (a + pretty + woman)
> a, pretty woman
>
> =============================
> SELECT DISTINCT A.word As word1, B.word AS word2, C.word AS word3
> FROM Data AS A
> JOIN Data AS B
> ON A.document = B.document
> AND A.position < B.position
> AND ABS(A.position - B.position) < 3
> JOIN Data AS C
> ON A.document = C.document
> AND A.position < C.position
> AND B.position < C.position
> AND ABS(A.position - C.position) < 3
> ORDER BY word1;
>

Just in case you did not see this post in comp.databases.mysql

The question to the OP is really "what problem are you trying to solve?"

   To what is the position relative? is there also a DOCID in there somewhere... having a word and a position, it must be related somehow to the doc. In this case, again, you have really not provided sufficient information to actually address the real problem.

Again, because I do not have all of the facts, I must make some educated guesses based on the OP's request.

Your WORDS table ""appears"" to be:

docid,word,position.

If this is the case you still need to fix your table definition such that your PK=(docid,word) This way you can NEVER have the same words twice eliminating your need to come up with some convoluted method to ensure uniqueness. In other "words" your WORDS table can never have duplicate "words" for the same id. Received on Sun Apr 05 2009 - 20:19:04 CDT

Original text of this message