Re: Identifying pairs as groups

From: Charlie Edwards <charlie3101_at_hotmail.com>
Date: 23 Mar 2004 04:10:05 -0800
Message-ID: <217ac5a8.0403230410.5ca3d15f@posting.google.com>

Daniel Morgan <damorgan_at_x.washington.edu> wrote in message news:<1079973490.55117_at_yasure>...
>
> Someone might be able to if you could clearly state the rules: Something
> you have not done. Examples are fine ... but rules stated in clear,
> descriptive, declarative sentences, are essential. I have no idea from
> what you've written what exactly defines a pair or a group. I could
> guess ... but that is not sufficient.

OK, point taken. I thought I was simplifying things, but obviously not (I bet you'd guess right, though :-)). I'll start at the beginning...

Some legacy data defines processes and the tasks that make up the procedures. So a process might be "Log Complaint" and the activities within it might be "Answer phone", "Enter customer details", "Log details", "Inform supervisor"

So the actual data of the child table looks like this:

PROCESS_CODE TASK_CODE TASK_COUNT
------------ --------- ----------

        4268       231          1
        4268       244          1
        4268       257          1
        4268       268          2
        4268       272          1
        4268       276          1
        4270       231          1
        4270       244          1
        4270       248          1
        4270       256          1
        4270       272          1
        4270       276          1
        4273       231          1
        4273       244          1
        4273       257          1
        4273       268          2
        4273       272          1
        4273       276          1
        4285       231          1
        4285       256          1
        4285       269          1
        4285       270          1
        4285       272          1

Now over a period of time, "new" proceeses have been created that have actually been duplicates of existing one. The defintion of a duplicate is if each process in a pair has identical tasks and task count. So in the above example process 4268 and 4273 are duplicates.

So in order to identify duplicates (which is what I've been asked to do), I wrote the following SQL to identify "pairs" of duplicates:

select x.process_code code1,

y.process_code code2
from
(select b.process_count,

       a.process_code,
       a.task_code,
       a.task_count
from   ag a,
       (select process_code,count(*) process_count
          from ag group by process_code) b

where a.process_code = b.process_code) x, (select b.process_count,

       a.process_code,
       a.task_code,
       a.task_count
from   ag a,
       (select process_code,count(*) process_count
          from ag group by process_code) b

where a.process_code = b.process_code) y where x.process_code <> y.process_code

and x.process_count = y.process_count
and x.task_code = y.task_code
and x.task_count = y.task_count
group by x.process_code,
         y.process_code

having count(*) = max(x.process_count)
/

This identifies each duplicate process and the process(es) that it is a duplicate of. So in the above example, I'd get

CODE1 CODE2
--------- ---------

     4268      4273
     4273      4268

However, I found that in many cases there were more than 2 processes that had identical tasks. So again, in the above example, if 4300 had the same tasks as 4268 and 4273, I'd get the following:

CODE1 CODE2
--------- ---------

     4268      4273
     4268      4300
     4273      4268
     4273      4300
     4300      4268
     4300      4273

So this brings me to my original problem. I've identified that 4268 is a duplicate of 4273 and that 4268 is a duplicate of 4300 (and also that 4273 is a duplicate of 4300, obviously). But what I want to do is say that 4268, 4273 and 4300 are all part of the same group of duplicates.

Of course maybe I'm barking up the wrong tree with my idea of identifying pairs, but any suggestions would be welcome.

Thanks

CE Received on Tue Mar 23 2004 - 06:10:05 CST