Re: Identifying pairs as groups

From: VC <boston103_at_hotmail.com>
Date: Tue, 23 Mar 2004 18:55:02 GMT
Message-ID: <2K%7c.75304$1p.1078765@attbi_s54>

Hello,

Please see blow:

Original Message ----- From: "Charlie Edwards" <charlie3101_at_hotmail.com> Newsgroups: comp.databases.oracle.server Sent: Tuesday, March 23, 2004 7:10 AM Subject: Re: Identifying pairs as groups

> Daniel Morgan <damorgan_at_x.washington.edu> wrote in message
news:<1079973490.55117_at_yasure>...
> >
> > Someone might be able to if you could clearly state the rules: Something
> > you have not done. Examples are fine ... but rules stated in clear,
> > descriptive, declarative sentences, are essential. I have no idea from
> > what you've written what exactly defines a pair or a group. I could
> > guess ... but that is not sufficient.
>
>
> OK, point taken. I thought I was simplifying things, but obviously
> not (I bet you'd guess right, though :-)). I'll start at the
> beginning...
>
> Some legacy data defines processes and the tasks that make up the
> procedures. So a process might be "Log Complaint" and the activities
> within it might be "Answer phone", "Enter customer details", "Log
> details", "Inform supervisor"
>
> So the actual data of the child table looks like this:
>
> PROCESS_CODE TASK_CODE TASK_COUNT
> ------------ --------- ----------
> 4268 231 1
> 4268 244 1
> 4268 257 1
> 4268 268 2
> 4268 272 1
> 4268 276 1
> 4270 231 1
> 4270 244 1
> 4270 248 1
> 4270 256 1
> 4270 272 1
> 4270 276 1
> 4273 231 1
> 4273 244 1
> 4273 257 1
> 4273 268 2
> 4273 272 1
> 4273 276 1
> 4285 231 1
> 4285 256 1
> 4285 269 1
> 4285 270 1
> 4285 272 1
>
> Now over a period of time, "new" proceeses have been created that have
> actually been duplicates of existing one. The defintion of a
> duplicate is if each process in a pair has identical tasks and task
> count. So in the above example process 4268 and 4273 are duplicates.
>
> So in order to identify duplicates (which is what I've been asked to
> do), I wrote the following SQL to identify "pairs" of duplicates:
>
> select x.process_code code1,
> y.process_code code2
> from
> (select b.process_count,
> a.process_code,
> a.task_code,
> a.task_count
> from ag a,
> (select process_code,count(*) process_count
> from ag group by process_code) b
> where a.process_code = b.process_code) x,
> (select b.process_count,
> a.process_code,
> a.task_code,
> a.task_count
> from ag a,
> (select process_code,count(*) process_count
> from ag group by process_code) b
> where a.process_code = b.process_code) y
> where x.process_code <> y.process_code
> and x.process_count = y.process_count
> and x.task_code = y.task_code
> and x.task_count = y.task_count
> group by x.process_code,
> y.process_code
> having count(*) = max(x.process_count)
> /
>
> This identifies each duplicate process and the process(es) that it is
> a duplicate of. So in the above example, I'd get
>
> CODE1 CODE2
> --------- ---------
> 4268 4273
> 4273 4268
>
> However, I found that in many cases there were more than 2 processes
> that had identical tasks. So again, in the above example, if 4300 had
> the same tasks as 4268 and 4273, I'd get the following:
>
> CODE1 CODE2
> --------- ---------
> 4268 4273
> 4268 4300
> 4273 4268
> 4273 4300
> 4300 4268
> 4300 4273
>
> So this brings me to my original problem. I've identified that 4268
> is a duplicate of 4273 and that 4268 is a duplicate of 4300 (and also
> that 4273 is a duplicate of 4300, obviously). But what I want to do
> is say that 4268, 4273 and 4300 are all part of the same group of
> duplicates.
>
> Of course maybe I'm barking up the wrong tree with my idea of
> identifying pairs, but any suggestions would be welcome.
>
> Thanks
>
> CE

In 9i, the most efficient and straight-forward way to do what you want would be through the user-defined aggregate (
http://download-west.oracle.com/docs/cd/B10501_01/appdev.920/a96595/dci11agg .htm#1004573 ). A nice example can be found at Mr. Kyte's website (http://asktom.oracle.com/pls/ask/f?p=4950:8:4350649838405935649::NO::F4950_ P8_DISPLAYID,F4950_P8_CRITERIA:15637744429336,)

Given:

create table t1(PROCESS_CODE int, TASK_CODE int, TASK_COUNT int)

insert into t1 values(4268 ,    231   ,     1);
insert into t1 values(4268 ,    244   ,     1);
insert into t1 values(4268 ,    257   ,     1);
insert into t1 values(4268 ,    268   ,     2);
insert into t1 values(4268 ,    272   ,     1);
insert into t1 values(4268 ,    276   ,     1);
insert into t1 values(4270 ,    231   ,     1);
insert into t1 values(4270 ,    244   ,     1);
insert into t1 values(4270 ,    248   ,     1);
insert into t1 values(4270 ,    256   ,     1);
insert into t1 values(4270 ,    272   ,     1);
insert into t1 values(4270 ,    276   ,     1);
insert into t1 values(4273 ,    231   ,     1);
insert into t1 values(4273 ,    244   ,     1);
insert into t1 values(4273 ,    257   ,     1);
insert into t1 values(4273 ,    268   ,     2);
insert into t1 values(4273 ,    272   ,     1);
insert into t1 values(4273 ,    276   ,     1);
insert into t1 values(4285 ,    231   ,     1);
insert into t1 values(4285 ,    256   ,     1);
insert into t1 values(4285 ,    269   ,     1);
insert into t1 values(4285 ,    270   ,     1);
insert into t1 values(4285 ,    272   ,     1);

Applying the aggregate function 'stragg' from the above site , one can do this:

select signature, stragg(process_code) codes from
(
select process_code, stragg(task_code||'.'||task_count) signature from t1
group by process_code
)
group by signature
having count(*) > 1

Result:

SIGNATURE                                       |   CODES

231.1,244.1,257.1,268.2,272.1,276.1 | 4268,4273

The same approach, although not so generic, can be used in Oracle 8i as well.

VC Received on Tue Mar 23 2004 - 12:55:02 CST