From oracle-l-bounce@freelists.org  Fri Jul 22 09:30:03 2005
Return-Path: <oracle-l-bounce@freelists.org>
Received: from air891.startdedicated.com (root@localhost)
 by orafaq.com (8.12.10/8.12.10) with ESMTP id j6MEU1iR009675
 for <oracle-l@orafaq.com>; Fri, 22 Jul 2005 09:30:01 -0500
X-ClientAddr: 206.53.239.180
Received: from turing.freelists.org (freelists-180.iquest.net [206.53.239.180])
 by air891.startdedicated.com (8.12.10/8.12.10) with ESMTP id j6METxIP009656
 for <oracle-l@orafaq.com>; Fri, 22 Jul 2005 09:29:59 -0500
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 7D7521D73E1;
 Fri, 22 Jul 2005 09:29:52 -0500 (EST)
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 11516-04; Fri, 22 Jul 2005 09:29:52 -0500 (EST)
Received: from turing (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id F2BE31D744C;
 Fri, 22 Jul 2005 09:29:51 -0500 (EST)
Content-class: urn:content-classes:message
Subject: RE: How To Find Reapeating Row Patterns
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by Ecartis
Date: Fri, 22 Jul 2005 10:27:59 -0400
Message-ID: <3C6C2B281FD3E74C9F7C9D5B1EDA45829BD4@wgexch01.wgenhq.net>
X-MimeOLE: Produced By Microsoft Exchange V6.5.6944.0
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: How To Find Reapeating Row Patterns
Thread-Index: AcWOVIKMV+cTAw5CTZyl8kM9KkM9OwAEWqiQABfFbYAFrom: "Anthony Molinaro" <amolinaro@wgen.net>
To: <Ethan.Post@ps.net>, "Dennis Williams" <oracledba.williams@gmail.com>
Cc: <oracle-l@freelists.org>
X-archive-position: 22817
X-ecartis-version: Ecartis v1.0.0
Sender: oracle-l-bounce@freelists.org
Errors-To: oracle-l-bounce@freelists.org
X-original-sender: amolinaro@wgen.net
Precedence: normal
Reply-To: amolinaro@wgen.net
X-list: oracle-l
From: oracle-l-bounce@freelists.org
X-Virus-Scanned: by amavisd-new-20030616-p9 (Debian) at avenirtech.net
X-mailscan-MailScanner-Information: Please contact the ISP for more information
X-mailscan-MailScanner: Found to be clean
X-MailScanner-From: oracle-l-bounce@freelists.org
X-Spam-Level: 
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on 
 air891.startdedicated.com
X-Spam-Status: No, hits=-4.7 required=5.0 tests=AWL,BAYES_00,NO_REAL_NAME 
 autolearn=no version=2.63

Ethan,
  I think the key, and it's been touched on already, is the
very definition of a pattern.  Consider the following result
set using the distinct values from your initial post (I trimmed
the output for readability):

SQL> select level lvl,
  2         '{ '||ltrim(sys_connect_by_path(id,','),',')||' }' sets
  3    from (
  4  select 'A' id from dual union all
  5  select 'B' id from dual union all
  6  select 'C' id from dual union all
  7  select 'D' id from dual union all
  8  select 'E' id from dual union all
  9  select 'F' id from dual
 10  )
 11  connect by prior id < id
 12  order by 1,2;

       LVL SETS
---------- ---------------
         1 { A }
         1 { B }
         1 { C }
         1 { D }
         1 { E }
         1 { F }
         2 { A,B }
         2 { A,C }
         ...
         2 { B,C }
         2 { B,D }
         ...
         2 { C,D }
         2 { C,E }
         2 { C,F }
         ...
         3 { A,B,C }
         ...
         3 { A,E,F }
         ...
         3 { B,E,F }
         3 { C,D,E }
         ...
         4 { A,B,C,D }
         ...
         4 { C,D,E,F }
         5 { A,B,C,D,E }
         ...
         5 { B,C,D,E,F }
         6 { A,B,C,D,E,F }


In the example above the definition of
a pattern is defined in the CONNECT BY
(which is why you see a,b and a,c, but
 not c,a and c,b).
Additionally, assuming you were in some way able to
define a pattern, you still need to decide
whether or not patterns are excluded due to set membership.
(I think you sorta mentioned this already, but it wasn't 
 100% clear to me).

For example,
is {a,b} not a pattern because it can be a subset
of {a,b,c} ?
if so, then it is implied {a,b,c} is not as pattern
because it can be found in {a,b,c,d} and so on...
so, then you end up counting only the occurrences 
of "complete" sets?

or do you simply want to count occurrences
or different sets, for example:

SQL> select level lvl,
  2         '{ '||ltrim(sys_connect_by_path(id,','),',')||' }' sets
  3    from (
  4  select 1 rn,'A' id from dual union all
  5  select 2,'B' id from dual union all
  6  select 3,'C' id from dual union all
  7  select 4,'D' id from dual union all
  8  select 5,'A' id from dual union all
  9  select 6,'B' id from dual
 10  )
 11  connect by prior id < id and prior rn = rn-1
 12  order by 1,2;

       LVL SETS
---------- ---------------
         1 { A }
         1 { A }
         1 { B }
         1 { B }
         1 { C }
         1 { D }
         2 { A,B }
         2 { A,B }
         2 { B,C }
         2 { C,D }
         3 { A,B,C }
         3 { B,C,D }
         4 { A,B,C,D }

13 rows selected.

Only set a,b occurs more than once..

Obviously these examples are simplified due to the fact
that there are no inverse sets. So, my whole point is, 
how are you defining a pattern? :)

 - a

-----Original Message-----
From: oracle-l-bounce@freelists.org
[mailto:oracle-l-bounce@freelists.org] On Behalf Of Post, Ethan
Sent: Thursday, July 21, 2005 10:45 PM
To: Dennis Williams
Cc: oracle-l@freelists.org
Subject: RE: How To Find Reapeating Row Patterns

Well I am pretty sure there must be plenty of programs out there that do
this type of thing. I was hoping one of the new analytical functions in
Oracle could handle this. If I was going to design a program like this I
would do something like...and I am literally thinking this up as I
type...

grab two values, look for # of occurrences with same two values using
lag...

if a lot of matches assume I am in some sort of repeating pattern so add
a value to the two and test three values...

keep repeating above until I stop getting matches

(throw this in also...even if I don't get a match on 3..4..5 go to the
next value and assume some sort of single value not included in sequence
and see if we can match on "ABC?E" etc...

-----Original Message-----
From: Dennis Williams [mailto:oracledba.williams@gmail.com] 
Sent: Thursday, July 21, 2005 7:29 PM
To: Post, Ethan
Cc: oracle-l@freelists.org
Subject: Re: How To Find Reapeating Row Patterns

Ethan,

I'm stumped. Do you know how to accomplish this in a regular
programming language like Pascal or Java? Is the sequence always the
same length, or is the length arbitrary? Unless someone knows of a
pattern recognition function in Oracle, my guess is that you will
probably write an external C function to do this and then call that
function from Oracle. It sounds pretty compute-intensive to me. It
sounds a little like what the Unix "diff" program does.

Dennis Williams

--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l

