Home » SQL & PL/SQL » SQL & PL/SQL » grouping , based on similarity in percentage (merged 2) 8i-9
grouping , based on similarity in percentage (merged 2) 8i-9 [message #419908] Wed, 26 August 2009 02:38 Go to next message
zoharo
Messages: 8
Registered: August 2009
Junior Member
hi

Im having the following table :

username | privliges
-----------------------
user1 privilge1
user1 privilge2
user1 privilge3
user1 privilge1
user1 privilge3
user1 privilge5
etc ...
i would like to group privliges based on similarity in percentage to other user .
for example in the table : user1 and user2 has only 2 privilges in common (privilge1,privilge3) which are 66% . so if i search for similarities in priviges based on more then 50% it will group it like this :

user1 : privilge1 ,privilge3
user2 : privilge1 ,privilge3
Hope i was able to express myself well Smile

Thanks ,

Zohar
Re: grouping , based on similarity in percentage [message #419912 is a reply to message #419908] Wed, 26 August 2009 02:41 Go to previous messageGo to next message
Michel Cadot
Messages: 63801
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Post a working Test case: create table and insert statements along with the result you want with these data.

Before please read OraFAQ Forum Guide, especially "How to format your post?" section.
Make sure that lines of code do not exceed 80 characters when you format.
Indent the code (See SQL Formatter), use code tags and align the columns in result.
Use the "Preview Message" button to verify.

Regards
Michel
Re: grouping , based on similarity in percentage [message #419915 is a reply to message #419912] Wed, 26 August 2009 03:16 Go to previous messageGo to next message
zoharo
Messages: 8
Registered: August 2009
Junior Member
Thanks for clarify .


create table users_permmisions  
( username  varchar2(20) ,
  privilge  varchar2(60)) ;



Insert into  users_permmisions values ('mike','grant select on table1');
Insert into  users_permmisions values ('mike','grant select on table2'); 
Insert into  users_permmisions values ('mike','grant select on table3');
Insert into  users_permmisions values ('rogger','grant select on table1');
Insert into  users_permmisions values ('rogger','grant select on table3'); 
Insert into  users_permmisions values ('rogger','grant select on table5'); 


i would like to group privliges based on similarity in percentage to other user .

for example in the table :

mike and rogers has only 2 privilges in common which are 66% . so if i search for similarities in priviges based on more then 50% it will group it like this :

the query in words should be somthing like :

return username and privilges if there more then 50% similar to other users .

mike
grant select on table1
grant select on table3

roger
grant select on table1
grant select on table3
Re: grouping , based on similarity in percentage [message #419925 is a reply to message #419915] Wed, 26 August 2009 03:49 Go to previous messageGo to next message
flyboy
Messages: 1831
Registered: November 2006
Senior Member
zoharo wrote on Wed, 26 August 2009 10:16
mike and rogers has only 2 privilges in common which are 66% .

What is the exact rule for this? What will be the percentage, if following rows will be added?
Insert into  users_permmisions values ('rogger','grant select on table7'); 
Insert into  users_permmisions values ('rogger','grant select on table9');

Again 66%? 40%? "Something" between?

Anyway, for getting same rows from two queries, use INTERSECT operator.
SQL> select privilge from users_permmisions where username = 'mike'
  2  intersect
  3  select privilge from users_permmisions where username = 'rogger';

PRIVILGE
------------------------------------------------------------
grant select on table1
grant select on table3

2 rows selected.

SQL> 
Re: grouping , based on similarity in percentage [message #419931 is a reply to message #419925] Wed, 26 August 2009 04:14 Go to previous messageGo to next message
zoharo
Messages: 8
Registered: August 2009
Junior Member
Dear flyboy

flyboy wrote
Quote:
What is the exact rule for this? What will be the percentage, if following rows will be added?
Insert into users_permmisions values ('rogger','grant select on table7');
Insert into users_permmisions values ('rogger','grant select on table9');


the 66% are just an example for the example table .
in real life this table has hundreds of users and thusends of privilges .
so basicly if i want to search for 50% of similarities in the user's privilges . i want for each user on the table to check against other users the that at least 50% of its privilges are the same as other users .

flyboy wrote
Quote:
Anyway, for getting same rows from two queries, use INTERSECT operator.
SQL> select privilge from users_permmisions where username = 'mike'
2 intersect
3 select privilge from users_permmisions where username = 'rogger';

PRIVILGE
------------------------------------------------------------
grant select on table1
grant select on table3

2 rows selected.

SQL>


this will do for specific two users not dynamicly for all users(suppose i have 2000 users to check) .

Thanks
Zohar
Re: grouping , based on similarity in percentage [message #419955 is a reply to message #419931] Wed, 26 August 2009 05:12 Go to previous messageGo to next message
flyboy
Messages: 1831
Registered: November 2006
Senior Member
zoharo wrote on Wed, 26 August 2009 11:14
the 66% are just an example for the example table .
in real life this table has hundreds of users and thusends of privilges .

In your example, both users have the same number of privileges (=3). My question was, what will be the base if the compared two users have different number of privileges.
For example, Mike has 3 privileges in total, Rogger has 5 privileges in total, 2 of them are the same. What will be the percentage in this specific case? Why?
zoharo wrote on Wed, 26 August 2009 11:14
this will do for specific two users not dynamicly for all users(suppose i have 2000 users to check) .

So, use it in the query over all combination of users. What is the problem with incorporating it?
Re: grouping , based on similarity in percentage [message #419961 is a reply to message #419955] Wed, 26 August 2009 05:26 Go to previous messageGo to next message
Michel Cadot
Messages: 63801
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
In addition to flyboy's answer, post a more complex test case involving for instance 10 users (call them 0 to 9) and 26 different privileges (say A to Z to simplify) with the expected result and explaining this result.

Regards
Michel
Re: grouping , based on similarity in percentage [message #419962 is a reply to message #419955] Wed, 26 August 2009 05:26 Go to previous messageGo to next message
zoharo
Messages: 8
Registered: August 2009
Junior Member

flyboy wrote
Quote:
In your example, both users have the same number of privileges (=3). My question was, what will be the base if the compared two users have different number of privileges.
For example, Mike has 3 privileges in total, Rogger has 5 privileges in total, 2 of them are the same. What will be the percentage in this specific case? Why?



ok , i got it .

suppose mike has 3 and rogger has 5 and they both share 2 priviges and i want that they will be similar at at least 50% then the base should be the one with the more privliges (rogger)
and in this case i will not have results (which mean they are too much different )
Re: grouping , based on similarity in percentage [message #419980 is a reply to message #419962] Wed, 26 August 2009 05:53 Go to previous messageGo to next message
Michel Cadot
Messages: 63801
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:
suppose mike has 3 and rogger has 5 and they both share 2 priviges and i want that they will be similar at at least 50% then the base should be the one with the more privliges

And so the necessity to have a more complex example that covers more cases and help you to precise your requirements and us to understand them.

Regards
Michel
Re: grouping , based on similarity in percentage [message #419986 is a reply to message #419980] Wed, 26 August 2009 06:07 Go to previous messageGo to next message
zoharo
Messages: 8
Registered: August 2009
Junior Member
ok .

more complicated example :

username  privilges 

0      A
0      B
0      C
0      D
1      A
1      Z
1      F
1      E
1      G
2      A
2      B
2      C
2      D
3      A
4      U
4      L



now as we can see if i want to find similarity of 100%
the results will be :
0 A,b,C,D
2 A,B,C,D

now as we can see if i want to find similarity of at least 50 %

0 A,b,C,D
2 A,B,C,D

now as we can see if i want to find similarity of at least 20 %

0 A,B,C,D
1 A
2 A,B,C,D
3 A

Re: grouping , based on similarity in percentage [message #420050 is a reply to message #419986] Wed, 26 August 2009 13:49 Go to previous messageGo to next message
zoharo
Messages: 8
Registered: August 2009
Junior Member
any idea ? Smile

is it possible to create it with sql code alone ?
or there is a need to create application to do so .

[Updated on: Wed, 26 August 2009 13:50]

Report message to a moderator

Re: grouping , based on similarity in percentage [message #420136 is a reply to message #420050] Thu, 27 August 2009 04:00 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
The problem is that you have been unable or unwilling to detail how we calculate the %age match between the sets of data.

Rather than join the chorus of people asking for more details before trying to help, I'll take a different approach - here is a query that will, for two usernames, show you the number of privileges owned by the first user and the number of these priviliges that are shared by the second user.

From this information, you should be able to work out how to calculate what you want:
create table test_050 (username number, priv varchar2(10));

insert into test_050 values (0      ,'A');
insert into test_050 values (0      ,'B');
insert into test_050 values (0      ,'C');
insert into test_050 values (0      ,'D');
insert into test_050 values (1      ,'A');
insert into test_050 values (1      ,'Z');
insert into test_050 values (1      ,'F');
insert into test_050 values (1      ,'E');
insert into test_050 values (1      ,'G');
insert into test_050 values (2      ,'A');
insert into test_050 values (2      ,'B');
insert into test_050 values (2      ,'C');
insert into test_050 values (2      ,'D');
insert into test_050 values (3      ,'A');
insert into test_050 values (4      ,'U');
insert into test_050 values (4      ,'L');

commit;

select count(t1.priv), count(t2.priv)
from   test_050 t1
      ,test_050 t2
where  t1.username = 1
and    t2.username(+) = 2
and    t1.priv = t2.priv(+);
Re: grouping , based on similarity in percentage [message #420343 is a reply to message #420050] Fri, 28 August 2009 08:12 Go to previous messageGo to next message
flyboy
Messages: 1831
Registered: November 2006
Senior Member
zoharo wrote on Wed, 26 August 2009 20:49
any idea ? Smile

is it possible to create it with sql code alone ?
or there is a need to create application to do so .

Just out of interest, I tried to incorporate INTERSECT into SQL query, which returns all combinations of two users
SQL> select
  2    t1.username username1, t1.cnt privs1, t2.username username2, t2.cnt privs2
  3  from
  4    (select username, count(*) cnt from users_permmisions group by username) t1,
  5    (select username, count(*) cnt from users_permmisions group by username) t2
  6  where t1.username < t2.username
  7  /

USERNAME1                PRIVS1 USERNAME2                PRIVS2
-------------------- ---------- -------------------- ----------
mike                          3 rogger                        5

1 row selected.

SQL> 
, but with no luck. The only way would be using a loop over this query with INTERSECT inside it.

Another approach would be counting it from all combinations of two users and their privileges using analytics:
SQL> select
  2    username1, username2, privs_both, privs1, privs2,
  3    privs_both / greatest( privs1, privs2 ) perc
  4  from (
  5    select
  6      up1.username username1, up1.privilge privilge1,
  7      up2.username username2, up2.privilge privilge2,
  8      count(distinct up1.privilge) over(partition by up1.username) privs1,
  9      count(distinct up2.privilge) over(partition by up2.username) privs2,
 10      count(case when up1.privilge = up2.privilge then 1 end)
 11        over(partition by up1.username, up2.username) privs_both
 12    from users_permmisions up1, users_permmisions up2
 13    where up1.username < up2.username )
 14  group by username1, username2, privs_both, privs1, privs2
 15  /

USERNAME1            USERNAME2            PRIVS_BOTH     PRIVS1     PRIVS2       PERC
-------------------- -------------------- ---------- ---------- ---------- ----------
mike                 rogger                        2          3          5         .4

1 row selected.

SQL> 
I wonder which approach is better for many users and/or privileges.
Re: grouping , based on similarity in percentage [message #420442 is a reply to message #420136] Sat, 29 August 2009 13:10 Go to previous messageGo to next message
scottwmackey
Messages: 505
Registered: March 2005
Senior Member
(Ugh. I hate when I do that. Thinking complicated, miss the simple.)

Do I win?
SQL> SELECT v1.username
  2        ,v2.priv_cnt
  3        ,v1.matched_username
  4        ,v1.match_cnt
  5  FROM (SELECT a.username
  6              ,b.username matched_username
  7              ,COUNT(*) match_cnt
  8        FROM test_050 a
  9        JOIN test_050 b ON b.priv = a.priv
 10                    AND b.username != a.username
 11        GROUP BY a.username, b.username) v1
 12  JOIN (SELECT username
 13              ,COUNT(*) priv_cnt
 14        FROM test_050
 15        GROUP BY username) v2 ON v2.username = v1.username
 16  WHERE match_cnt / priv_cnt > .5
 17  ORDER BY v1.username, v1.matched_username;
 
  USERNAME   PRIV_CNT MATCHED_USERNAME  MATCH_CNT
---------- ---------- ---------------- ----------
         0          4                2          4
         2          4                0          4
         3          1                0          1
         3          1                1          1
         3          1                2          1

How about a third try
SQL> SELECT *
  2  FROM (SELECT a.username
  3              ,a.priv_cnt
  4              ,b.username matched_username
  5              ,COUNT(*) match_cnt
  6        FROM (SELECT t.*
  7                    ,COUNT(*) OVER(PARTITION BY t.username) priv_cnt
  8              FROM test_050 t) a
  9        JOIN test_050 b ON b.priv = a.priv
 10                       AND b.username != a.username
 11        GROUP BY a.username, a.priv_cnt, b.username) v1
 12  WHERE match_cnt / priv_cnt > .5
 13  ORDER BY v1.username, v1.matched_username;
 
  USERNAME   PRIV_CNT MATCHED_USERNAME  MATCH_CNT
---------- ---------- ---------------- ----------
         0          4                2          4
         2          4                0          4
         3          1                0          1
         3          1                1          1
         3          1                2          1

[Updated on: Sat, 29 August 2009 13:57]

Report message to a moderator

Re: grouping , based on similarity in percentage [message #420460 is a reply to message #419986] Sun, 30 August 2009 00:47 Go to previous messageGo to next message
Michel Cadot
Messages: 63801
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
A first step:
drop table t purge;
create table t (username integer, priv varchar2(10));
insert into t values (0,'A');
insert into t values (0,'B');
insert into t values (0,'C');
insert into t values (0,'D');
insert into t values (1,'A');
insert into t values (1,'Z');
insert into t values (1,'F');
insert into t values (1,'E');
insert into t values (1,'G');
insert into t values (2,'A');
insert into t values (2,'B');
insert into t values (2,'C');
insert into t values (2,'D');
insert into t values (3,'A');
insert into t values (4,'U');
insert into t values (4,'L');
commit;

SQL> select * from t order by 1, 2;
  USERNAME PRIV
---------- ----------
         0 A
         0 B
         0 C
         0 D
         1 A
         1 E
         1 F
         1 G
         1 Z
         2 A
         2 B
         2 C
         2 D
         3 A
         4 L
         4 U

16 rows selected.

SQL> def threshold=50
SQL> with
  2    counts as (
  3      select t1.username user1, t2.username user2,
  4             greatest(count(distinct t1.priv),count(distinct t2.priv)) max_nb,
  5             count(decode(t1.priv,t2.priv,1)) nb_common
  6      from t t1, t t2
  7      where t2.username > t1.username
  8      group by t1.username, t2.username
  9    )
 10  select user1, user2, nb_common, max_nb, 
 11         round(100*nb_common/max_nb) pct
 12  from counts
 13  where 100*nb_common/max_nb > &threshold
 14  order by 1, 2
 15  /
     USER1      USER2  NB_COMMON     MAX_NB        PCT
---------- ---------- ---------- ---------- ----------
         0          2          4          4        100

1 row selected.

SQL> def threshold=20
SQL> /
     USER1      USER2  NB_COMMON     MAX_NB        PCT
---------- ---------- ---------- ---------- ----------
         0          2          4          4        100
         0          3          1          4         25
         2          3          1          4         25

3 rows selected.

Regards
Michel

[Updated on: Sun, 30 August 2009 00:48]

Report message to a moderator

Re: grouping , based on similarity in percentage [message #420474 is a reply to message #420460] Sun, 30 August 2009 08:26 Go to previous messageGo to next message
Michel Cadot
Messages: 63801
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Second step:
SQL> def threshold=50
SQL> col common_privs format a20 heading "COMMON PRIVS AT &threshold.%"
SQL> with
  2    counts as (
  3      select t1.username user1, t2.username user2,
  4             greatest(count(distinct t1.priv),count(distinct t2.priv)) max_nb,
  5             count(decode(t1.priv,t2.priv,1)) nb_common,
  6             wm_concat(decode(t1.priv,t2.priv,t1.priv)) common_privs
  7      from t t1, t t2
  8      where t2.username > t1.username
  9      group by t1.username, t2.username
 10    )
 11  select user1, user2, common_privs
 12  from counts
 13  where 100*nb_common/max_nb > &threshold
 14  order by 1, 2
 15  /
     USER1      USER2 COMMON PRIVS AT 50%
---------- ---------- --------------------
         0          2 C,A,D,B

1 row selected.

SQL> def threshold=20
SQL> col common_privs format a20 heading "COMMON PRIVS AT &threshold.%"
SQL> /
     USER1      USER2 COMMON PRIVS AT 20%
---------- ---------- --------------------
         0          2 C,A,D,B
         0          3 A
         2          3 A

3 rows selected.

Regards
Michel

[Updated on: Sun, 30 August 2009 08:30]

Report message to a moderator

Re: grouping , based on similarity in percentage [message #420486 is a reply to message #420474] Sun, 30 August 2009 14:29 Go to previous messageGo to next message
zoharo
Messages: 8
Registered: August 2009
Junior Member
briliant .

Thank u all gurus for your efforts .
it seems that Michel's solution was the closest to my needs .
im trying it and will provide my full solution after it will be finished .
maybe it could help someone else .

and again thanks for everyone .

Zohar
Re: grouping , based on similarity in percentage [message #420694 is a reply to message #420474] Tue, 01 September 2009 09:57 Go to previous messageGo to next message
zoharo
Messages: 8
Registered: August 2009
Junior Member
Hi

SQL> with
  2    counts as (
  3      select t1.username user1, t2.username user2,
  4             greatest(count(distinct t1.priv),count(distinct t2.priv)) max_nb,
  5             count(decode(t1.priv,t2.priv,1)) nb_common,
  6             wm_concat(decode(t1.priv,t2.priv,t1.priv)) common_privs
  7      from t t1, t t2
  8      where t2.username > t1.username
  9      group by t1.username, t2.username
 10    )
 11  select user1, user2, common_privs
 12  from counts
 13  where 100*nb_common/max_nb > &threshold
 14  order by 1, 2
 15  /



It seems that im meeting an oracle limitation .
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error: character string buffer too small
ORA-06512: at "WMSYS.WM_CONCAT_IMPL", line 30

the results from the first find nb_common=1739 (1739 shared privliges) .

Thanks
Re: grouping , based on similarity in percentage [message #420698 is a reply to message #420694] Tue, 01 September 2009 11:13 Go to previous message
Michel Cadot
Messages: 63801
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
I bet WM_CONCAT internal buffer is limited to 32K (if not 4K).
You have to search for STRAGG (on AskTom) and recreate it with CLOB result ...or make your values shorter.

By the way, who will read line longer than 4K?

Regards
Michel

[Updated on: Tue, 01 September 2009 11:15]

Report message to a moderator

Previous Topic: PERFORMANCE TUNING
Next Topic: Do we need to close Ref Cursors
Goto Forum:
  


Current Time: Sat Sep 24 23:09:09 CDT 2016

Total time taken to generate the page: 0.04908 seconds