Home » SQL & PL/SQL » SQL & PL/SQL » grouping , based on similarity in percentage (merged 2) 8i-9
| grouping , based on similarity in percentage (merged 2) 8i-9 [message #419908] |
Wed, 26 August 2009 02:38  |
zoharo
Messages: 8 Registered: August 2009
|
Junior Member |
|
|
hi
Im having the following table :
username | privliges
-----------------------
user1 privilge1
user1 privilge2
user1 privilge3
user1 privilge1
user1 privilge3
user1 privilge5
etc ...
i would like to group privliges based on similarity in percentage to other user .
for example in the table : user1 and user2 has only 2 privilges in common (privilge1,privilge3) which are 66% . so if i search for similarities in priviges based on more then 50% it will group it like this :
user1 : privilge1 ,privilge3
user2 : privilge1 ,privilge3
Hope i was able to express myself well 
Thanks ,
Zohar
|
|
|
|
|
|
| Re: grouping , based on similarity in percentage [message #419915 is a reply to message #419912] |
Wed, 26 August 2009 03:16   |
zoharo
Messages: 8 Registered: August 2009
|
Junior Member |
|
|
Thanks for clarify .
create table users_permmisions
( username varchar2(20) ,
privilge varchar2(60)) ;
Insert into users_permmisions values ('mike','grant select on table1');
Insert into users_permmisions values ('mike','grant select on table2');
Insert into users_permmisions values ('mike','grant select on table3');
Insert into users_permmisions values ('rogger','grant select on table1');
Insert into users_permmisions values ('rogger','grant select on table3');
Insert into users_permmisions values ('rogger','grant select on table5');
i would like to group privliges based on similarity in percentage to other user .
for example in the table :
mike and rogers has only 2 privilges in common which are 66% . so if i search for similarities in priviges based on more then 50% it will group it like this :
the query in words should be somthing like :
return username and privilges if there more then 50% similar to other users .
mike
grant select on table1
grant select on table3
roger
grant select on table1
grant select on table3
|
|
|
|
| Re: grouping , based on similarity in percentage [message #419925 is a reply to message #419915] |
Wed, 26 August 2009 03:49   |
flyboy
Messages: 1903 Registered: November 2006
|
Senior Member |
|
|
| zoharo wrote on Wed, 26 August 2009 10:16 | mike and rogers has only 2 privilges in common which are 66% .
|
What is the exact rule for this? What will be the percentage, if following rows will be added? Insert into users_permmisions values ('rogger','grant select on table7');
Insert into users_permmisions values ('rogger','grant select on table9');
Again 66%? 40%? "Something" between?
Anyway, for getting same rows from two queries, use INTERSECT operator. SQL> select privilge from users_permmisions where username = 'mike'
2 intersect
3 select privilge from users_permmisions where username = 'rogger';
PRIVILGE
------------------------------------------------------------
grant select on table1
grant select on table3
2 rows selected.
SQL>
|
|
|
|
| Re: grouping , based on similarity in percentage [message #419931 is a reply to message #419925] |
Wed, 26 August 2009 04:14   |
zoharo
Messages: 8 Registered: August 2009
|
Junior Member |
|
|
Dear flyboy
flyboy wrote
| Quote: | What is the exact rule for this? What will be the percentage, if following rows will be added?
Insert into users_permmisions values ('rogger','grant select on table7');
Insert into users_permmisions values ('rogger','grant select on table9');
|
the 66% are just an example for the example table .
in real life this table has hundreds of users and thusends of privilges .
so basicly if i want to search for 50% of similarities in the user's privilges . i want for each user on the table to check against other users the that at least 50% of its privilges are the same as other users .
flyboy wrote
| Quote: | Anyway, for getting same rows from two queries, use INTERSECT operator.
SQL> select privilge from users_permmisions where username = 'mike'
2 intersect
3 select privilge from users_permmisions where username = 'rogger';
PRIVILGE
------------------------------------------------------------
grant select on table1
grant select on table3
2 rows selected.
SQL>
|
this will do for specific two users not dynamicly for all users(suppose i have 2000 users to check) .
Thanks
Zohar
|
|
|
|
| Re: grouping , based on similarity in percentage [message #419955 is a reply to message #419931] |
Wed, 26 August 2009 05:12   |
flyboy
Messages: 1903 Registered: November 2006
|
Senior Member |
|
|
| zoharo wrote on Wed, 26 August 2009 11:14 | the 66% are just an example for the example table .
in real life this table has hundreds of users and thusends of privilges .
|
In your example, both users have the same number of privileges (=3). My question was, what will be the base if the compared two users have different number of privileges.
For example, Mike has 3 privileges in total, Rogger has 5 privileges in total, 2 of them are the same. What will be the percentage in this specific case? Why?
| zoharo wrote on Wed, 26 August 2009 11:14 | this will do for specific two users not dynamicly for all users(suppose i have 2000 users to check) .
|
So, use it in the query over all combination of users. What is the problem with incorporating it?
|
|
|
|
|
|
| Re: grouping , based on similarity in percentage [message #419962 is a reply to message #419955] |
Wed, 26 August 2009 05:26   |
zoharo
Messages: 8 Registered: August 2009
|
Junior Member |
|
|
flyboy wrote
| Quote: | In your example, both users have the same number of privileges (=3). My question was, what will be the base if the compared two users have different number of privileges.
For example, Mike has 3 privileges in total, Rogger has 5 privileges in total, 2 of them are the same. What will be the percentage in this specific case? Why?
|
ok , i got it .
suppose mike has 3 and rogger has 5 and they both share 2 priviges and i want that they will be similar at at least 50% then the base should be the one with the more privliges (rogger)
and in this case i will not have results (which mean they are too much different )
|
|
|
|
|
|
| Re: grouping , based on similarity in percentage [message #419986 is a reply to message #419980] |
Wed, 26 August 2009 06:07   |
zoharo
Messages: 8 Registered: August 2009
|
Junior Member |
|
|
ok .
more complicated example :
username privilges
0 A
0 B
0 C
0 D
1 A
1 Z
1 F
1 E
1 G
2 A
2 B
2 C
2 D
3 A
4 U
4 L
now as we can see if i want to find similarity of 100%
the results will be :
0 A,b,C,D
2 A,B,C,D
now as we can see if i want to find similarity of at least 50 %
0 A,b,C,D
2 A,B,C,D
now as we can see if i want to find similarity of at least 20 %
0 A,B,C,D
1 A
2 A,B,C,D
3 A
|
|
|
|
|
|
| Re: grouping , based on similarity in percentage [message #420136 is a reply to message #420050] |
Thu, 27 August 2009 04:00   |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
The problem is that you have been unable or unwilling to detail how we calculate the %age match between the sets of data.
Rather than join the chorus of people asking for more details before trying to help, I'll take a different approach - here is a query that will, for two usernames, show you the number of privileges owned by the first user and the number of these priviliges that are shared by the second user.
From this information, you should be able to work out how to calculate what you want:create table test_050 (username number, priv varchar2(10));
insert into test_050 values (0 ,'A');
insert into test_050 values (0 ,'B');
insert into test_050 values (0 ,'C');
insert into test_050 values (0 ,'D');
insert into test_050 values (1 ,'A');
insert into test_050 values (1 ,'Z');
insert into test_050 values (1 ,'F');
insert into test_050 values (1 ,'E');
insert into test_050 values (1 ,'G');
insert into test_050 values (2 ,'A');
insert into test_050 values (2 ,'B');
insert into test_050 values (2 ,'C');
insert into test_050 values (2 ,'D');
insert into test_050 values (3 ,'A');
insert into test_050 values (4 ,'U');
insert into test_050 values (4 ,'L');
commit;
select count(t1.priv), count(t2.priv)
from test_050 t1
,test_050 t2
where t1.username = 1
and t2.username(+) = 2
and t1.priv = t2.priv(+);
|
|
|
|
| Re: grouping , based on similarity in percentage [message #420343 is a reply to message #420050] |
Fri, 28 August 2009 08:12   |
flyboy
Messages: 1903 Registered: November 2006
|
Senior Member |
|
|
| zoharo wrote on Wed, 26 August 2009 20:49 | any idea ?
is it possible to create it with sql code alone ?
or there is a need to create application to do so .
|
Just out of interest, I tried to incorporate INTERSECT into SQL query, which returns all combinations of two users SQL> select
2 t1.username username1, t1.cnt privs1, t2.username username2, t2.cnt privs2
3 from
4 (select username, count(*) cnt from users_permmisions group by username) t1,
5 (select username, count(*) cnt from users_permmisions group by username) t2
6 where t1.username < t2.username
7 /
USERNAME1 PRIVS1 USERNAME2 PRIVS2
-------------------- ---------- -------------------- ----------
mike 3 rogger 5
1 row selected.
SQL> , but with no luck. The only way would be using a loop over this query with INTERSECT inside it.
Another approach would be counting it from all combinations of two users and their privileges using analytics:
SQL> select
2 username1, username2, privs_both, privs1, privs2,
3 privs_both / greatest( privs1, privs2 ) perc
4 from (
5 select
6 up1.username username1, up1.privilge privilge1,
7 up2.username username2, up2.privilge privilge2,
8 count(distinct up1.privilge) over(partition by up1.username) privs1,
9 count(distinct up2.privilge) over(partition by up2.username) privs2,
10 count(case when up1.privilge = up2.privilge then 1 end)
11 over(partition by up1.username, up2.username) privs_both
12 from users_permmisions up1, users_permmisions up2
13 where up1.username < up2.username )
14 group by username1, username2, privs_both, privs1, privs2
15 /
USERNAME1 USERNAME2 PRIVS_BOTH PRIVS1 PRIVS2 PERC
-------------------- -------------------- ---------- ---------- ---------- ----------
mike rogger 2 3 5 .4
1 row selected.
SQL> I wonder which approach is better for many users and/or privileges.
|
|
|
|
| Re: grouping , based on similarity in percentage [message #420442 is a reply to message #420136] |
Sat, 29 August 2009 13:10   |
scottwmackey
Messages: 515 Registered: March 2005
|
Senior Member |
|
|
(Ugh. I hate when I do that. Thinking complicated, miss the simple.)
Do I win?
SQL> SELECT v1.username
2 ,v2.priv_cnt
3 ,v1.matched_username
4 ,v1.match_cnt
5 FROM (SELECT a.username
6 ,b.username matched_username
7 ,COUNT(*) match_cnt
8 FROM test_050 a
9 JOIN test_050 b ON b.priv = a.priv
10 AND b.username != a.username
11 GROUP BY a.username, b.username) v1
12 JOIN (SELECT username
13 ,COUNT(*) priv_cnt
14 FROM test_050
15 GROUP BY username) v2 ON v2.username = v1.username
16 WHERE match_cnt / priv_cnt > .5
17 ORDER BY v1.username, v1.matched_username;
USERNAME PRIV_CNT MATCHED_USERNAME MATCH_CNT
---------- ---------- ---------------- ----------
0 4 2 4
2 4 0 4
3 1 0 1
3 1 1 1
3 1 2 1
How about a third try
SQL> SELECT *
2 FROM (SELECT a.username
3 ,a.priv_cnt
4 ,b.username matched_username
5 ,COUNT(*) match_cnt
6 FROM (SELECT t.*
7 ,COUNT(*) OVER(PARTITION BY t.username) priv_cnt
8 FROM test_050 t) a
9 JOIN test_050 b ON b.priv = a.priv
10 AND b.username != a.username
11 GROUP BY a.username, a.priv_cnt, b.username) v1
12 WHERE match_cnt / priv_cnt > .5
13 ORDER BY v1.username, v1.matched_username;
USERNAME PRIV_CNT MATCHED_USERNAME MATCH_CNT
---------- ---------- ---------------- ----------
0 4 2 4
2 4 0 4
3 1 0 1
3 1 1 1
3 1 2 1
[Updated on: Sat, 29 August 2009 13:57] Report message to a moderator
|
|
|
|
| Re: grouping , based on similarity in percentage [message #420460 is a reply to message #419986] |
Sun, 30 August 2009 00:47   |
 |
Michel Cadot
Messages: 68770 Registered: March 2007 Location: Saint-Maur, France, https...
|
Senior Member Account Moderator |
|
|
A first step:
drop table t purge;
create table t (username integer, priv varchar2(10));
insert into t values (0,'A');
insert into t values (0,'B');
insert into t values (0,'C');
insert into t values (0,'D');
insert into t values (1,'A');
insert into t values (1,'Z');
insert into t values (1,'F');
insert into t values (1,'E');
insert into t values (1,'G');
insert into t values (2,'A');
insert into t values (2,'B');
insert into t values (2,'C');
insert into t values (2,'D');
insert into t values (3,'A');
insert into t values (4,'U');
insert into t values (4,'L');
commit;
SQL> select * from t order by 1, 2;
USERNAME PRIV
---------- ----------
0 A
0 B
0 C
0 D
1 A
1 E
1 F
1 G
1 Z
2 A
2 B
2 C
2 D
3 A
4 L
4 U
16 rows selected.
SQL> def threshold=50
SQL> with
2 counts as (
3 select t1.username user1, t2.username user2,
4 greatest(count(distinct t1.priv),count(distinct t2.priv)) max_nb,
5 count(decode(t1.priv,t2.priv,1)) nb_common
6 from t t1, t t2
7 where t2.username > t1.username
8 group by t1.username, t2.username
9 )
10 select user1, user2, nb_common, max_nb,
11 round(100*nb_common/max_nb) pct
12 from counts
13 where 100*nb_common/max_nb > &threshold
14 order by 1, 2
15 /
USER1 USER2 NB_COMMON MAX_NB PCT
---------- ---------- ---------- ---------- ----------
0 2 4 4 100
1 row selected.
SQL> def threshold=20
SQL> /
USER1 USER2 NB_COMMON MAX_NB PCT
---------- ---------- ---------- ---------- ----------
0 2 4 4 100
0 3 1 4 25
2 3 1 4 25
3 rows selected.
Regards
Michel
[Updated on: Sun, 30 August 2009 00:48] Report message to a moderator
|
|
|
|
| Re: grouping , based on similarity in percentage [message #420474 is a reply to message #420460] |
Sun, 30 August 2009 08:26   |
 |
Michel Cadot
Messages: 68770 Registered: March 2007 Location: Saint-Maur, France, https...
|
Senior Member Account Moderator |
|
|
Second step:
SQL> def threshold=50
SQL> col common_privs format a20 heading "COMMON PRIVS AT &threshold.%"
SQL> with
2 counts as (
3 select t1.username user1, t2.username user2,
4 greatest(count(distinct t1.priv),count(distinct t2.priv)) max_nb,
5 count(decode(t1.priv,t2.priv,1)) nb_common,
6 wm_concat(decode(t1.priv,t2.priv,t1.priv)) common_privs
7 from t t1, t t2
8 where t2.username > t1.username
9 group by t1.username, t2.username
10 )
11 select user1, user2, common_privs
12 from counts
13 where 100*nb_common/max_nb > &threshold
14 order by 1, 2
15 /
USER1 USER2 COMMON PRIVS AT 50%
---------- ---------- --------------------
0 2 C,A,D,B
1 row selected.
SQL> def threshold=20
SQL> col common_privs format a20 heading "COMMON PRIVS AT &threshold.%"
SQL> /
USER1 USER2 COMMON PRIVS AT 20%
---------- ---------- --------------------
0 2 C,A,D,B
0 3 A
2 3 A
3 rows selected.
Regards
Michel
[Updated on: Sun, 30 August 2009 08:30] Report message to a moderator
|
|
|
|
| Re: grouping , based on similarity in percentage [message #420486 is a reply to message #420474] |
Sun, 30 August 2009 14:29   |
zoharo
Messages: 8 Registered: August 2009
|
Junior Member |
|
|
briliant .
Thank u all gurus for your efforts .
it seems that Michel's solution was the closest to my needs .
im trying it and will provide my full solution after it will be finished .
maybe it could help someone else .
and again thanks for everyone .
Zohar
|
|
|
|
| Re: grouping , based on similarity in percentage [message #420694 is a reply to message #420474] |
Tue, 01 September 2009 09:57   |
zoharo
Messages: 8 Registered: August 2009
|
Junior Member |
|
|
Hi
SQL> with
2 counts as (
3 select t1.username user1, t2.username user2,
4 greatest(count(distinct t1.priv),count(distinct t2.priv)) max_nb,
5 count(decode(t1.priv,t2.priv,1)) nb_common,
6 wm_concat(decode(t1.priv,t2.priv,t1.priv)) common_privs
7 from t t1, t t2
8 where t2.username > t1.username
9 group by t1.username, t2.username
10 )
11 select user1, user2, common_privs
12 from counts
13 where 100*nb_common/max_nb > &threshold
14 order by 1, 2
15 /
It seems that im meeting an oracle limitation .
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error: character string buffer too small
ORA-06512: at "WMSYS.WM_CONCAT_IMPL", line 30
the results from the first find nb_common=1739 (1739 shared privliges) .
Thanks
|
|
|
|
|
|
Goto Forum:
Current Time: Tue Nov 11 05:55:53 CST 2025
|