Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> non-deterministic results by adding an output SELECT column...

non-deterministic results by adding an output SELECT column...

From: Giovanni Azua <bravegag_at_hotmail.com>
Date: Thu, 11 Sep 2003 13:18:33 +0200
Message-ID: <bjplpq$lo60o$1@ID-114658.news.uni-berlin.de>


Hello all,

Having the query bellow which involves an analytic function and Oracle TEXT, I am getting very strange non-deterministic results by just adding or removing a final output column, this neutral change to the query implies resulting in completely different values in another column (double type) as dynamically computed score. The goal I wanted to achieve is (without using kind of temporary table) select the string value within a group for which the score is maximum, or select the best matching name for the specific entity (that is the row having highest score within the group):

i.e. something that if were possible in SQL would look like DECODE(score, max(score), best_match, NULL)

Original query with the properly computed GOOD overall score:

select *

from (select /*+ INDEX (t_ofac_demo T_OFAC_DEMO_I_ENT_ID) */ ent_id
, max(family_name) as family_name
, max(given_name) as given_name
, max(score) as score
  from (select t1.score

, t1.family_name
, t1.given_name
, t1.ent_id
, first_value(best_match_aux) over (partition by ent_id order by score
desc) as best_match from (select t_ofac_demo.* ,

round((GREATEST(10*LEAST((GREATEST(LEAST(round(score(1)/(30*(1+log(10, 6131/9)))*100), 100)

           , CASE WHEN
1=ofac_package.token_occurs(TRIM(UPPER(family_name)),'ALBERTO') THEN 100 ELSE 0 END))/GREATEST(1,1,family_name_wc), 100)))/(DECODE(trim(family_name || alt_name_last), NULL, 0, 10) + DECODE(trim(given_name || alt_name_first), NULL, 0, 3) + DECODE(TRIM(address), NULL, 0, 1) + DECODE(TRIM(city), NULL, 0, 2) + DECODE(TRIM(country), NULL, 0, 1))) as score, decode(sign((10*LEAST((GREATEST(LEAST(round(score(1)/(30*(1+log(10, 6131/9)))*100), 100), CASE WHEN
1=ofac_package.token_occurs(TRIM(UPPER(family_name)),'ALBERTO') THEN 100 ELSE 0 END))/GREATEST(1,1,family_name_wc), 100))), 1, given_name || ' ' || family_name
, -1, alt_name_first || ' ' || alt_name_last, given_name || ' ' || family_name) as best_match_aux from t_ofac_demo where ((contains(family_name, 'fuzzy({Alberto},20,5000,weight) or {Alberto}', 1) > 0)) ) t1 ) t1
group by ent_id
order by score desc)

produces the following results:

ENT_ID FAMILY_NAME GIVEN_NAME SCORE
214 ARBEEL 32
4281 ARBOLEDA Julio 19
7179 ALBERDI URANGA Itziar 16
4280 ARBOLEDA ARROYAVE Pedro ... 9
557 COMPAGNIE ALGERO-LIBYENNE DE ... 4 Elapsed: 00:00:00.07

Execution Plan


   0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=9 Bytes=2070)    1 0 VIEW (Cost=8 Card=9 Bytes=2070)

   2    1     SORT (ORDER BY) (Cost=8 Card=9 Bytes=1980)
   3    2       SORT (GROUP BY) (Cost=8 Card=9 Bytes=1980)
   4    3         VIEW (Cost=4 Card=9 Bytes=1980)
   5    4           TABLE ACCESS (BY INDEX ROWID) OF 'T_OFAC_DEMO' (Cost=4
Card=9 Bytes=855)
   6    5             DOMAIN INDEX OF 'OFAC_FAMILY_NAME_INDX' (Cost=1)

Statistics


        283  recursive calls
          0  db block gets
        554  consistent gets
          0  physical reads
          0  redo size
        554  bytes sent via SQL*Net to client
        274  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          5  rows processed

*************************************************************

And just by adding the column max(best_match) as best_match into the top SELECT statement, completely changes the output scores:


CORRUPTED, BUGGY or WRONG overall score:

select *

from (select /*+ INDEX (t_ofac_demo T_OFAC_DEMO_I_ENT_ID) */ ent_id
, max(family_name) as family_name
, max(given_name) as given_name
, max(score) as score
, max(best_match) as best_match
  from (select t1.score

, t1.family_name
, t1.given_name
, t1.ent_id
, first_value(best_match_aux) over (partition by ent_id order by score
desc) as best_match from (select t_ofac_demo.* ,

round((GREATEST(10*LEAST((GREATEST(LEAST(round(score(1)/(30*(1+log(10, 6131/9)))*100), 100)

           , CASE WHEN
1=ofac_package.token_occurs(TRIM(UPPER(family_name)),'ALBERTO') THEN 100 ELSE 0 END))/GREATEST(1,1,family_name_wc), 100)))/(DECODE(trim(family_name || alt_name_last), NULL, 0, 10) + DECODE(trim(given_name || alt_name_first), NULL, 0, 3) + DECODE(TRIM(address), NULL, 0, 1) + DECODE(TRIM(city), NULL, 0, 2) + DECODE(TRIM(country), NULL, 0, 1))) as score, decode(sign((10*LEAST((GREATEST(LEAST(round(score(1)/(30*(1+log(10, 6131/9)))*100), 100), CASE WHEN
1=ofac_package.token_occurs(TRIM(UPPER(family_name)),'ALBERTO') THEN 100 ELSE 0 END))/GREATEST(1,1,family_name_wc), 100))), 1, given_name || ' ' || family_name
, -1, alt_name_first || ' ' || alt_name_last, given_name || ' ' || family_name) as best_match_aux from t_ofac_demo where ((contains(family_name, 'fuzzy({Alberto},20,5000,weight) or {Alberto}', 1) > 0)) ) t1 ) t1
group by ent_id
order by score desc)

ENT_ID FAMILY_NAME GIVEN_NAME SCORE BEST_MATCH

214 ARBEEL     4      ...
4281 ARBOLEDA  Julio  0      ...
7179 ALBERDI URANGA  Itziar  0      ...
4280 ARBOLEDA ARROYAVE Pedro ... 0      ...
557 COMPAGNIE ALGERO-LIBYENNE DE  ... 0      ...

Elapsed: 00:00:01.01

Execution Plan


   0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=9 Bytes=6588)    1 0 VIEW (Cost=8 Card=9 Bytes=6588)

   2    1     SORT (ORDER BY) (Cost=8 Card=9 Bytes=6498)
   3    2       SORT (GROUP BY) (Cost=8 Card=9 Bytes=6498)
   4    3         VIEW (Cost=6 Card=9 Bytes=6498)
   5    4           WINDOW (SORT) (Cost=6 Card=9 Bytes=855)
   6    5             TABLE ACCESS (BY INDEX ROWID) OF 'T_OFAC_DEMO' (Cost=4
Card=9 Bytes=855)
   7    6               DOMAIN INDEX OF 'OFAC_FAMILY_NAME_INDX' (Cost=1)

Statistics


        278  recursive calls
          0  db block gets
        386  consistent gets
          0  physical reads
          0  redo size
        728  bytes sent via SQL*Net to client
        275  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
          5  rows processed

I know the info is quite limited without having the complete test-case but perhaps by having this specific symptom (unexplainable for me) you could figure out what the problem is.

I have also tried without success:

select *

from (select /*+ INDEX (t_ofac_demo T_OFAC_DEMO_I_ENT_ID) */ ent_id
, max(family_name) as family_name
, max(given_name) as given_name
, max(score) as score
, max(CASE WHEN rnum=1 THEN best_match_aux ELSE NULL END) as best_match
  from (select t1.score

, t1.family_name
, t1.given_name
, t1.ent_id
, t1.best_match_aux
, row_number() over (partition by ent_id order by score desc) as rnum
from (select t_ofac_demo.* ,

round((GREATEST(10*LEAST((GREATEST(LEAST(round(score(1)/(30*(1+log(10, 6131/9)))*100), 100)

           , CASE WHEN
1=ofac_package.token_occurs(TRIM(UPPER(family_name)),'ALBERTO') THEN 100 ELSE 0 END))/GREATEST(1,1,family_name_wc), 100)))/(DECODE(trim(family_name || alt_name_last), NULL, 0, 10) + DECODE(trim(given_name || alt_name_first), NULL, 0, 3) + DECODE(TRIM(address), NULL, 0, 1) + DECODE(TRIM(city), NULL, 0, 2) + DECODE(TRIM(country), NULL, 0, 1))) as score, decode(sign((10*LEAST((GREATEST(LEAST(round(score(1)/(30*(1+log(10, 6131/9)))*100), 100), CASE WHEN
1=ofac_package.token_occurs(TRIM(UPPER(family_name)),'ALBERTO') THEN 100 ELSE 0 END))/GREATEST(1,1,family_name_wc), 100))), 1, given_name || ' ' || family_name
, -1, alt_name_first || ' ' || alt_name_last, given_name || ' ' || family_name) as best_match_aux from t_ofac_demo where ((contains(family_name, 'fuzzy({Alberto},20,5000,weight) or {Alberto}', 1) > 0)) ) t1 ) t1
group by ent_id
order by score desc)

This last also change the score values to those strange values completely unexplainable to me. It is like if the adding of the column forced an implicit
"where" clause and filtered some records.

Thanks in advance,
Best Regard,
Giovanni Received on Thu Sep 11 2003 - 06:18:33 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US