Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> non-deterministic results by adding an output SELECT column...
Hello all,
Having the query bellow which involves an analytic function and Oracle TEXT, I am getting very strange non-deterministic results by just adding or removing a final output column, this neutral change to the query implies resulting in completely different values in another column (double type) as dynamically computed score. The goal I wanted to achieve is (without using kind of temporary table) select the string value within a group for which the score is maximum, or select the best matching name for the specific entity (that is the row having highest score within the group):
i.e. something that if were possible in SQL would look like DECODE(score, max(score), best_match, NULL)
Original query with the properly computed GOOD overall score:
select *
from (select /*+ INDEX (t_ofac_demo T_OFAC_DEMO_I_ENT_ID) */ ent_id , max(family_name) as family_name , max(given_name) as given_name , max(score) as score from (select t1.score
, t1.family_name
, t1.given_name
, t1.ent_id
, first_value(best_match_aux) over (partition by ent_id order by score
desc) as best_match from (select t_ofac_demo.* ,
, CASE WHEN
1=ofac_package.token_occurs(TRIM(UPPER(family_name)),'ALBERTO') THEN 100
ELSE 0 END))/GREATEST(1,1,family_name_wc), 100)))/(DECODE(trim(family_name
|| alt_name_last), NULL, 0, 10) + DECODE(trim(given_name || alt_name_first),
NULL, 0, 3) + DECODE(TRIM(address), NULL, 0, 1) + DECODE(TRIM(city), NULL,
0, 2) + DECODE(TRIM(country), NULL, 0, 1))) as score,
decode(sign((10*LEAST((GREATEST(LEAST(round(score(1)/(30*(1+log(10,
6131/9)))*100), 100), CASE WHEN
1=ofac_package.token_occurs(TRIM(UPPER(family_name)),'ALBERTO') THEN 100
ELSE 0 END))/GREATEST(1,1,family_name_wc), 100))), 1, given_name || ' ' ||
family_name
, -1, alt_name_first || ' ' || alt_name_last, given_name || ' ' ||
family_name) as best_match_aux from t_ofac_demo where
((contains(family_name, 'fuzzy({Alberto},20,5000,weight) or {Alberto}', 1) >
0)) ) t1 ) t1
group by ent_id
order by score desc)
produces the following results:
ENT_ID FAMILY_NAME GIVEN_NAME SCORE
214 ARBEEL 32
4281 ARBOLEDA Julio 19
7179 ALBERDI URANGA Itziar 16
4280 ARBOLEDA ARROYAVE Pedro ... 9
557 COMPAGNIE ALGERO-LIBYENNE DE ... 4
Elapsed: 00:00:00.07
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=9 Bytes=2070) 1 0 VIEW (Cost=8 Card=9 Bytes=2070)
2 1 SORT (ORDER BY) (Cost=8 Card=9 Bytes=1980) 3 2 SORT (GROUP BY) (Cost=8 Card=9 Bytes=1980) 4 3 VIEW (Cost=4 Card=9 Bytes=1980) 5 4 TABLE ACCESS (BY INDEX ROWID) OF 'T_OFAC_DEMO' (Cost=4 Card=9 Bytes=855) 6 5 DOMAIN INDEX OF 'OFAC_FAMILY_NAME_INDX' (Cost=1)
Statistics
283 recursive calls 0 db block gets 554 consistent gets 0 physical reads 0 redo size 554 bytes sent via SQL*Net to client 274 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 5 rows processed *************************************************************
And just by adding the column max(best_match) as best_match into the top SELECT statement, completely changes the output scores:
CORRUPTED, BUGGY or WRONG overall score:
select *
from (select /*+ INDEX (t_ofac_demo T_OFAC_DEMO_I_ENT_ID) */ ent_id , max(family_name) as family_name , max(given_name) as given_name , max(score) as score , max(best_match) as best_match from (select t1.score
, t1.family_name
, t1.given_name
, t1.ent_id
, first_value(best_match_aux) over (partition by ent_id order by score
desc) as best_match from (select t_ofac_demo.* ,
, CASE WHEN
1=ofac_package.token_occurs(TRIM(UPPER(family_name)),'ALBERTO') THEN 100
ELSE 0 END))/GREATEST(1,1,family_name_wc), 100)))/(DECODE(trim(family_name
|| alt_name_last), NULL, 0, 10) + DECODE(trim(given_name || alt_name_first),
NULL, 0, 3) + DECODE(TRIM(address), NULL, 0, 1) + DECODE(TRIM(city), NULL,
0, 2) + DECODE(TRIM(country), NULL, 0, 1))) as score,
decode(sign((10*LEAST((GREATEST(LEAST(round(score(1)/(30*(1+log(10,
6131/9)))*100), 100), CASE WHEN
1=ofac_package.token_occurs(TRIM(UPPER(family_name)),'ALBERTO') THEN 100
ELSE 0 END))/GREATEST(1,1,family_name_wc), 100))), 1, given_name || ' ' ||
family_name
, -1, alt_name_first || ' ' || alt_name_last, given_name || ' ' ||
family_name) as best_match_aux from t_ofac_demo where
((contains(family_name, 'fuzzy({Alberto},20,5000,weight) or {Alberto}', 1) >
0)) ) t1 ) t1
group by ent_id
order by score desc)
ENT_ID FAMILY_NAME GIVEN_NAME SCORE BEST_MATCH
214 ARBEEL 4 ... 4281 ARBOLEDA Julio 0 ... 7179 ALBERDI URANGA Itziar 0 ... 4280 ARBOLEDA ARROYAVE Pedro ... 0 ... 557 COMPAGNIE ALGERO-LIBYENNE DE ... 0 ...
Elapsed: 00:00:01.01
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=9 Bytes=6588) 1 0 VIEW (Cost=8 Card=9 Bytes=6588)
2 1 SORT (ORDER BY) (Cost=8 Card=9 Bytes=6498) 3 2 SORT (GROUP BY) (Cost=8 Card=9 Bytes=6498) 4 3 VIEW (Cost=6 Card=9 Bytes=6498) 5 4 WINDOW (SORT) (Cost=6 Card=9 Bytes=855) 6 5 TABLE ACCESS (BY INDEX ROWID) OF 'T_OFAC_DEMO' (Cost=4 Card=9 Bytes=855) 7 6 DOMAIN INDEX OF 'OFAC_FAMILY_NAME_INDX' (Cost=1)
Statistics
278 recursive calls 0 db block gets 386 consistent gets 0 physical reads 0 redo size 728 bytes sent via SQL*Net to client 275 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 3 sorts (memory) 0 sorts (disk) 5 rows processed
I know the info is quite limited without having the complete test-case but perhaps by having this specific symptom (unexplainable for me) you could figure out what the problem is.
I have also tried without success:
select *
from (select /*+ INDEX (t_ofac_demo T_OFAC_DEMO_I_ENT_ID) */ ent_id , max(family_name) as family_name , max(given_name) as given_name , max(score) as score , max(CASE WHEN rnum=1 THEN best_match_aux ELSE NULL END) as best_match from (select t1.score
, t1.family_name
, t1.given_name
, t1.ent_id
, t1.best_match_aux
, row_number() over (partition by ent_id order by score desc) as rnum
from (select t_ofac_demo.* ,
, CASE WHEN
1=ofac_package.token_occurs(TRIM(UPPER(family_name)),'ALBERTO') THEN 100
ELSE 0 END))/GREATEST(1,1,family_name_wc), 100)))/(DECODE(trim(family_name
|| alt_name_last), NULL, 0, 10) + DECODE(trim(given_name || alt_name_first),
NULL, 0, 3) + DECODE(TRIM(address), NULL, 0, 1) + DECODE(TRIM(city), NULL,
0, 2) + DECODE(TRIM(country), NULL, 0, 1))) as score,
decode(sign((10*LEAST((GREATEST(LEAST(round(score(1)/(30*(1+log(10,
6131/9)))*100), 100), CASE WHEN
1=ofac_package.token_occurs(TRIM(UPPER(family_name)),'ALBERTO') THEN 100
ELSE 0 END))/GREATEST(1,1,family_name_wc), 100))), 1, given_name || ' ' ||
family_name
, -1, alt_name_first || ' ' || alt_name_last, given_name || ' ' ||
family_name) as best_match_aux from t_ofac_demo where
((contains(family_name, 'fuzzy({Alberto},20,5000,weight) or {Alberto}', 1) >
0)) ) t1 ) t1
group by ent_id
order by score desc)
This last also change the score values to those strange values completely
unexplainable to me. It is like if the adding of the column forced an
implicit
"where" clause and filtered some records.
Thanks in advance,
Best Regard,
Giovanni
Received on Thu Sep 11 2003 - 06:18:33 CDT