Home » Server Options » Text & interMedia » CTX_DOC.Snippet not recognized special characters like ',""
CTX_DOC.Snippet not recognized special characters like ',"" [message #382431] Thu, 22 January 2009 05:05 Go to next message
miharoscabv
Messages: 5
Registered: January 2009
Junior Member
I discovered an issue on CTX_Doc.Snippet. I have a query like this:

SELECT
CTX_DOC.SNIPPET('FTS_UA_TOPICSEARCH', TO_CHAR(UAS.FILEID), 'view'),
UAS.FILEPATH,
SCORE(1)
FROM UA_TOPICSEARCH UAS
WHERE CONTAINS(UAS.FILEPATH , 'view', 1)>0;

And I want to search in a file which content is:
"but not be able to click the link because the user doesn’t have access to view the data. If the user want to create"
and if I search for word "view" the CTX_DOC.Snippet returns "because the user doesn" because it did not recongnize the character "'" from doesn't.
If I look for a word like "user" it doesn't display anything.
Have anyone encountered this issue? Can you give me a hint about how to handle this?

Thank You!

Re: CTX_DOC.Snippet not recognized special characters like ',"" [message #382486 is a reply to message #382431] Thu, 22 January 2009 10:50 Go to previous messageGo to next message
Barbara Boehmer
Messages: 7973
Registered: November 2002
Location: California, USA
Senior Member
I am unable to reproduce the error. It might be due to a difference in versions or parameters used in index creation of character set. It might help if you provide a copy and paste of a complete run of a small test, as I have done below, that reproduces your problem.


SCOTT@orcl_11g> SELECT banner FROM v$version
  2  /

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE	11.1.0.6.0	Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

SCOTT@orcl_11g> CREATE TABLE ua_topicsearch
  2    (fileid	  NUMBER PRIMARY KEY,
  3  	filepath  VARCHAR2 (30))
  4  /

Table created.

SCOTT@orcl_11g> INSERT INTO ua_topicsearch VALUES
  2    (1, 'c:\oracle11g\test.dat')
  3  /

1 row created.

SCOTT@orcl_11g> CREATE INDEX fts_ua_topicsearch ON ua_topicsearch (filepath)
  2  INDEXTYPE IS CTXSYS.CONTEXT
  3  PARAMETERS ('DATASTORE CTXSYS.FILE_DATASTORE')
  4  /

Index created.

SCOTT@orcl_11g> SELECT CTX_DOC.SNIPPET
  2  	      ('FTS_UA_TOPICSEARCH',
  3  	       TO_CHAR (UAS.FILEID), 'view'),
  4  	    UAS.FILEPATH,
  5  	    SCORE(1)
  6  FROM   UA_TOPICSEARCH UAS
  7  WHERE  CONTAINS (UAS.FILEPATH, 'view', 1) > 0
  8  /

CTX_DOC.SNIPPET('FTS_UA_TOPICSEARCH',TO_CHAR(UAS.FILEID),'VIEW')
--------------------------------------------------------------------------------
FILEPATH                         SCORE(1)
------------------------------ ----------
because the user doesn’t have access to <b>view</b> the data. If the user want t
o create
c:\oracle11g\test.dat                   3


SCOTT@orcl_11g> SELECT CTX_DOC.SNIPPET
  2  	      ('FTS_UA_TOPICSEARCH',
  3  	       TO_CHAR (UAS.FILEID), 'user'),
  4  	    UAS.FILEPATH,
  5  	    SCORE(1)
  6  FROM   UA_TOPICSEARCH UAS
  7  WHERE  CONTAINS (UAS.FILEPATH, 'user', 1) > 0
  8  /

CTX_DOC.SNIPPET('FTS_UA_TOPICSEARCH',TO_CHAR(UAS.FILEID),'USER')
--------------------------------------------------------------------------------
FILEPATH                         SCORE(1)
------------------------------ ----------
click the link because the <b>user</b> doesn’t have access to view the data. If
the <b>user</b> want to create
c:\oracle11g\test.dat                   6


SCOTT@orcl_11g> 

Re: CTX_DOC.Snippet not recognized special characters like ',"" [message #382587 is a reply to message #382486] Fri, 23 January 2009 02:11 Go to previous messageGo to next message
miharoscabv
Messages: 5
Registered: January 2009
Junior Member
Thank you for your quick answer!
Indeed I was not so clear because I did not specify that the file that I used are html and is much bigger that I wrote and the think that make the difference is that the file must be saved using ANSI Encoding.
So,if I have 2 files one saved in a UTF-8 format and another one in as a ANSI format and the same scripts that you wrote:

SQL> CREATE TABLE ua_topicsearch
2 (fileid NUMBER PRIMARY KEY,
3 filepath VARCHAR2 (30))
4 /

Table created

SQL> INSERT INTO ua_topicsearch VALUES (2, 'c:\oracle11g\testUTF.dat');

1 row inserted

SQL> INSERT INTO ua_topicsearch VALUES (3, 'c:\oracle11g\testANSI.dat');

1 row inserted

SQL> CREATE INDEX fts_ua_topicsearch ON ua_topicsearch (filepath)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS ('DATASTORE CTXSYS.FILE_DATASTORE')
4 /

Index created

SQL> SELECT CTX_DOC.SNIPPET
2 ('FTS_UA_TOPICSEARCH',
3 TO_CHAR (UAS.FILEID), 'view'),
4 UAS.FILEPATH,
5 SCORE(1)
6 FROM UA_TOPICSEARCH UAS
7 WHERE CONTAINS (UAS.FILEPATH, 'view', 1) > 0
8 /

CTX_DOC.SNIPPET('FTS_UA_TOPICS FILEPATH SCORE(1)
-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- ----------
because the user doesn’t have access to <b>view</b> the data. If the user want t c:\oracle11g\testUTF.dat 3
because the user doesn c:\oracle11g\testANSI.dat

It can see that for ANSI Encoding the Snippet doesn't display the character.
I must use these type of encoding because I have the same issue with offset characters described in "Problems with CTX_DOC.SNIPPET on HTML documents" topic but I can not use CLOB as column because I must saved in my table also: pdf and mp3 files.

Thank you once again!


Re: CTX_DOC.Snippet not recognized special characters like ',"" [message #382732 is a reply to message #382587] Fri, 23 January 2009 15:28 Go to previous messageGo to next message
Barbara Boehmer
Messages: 7973
Registered: November 2002
Location: California, USA
Senior Member
I have saved the file as ANSI, but am still unable to reproduce the problem in Oracle 11g. What version of Oracle are you using? Also, there may be differences in default preferences, so can you post the result of:

SELECT CTX_REPORT.CREATE_INDEX_SCRIPT ('fts_ua_topicsearch') FROM DUAL;

Re: CTX_DOC.Snippet not recognized special characters like ',"" [message #382880 is a reply to message #382732] Mon, 26 January 2009 01:00 Go to previous messageGo to next message
miharoscabv
Messages: 5
Registered: January 2009
Junior Member
I am using Oracle 11:
SQL> SELECT banner FROM v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
PL/SQL Release 11.1.0.7.0 - Production
CORE 11.1.0.7.0 Production

TNS for 32-bit Windows: Version 11.1.0.7.0 - Production
NLSRTL Version 11.1.0.7.0 - Production

and the preferences for my index are:

SELECT CTX_REPORT.CREATE_INDEX_SCRIPT ('fts_ua_topicsearch') FROM DUAL;

begin
ctx_ddl.create_preference('"FTS_UA_TOPICSEARCH_DST"','FILE_DATASTORE');
end;
/

begin
ctx_ddl.create_preference('"FTS_UA_TOPICSEARCH_FIL"','AUTO_FILTER');
end;
/

begin
ctx_ddl.create_section_group('"FTS_UA_TOPICSEARCH_SGP"','HTML_SECTION_GROUP');
end;
/

begin
ctx_ddl.create_preference('"FTS_UA_TOPICSEARCH_LEX"','BASIC_LEXER');
ctx_ddl.set_attribute('"FTS_UA_TOPICSEARCH_LEX"','MIXED_CASE','YES');
end;
/

begin
ctx_ddl.create_preference('"FTS_UA_TOPICSEARCH_WDL"','BASIC_WORDLIST');
ctx_ddl.set_attribute('"FTS_UA_TOPICSEARCH_WDL"','STEMMER','ENGLISH');
ctx_ddl.set_attribute('"FTS_UA_TOPICSEARCH_WDL"','FUZZY_MATCH','GENERIC');
end;
/

begin
ctx_ddl.create_stoplist('"FTS_UA_TOPICSEARCH_SPL"','BASIC_STOPLIST');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','Mr');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','Mrs');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','Ms');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','a');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','all');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','almost');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','also');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','although');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','an');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','and');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','any');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','are');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','as');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','at');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','be');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','because');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','been');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','both');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','but');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','by');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','can');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','could');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','d');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','did');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','do');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','does');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','either');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','for');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','from');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','had');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','has');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','have');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','having');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','he');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','her');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','here');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','hers');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','him');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','his');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','how');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','however');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','i');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','if');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','in');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','into');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','is');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','it');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','its');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','just');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','ll');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','me');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','might');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','my');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','no');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','non');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','nor');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','not');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','of');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','on');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','one');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','only');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','onto');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','or');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','our');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','ours');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','s');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','shall');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','she');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','should');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','since');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','so');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','some');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','still');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','such');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','t');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','than');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','that');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','the');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','their');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','them');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','then');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','there');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','therefore');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','these');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','they');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','this');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','those');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','though');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','through');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','thus');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','to');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','too');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','until');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','ve');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','very');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','was');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','we');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','were');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','what');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','when');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','where');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','whether');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','which');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','while');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','who');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','whose');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','why');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','will');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','with');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','would');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','yet');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','you');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','your');
ctx_ddl.add_stopword('"FTS_UA_TOPICSEARCH_SPL"','yours');
end;
/

begin
ctx_ddl.create_preference('"FTS_UA_TOPICSEARCH_STO"','BASIC_STORAGE');
ctx_ddl.set_attribute('"FTS_UA_TOPICSEARCH_STO"','I_TABLE_CLAUSE','TABLESPACE watindex');
ctx_ddl.set_attribute('"FTS_UA_TOPICSEARCH_STO"','K_TABLE_CLAUSE','TABLESPACE watindex');
ctx_ddl.set_attribute('"FTS_UA_TOPICSEARCH_STO"','R_TABLE_CLAUSE','TABLESPACE watindex lob(data) store as (cache)');
ctx_ddl.set_attribute('"FTS_UA_TOPICSEARCH_STO"','N_TABLE_CLAUSE','TABLESPACE watindex');
ctx_ddl.set_attribute('"FTS_UA_TOPICSEARCH_STO"','I_INDEX_CLAUSE','TABLESPACE watindex compress 2');
ctx_ddl.set_attribute('"FTS_UA_TOPICSEARCH_STO"','P_TABLE_CLAUSE','TABLESPACE watindex');
end;
/


begin
ctx_output.start_log('FTS_UA_TOPICSEARCH_LOG');
end;
/

create index "EVEREST"."FTS_UA_TOPICSEARCH"
on "EVEREST"."UA_TOPICSEARCH"
("FILEPATH")
indextype is ctxsys.context
parameters('
datastore "FTS_UA_TOPICSEARCH_DST"
filter "FTS_UA_TOPICSEARCH_FIL"
section group "FTS_UA_TOPICSEARCH_SGP"
lexer "FTS_UA_TOPICSEARCH_LEX"
wordlist "FTS_UA_TOPICSEARCH_WDL"
stoplist "FTS_UA_TOPICSEARCH_SPL"
storage "FTS_UA_TOPICSEARCH_STO"
sync (every "SYSDATE + 1/1440" memory 12582912)
')
/

begin
ctx_output.end_log;
end;
/

I don't know which of these could affect the snippet result?
Re: CTX_DOC.Snippet not recognized special characters like ',"" [message #382921 is a reply to message #382880] Mon, 26 January 2009 06:12 Go to previous message
Barbara Boehmer
Messages: 7973
Registered: November 2002
Location: California, USA
Senior Member
Sorry, but I am still unable to reproduce the problem. I suggest that you repost your question on the Oracle Text forum on OTN:

http://forums.oracle.com/forums/forum.jspa?forumID=71

where it will likely get the attention of product manager Roger Ford, who has the resources to test on different character sets and subsets of versions and may have some further insight as to the problem.

update: This discussion was continued on the OTN Text forum in the following thread:

http://forums.oracle.com/forums/thread.jspa?threadID=851392&tstart=0

[Updated on: Wed, 28 January 2009 15:54]

Report message to a moderator

Previous Topic: Problem with Catsearch and special characters in UTF8
Next Topic: CTX_Doc.Snippet and XPS
Goto Forum:
  


Current Time: Mon Sep 01 09:59:45 CDT 2014

Total time taken to generate the page: 0.09058 seconds