Home » Server Options » Text & interMedia » Space in Lexer (Oracle, 10g, Win XP)
Space in Lexer [message #362158] Mon, 01 December 2008 03:57 Go to next message
getritesh
Messages: 235
Registered: July 2006
Location: INDIA
Senior Member

Hello,

Check out the following scenario in which i have created Table, inserted rows in table and created a Domain index on one column

SQL> Create Table POLI(poli_no Varchar2(20));

Table created.

SQL> Insert Into POLI Values('ABC_009');

1 row created.

SQL> Insert Into POLI Values('B00010054207-001');

1 row created.

SQL> Insert Into POLI Values('B00010054207-003');

1 row created.

SQL> Insert Into POLI Values('CA 0005379-02-001');

1 row created.

SQL> Insert Into POLI Values('CA 0005444-03-002');

1 row created.

SQL> Commit;

Commit complete.

SQL> Select *
  2  From poli;

POLI_NO
--------------------
ABC_009
B00010054207-001
B00010054207-003
CA 0005379-02-001
CA 0005444-03-002

SQL> CREATE INDEX I_TXT_POL_NO ON POLI
  2  (POLI_NO)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  PARAMETERS ('SYNC (ON COMMIT)');

Index created.

SQL> Select poli_no
  2  From POLI
  3  Where Contains(POLI_NO,'ABC_009') > 0;

no rows selected

SQL> Select poli_no
  2  From POLI
  3  Where Contains(POLI_NO,'CA 0005444-03-002') > 0;

no rows selected

SQL> exec ctx_ddl.create_preference('my_lexer','BASIC_LEXER');

PL/SQL procedure successfully completed.

SQL> exec ctx_ddl.set_attribute('my_lexer','printjoins','-_~!@#$%^&*(){}[],=?\;|><.');

PL/SQL procedure successfully completed.

SQL> exec CTX_DDL.SET_ATTRIBUTE ('my_lexer', 'WHITESPACE', ' ');

PL/SQL procedure successfully completed.

SQL> exec ctx_ddl.set_attribute('my_lexer','punctuations','.!?');

PL/SQL procedure successfully completed.

SQL> EXEC CTX_DDL.CREATE_STOPLIST ('my_stoplist', 'BASIC_STOPLIST');

PL/SQL procedure successfully completed.

SQL> Drop Index I_TXT_POL_NO;

Index dropped.

SQL> CREATE INDEX I_TXT_POL_NO ON POLI
  2  (POLI_NO)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  PARAMETERS ('LEXER my_lexer STOPLIST my_stoplist SYNC (ON COMMIT)');

Index created.

SQL> Select poli_no
  2  From POLI
  3  Where Contains(POLI_NO,'ABC_009') > 0;

POLI_NO
--------------------
ABC_009

SQL> Select poli_no
  2  From POLI
  3  Where Contains(POLI_NO,'CA 0005444-03-002') > 0
  4  /

no rows selected


In poli_no 'CA 0005444-03-002' space after CA is causing the problem, please give me solution for this.How to add space in my_lexer
Thanks,
Ritesh

[Updated on: Mon, 01 December 2008 05:46]

Report message to a moderator

Re: Space in Lexer [message #362272 is a reply to message #362158] Mon, 01 December 2008 11:54 Go to previous messageGo to next message
Barbara Boehmer
Messages: 7997
Registered: November 2002
Location: California, USA
Senior Member
The problem has nothing to do with the space after CA. The problem is the hyphens that have special meaning in Oracle Text. In Oracle Text, a hyphen means minus. So, it subtracts the score returned by finding the value after the hyphen in the document from the score returned by finding the value before the hyphen, and if a document contains both values such score is likely <= 0. In order to have Oracle treat the hyphen like a regular character, you need to escape it, either by putting a backslash before each hyphen or by enclosing the entire token with curley brackets. Please see the demonstration below.

-- test environment you provided:
SCOTT@orcl_11g> Create Table POLI(poli_no Varchar2(20))
  2  /

Table created.

SCOTT@orcl_11g> BEGIN
  2    Insert Into POLI Values('ABC_009');
  3    Insert Into POLI Values('B00010054207-001');
  4    Insert Into POLI Values('B00010054207-003');
  5    Insert Into POLI Values('CA 0005379-02-001');
  6    Insert Into POLI Values('CA 0005444-03-002');
  7  END;
  8  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> BEGIN
  2    ctx_ddl.create_preference('my_lexer','BASIC_LEXER');
  3    ctx_ddl.set_attribute('my_lexer','printjoins','-_~!@#$%^&*(){}[],=?\;|><.');
  4    CTX_DDL.SET_ATTRIBUTE ('my_lexer', 'WHITESPACE', ' ');
  5    ctx_ddl.set_attribute('my_lexer','punctuations','.!?');
  6    CTX_DDL.CREATE_STOPLIST ('my_stoplist', 'BASIC_STOPLIST');
  7  END;
  8  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> CREATE INDEX I_TXT_POL_NO ON POLI (POLI_NO)
  2  INDEXTYPE IS CTXSYS.CONTEXT
  3  PARAMETERS
  4    ('LEXER	   my_lexer
  5  	 STOPLIST  my_stoplist
  6  	 SYNC	   (ON COMMIT)')
  7  /

Index created.


-- reproduction of problem:
SCOTT@orcl_11g> Select poli_no
  2  From   POLI
  3  Where  Contains(POLI_NO,'CA 0005444-03-002') > 0
  4  /

no rows selected


-- corrected query using backslash to escape individual hyphens:
SCOTT@orcl_11g> Select poli_no
  2  From   POLI
  3  Where  Contains(POLI_NO,'CA 0005444\-03\-002') > 0
  4  /

POLI_NO
--------------------
CA 0005444-03-002


-- corrected query using curley brackets around entire token:
SCOTT@orcl_11g> Select poli_no
  2  From   POLI
  3  Where  Contains(POLI_NO,'CA {0005444-03-002}') > 0
  4  /

POLI_NO
--------------------
CA 0005444-03-002


Re: Space in Lexer [message #362351 is a reply to message #362272] Tue, 02 December 2008 00:30 Go to previous messageGo to next message
getritesh
Messages: 235
Registered: July 2006
Location: INDIA
Senior Member

Thanks a lot

but i am using OUT parameter (prm_poli_no) of a procedure in Contains clause as shown

Where Contains(poli_no,[B]prm_poli_no[/B]||'__') > 0;


Then will i use '\' or '{}' to escape a special character like '_~!@#$%^&*(){}[],=?\;|><.' and space in my query.

SQL> Select POLI_NO
From POLI
Where Contains(poli_no,[B]prm_poli_no[/B]||'__') > 0;


Thanks,
Ritesh
Re: Space in Lexer [message #363029 is a reply to message #362351] Thu, 04 December 2008 16:43 Go to previous messageGo to next message
Barbara Boehmer
Messages: 7997
Registered: November 2002
Location: California, USA
Senior Member
You can use whichever you prefer. You can use replace or translate to put a backslash in front of each special character or you can concatenate brackets around the entire phrase. If you concatenate brackets around the entire phrase, then it only searches for that exact phrase, eliminating any special meaning or conditions within. It really depends on what you expect your users to enter and what options you need to provide them with.
Re: Space in Lexer [message #397003 is a reply to message #362158] Wed, 08 April 2009 10:34 Go to previous messageGo to next message
mayur316
Messages: 4
Registered: April 2009
Junior Member
Hi Barbara

I am facing almost similar problem and applied solution suggested by you but it's still not giving expected result(as per previously existing functionality of the application) for following test scenario in which wildcard(%) is placed in between search parameter.

I had one alternate solution ready using skipjoins but that would require buy in from client and seniors since i am new to oracle i don't know if such step is necessary or not.

Please suggest solution to following problem in which contains query returns only two rows while original like clause is returning five.

Thanks.

SQL>  create table test (text varchar2(50));

Table created.

SQL> insert into test values ('ABC 10001-11 1777');

1 row created.

SQL> insert into test values ('ABC 10001-11 1777');

1 row created.

SQL> insert into test values ('ABC-20002-22-2888');

1 row created.

SQL> insert into test values ('UMB0001968-01-001');

1 row created.

SQL> insert into test values ('ABC@10001-11#1777');

1 row created.

SQL> insert into test values ('ABC 10001 1779');

1 row created.

SQL> insert into test values ('ABC-20002-22-17772');

1 row created.

SQL> commit;

Commit complete.

SQL> BEGIN
  2     ctx_ddl.drop_preference('lextest');
  3     ctx_ddl.create_preference('lextest','BASIC_LEXER');
  4     ctx_ddl.set_attribute('lextest','printjoins','-_~!@#$^&*(){}[],=?\;|><.');
  5     CTX_DDL.SET_ATTRIBUTE ('lextest', 'WHITESPACE', ' ');
  6     ctx_ddl.set_attribute('lextest','punctuations','.!?');
  7     CTX_DDL.CREATE_STOPLIST ('test_stoplist', 'BASIC_STOPLIST');
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> create index test_text_idx on test (text)
  2  INDEXTYPE IS CTXSYS.CONTEXT
  3  parameters ( 'LEXER lextest WORDLIST max_wordlist sync (on commit)  STOPLIST test_stoplist' );

Index created.

SQL> 
SQL> select text
  2  from test
  3  where contains(text,'%'||'ABC%177'||'%') > 0;


SQL> 
SQL> select text
  2  from test
  3  where contains(text,'%'||'ABC%177'||'%') > 0;

TEXT
--------------------------------------------------
ABC@10001-11#1777
ABC-20002-22-17772

SQL> 
SQL> 
SQL> select text
  2  from test
  3  where text like '%'||'ABC%177'||'%'
  4  /

TEXT
--------------------------------------------------
ABC 10001-11 1777
ABC 10001-11 1777
ABC@10001-11#1777
ABC 10001 1779
ABC-20002-22-17772
Re: Space in Lexer [message #397013 is a reply to message #397003] Wed, 08 April 2009 11:13 Go to previous messageGo to next message
Barbara Boehmer
Messages: 7997
Registered: November 2002
Location: California, USA
Senior Member
Queries using CONTAINS are not the same as queries using LIKE. In the example that you provided, you got what you searched for. Context indexes separate the text in the column into tokens (words or groups of characters separated by spaces) and you can search for things within tokens and combinations of things within tokens. So, when you search for '%ABC%177%' you are searching for any row that contains any one set of characters between spaces that includes 'ABC' somewhere within that token, then '177' somewhere after that, within the same token. So, the two rows that you got are the only rows that meet that criteria. If you want rows that have a token with 'ABC' in it and the same or different token with '177' in it, then you need to search for '%ABC% AND %177%', as demonstrated below. Also, notice in the example below, that you can list the token_text values in the dr$...$i domain index table to see what values have been tokenized, indexed, and are searchable.

SCOTT@orcl_11g> create table test (text varchar2(50))
  2  /

Table created.

SCOTT@orcl_11g> BEGIN
  2    insert into test values ('ABC 10001-11 1777');
  3    insert into test values ('ABC 10001-11 1777');
  4    insert into test values ('ABC-20002-22-2888');
  5    insert into test values ('UMB0001968-01-001');
  6    insert into test values ('ABC@10001-11#1777');
  7    insert into test values ('ABC 10001 1779');
  8    insert into test values ('ABC-20002-22-17772');
  9  END;
 10  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> BEGIN
  2  	ctx_ddl.create_preference('lextest','BASIC_LEXER');
  3  	ctx_ddl.set_attribute('lextest','printjoins','-_~!@#$^&*(){}[],=?\;|><.');
  4  	CTX_DDL.SET_ATTRIBUTE ('lextest', 'WHITESPACE', ' ');
  5  	ctx_ddl.set_attribute('lextest','punctuations','.!?');
  6  	CTX_DDL.CREATE_STOPLIST ('test_stoplist', 'BASIC_STOPLIST');
  7  END;
  8  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> create index test_text_idx on test (text)
  2  INDEXTYPE IS CTXSYS.CONTEXT
  3  parameters ( 'LEXER lextest STOPLIST test_stoplist')
  4  /

Index created.

SCOTT@orcl_11g> -- These are the tokens that can be searched for:
SCOTT@orcl_11g> SELECT token_text FROM dr$test_text_idx$i
  2  /

TOKEN_TEXT
----------------------------------------------------------------
10001
10001-11
1777
1779
ABC
ABC-20002-22-17772
ABC-20002-22-2888
ABC@10001-11#1777
UMB0001968-01-001

9 rows selected.

SCOTT@orcl_11g> -- If you want to find these results:
SCOTT@orcl_11g> select text from test
  2  where text like '%'||'ABC%177'||'%'
  3  /

TEXT
--------------------------------------------------
ABC 10001-11 1777
ABC 10001-11 1777
ABC@10001-11#1777
ABC 10001 1779
ABC-20002-22-17772

5 rows selected.

SCOTT@orcl_11g> -- then you need to use a search string like this one
SCOTT@orcl_11g> select text from test
  2  where contains (text, '%ABC% AND %177%') > 0
  3  /

TEXT
--------------------------------------------------
ABC 10001-11 1777
ABC 10001-11 1777
ABC@10001-11#1777
ABC 10001 1779
ABC-20002-22-17772

5 rows selected.

SCOTT@orcl_11g>




Re: Space in Lexer [message #397019 is a reply to message #397003] Wed, 08 April 2009 11:34 Go to previous message
mayur316
Messages: 4
Registered: April 2009
Junior Member
Thanks a lot barbara
Problem is solved and cleared basic concepts too. Smile
Previous Topic: create index hangs
Next Topic: How to prevent SQL Injection in OracleText SQL statement
Goto Forum:
  


Current Time: Fri Oct 31 02:57:00 CDT 2014

Total time taken to generate the page: 0.09872 seconds