Home » Server Options » Text & interMedia » Best approach for Multi Column text search (10.2)
Best approach for Multi Column text search [message #363322] Fri, 05 December 2008 16:50 Go to next message
rleishman
Messages: 3651
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I have a table with many address columns such as street number, street name, city, etc. I need to provide the ability for users to perform a text search as if the columns were all concatenated.

I understand from Barbara's magnificent examples in this forum that I can use a CONTEXT index with a MULTI_COLUMN_DATASTORE, but I read somewhere that CONTEXT indexes - unlike CTXCAT - are not self-maintaining when you update the base data. Is that right?

Alternatively, I could concatenate the columns myself into a new VARCHAR2 column during the ETL (this is a data warehouse) and CTXCAT index this one column.

At face value, the CTXCAT option would seem to be a bit wasteful as it duplicates the address data, but then I don't know what CONTEXT is doing behind the scenes.

QUESTION: For those who have worked with these things, how onerous is the overhead to maintain CONTEXT indexes? Is it worth the space overhead to have the self-maintaining CTXCAT index?

Ross Leishman
Re: Best approach for Multi Column text search [message #363337 is a reply to message #363322] Fri, 05 December 2008 22:26 Go to previous messageGo to next message
Barbara Boehmer
Messages: 7860
Registered: November 2002
Location: California, USA
Senior Member
What you should use is a multi_column_datastore with a context index. You can create it in such a manner that it will allow searches of all fields or individual columns or combinations with one index access. The only other alternative that you might consider would be a user_datastore with a procedure if you want to do further customization, such as creating groups of columns that can be searched. When you create a multi_column_datastore, it is actually doing what creating a user_datastore with a procedure does behind the scenes.

When you create or alter the context index, you can specify in the parameters how often you want synchronization and optimization similar to how you would specify the same things for a materialized view. You can specify on commit or specified intervals for synchronization and fast or full for optimization. You can also do these things manually and you can also rebuild or drop and recreate. Synchronization makes the inserted, updated, or deleted changes immediately available to searches, but does so by adding individual rows to the dr$...$... domain index tables, which causes gradually increasing fragmentation. Optimization eliminates the fragmentation and combines the data into fewer rows by token. Optimization can be done online so that it does not interfere with searches.

You do not want a ctxcat index. Ctxcat indexes are intended for use when you have only one unstructured (text) column and perhaps some associated structured data and it offers limited search capabilities. You cannot use all of the search methods using a catsearch query on a ctxcat index as you can using a contains query on a context index, although you can do a workaround by using a query template to allow the context grammar. There is also one very important thing, that I consider a bug, although it is documented that this can happen. If the optimizer chooses to use functional invocation on a catsearch query, which usually happens if the structured portion of the query is more limiting than the unstructured (text) portion, then all you get is an error message, saying that functional invocation is not supported on ctxcat indexes. No results, just an error. I would consider this a sufficient reason not to use any ctxcat index on an application until this is fixed in some future version. Yes, it is fast when it works and you don't have to specify synchronization and optimization when you create the index, but it may not work at all at any given time, and sometimes all the hints and materialized subqueries that you provide cannot influence the optimizer not to use functional invocation. This problem does not exist with context indexes, since they support functional invocation.

Please let me know if you need further clarification or examples and feel free to post any code that you are working on.

Re: Best approach for Multi Column text search [message #363338 is a reply to message #363337] Fri, 05 December 2008 22:36 Go to previous messageGo to next message
rleishman
Messages: 3651
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Great response Barbara. Thanks. Looks as though the way forward is clear.

Ross Leishman
Re: Best approach for Multi Column text search [message #438472 is a reply to message #363338] Sat, 09 January 2010 11:29 Go to previous messageGo to next message
ora22
Messages: 8
Registered: December 2009
Location: BOSTON
Junior Member
<quote>
You can create it in such a manner that it will allow searches of all fields or individual columns or combinations with one index access
</quote>

Hi Barbara

I am trying to figure out how to do the combinations bit with multi_column_datastore.

As an example - i have the following

---------------------------------
-- CREATE MULTI-COLUMN-DATASTORE
---------------------------------
connect ctxsys/<password>
grant execute on ctxsys.ctx_ddl to myschema;
-- only ctxsys can create MULTI_COLUMN preferences in Oracle 10g
begin
  ctx_ddl.drop_preference ( 'gcp_multi_column_datastore' );
end;
/
begin
  ctx_ddl.create_preference ( 'gcp_multi_column_datastore', 'multi_column_datastore' );
  ctx_ddl.set_attribute ( 'gcp_multi_column_datastore', 'columns', 'eng_name, cli_short_name, entity_duns_name, ult_par_duns_name' );
end;

connect myschema/<password>
-------------------
-- CREATE TABLE
------------------
drop table test_tab1;
create table test_tab1
(
  eng_name                 varchar2(100 byte),
  cli_short_name                    varchar2(200 byte),
  entity_duns_name              varchar2(100 byte),
  ult_par_duns_name             varchar2(100 byte),
  concat                char(1) /* placeholder for user datastore */    
);

insert into test_tab1 values ('pepsi','test1','test1','test1','');
insert into test_tab1 values ('test2','pepsi','test2','test2','');
insert into test_tab1 values ('test3','test3','exxon','test3','');
insert into test_tab1 values ('test4','test4','test4','exxon','');

commit;

-----------------------
-- CREATE SECTION GROUP
-----------------------
begin
  ctx_ddl.drop_section_group
    (
       group_name => 'test1_section_group'
    );
end;

begin
  ctx_ddl.create_section_group
    (
       group_name => 'test1_section_group',
       group_type => 'basic_section_group'
    );
  ctx_ddl.add_field_section
    (
      group_name   => 'test1_section_group',
      section_name => 'the_eng_name',
      tag          => 'eng_name',
      visible      => true
    );
  ctx_ddl.add_field_section
    (
      group_name   => 'test1_section_group',
      section_name => 'the_cli_short_name',
      tag          => 'cli_short_name',
      visible      => true
    );
  ctx_ddl.add_field_section
    (
      group_name   => 'test1_section_group',
      section_name => 'the_entity_duns_name',
      tag          => 'entity_duns_name',
      visible      => true
    );
  ctx_ddl.add_field_section
    (
      group_name   => 'test1_section_group',
      section_name => 'the_ult_par_duns_name',
      tag          => 'ult_par_duns_name',
      visible      => true
    );
end;

/

---------------
-- CREATE INDEX
---------------
create index test_tab1_concat on test_tab1 ( concat )
  indextype is ctxsys.context
  parameters ( '
    datastore      ctxsys.gcp_multi_column_datastore
    section group  test1_section_group           ' );


-------------------
-- SEARCH EXAMPLES
------------------
-- Look Anywhere - works great

select eng_name,cli_short_name,entity_duns_name,ult_par_duns_name 
from test_tab1 where contains ( concat,'exxon or pepsi',1 ) > 0

-- Look In eng_name Only - Works Great

select eng_name,cli_short_name,entity_duns_name,ult_par_duns_name 
from test_tab1 where contains ( concat,'(exxon or pepsi) within the_eng_name',1 ) > 0


My Question is - How do i do a search that looks in section the_eng_name & the_cli_short_name but not the others
in one index access

I tried

select eng_name,cli_short_name,entity_duns_name,ult_par_duns_name 
from test_tab1 where contains ( concat,'(exxon or pepsi) within the_eng_name,the_cli_short_name',1 ) > 0

or

select eng_name,cli_short_name,entity_duns_name,ult_par_duns_name 
from test_tab1 where contains ( concat,'(exxon or pepsi) within (the_eng_name or the_cli_short_name)',1 ) > 0


etc but none of those work

World appreciate your response.
Re: Best approach for Multi Column text search [message #438485 is a reply to message #438472] Sat, 09 January 2010 15:44 Go to previous messageGo to next message
Barbara Boehmer
Messages: 7860
Registered: November 2002
Location: California, USA
Senior Member
SCOTT@orcl_11g> COLUMN eng_name 	 FORMAT A8
SCOTT@orcl_11g> COLUMN cli_short_name	 FORMAT A14
SCOTT@orcl_11g> COLUMN entity_duns_name  FORMAT A16
SCOTT@orcl_11g> COLUMN ult_par_duns_name FORMAT A17
SCOTT@orcl_11g> SET    AUTOTRACE ON EXPLAIN
SCOTT@orcl_11g> SELECT eng_name, cli_short_name, entity_duns_name, ult_par_duns_name FROM test_tab1
  2  WHERE  CONTAINS
  3  	      (concat,
  4  	      '(exxon OR pepsi) WITHIN the_eng_name
  5  		OR
  6  	       (exxon OR pepsi) WITHIN the_cli_short_name',
  7  	      1) > 0
  8  /

ENG_NAME CLI_SHORT_NAME ENTITY_DUNS_NAME ULT_PAR_DUNS_NAME
-------- -------------- ---------------- -----------------
pepsi    test1          test1            test1
test2    pepsi          test2            test2


Execution Plan
----------------------------------------------------------
Plan hash value: 223341857

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |     1 |   273 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST_TAB1        |     1 |   273 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | TEST_TAB1_CONCAT |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("CONCAT",'(exxon OR pepsi) WITHIN the_eng_name
                  OR           (exxon OR pepsi) WITHIN the_cli_short_name',1)>0)

Note
-----
   - dynamic sampling used for this statement

SCOTT@orcl_11g> SET AUTOTRACE OFF
SCOTT@orcl_11g> 

Re: Best approach for Multi Column text search [message #438504 is a reply to message #438485] Sun, 10 January 2010 00:21 Go to previous message
ora22
Messages: 8
Registered: December 2009
Location: BOSTON
Junior Member
Thanks ! This is exactly what we needed ...
Previous Topic: CONTAINS function not working! (split from hijacked thread by bb)
Next Topic: Show Markup of Searced Multi-Column Text
Goto Forum:
  


Current Time: Sun Apr 20 16:53:44 CDT 2014

Total time taken to generate the page: 0.08605 seconds