Re: Oracle Text and searches across columns

From: Vladimir M. Zakharychev <vladimir.zakharychev_at_gmail.com>
Date: Mon, 4 Feb 2008 02:10:56 -0800 (PST)
Message-ID: <36f440bf-6a7e-40f4-8d60-c5aa86423f37@c4g2000hsg.googlegroups.com>


On Feb 3, 11:34 pm, Jake <jgarfi..._at_earthlink.net> wrote:
> Say if I am making a database of college classes across many colleges,
> where people can search and find classes.
>
> So, for example, each class has a:
>
> level - bachelors, graduate (for example)
> college name- State U, or SU as short name, along with other colleges
> semester - Fall 2008...
> Category - Biology, Math, Music etc.
> course number - BIO 101
> description - a full text description of the course.
> etc.
>
> as you can imagine, all of these things are in many tables in many
> columns. The "full description" that the user sees actually contains
> information from a join of many columns.
>
> What I would like is for the user to be able to search given a text
> phrase, and given a list of hits from all classes where the "full
> description" contains every one of the words they search for. The
> words do not have to be in any particular order. Example searches:
> "Biology 101" "BIO 101" "SU Math Fall 2008"
>
> I haven't seen an Oracle Text example that handles this type of case.
>
> What is the best way to do this? Can any one point me in the right
> direction?

If "full description" is a "computed" column (that is, you dynamically join relevant columns from different sources and present that to the user,) the most straightforward solution is to create a USER_DATASTORE with a PL/SQL procedure that will aggregate all this data into a CLOB and present that to Oracle Text for indexing. For example:

CREATE TABLE CLASSMATES
 ( ID NUMBER(10)

  ,FIRST_NAME VARCHAR2(100)
  ,LAST_NAME  VARCHAR2(100)
  ,MIDDLE_NAME VARCHAR2(100)
  ,....
  ,FULL_DESC  CHAR(1)         -- this is the placeholder column for
indexing
  ,CONSTRAINT PK$CLASSMATES#ID PRIMARY KEY (ID) )
/
-- some related tables are built here... ...
  • create the USER_DATASTORE preference. BEGIN
CTX_DDL.CREATE_PREFERENCE('MY_FULL_DESC_DATASTORE','USER_DATASTORE');
  • this is the procedure that will synthesize the document for indexing

CTX_DDL.SET_ATTRIBUTE('MY_FULL_DESC_DATASTORE','PROCEDURE','SYNTHESIZE_FULL_DESC');

  • 'CLOB' is default if not specified, included here for completeness

CTX_DDL.SET_ATTRIBUTE('MY_FULL_DESC_DATASTORE','OUTPUT_TYPE','CLOB'); END;
/

  • this is our document synthesizer
  • it MUST have two arguments: first is IN ROWID and
  • second is IN OUT NOCOPY <output_type>
  • names of the arguments are not important, only their types and order CREATE OR REPLACE PROCEDURE SYNTHESIZE_FULL_DESC(RID IN ROWID, C IN OUT NOCOPY CLOB) IS
    • DBMS_LOB.WRITEAPPEND is pretty expensive, so we will buffer
    • our output in this local variable. sBuf VARCHAR2(32767 BYTE) := NULL;
    • a buffering string appender procedure, also automatically
    • adds space character to every appended token to delimit them. PROCEDURE APPND( S IN VARCHAR2, bFlush IN BOOLEAN := FALSE ) IS BEGIN IF S IS NOT NULL AND LENGTHB(sBuf) + LENGTHB(S || ' ') > 32000 THEN DBMS_LOB.WRITEAPPEND(C, LENGTH(sBuf), sBuf); sBuf := S || ' '; ELSE sBuf := sBuf || S || ' '; END IF; IF bFlush THEN DBMS_LOB.WRITEAPPEND(C, LENGTH(sBuf), sBuf); sBuf := NULL; END IF; END APPND; BEGIN FOR X IN (SELECT * FROM CLASSMATES WHERE ROWID=RID) LOOP APPND(X.FIRST_NAME); APPND(X.LAST_NAME); APPND(X.MIDDLE_NAME);
      • can be repeated several times for different detail tables
      • or you can join all details here, or you can join them in the
      • outer cursor. FOR Y IN (SELECT * FROM SOME_DETAIL_TABLE WHERE CLASSMATE_ID=X.ID) LOOP APPND(Y.WHATEVER); .... END LOOP; END LOOP;
    • flush whatever is in the buffer to the CLOB APPND(NULL, TRUE);
END SYNTHESIZE_FULL_DESC;
/
  • now we create the index on our "full description" placeholder
  • and specify our custom data store the data will come from. CREATE INDEX CTX$CLASSMATES#FULL_DESC ON CLASSMATES.FULL_DESC INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('DATASTORE MY_FULL_DESC_DATASTORE SYNC(ON COMMIT)');
  • now we can query the base table and all details will be searched
  • as well thanks to our document synthesizer. SELECT ID, SCORE(0) AS SCORE FROM CLASSMATES WHERE CONTAINS(FULL_DESC, 'my query terms', 0) > 0;

One important note is due here: if any data in any detail table is changed, the base table should be updated as well, otherwise that changed detail data will not be re-indexed because the index only depends on the base table and will not "see" any changes of data it actually indexes. A simple no-work update is sufficient:

UPDATE CLASSMATES SET FULL_DESC=FULL_DESC WHERE ID=:CHANGED_DETAIL_ID; Despite the fact that this update doesn't change anything, Oracle Text will still add the ROWID of "updated" column to the list of rows pending indexing.

For more information: Oracle Text Reference for your Oracle release, Google (I believe I posted similar solution to a similar problem a few years back in this very group.)

Hth,

   Vladimir M. Zakharychev
   N-Networks, makers of Dynamic PSP(tm)    http://www.dynamicpsp.com Received on Mon Feb 04 2008 - 04:10:56 CST

Original text of this message