Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Removing NUMGROUP from lexer in ConTerMedText index

RE: Removing NUMGROUP from lexer in ConTerMedText index

From: Jesse, Rich <Rich.Jesse_at_qtiworld.com>
Date: Fri, 31 Oct 2003 07:44:34 -0800
Message-ID: <F001.005D533C.20031031074434@fatcity.com>


OK, not a lot of ConTerMedText people out here?

Here's what I've done to fix it. It seems to work:

begin
ctx_ddl.create_preference('MYLEXER','BASIC_LEXER'); ctx_ddl.set_attribute('MYLEXER','NUMGROUP',CHR(255)); end;

So, I just changed the default NUMGROUP from "," to an unprintable ASCII 255. I think it's safe to assume that a user's not going to be allowed to enter that into a description.

Thanks, Rich! :)

Rich

Rich Jesse                           System/Database Administrator
rjesse_at_qtiworld.com                  Quad/Tech Inc, Sussex, WI USA


> -----Original Message-----
> From: Jesse, Rich
> Sent: Wednesday, October 29, 2003 4:44 PM
> To: Multiple recipients of list ORACLE-L
> Subject: Removing NUMGROUP from lexer in ConTerMedText index
>
>
> Hey all,
>
> I've setup a Context/Intermedia/Text/whateverTheHell index on
> 8.1.7.4 on
> HP/UX to index about 250000 description fields in order for
> our users to
> search on them. This was two years ago, and now someone has
> discovered at
> least one issue.
>
> One description contains something like:
>
> BLEAH,120,1/4W
>
> Using the default lexer, this stupidly parses into tokens of "BLEAH",
> "120,1" and "4W" instead of "BLEAH", "120", and "1/4W" (or
> even "1" and
> "4W"). I think this is because of the default NUMGROUP for
> US languages,
> which is a comma (","). So when a user looks for "120 AND 1/4W", this
> description is missed because "120" isn't a valid token with
> the default
> lexer.
>
> There can be numerous other issues with NUMGROUP when lexing a
> free-formatted description, so I really don't want a
> NUMGROUP. I tried
> setting it to null using:
>
> ctx_ddl.set_attribute('MYLEXER','NUMGROUP','');
>
> ..but this bombs with:
>
> ORA-20000: interMedia Text error:
> DRG-10705: invalid value NULL for attribute NUMGROUP
>
> Other than trying to find some char that will work with 250K
> rows, is there
> a way to turn this off? The thing that gets me is that
> "120,1" isn't even a
> proper number, but ConTerMedText thinks it is and tokenizes it.
>
> TIA,
> Rich

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Jesse, Rich
  INET: Rich.Jesse_at_qtiworld.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
Received on Fri Oct 31 2003 - 09:44:34 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US