Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Intermedia index
Hello,
Thank you very much for your help. Using your informations, i wrote this :
begin
ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER'); ctx_ddl.set_attribute('my_lexer', 'INDEX_TEXT', 'YES'); ctx_ddl.set_attribute('my_lexer', 'INDEX_THEMES', 'NO');end;
PROMPT index creation...;
create index ctx_attach_notice_pdf_lb on attach (pdf_notice_lb) indextype is
ctxsys.context parameters ('storage users lexer my_lexer filter
CTXSYS.INSO_FILTER nopopulate');
indexing was rapid
My problem : I find nothing in column TOKEN_TEXT in table DR$ctx_attach_notice_pdf_lb$I ???
When i don't use the lexer parameter, i find binary data ? in column TOKEN_TEXT in table DR$ctx_attach_notice_pdf_lb$I ??? : TOKEN_TEXT indexing took a lot of time
Maybe i have a problem with saving my PDF files in BLOB ?
Chris Weiss <chris_at_www.hpdbe.com> a écrit dans le message :
a5ljp9$ejn$1_at_msunews.cl.msu.edu...
> Please include the command you used for creating the index. Did you
specify
> filtering in the parameter list? Look at the DR$ tables for the tokens
from
> the index.
>
> When filtering, if you are only interested in keywords, then you should
> create a context preference and turn *OFF* theme indexing. Your indexing
> time will be substantially reduced.
>
> Also, when you establish synchronization policies, you will get much
better
> performance if you can use a DBMS_JOB call no sooner than every 5 minutes
to
> resync your indexes. This depends on your update/insert frequency. If
the
> database is updated slowly, re-indexing once a day might be sufficient.
>
> The insofilter is a resource pig. I recently replaced it with a C program
> for HTML files, the results were a 5x+ speed up in filtering and I
overcame
> some bugs I found in the inso filter. When indexing, the CPU will peg to
> 100%. This is typical behavior.
>
>
> Good Luck!
>
> --
>
> ~~~~~~~~~~~~~~~~
> Chris Weiss
> www.hpdbe.com
> High Performance Database Engineering
> ~~~~~~~~~~~~~~~~
>
>
> "Olivier VEIT" <oveit_at_infeurope.lu> wrote in message
> news:a5ljac$l55$1_at_wanadoo.fr...
> > Hello,
> >
> > I've a column PDF_NOTICE_LB BLOB in my table ATTACH containing PDF files
> > which contains text in FR, DE, IT and EN.
> > To index this column, i simply execute :
> > create index CTX_ATTACH_PDF_NOTICE_LB on ATTACH(PDF_NOTICE_LB) indextype
> is
> > ctxsys.context;
> >
> > This command is running but is very slow (45 min for indexing to PDF
> having
> > each 170 Kb). But after indexing, i can't find anything then i use :
> > select attach_id from attach where contains(PDF_NOTICE_LB,'test') >0;
> >
> > If I try to index a lot of pdf file, oracle uses 99% processor but seems
> to
> > do nothing.
> > It seems that the inso filter are OK (I used the ctxhx utility on one of
> my
> > pdf document to test)
> >
> > Has someone an idea wich could help me...or better : a solution ;-)
> >
> > FYI : I'am using Oracle 8i (8.1.7) with Linux
> >
> > Thank you very much
> >
> >
>
>
Received on Fri Mar 01 2002 - 04:38:29 CST