Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Intermedia index
Hmmmm.... It looks like this might be a character set or version conflict
for your PDF files.
Please confirm that your installation can handle the character set contained within your PDF files and then confirm that your installation of Oracle can support the version of the PDF files you have in your blob. If your data is not English I would recommend using a UTF8 character set database.
PDF files must be stored in BLOBs or BFILEs, so I would not worry about your storage for now. I know that intermedia is partially crippled on Linux (or was), so it may be that your character set and file version may not be compatible with your installation. I would search meta-link if you cannot find anything in the Linux documents. If everything looks good, I would try linking the PDFs as BFILEs and see if that works.
I strongly suspect your problem is a configuration issue. If not, then you might have found a bug in the Linux distribution. Unless you have a reason to stick with 8.1.7, I would recommend an upgrade to 9i.
Good Luck!
-- ~~~~~~~~~~~~~~~~ Chris Weiss www.hpdbe.com High Performance Database Engineering ~~~~~~~~~~~~~~~~ "Olivier VEIT" <oveit_at_infeurope.lu> wrote in message news:a5nlg0$hgn$1_at_wanadoo.fr...Received on Fri Mar 01 2002 - 12:56:46 CST
> Hello,
>
> Thank you very much for your help. Using your informations, i wrote this :
>
> begin
> ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER');
> ctx_ddl.set_attribute('my_lexer', 'INDEX_TEXT', 'YES');
> ctx_ddl.set_attribute('my_lexer', 'INDEX_THEMES', 'NO');
> end;
> /
>
> PROMPT index creation...;
> create index ctx_attach_notice_pdf_lb on attach (pdf_notice_lb) indextype
is
> ctxsys.context parameters ('storage users lexer my_lexer filter
> CTXSYS.INSO_FILTER nopopulate');
> indexing was rapid
>
> My problem : I find nothing in column TOKEN_TEXT in table
> DR$ctx_attach_notice_pdf_lb$I ???
>
> When i don't use the lexer parameter, i find binary data ? in column
> TOKEN_TEXT in table DR$ctx_attach_notice_pdf_lb$I ??? :
> TOKEN_TEXT
>
> indexing took a lot of time
> ----------------------------------------------------------------
> ÃzH
> ÃzHQ
> ÃzHQS
> ÃzIÃ'
> ÃzJ
> ÃzJFS
> ÃzJFS
> ÃzMÃ?HH
> ÃzMÃ?HHS
> ÃzNDS
>
> Maybe i have a problem with saving my PDF files in BLOB ?
>
>
> Chris Weiss <chris_at_www.hpdbe.com> a écrit dans le message :
> a5ljp9$ejn$1_at_msunews.cl.msu.edu...
> > Please include the command you used for creating the index. Did you
> specify
> > filtering in the parameter list? Look at the DR$ tables for the tokens
> from
> > the index.
> >
> > When filtering, if you are only interested in keywords, then you should
> > create a context preference and turn *OFF* theme indexing. Your
indexing
> > time will be substantially reduced.
> >
> > Also, when you establish synchronization policies, you will get much
> better
> > performance if you can use a DBMS_JOB call no sooner than every 5
minutes
> to
> > resync your indexes. This depends on your update/insert frequency. If
> the
> > database is updated slowly, re-indexing once a day might be sufficient.
> >
> > The insofilter is a resource pig. I recently replaced it with a C
program
> > for HTML files, the results were a 5x+ speed up in filtering and I
> overcame
> > some bugs I found in the inso filter. When indexing, the CPU will peg
to
> > 100%. This is typical behavior.
> >
> >
> > Good Luck!
> >
> > --
> >
> > ~~~~~~~~~~~~~~~~
> > Chris Weiss
> > www.hpdbe.com
> > High Performance Database Engineering
> > ~~~~~~~~~~~~~~~~
> >
> >
> > "Olivier VEIT" <oveit_at_infeurope.lu> wrote in message
> > news:a5ljac$l55$1_at_wanadoo.fr...
> > > Hello,
> > >
> > > I've a column PDF_NOTICE_LB BLOB in my table ATTACH containing PDF
files
> > > which contains text in FR, DE, IT and EN.
> > > To index this column, i simply execute :
> > > create index CTX_ATTACH_PDF_NOTICE_LB on ATTACH(PDF_NOTICE_LB)
indextype
> > is
> > > ctxsys.context;
> > >
> > > This command is running but is very slow (45 min for indexing to PDF
> > having
> > > each 170 Kb). But after indexing, i can't find anything then i use :
> > > select attach_id from attach where contains(PDF_NOTICE_LB,'test') >0;
> > >
> > > If I try to index a lot of pdf file, oracle uses 99% processor but
seems
> > to
> > > do nothing.
> > > It seems that the inso filter are OK (I used the ctxhx utility on one
of
> > my
> > > pdf document to test)
> > >
> > > Has someone an idea wich could help me...or better : a solution ;-)
> > >
> > > FYI : I'am using Oracle 8i (8.1.7) with Linux
> > >
> > > Thank you very much
> > >
> > >
> >
> >
>
>
![]() |
![]() |