Re: Fuzzy string search?

From: Vladimir M. Zakharychev <vladimir.zakharychev_at_gmail.com>
Date: Tue, 8 Jan 2008 08:33:37 -0800 (PST)
Message-ID: <3154951e-7118-4488-bb2b-2fb37741be2b@e23g2000prf.googlegroups.com>


On Jan 8, 5:13 pm, Marten Lehmann <lehmannmap..._at_cnm.de> wrote:
> Hello,
>
> in our application it would be very helpful, if the search wouldn't
> return only exact matches, but also similar items, e.g.
>
> search for "models" returns also "model"
> search for "exmaple" returns also "example"
> search for "dl365" returns also "dl385"
>
> How can I do this with Oracle? Is there any extension I have to install?
> Or is there a function like the function to match regular expressions?
>
> Regards
> Marten

  1. Basic soundex function (works only with English):

SQL> select soundex('example'), soundex('exmaple') from dual;

SOUN SOUN
---- ----
E251 E251

However, soundex will, for example, distinguish 'model' and 'models', so it's not universal.

2. Oracle Text indexes (supports several European languages plus Japanese and OCR):

SQL> create table texts (id number(10) primary key, content varchar2(4000 byte));

Table created.

SQL> create index ix$ctx#texts on texts(content) indextype is ctxsys.context;

Index created.

SQL> insert into texts values(1, 'model');

1 row created.

SQL> insert into texts values(2, 'example');

1 row created.

SQL> insert into texts values(3, 'dl385');

1 row created.

SQL> commit;

Commit complete.

SQL> exec ctxsys.ctx_ddl.sync_index('IX$CTX#TEXTS');

PL/SQL procedure successfully completed.

SQL> select id from texts where contains(content,'fuzzy(models)') > 0;  

ID


 

1

SQL> select id from texts where contains(content,'fuzzy(exmaple)') > 0;  

ID


 

2

SQL> select id from texts where contains(content,'?dl365') > 0;  

ID


 

3

'?keyword' is a shortcut to 'fuzzy(keyword)' (fuzzy() can also accept 4 arguments for weighted matching.) See Oracle Text Reference for your Oracle release for more information on this powerful technology. Usually it's installed by default. In 10g and later, there's pretty little effort needed to maintain Text indexes with "sync on commit" option and scheduled online index rebuilds for volatile data.

Hth,

   Vladimir M. Zakharychev
   N-Networks, makers of Dynamic PSP(tm)    http://www.dynamicpsp.com Received on Tue Jan 08 2008 - 10:33:37 CST

Original text of this message