Home » SQL & PL/SQL » SQL & PL/SQL » Text based search (Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production)
Text based search [message #630682] Fri, 02 January 2015 18:48 Go to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

Hi,

I have to do some sort of search engine like searching, but I don't have text based index installed in our system.

I have 2 tables -
1. NAME_DATA_TBL - contains customer names and other information
2. BAD_WORDS - just 1 column containing abusive words

I have to find bad words in the customer names, it can be anywhere - in first_name, middle_initial, or last_name, or any permutation combination of these columns.



CREATE TABLE NAME_DATA_TBL
(
  NAME_ID             NUMBER(9) NOT NULL,
  LAST_BUSINESS_NAME  VARCHAR2(60 BYTE),
  FIRST_NAME          VARCHAR2(32 BYTE),
  MIDDLE_INITIAL      VARCHAR2(32 BYTE),
  NAME_TITLE          VARCHAR2(5 BYTE),
  NAME_SUFFIX         CHAR(12 BYTE),
  ADDITIONAL_TITLE    VARCHAR2(60 BYTE),
  NAME_FORMAT         CHAR(1 BYTE),
  CONV_RUN_NO         NUMBER(3)
);

SQL> SELECT /*+ parallel(a,12) */ COUNT (*) FROM NAME_DATA;

  COUNT(*)
----------
 290371372

CREATE TABLE BAD_WORDS
(
  X_VALUE  VARCHAR2(255 BYTE)
);

SQL> select count(*) from BAD_WORDS;

  COUNT(*)
----------
       547

SELECT A.rowid row_id, A.*
  FROM NAME_DATA_TBL A, BAD_WORDS B
 WHERE UPPER (
             TRIM (A.FIRST_NAME)
          || TRIM (A.MIDDLE_INITIAL)
          || TRIM (A.LAST_BUSINESS_NAME)) LIKE
          '%' || B.X_VALUE || '%'
       OR UPPER (
                TRIM (A.LAST_BUSINESS_NAME)
             || TRIM (A.MIDDLE_INITIAL)
             || TRIM (A.FIRST_NAME)) LIKE
             '%' || B.X_VALUE || '%'
       OR UPPER (TRIM (A.LAST_BUSINESS_NAME) || TRIM (A.FIRST_NAME)) LIKE
             '%' || B.X_VALUE || '%'
       OR UPPER (TRIM (A.FIRST_NAME)) LIKE '%' || B.X_VALUE || '%'
       OR UPPER (TRIM (A.LAST_BUSINESS_NAME)) LIKE '%' || B.X_VALUE || '%'
       OR UPPER (TRIM (A.MIDDLE_INITIAL)) LIKE '%' || B.X_VALUE || '%'
       OR UPPER (TRIM (A.FIRST_NAME) || TRIM (A.LAST_BUSINESS_NAME)) LIKE
             '%' || B.X_VALUE || '%';


I am trying above, but it is very slow, as for each row, internally it has to go for multiple searches because of like operator.

Any suggestion to make this query faster will be appreciated.

Thanks,
Manu
Re: Text based search [message #630684 is a reply to message #630682] Fri, 02 January 2015 19:27 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Why look after the data is in the database?
It would be easier & faster NOT to allow the Bad Words into the DB in the first place.
The "design" is flawed & the BEST solution is to change the design.
Re: Text based search [message #630685 is a reply to message #630684] Fri, 02 January 2015 19:29 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

Okay, change the design like? You have the table structure, what should I change?
Or you want me to move it out of oracle? And spool it and do it in unix grep?

Manu
Re: Text based search [message #630686 is a reply to message #630685] Fri, 02 January 2015 19:33 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
>what should I change?

It would be easier & faster NOT to allow the Bad Words into the DB in the first place.
Re: Text based search [message #630687 is a reply to message #630686] Fri, 02 January 2015 19:39 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

That I will work on, but now they are already in database...
Tomorrow I will try processing the data with UNIX - grep, and put it back in oracle.

Thanks,
Manu
Re: Text based search [message #630688 is a reply to message #630687] Fri, 02 January 2015 19:44 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Why does it matter if your SQL takes multiple days to complete?
Re: Text based search [message #630704 is a reply to message #630682] Sat, 03 January 2015 05:11 Go to previous messageGo to next message
John Watson
Messages: 8928
Registered: January 2010
Location: Global Village
Senior Member
You are probably getting a nested loop join, which could be driven by either table. What is the join order? I would hope that the outer table is BAD_WORDS.
Re: Text based search [message #630707 is a reply to message #630682] Sat, 03 January 2015 06:12 Go to previous messageGo to next message
Michel Cadot
Messages: 68634
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Taking out the flaws BlackSwan rightly pointed you to, you will get better performances using Text/Ultra search Oracle feature... if you have the license.
Do you want me to move this topic to the associated forum where, I think, Barbara who is our expert on this subject, will be able to help you to set it?

Re: Text based search [message #630716 is a reply to message #630685] Sat, 03 January 2015 10:01 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3269
Registered: January 2010
Location: Connecticut, USA
Senior Member
manubatham20 wrote on Fri, 02 January 2015 20:29
what should I change?


Just create a trigger:

CREATE OR REPLACE
  TRIGGER NO_SWEARING
  BEFORE INSERT
      OR UPDATE
  ON NAME_DATA_TBL
  FOR EACH ROW
  DECLARE
      v_cnt NUMBER;
  BEGIN
      SELECT  COUNT(*)
        FROM  BAD_WORDS
        WHERE ROWNUM = 1
          AND (
                  INSTR(
                        UPPER(
                              RTRIM(:NEW.FIRST_NAME) || TRIM(:NEW.MIDDLE_INITIAL) || LTRIM(:NEW.LAST_BUSINESS_NAME)
                             ),
                        X_VALUE
                       ) > 0
               OR
                  INSTR(
                        UPPER(
                              RTRIM(:NEW.LAST_BUSINESS_NAME) || TRIM(:NEW.MIDDLE_INITIAL) || LTRIM(:NEW.FIRST_NAME)
                             ),
                        X_VALUE
                       ) > 0
               OR
                  INSTR(
                        UPPER(
                              RTRIM(:NEW.LAST_BUSINESS_NAME) || LTRIM(:NEW.FIRST_NAME)
                             ),
                        X_VALUE
                       ) > 0
               OR
                  INSTR(
                        UPPER(:NEW.FIRST_NAME),
                        X_VALUE
                       ) > 0
               OR
                  INSTR(
                        UPPER(:NEW.LAST_BUSINESS_NAME),
                        X_VALUE
                       ) > 0
               OR
                  INSTR(
                        UPPER(:NEW.MIDDLE_INITIAL),
                        X_VALUE
                       ) > 0
               OR
                  INSTR(
                        UPPER(
                              RTRIM(:NEW.FIRST_NAME) || LTRIM(:NEW.LAST_BUSINESS_NAME)
                             ),
                        X_VALUE
                       ) > 0
              );
      RAISE_APPLICATION_ERROR(
                              -20500,
                              'Watch your language!'
                             );
END;
/


Now you can prevent users from entering "bad words". You still need to validate existing table. Easiest way is to create error log table and issue:

BEGIN
  DBMS_ERRLOG.CREATE_ERROR_LOG(dml_table_name => 'NAME_DATA_TBL');
END;
/
UPDATE NAME_DATA_TBL
   SET FIRST_NAME = FIRST_NAME
   LOG ERRORS INTO ERR$_NAME_DATA_TBL('VALIDATE') REJECT LIMIT UNLIMITED
/


Now check ERR$_NAME_DATA_TBL for offending row ROWIDs.

SY.
Re: Text based search [message #630718 is a reply to message #630716] Sat, 03 January 2015 10:40 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

Hi John,

You are right, it's nested loops. As I have used all the things (TRIM/UPPER/LIKE) which suppress the use of indexes and hash joins.
Attached is plan (jpg) for your reference.

Michael,

Thanks, but as I already said, we don't have text searches installed, as we don't have license. So moving topic won't help.

SY,

We decided to put this check on front-end, that will be less costly.

Thanks you guys.
Manu
Re: Text based search [message #630720 is a reply to message #630718] Sat, 03 January 2015 10:48 Go to previous messageGo to next message
John Watson
Messages: 8928
Registered: January 2010
Location: Global Village
Senior Member
Quote:
Thanks, but as I already said, we don't have text searches installed, as we don't have license. So moving topic won't help.
Text is free with all editions.
Re: Text based search [message #630721 is a reply to message #630720] Sat, 03 January 2015 11:00 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

That's new for me. If this is the case, let me ask DBAs to get that installed. And I will let you know their responses.

Moreover, I will also try, spool and grep.

Thanks,
Manu
Re: Text based search [message #630723 is a reply to message #630721] Sat, 03 January 2015 12:47 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

John,

I was looking at one of your post regarding inverted tables.

Inverted tables: an alternative to relational structures

Can't it be applied here somehow?

Thanks,
Manu
Re: Text based search [message #630724 is a reply to message #630723] Sat, 03 January 2015 13:16 Go to previous messageGo to next message
John Watson
Messages: 8928
Registered: January 2010
Location: Global Village
Senior Member
What do you think?
Re: Text based search [message #630725 is a reply to message #630724] Sat, 03 January 2015 13:18 Go to previous message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

I think inverted tables, are just one use case substitution of text searches, but not a replacement.

I don't think inverted table will be useful to resolve wildcards.

Manu | DM
Previous Topic: Check Time In Query
Next Topic: String to Number conversion
Goto Forum:
  


Current Time: Tue Apr 16 05:52:30 CDT 2024