Re: Help: Join a "glossary" table and a "data" table referring to terms in the glossary

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: Tue, 5 Feb 2008 19:11:49 -0800 (PST)
Message-ID: <d8e2d79c-f354-470f-b751-3554af8e73fa@j20g2000hsi.googlegroups.com>


On Feb 5, 8:13 pm, Vamsidhar <tvamsid..._at_gmail.com> wrote:
> On Feb 5, 7:15 pm, Mark D Powell <Mark.Pow..._at_eds.com> wrote:
> > On Feb 5, 6:24 pm, Vamsidhar <tvamsid..._at_gmail.com> wrote:
>
> > > I have a "glossary" table that holds a unique identifier (key) for
> > > each "text" string used elsewhere in the app.
> > > The fields in the data tables that should hold the text are then
> > > populated with the "key" instead of the text - helps clients
> > > substitute their own "glossary" for the default one, I suppose. In
> > > some cases, there may be more than one key in a given field (separated
> > > by commas). And the objective here is to extract a list of all the
> > > keys used in a given data table.
>
> > > I've tried to describe the scenario with a simplified example below:
>
> > > Given the following data:
>
> > > -- ---------------------
> > > -- This is my "glossary"
> > > create table vt_del_lib (key varchar2(100), text varchar2(100));
>
> > > insert into vt_del_lib values ('010001', 'One');
> > > insert into vt_del_lib values ('010002', 'Two');
> > > insert into vt_del_lib values ('010003', 'Three');
> > > insert into vt_del_lib values ('010004', 'Four');
> > > insert into vt_del_lib values ('010005', 'Five');
> > > insert into vt_del_lib values ('010006', 'Six');
> > > insert into vt_del_lib values ('010007', 'Seven');
> > > insert into vt_del_lib values ('010008', 'Eight');
> > > insert into vt_del_lib values ('010009', 'Nine');
> > > insert into vt_del_lib values ('010010', 'Ten');
> > > commit;
>
> > > -- This is my "data" table
> > > create table vt_del_keys (val1 varchar2(1000));
>
> > > insert into vt_del_keys values('010001');
> > > insert into vt_del_keys values('010004,010005,010008,010010');
> > > insert into vt_del_keys values('010005,010008');
> > > commit;
> > > -- ---------------------
>
> > > Here's the query I've finally decided to use. I know it works, but not
> > > how :) I was wondering if someone could explain - how/why does it work
> > > - which version is better - is there a better way - etc.
>
> > > It gets pretty slow as the data table grows and I'm having trouble
> > > trying to improve its performance.
>
> > > -- ---------------------
> > > select distinct l.key
> > > from
> > >         vt_del_lib l
> > >         join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > -- OR
>
> > > select distinct l.key
> > > from
> > >         vt_del_keys d
> > >         join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > -- ---------------------
>
> > > Sorry for the cryptic subject line - I couldn't classify the problem
> > > any better :)
>
> > > TIA!
> > > Vamsi.
>
> > > P.S.: Some sample output...
>
> > > SQL> --Get used keys
> > > SQL> select l.key
> > >   2  from
> > >   3     vt_del_lib l
> > >   4     join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > KEY
> > > ---------------------------------------------------------------------------­­­-----
> > > 010001
> > > 010004
> > > 010005
> > > 010008
> > > 010010
> > > 010005
> > > 010008
>
> > > 7 rows selected
>
> > > Executed in 0.375 seconds
> > > SQL> select l.key
> > >   2  from
> > >   3     vt_del_keys d
> > >   4     join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > KEY
> > > ---------------------------------------------------------------------------­­­-----
> > > 010001
> > > 010004
> > > 010005
> > > 010005
> > > 010008
> > > 010008
> > > 010010
>
> > > 7 rows selected
>
> > > Executed in 0.375 seconds
> > > SQL> --Get distinct used kyes
> > > SQL> select distinct l.key
> > >   2  from
> > >   3     vt_del_lib l
> > >   4     join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > KEY
> > > ---------------------------------------------------------------------------­­­-----
> > > 010001
> > > 010004
> > > 010005
> > > 010008
> > > 010010
>
> > I do not have a system in front of me that I can test with but if you
> > run explain plan on these queries I expect that both tables are being
> > read via full table scans since you concatenate a comma before and
> > after you key columns rendering any index useless unless you have
> > built function based indexes on both set of keys.
>
> > Because the run time is exactly equal I suspect Oracle is using the
> > same plan for both orderings of the table name in the query.
>
> > Is the goal to find the key value or to get the descriptions for the
> > keys?
>
> > -- Mark D Powell --- Hide quoted text -
>
> > - Show quoted text -
>
> Thanks Mark D Powell :)
>
> The goal is to extract the keys values.
>
> Yes, the explain plan shows that both tables undergo full table scan.
> I didn't think of creating an index on the data table because I
> thought the LIKE clause doesn't allow for an index to be used. Can you
> suggest what the function-based index should be like on the data
> table? - or the glossary table, for that matter :)
>
> Thanks.
> Vamsi

Thanks for posting the DDL and DML to create and populate the tables. Try a bit of experimentation working with some of Mark's suggestions. In addition to the rows that you inserted into the tables, I excuted the following to add an additional 900,000 rows into one of the tables:
NSERT INTO
  VT_DEL_LIB
SELECT
  TO_CHAR(ROWNUM+10010,'000000'),
  'N'||TO_CHAR(ROWNUM+10010,'000000')
FROM
  DUAL
CONNECT BY
  LEVEL<=900000;

COMMIT; Now, let's gather stats on the tables and any indexes: EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_LIB',CASCADE=>TRUE); EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_KEYS',CASCADE=>TRUE); select distinct l.key
from

        vt_del_lib l
        join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',
%';

The DBMS Xplan:



| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |

| 1 | HASH UNIQUE | | 1 | 2 | 5 | 00:00:01.35 | 8217 | 968K| 968K| 740K (0)|
|   2 |   NESTED LOOPS      |             |      1 |      2 |      7 |
00:00:01.35 |    8217 |       |       |          |
|   3 |    TABLE ACCESS FULL| VT_DEL_KEYS |      1 |      3 |      3 |
00:00:00.10 |       3 |       |       |          |
|*  4 |    TABLE ACCESS FULL| VT_DEL_LIB  |      3 |      1 |      7 |
00:00:01.25 |    8214 |       |       |          |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')

The above completed in 1.35 seconds.

Now, an experimentation by creating a unique index on the VT_DEL_LIB table, modifying the KEY column so that it cannot be NULL, and gathering table and index stats:
CREATE UNIQUE INDEX IND_VT_DEL_LIB ON VT_DEL_LIB (KEY); ALTER TABLE VT_DEL_LIB MODIFY(KEY NOT NULL); EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_LIB',CASCADE=>TRUE); select distinct l.key
from

        vt_del_lib l
        join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',
%';

| Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
|   1 |  HASH UNIQUE           |                |      1 |
135K|      5 |00:00:01.40 |    6816 |      3 |   968K|   968K|  742K
(0)|
|   2 |   NESTED LOOPS         |                |      1 |
135K|      7 |00:00:01.40 |    6816 |      3 |       |
|          |
|   3 |    TABLE ACCESS FULL   | VT_DEL_KEYS    |      1 |      3
|      3 |00:00:00.01 |       3 |      0 |       |       |          |
|*  4 |    INDEX FAST FULL SCAN| IND_VT_DEL_LIB |      3 |  45001
|      7 |00:00:01.40 |    6813 |      3 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')

The query is now using the index that was just created, but the execution time increased by 0.05 seconds.

SELECT DISTINCT
  L.KEY
FROM
  VT_DEL_LIB L,
  VT_DEL_KEYS D
WHERE
  D.VAL1 LIKE '%'||L.KEY||'%';



| Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
|   1 |  HASH UNIQUE           |                |      1 |
135K|      5 |00:00:00.81 |    6816 |   968K|   968K|  713K (0)|
|   2 |   NESTED LOOPS         |                |      1 |
135K|      7 |00:00:00.81 |    6816 |       |       |          |
|   3 |    TABLE ACCESS FULL   | VT_DEL_KEYS    |      1 |      3
|      3 |00:00:00.01 |       3 |       |       |          |
|*  4 |    INDEX FAST FULL SCAN| IND_VT_DEL_LIB |      3 |  45001
|      7 |00:00:00.80 |    6813 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   4 - filter("D"."VAL1" LIKE '%'||"L"."KEY"||'%')

The above completed in 0.81 seconds, which is a nice improvement, but still not great.

Now, let's experiment with regular expressions (an adaptation of a previous solution by Maxim Demenko for splitting a sentence into words):
SELECT DISTINCT
  REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
FROM
  VT_DEL_KEYS
CONNECT BY
  REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL;

KEY



010005
010008
010010
010004
010001

If we can take the above results and drive back into into the VT_DEL_LIB table using the index that we previously created, we obtain an efficient plan:
SELECT DISTINCT
  L.KEY
FROM
  (SELECT DISTINCT
    REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY   FROM
    VT_DEL_KEYS
  CONNECT BY
    REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL) D,   VT_DEL_LIB L
WHERE
  L.KEY=D.KEY;


| Id  | Operation                        | Name           | Starts | E-
Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
|   1 |  HASH UNIQUE                     |                |      1
|      3 |      5 |00:00:00.01 |      15 |   968K|   968K|  717K (0)|
|   2 |   NESTED LOOPS                   |                |      1
|      3 |      5 |00:00:00.01 |      15 |       |       |          |
|   3 |    VIEW                          |                |      1
|      3 |      5 |00:00:00.01 |       3 |       |       |          |
|   4 |     HASH UNIQUE                  |                |      1
|      3 |      5 |00:00:00.01 |       3 |   936K|   936K|  738K (0)|
|   5 |      CONNECT BY WITHOUT FILTERING|                |      1
|        |     21 |00:00:00.01 |       3 |       |       |          |
|   6 |       TABLE ACCESS FULL          | VT_DEL_KEYS    |      1
|      3 |      3 |00:00:00.01 |       3 |       |       |          |
|*  7 |    INDEX UNIQUE SCAN             | IND_VT_DEL_LIB |      5
|      1 |      5 |00:00:00.01 |      12 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   7 - access("L"."KEY"="D"."KEY")

The above completed in 0.01 seconds. As the data size increases, it may be necessary to add an /*+ ORDERED */ hint to make certain that the plan does not change significantly. Experiment to see what happens as the data size increases.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc. Received on Tue Feb 05 2008 - 21:11:49 CST

Original text of this message