Re: Help: Join a "glossary" table and a "data" table referring to terms in the glossary

From: Vamsidhar <tvamsidhar_at_gmail.com>
Date: Wed, 6 Feb 2008 20:21:05 -0800 (PST)
Message-ID: <074d8fb3-ba21-4087-b9df-d0955a2a6c4b@v4g2000hsf.googlegroups.com>


On Feb 5, 10:11 pm, Charles Hooper <hooperc2..._at_yahoo.com> wrote:
> On Feb 5, 8:13 pm, Vamsidhar <tvamsid..._at_gmail.com> wrote:
>
> > On Feb 5, 7:15 pm, Mark D Powell <Mark.Pow..._at_eds.com> wrote:
> > > On Feb 5, 6:24 pm, Vamsidhar <tvamsid..._at_gmail.com> wrote:
>
> > > > I have a "glossary" table that holds a unique identifier (key) for
> > > > each "text" string used elsewhere in the app.
> > > > The fields in the data tables that should hold the text are then
> > > > populated with the "key" instead of the text - helps clients
> > > > substitute their own "glossary" for the default one, I suppose. In
> > > > some cases, there may be more than one key in a given field (separated
> > > > by commas). And the objective here is to extract a list of all the
> > > > keys used in a given data table.
>
> > > > I've tried to describe the scenario with a simplified example below:
>
> > > > Given the following data:
>
> > > > -- ---------------------
> > > > -- This is my "glossary"
> > > > create table vt_del_lib (key varchar2(100), text varchar2(100));
>
> > > > insert into vt_del_lib values ('010001', 'One');
> > > > insert into vt_del_lib values ('010002', 'Two');
> > > > insert into vt_del_lib values ('010003', 'Three');
> > > > insert into vt_del_lib values ('010004', 'Four');
> > > > insert into vt_del_lib values ('010005', 'Five');
> > > > insert into vt_del_lib values ('010006', 'Six');
> > > > insert into vt_del_lib values ('010007', 'Seven');
> > > > insert into vt_del_lib values ('010008', 'Eight');
> > > > insert into vt_del_lib values ('010009', 'Nine');
> > > > insert into vt_del_lib values ('010010', 'Ten');
> > > > commit;
>
> > > > -- This is my "data" table
> > > > create table vt_del_keys (val1 varchar2(1000));
>
> > > > insert into vt_del_keys values('010001');
> > > > insert into vt_del_keys values('010004,010005,010008,010010');
> > > > insert into vt_del_keys values('010005,010008');
> > > > commit;
> > > > -- ---------------------
>
> > > > Here's the query I've finally decided to use. I know it works, but not
> > > > how :) I was wondering if someone could explain - how/why does it work
> > > > - which version is better - is there a better way - etc.
>
> > > > It gets pretty slow as the data table grows and I'm having trouble
> > > > trying to improve its performance.
>
> > > > -- ---------------------
> > > > select distinct l.key
> > > > from
> > > >         vt_del_lib l
> > > >         join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > -- OR
>
> > > > select distinct l.key
> > > > from
> > > >         vt_del_keys d
> > > >         join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > -- ---------------------
>
> > > > Sorry for the cryptic subject line - I couldn't classify the problem
> > > > any better :)
>
> > > > TIA!
> > > > Vamsi.
>
> > > > P.S.: Some sample output...
>
> > > > SQL> --Get used keys
> > > > SQL> select l.key
> > > >   2  from
> > > >   3     vt_del_lib l
> > > >   4     join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > KEY
> > > > ---------------------------------------------------------------------------­­­­-----
> > > > 010001
> > > > 010004
> > > > 010005
> > > > 010008
> > > > 010010
> > > > 010005
> > > > 010008
>
> > > > 7 rows selected
>
> > > > Executed in 0.375 seconds
> > > > SQL> select l.key
> > > >   2  from
> > > >   3     vt_del_keys d
> > > >   4     join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > KEY
> > > > ---------------------------------------------------------------------------­­­­-----
> > > > 010001
> > > > 010004
> > > > 010005
> > > > 010005
> > > > 010008
> > > > 010008
> > > > 010010
>
> > > > 7 rows selected
>
> > > > Executed in 0.375 seconds
> > > > SQL> --Get distinct used kyes
> > > > SQL> select distinct l.key
> > > >   2  from
> > > >   3     vt_del_lib l
> > > >   4     join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > KEY
> > > > ---------------------------------------------------------------------------­­­­-----
> > > > 010001
> > > > 010004
> > > > 010005
> > > > 010008
> > > > 010010
>
> > > I do not have a system in front of me that I can test with but if you
> > > run explain plan on these queries I expect that both tables are being
> > > read via full table scans since you concatenate a comma before and
> > > after you key columns rendering any index useless unless you have
> > > built function based indexes on both set of keys.
>
> > > Because the run time is exactly equal I suspect Oracle is using the
> > > same plan for both orderings of the table name in the query.
>
> > > Is the goal to find the key value or to get the descriptions for the
> > > keys?
>
> > > -- Mark D Powell --- Hide quoted text -
>
> > > - Show quoted text -
>
> > Thanks Mark D Powell :)
>
> > The goal is to extract the keys values.
>
> > Yes, the explain plan shows that both tables undergo full table scan.
> > I didn't think of creating an index on the data table because I
> > thought the LIKE clause doesn't allow for an index to be used. Can you
> > suggest what the function-based index should be like on the data
> > table? - or the glossary table, for that matter :)
>
> > Thanks.
> > Vamsi
>
> Thanks for posting the DDL and DML to create and populate the tables.
> Try a bit of experimentation working with some of Mark's suggestions.
> In addition to the rows that you inserted into the tables, I excuted
> the following to add an additional 900,000 rows into one of the
> tables:
> NSERT INTO
>   VT_DEL_LIB
> SELECT
>   TO_CHAR(ROWNUM+10010,'000000'),
>   'N'||TO_CHAR(ROWNUM+10010,'000000')
> FROM
>   DUAL
> CONNECT BY
>   LEVEL<=900000;
>
> COMMIT;
>
> Now, let's gather stats on the tables and any indexes:
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_LIB',CASCADE=>­TRUE);
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_KEYS',CASCADE=­>TRUE);
>
> select distinct l.key
> from
>         vt_del_lib l
>         join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',
> %';
>
> The DBMS Xplan:
> ---------------------------------------------------------------------------­---------------------------------------------
> | Id  | Operation           | Name        | Starts | E-Rows | A-Rows
> |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
> ---------------------------------------------------------------------------­---------------------------------------------
> |   1 |  HASH UNIQUE        |             |      1 |      2 |      5 |
> 00:00:01.35 |    8217 |   968K|   968K|  740K (0)|
> |   2 |   NESTED LOOPS      |             |      1 |      2 |      7 |
> 00:00:01.35 |    8217 |       |       |          |
> |   3 |    TABLE ACCESS FULL| VT_DEL_KEYS |      1 |      3 |      3 |
> 00:00:00.10 |       3 |       |       |          |
> |*  4 |    TABLE ACCESS FULL| VT_DEL_LIB  |      3 |      1 |      7 |
> 00:00:01.25 |    8214 |       |       |          |
> ---------------------------------------------------------------------------­---------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>    4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')
>
> The above completed in 1.35 seconds.
>
> Now, an experimentation by creating a unique index on the VT_DEL_LIB
> table, modifying the KEY column so that it cannot be NULL, and
> gathering table and index stats:
> CREATE UNIQUE INDEX IND_VT_DEL_LIB ON VT_DEL_LIB (KEY);
> ALTER TABLE VT_DEL_LIB MODIFY(KEY NOT NULL);
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_LIB',CASCADE=>­TRUE);
>
> select distinct l.key
> from
>         vt_del_lib l
>         join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',
> %';
>
> ---------------------------------------------------------------------------­------------------------------------------------------------
> | Id  | Operation              | Name           | Starts | E-Rows | A-
> Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
> ---------------------------------------------------------------------------­------------------------------------------------------------
> |   1 |  HASH UNIQUE           |                |      1 |
> 135K|      5 |00:00:01.40 |    6816 |      3 |   968K|   968K|  742K
> (0)|
> |   2 |   NESTED LOOPS         |                |      1 |
> 135K|      7 |00:00:01.40 |    6816 |      3 |       |
> |          |
> |   3 |    TABLE ACCESS FULL   | VT_DEL_KEYS    |      1 |      3
> |      3 |00:00:00.01 |       3 |      0 |       |       |          |
> |*  4 |    INDEX FAST FULL SCAN| IND_VT_DEL_LIB |      3 |  45001
> |      7 |00:00:01.40 |    6813 |      3 |       |       |          |
> ---------------------------------------------------------------------------­------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>    4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')
>
> The query is now using the index that was just created, but the
> execution time increased by 0.05 seconds.
>
> SELECT DISTINCT
>   L.KEY
> FROM
>   VT_DEL_LIB L,
>   VT_DEL_KEYS D
> WHERE
>   D.VAL1 LIKE '%'||L.KEY||'%';
>
> ---------------------------------------------------------------------------­---------------------------------------------------
> | Id  | Operation              | Name           | Starts | E-Rows | A-
> Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
> ---------------------------------------------------------------------------­---------------------------------------------------
> |   1 |  HASH UNIQUE           |                |      1 |
> 135K|      5 |00:00:00.81 |    6816 |   968K|   968K|  713K (0)|
> |   2 |   NESTED LOOPS         |                |      1 |
> 135K|      7 |00:00:00.81 |    6816 |       |       |          |
> |   3 |    TABLE ACCESS FULL   | VT_DEL_KEYS    |      1 |      3
> |      3 |00:00:00.01 |       3 |       |       |          |
> |*  4 |    INDEX FAST FULL SCAN| IND_VT_DEL_LIB |      3 |  45001
> |      7 |00:00:00.80 |    6813 |       |       |          |
> ---------------------------------------------------------------------------­---------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>    4 - filter("D"."VAL1" LIKE '%'||"L"."KEY"||'%')
>
> The above completed in 0.81 seconds, which is a nice improvement, but
> still not great.
>
> Now, let's experiment with regular expressions (an adaptation of a
> previous solution by Maxim Demenko for splitting a sentence into
> words):
> SELECT DISTINCT
>   REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
> FROM
>   VT_DEL_KEYS
> CONNECT BY
>   REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL;
>
> KEY
> ------
> 010005
> 010008
> 010010
> 010004
> 010001
>
> If we can take the above results and drive back into into the
> VT_DEL_LIB table using the index that we previously created, we obtain
> an efficient plan:
> SELECT DISTINCT
>   L.KEY
> FROM
>   (SELECT DISTINCT
>     REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
>   FROM
>     VT_DEL_KEYS
>   CONNECT BY
>     REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL) D,
>   VT_DEL_LIB L
> WHERE
>   L.KEY=D.KEY;
>
> ---------------------------------------------------------------------------­-------------------------------------------------------------
> | Id  | Operation                        | Name           | Starts | E-
> Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
> ---------------------------------------------------------------------------­-------------------------------------------------------------
> |   1 |  HASH UNIQUE                     |                |      1
> |      3 |      5 |00:00:00.01 |      15 |   968K|   968K|  717K (0)|
> |   2 |   NESTED LOOPS                   |                |      1
> |      3 |      5 |00:00:00.01 |      15 |       |       |          |
> |   3 |    VIEW                          |                |      1
> |      3 |      5 |00:00:00.01 |       3 |       |       |          |
> |   4 |     HASH UNIQUE                  |                |      1
> |      3 |      5 |00:00:00.01 |       3 |   936K|   936K|  738K (0)|
> |   5 |      CONNECT BY WITHOUT FILTERING|                |      1
> |        |     21 |00:00:00.01 |       3 |       |       |          |
> |   6 |       TABLE ACCESS FULL          | VT_DEL_KEYS    |      1
> |      3 |      3 |00:00:00.01 |       3 |       |       |          |
> |*  7 |    INDEX UNIQUE SCAN             | IND_VT_DEL_LIB |      5
> |      1 |      5 |00:00:00.01 |      12 |       |       |          |
> ---------------------------------------------------------------------------­-------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>    7 - access("L"."KEY"="D"."KEY")
>
> The above completed in 0.01 seconds.  As the data size increases, it
> may be necessary to add an /*+ ORDERED */ hint to make certain that
> the plan does not change significantly.  Experiment to see what
> happens as the data size increases.
>
> Charles Hooper
> IT Manager/Oracle DBA
> K&M Machine-Fabricating, Inc.

Thank you *very* much for that amazing analysis, Charles Hooper!

I get the impression that the regex. support is not available in Oracle 9i - is it? We're still on 9i. But, I'm going to adapt the basic idea and try it out using a "lookup" function I wrote a while back.

Something like this:

SELECT DISTINCT
  bpr_app.bpr_tools.getTokenN(VAL1, LEVEL, ',') FROM
  VT_DEL_KEYS
CONNECT BY
  bpr_app.bpr_tools.getTokenN(VAL1, LEVEL, ',') IS NOT NULL;

Since the objective is to extract only the keys, I don't need to join with the LIB table anymore - the *real* data goes thro' validations that ensure that all keys used are actually valid!! :-D Removing the DISTINCT gives me strange results, but, I'll try to figure that out. (I always thought there should be support for functionality that is the reverse of GROUP BY - thanks so much for introducing it)

I'll post here when I've a working design (if I can apply the outline above to my real queries) or if I run into problems.

Thanks!
Vamsi. Received on Wed Feb 06 2008 - 22:21:05 CST

Original text of this message