Re: Help: Join a "glossary" table and a "data" table referring to terms in the glossary

From: Vamsidhar <tvamsidhar_at_gmail.com>
Date: Wed, 6 Feb 2008 20:21:05 -0800 (PST)
Message-ID: <074d8fb3-ba21-4087-b9df-d0955a2a6c4b@v4g2000hsf.googlegroups.com>

On Feb 5, 10:11�pm, Charles Hooper <hooperc2..._at_yahoo.com> wrote:
> On Feb 5, 8:13�pm, Vamsidhar <tvamsid..._at_gmail.com> wrote:
>
> > On Feb 5, 7:15�pm, Mark D Powell <Mark.Pow..._at_eds.com> wrote:
> > > On Feb 5, 6:24�pm, Vamsidhar <tvamsid..._at_gmail.com> wrote:
>
> > > > I have a "glossary" table that holds a unique identifier (key) for
> > > > each "text" string used elsewhere in the app.
> > > > The fields in the data tables that should hold the text are then
> > > > populated with the "key" instead of the text - helps clients
> > > > substitute their own "glossary" for the default one, I suppose. In
> > > > some cases, there may be more than one key in a given field (separated
> > > > by commas). And the objective here is to extract a list of all the
> > > > keys used in a given data table.
>
> > > > I've tried to describe the scenario with a simplified example below:
>
> > > > Given the following data:
>
> > > > -- ---------------------
> > > > -- This is my "glossary"
> > > > create table vt_del_lib (key varchar2(100), text varchar2(100));
>
> > > > insert into vt_del_lib values ('010001', 'One');
> > > > insert into vt_del_lib values ('010002', 'Two');
> > > > insert into vt_del_lib values ('010003', 'Three');
> > > > insert into vt_del_lib values ('010004', 'Four');
> > > > insert into vt_del_lib values ('010005', 'Five');
> > > > insert into vt_del_lib values ('010006', 'Six');
> > > > insert into vt_del_lib values ('010007', 'Seven');
> > > > insert into vt_del_lib values ('010008', 'Eight');
> > > > insert into vt_del_lib values ('010009', 'Nine');
> > > > insert into vt_del_lib values ('010010', 'Ten');
> > > > commit;
>
> > > > -- This is my "data" table
> > > > create table vt_del_keys (val1 varchar2(1000));
>
> > > > insert into vt_del_keys values('010001');
> > > > insert into vt_del_keys values('010004,010005,010008,010010');
> > > > insert into vt_del_keys values('010005,010008');
> > > > commit;
> > > > -- ---------------------
>
> > > > Here's the query I've finally decided to use. I know it works, but not
> > > > how :) I was wondering if someone could explain - how/why does it work
> > > > - which version is better - is there a better way - etc.
>
> > > > It gets pretty slow as the data table grows and I'm having trouble
> > > > trying to improve its performance.
>
> > > > -- ---------------------
> > > > select distinct l.key
> > > > from
> > > > � � � � vt_del_lib l
> > > > � � � � join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > -- OR
>
> > > > select distinct l.key
> > > > from
> > > > � � � � vt_del_keys d
> > > > � � � � join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > -- ---------------------
>
> > > > Sorry for the cryptic subject line - I couldn't classify the problem
> > > > any better :)
>
> > > > TIA!
> > > > Vamsi.
>
> > > > P.S.: Some sample output...
>
> > > > SQL> --Get used keys
> > > > SQL> select l.key
> > > > � 2 �from
> > > > � 3 � � vt_del_lib l
> > > > � 4 � � join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > KEY
> > > > ---------------------------------------------------------------------------��-----
> > > > 010001
> > > > 010004
> > > > 010005
> > > > 010008
> > > > 010010
> > > > 010005
> > > > 010008
>
> > > > 7 rows selected
>
> > > > Executed in 0.375 seconds
> > > > SQL> select l.key
> > > > � 2 �from
> > > > � 3 � � vt_del_keys d
> > > > � 4 � � join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > KEY
> > > > ---------------------------------------------------------------------------��-----
> > > > 010001
> > > > 010004
> > > > 010005
> > > > 010005
> > > > 010008
> > > > 010008
> > > > 010010
>
> > > > 7 rows selected
>
> > > > Executed in 0.375 seconds
> > > > SQL> --Get distinct used kyes
> > > > SQL> select distinct l.key
> > > > � 2 �from
> > > > � 3 � � vt_del_lib l
> > > > � 4 � � join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > KEY
> > > > ---------------------------------------------------------------------------��-----
> > > > 010001
> > > > 010004
> > > > 010005
> > > > 010008
> > > > 010010
>
> > > I do not have a system in front of me that I can test with but if you
> > > run explain plan on these queries I expect that both tables are being
> > > read via full table scans since you concatenate a comma before and
> > > after you key columns rendering any index useless unless you have
> > > built function based indexes on both set of keys.
>
> > > Because the run time is exactly equal I suspect Oracle is using the
> > > same plan for both orderings of the table name in the query.
>
> > > Is the goal to find the key value or to get the descriptions for the
> > > keys?
>
> > > -- Mark D Powell --- Hide quoted text -
>
> > > - Show quoted text -
>
> > Thanks Mark D Powell :)
>
> > The goal is to extract the keys values.
>
> > Yes, the explain plan shows that both tables undergo full table scan.
> > I didn't think of creating an index on the data table because I
> > thought the LIKE clause doesn't allow for an index to be used. Can you
> > suggest what the function-based index should be like on the data
> > table? - or the glossary table, for that matter :)
>
> > Thanks.
> > Vamsi
>
> Thanks for posting the DDL and DML to create and populate the tables.
> Try a bit of experimentation working with some of Mark's suggestions.
> In addition to the rows that you inserted into the tables, I excuted
> the following to add an additional 900,000 rows into one of the
> tables:
> NSERT INTO
> � VT_DEL_LIB
> SELECT
> � TO_CHAR(ROWNUM+10010,'000000'),
> � 'N'||TO_CHAR(ROWNUM+10010,'000000')
> FROM
> � DUAL
> CONNECT BY
> � LEVEL<=900000;
>
> COMMIT;
>
> Now, let's gather stats on the tables and any indexes:
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_LIB',CASCADE=>�TRUE);
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_KEYS',CASCADE=�>TRUE);
>
> select distinct l.key
> from
> � � � � vt_del_lib l
> � � � � join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',
> %';
>
> The DBMS Xplan:
> ---------------------------------------------------------------------------�---------------------------------------------
> | Id �| Operation � � � � � | Name � � � �| Starts | E-Rows | A-Rows
> | � A-Time � | Buffers | �OMem | �1Mem | Used-Mem |
> ---------------------------------------------------------------------------�---------------------------------------------
> | � 1 | �HASH UNIQUE � � � �| � � � � � � | � � �1 | � � �2 | � � �5 |
> 00:00:01.35 | � �8217 | � 968K| � 968K| �740K (0)|
> | � 2 | � NESTED LOOPS � � �| � � � � � � | � � �1 | � � �2 | � � �7 |
> 00:00:01.35 | � �8217 | � � � | � � � | � � � � �|
> | � 3 | � �TABLE ACCESS FULL| VT_DEL_KEYS | � � �1 | � � �3 | � � �3 |
> 00:00:00.10 | � � � 3 | � � � | � � � | � � � � �|
> |* �4 | � �TABLE ACCESS FULL| VT_DEL_LIB �| � � �3 | � � �1 | � � �7 |
> 00:00:01.25 | � �8214 | � � � | � � � | � � � � �|
> ---------------------------------------------------------------------------�---------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> � �4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')
>
> The above completed in 1.35 seconds.
>
> Now, an experimentation by creating a unique index on the VT_DEL_LIB
> table, modifying the KEY column so that it cannot be NULL, and
> gathering table and index stats:
> CREATE UNIQUE INDEX IND_VT_DEL_LIB ON VT_DEL_LIB (KEY);
> ALTER TABLE VT_DEL_LIB MODIFY(KEY NOT NULL);
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_LIB',CASCADE=>�TRUE);
>
> select distinct l.key
> from
> � � � � vt_del_lib l
> � � � � join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',
> %';
>
> ---------------------------------------------------------------------------�------------------------------------------------------------
> | Id �| Operation � � � � � � �| Name � � � � � | Starts | E-Rows | A-
> Rows | � A-Time � | Buffers | Reads �| �OMem | �1Mem | Used-Mem |
> ---------------------------------------------------------------------------�------------------------------------------------------------
> | � 1 | �HASH UNIQUE � � � � � | � � � � � � � �| � � �1 |
> 135K| � � �5 |00:00:01.40 | � �6816 | � � �3 | � 968K| � 968K| �742K
> (0)|
> | � 2 | � NESTED LOOPS � � � � | � � � � � � � �| � � �1 |
> 135K| � � �7 |00:00:01.40 | � �6816 | � � �3 | � � � |
> | � � � � �|
> | � 3 | � �TABLE ACCESS FULL � | VT_DEL_KEYS � �| � � �1 | � � �3
> | � � �3 |00:00:00.01 | � � � 3 | � � �0 | � � � | � � � | � � � � �|
> |* �4 | � �INDEX FAST FULL SCAN| IND_VT_DEL_LIB | � � �3 | �45001
> | � � �7 |00:00:01.40 | � �6813 | � � �3 | � � � | � � � | � � � � �|
> ---------------------------------------------------------------------------�------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> � �4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')
>
> The query is now using the index that was just created, but the
> execution time increased by 0.05 seconds.
>
> SELECT DISTINCT
> � L.KEY
> FROM
> � VT_DEL_LIB L,
> � VT_DEL_KEYS D
> WHERE
> � D.VAL1 LIKE '%'||L.KEY||'%';
>
> ---------------------------------------------------------------------------�---------------------------------------------------
> | Id �| Operation � � � � � � �| Name � � � � � | Starts | E-Rows | A-
> Rows | � A-Time � | Buffers | �OMem | �1Mem | Used-Mem |
> ---------------------------------------------------------------------------�---------------------------------------------------
> | � 1 | �HASH UNIQUE � � � � � | � � � � � � � �| � � �1 |
> 135K| � � �5 |00:00:00.81 | � �6816 | � 968K| � 968K| �713K (0)|
> | � 2 | � NESTED LOOPS � � � � | � � � � � � � �| � � �1 |
> 135K| � � �7 |00:00:00.81 | � �6816 | � � � | � � � | � � � � �|
> | � 3 | � �TABLE ACCESS FULL � | VT_DEL_KEYS � �| � � �1 | � � �3
> | � � �3 |00:00:00.01 | � � � 3 | � � � | � � � | � � � � �|
> |* �4 | � �INDEX FAST FULL SCAN| IND_VT_DEL_LIB | � � �3 | �45001
> | � � �7 |00:00:00.80 | � �6813 | � � � | � � � | � � � � �|
> ---------------------------------------------------------------------------�---------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> � �4 - filter("D"."VAL1" LIKE '%'||"L"."KEY"||'%')
>
> The above completed in 0.81 seconds, which is a nice improvement, but
> still not great.
>
> Now, let's experiment with regular expressions (an adaptation of a
> previous solution by Maxim Demenko for splitting a sentence into
> words):
> SELECT DISTINCT
> � REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
> FROM
> � VT_DEL_KEYS
> CONNECT BY
> � REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL;
>
> KEY
> ------
> 010005
> 010008
> 010010
> 010004
> 010001
>
> If we can take the above results and drive back into into the
> VT_DEL_LIB table using the index that we previously created, we obtain
> an efficient plan:
> SELECT DISTINCT
> � L.KEY
> FROM
> � (SELECT DISTINCT
> � � REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
> � FROM
> � � VT_DEL_KEYS
> � CONNECT BY
> � � REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL) D,
> � VT_DEL_LIB L
> WHERE
> � L.KEY=D.KEY;
>
> ---------------------------------------------------------------------------�-------------------------------------------------------------
> | Id �| Operation � � � � � � � � � � � �| Name � � � � � | Starts | E-
> Rows | A-Rows | � A-Time � | Buffers | �OMem | �1Mem | Used-Mem |
> ---------------------------------------------------------------------------�-------------------------------------------------------------
> | � 1 | �HASH UNIQUE � � � � � � � � � � | � � � � � � � �| � � �1
> | � � �3 | � � �5 |00:00:00.01 | � � �15 | � 968K| � 968K| �717K (0)|
> | � 2 | � NESTED LOOPS � � � � � � � � � | � � � � � � � �| � � �1
> | � � �3 | � � �5 |00:00:00.01 | � � �15 | � � � | � � � | � � � � �|
> | � 3 | � �VIEW � � � � � � � � � � � � �| � � � � � � � �| � � �1
> | � � �3 | � � �5 |00:00:00.01 | � � � 3 | � � � | � � � | � � � � �|
> | � 4 | � � HASH UNIQUE � � � � � � � � �| � � � � � � � �| � � �1
> | � � �3 | � � �5 |00:00:00.01 | � � � 3 | � 936K| � 936K| �738K (0)|
> | � 5 | � � �CONNECT BY WITHOUT FILTERING| � � � � � � � �| � � �1
> | � � � �| � � 21 |00:00:00.01 | � � � 3 | � � � | � � � | � � � � �|
> | � 6 | � � � TABLE ACCESS FULL � � � � �| VT_DEL_KEYS � �| � � �1
> | � � �3 | � � �3 |00:00:00.01 | � � � 3 | � � � | � � � | � � � � �|
> |* �7 | � �INDEX UNIQUE SCAN � � � � � � | IND_VT_DEL_LIB | � � �5
> | � � �1 | � � �5 |00:00:00.01 | � � �12 | � � � | � � � | � � � � �|
> ---------------------------------------------------------------------------�-------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> � �7 - access("L"."KEY"="D"."KEY")
>
> The above completed in 0.01 seconds. �As the data size increases, it
> may be necessary to add an /*+ ORDERED */ hint to make certain that
> the plan does not change significantly. �Experiment to see what
> happens as the data size increases.
>
> Charles Hooper
> IT Manager/Oracle DBA
> K&M Machine-Fabricating, Inc.

Thank you *very* much for that amazing analysis, Charles Hooper!

I get the impression that the regex. support is not available in Oracle 9i - is it? We're still on 9i. But, I'm going to adapt the basic idea and try it out using a "lookup" function I wrote a while back.

Something like this:

SELECT DISTINCT
  bpr_app.bpr_tools.getTokenN(VAL1, LEVEL, ',') FROM
  VT_DEL_KEYS
CONNECT BY
  bpr_app.bpr_tools.getTokenN(VAL1, LEVEL, ',') IS NOT NULL;

Since the objective is to extract only the keys, I don't need to join with the LIB table anymore - the *real* data goes thro' validations that ensure that all keys used are actually valid!! :-D Removing the DISTINCT gives me strange results, but, I'll try to figure that out. (I always thought there should be support for functionality that is the reverse of GROUP BY - thanks so much for introducing it)

I'll post here when I've a working design (if I can apply the outline above to my real queries) or if I run into problems.

Thanks!
Vamsi. Received on Wed Feb 06 2008 - 22:21:05 CST

This message: [ Message body ]
Next message: shantanu: "Oracle Import and Export Strategy"
Previous message: DriVE_mE_cRazY: "Help - customizing RDBMS_ORACLE_HOME sqlnet.ora file"
In reply to: Charles Hooper: "Re: Help: Join a "glossary" table and a "data" table referring to terms in the glossary"
Next in thread: JMoo: "Re: Help: Join a "glossary" table and a "data" table referring to terms in the glossary"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message