Re: Help: Join a "glossary" table and a "data" table referring to terms in the glossary
Date: Wed, 6 Feb 2008 20:21:05 -0800 (PST)
Message-ID: <074d8fb3-ba21-4087-b9df-d0955a2a6c4b@v4g2000hsf.googlegroups.com>
On Feb 5, 10:11 pm, Charles Hooper <hooperc2..._at_yahoo.com> wrote:
> On Feb 5, 8:13 pm, Vamsidhar <tvamsid..._at_gmail.com> wrote:
>
> > On Feb 5, 7:15 pm, Mark D Powell <Mark.Pow..._at_eds.com> wrote:
> > > On Feb 5, 6:24 pm, Vamsidhar <tvamsid..._at_gmail.com> wrote:
>
> > > > I have a "glossary" table that holds a unique identifier (key) for
> > > > each "text" string used elsewhere in the app.
> > > > The fields in the data tables that should hold the text are then
> > > > populated with the "key" instead of the text - helps clients
> > > > substitute their own "glossary" for the default one, I suppose. In
> > > > some cases, there may be more than one key in a given field (separated
> > > > by commas). And the objective here is to extract a list of all the
> > > > keys used in a given data table.
>
> > > > I've tried to describe the scenario with a simplified example below:
>
> > > > Given the following data:
>
> > > > -- ---------------------
> > > > -- This is my "glossary"
> > > > create table vt_del_lib (key varchar2(100), text varchar2(100));
>
> > > > insert into vt_del_lib values ('010001', 'One');
> > > > insert into vt_del_lib values ('010002', 'Two');
> > > > insert into vt_del_lib values ('010003', 'Three');
> > > > insert into vt_del_lib values ('010004', 'Four');
> > > > insert into vt_del_lib values ('010005', 'Five');
> > > > insert into vt_del_lib values ('010006', 'Six');
> > > > insert into vt_del_lib values ('010007', 'Seven');
> > > > insert into vt_del_lib values ('010008', 'Eight');
> > > > insert into vt_del_lib values ('010009', 'Nine');
> > > > insert into vt_del_lib values ('010010', 'Ten');
> > > > commit;
>
> > > > -- This is my "data" table
> > > > create table vt_del_keys (val1 varchar2(1000));
>
> > > > insert into vt_del_keys values('010001');
> > > > insert into vt_del_keys values('010004,010005,010008,010010');
> > > > insert into vt_del_keys values('010005,010008');
> > > > commit;
> > > > -- ---------------------
>
> > > > Here's the query I've finally decided to use. I know it works, but not
> > > > how :) I was wondering if someone could explain - how/why does it work
> > > > - which version is better - is there a better way - etc.
>
> > > > It gets pretty slow as the data table grows and I'm having trouble
> > > > trying to improve its performance.
>
> > > > -- ---------------------
> > > > select distinct l.key
> > > > from
> > > > vt_del_lib l
> > > > join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > -- OR
>
> > > > select distinct l.key
> > > > from
> > > > vt_del_keys d
> > > > join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > -- ---------------------
>
> > > > Sorry for the cryptic subject line - I couldn't classify the problem
> > > > any better :)
>
> > > > TIA!
> > > > Vamsi.
>
> > > > P.S.: Some sample output...
>
> > > > SQL> --Get used keys
> > > > SQL> select l.key
> > > > 2 from
> > > > 3 vt_del_lib l
> > > > 4 join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > KEY
> > > > --------------------------------------------------------------------------------
> > > > 010001
> > > > 010004
> > > > 010005
> > > > 010008
> > > > 010010
> > > > 010005
> > > > 010008
>
> > > > 7 rows selected
>
> > > > Executed in 0.375 seconds
> > > > SQL> select l.key
> > > > 2 from
> > > > 3 vt_del_keys d
> > > > 4 join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > KEY
> > > > --------------------------------------------------------------------------------
> > > > 010001
> > > > 010004
> > > > 010005
> > > > 010005
> > > > 010008
> > > > 010008
> > > > 010010
>
> > > > 7 rows selected
>
> > > > Executed in 0.375 seconds
> > > > SQL> --Get distinct used kyes
> > > > SQL> select distinct l.key
> > > > 2 from
> > > > 3 vt_del_lib l
> > > > 4 join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > > KEY
> > > > --------------------------------------------------------------------------------
> > > > 010001
> > > > 010004
> > > > 010005
> > > > 010008
> > > > 010010
>
> > > I do not have a system in front of me that I can test with but if you
> > > run explain plan on these queries I expect that both tables are being
> > > read via full table scans since you concatenate a comma before and
> > > after you key columns rendering any index useless unless you have
> > > built function based indexes on both set of keys.
>
> > > Because the run time is exactly equal I suspect Oracle is using the
> > > same plan for both orderings of the table name in the query.
>
> > > Is the goal to find the key value or to get the descriptions for the
> > > keys?
>
> > > -- Mark D Powell --- Hide quoted text -
>
> > > - Show quoted text -
>
> > Thanks Mark D Powell :)
>
> > The goal is to extract the keys values.
>
> > Yes, the explain plan shows that both tables undergo full table scan.
> > I didn't think of creating an index on the data table because I
> > thought the LIKE clause doesn't allow for an index to be used. Can you
> > suggest what the function-based index should be like on the data
> > table? - or the glossary table, for that matter :)
>
> > Thanks.
> > Vamsi
>
> Thanks for posting the DDL and DML to create and populate the tables.
> Try a bit of experimentation working with some of Mark's suggestions.
> In addition to the rows that you inserted into the tables, I excuted
> the following to add an additional 900,000 rows into one of the
> tables:
> NSERT INTO
> VT_DEL_LIB
> SELECT
> TO_CHAR(ROWNUM+10010,'000000'),
> 'N'||TO_CHAR(ROWNUM+10010,'000000')
> FROM
> DUAL
> CONNECT BY
> LEVEL<=900000;
>
> COMMIT;
>
> Now, let's gather stats on the tables and any indexes:
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_LIB',CASCADE=>TRUE);
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_KEYS',CASCADE=>TRUE);
>
> select distinct l.key
> from
> vt_del_lib l
> join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',
> %';
>
> The DBMS Xplan:
> ------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows | A-Rows
> | A-Time | Buffers | OMem | 1Mem | Used-Mem |
> ------------------------------------------------------------------------------------------------------------------------
> | 1 | HASH UNIQUE | | 1 | 2 | 5 |
> 00:00:01.35 | 8217 | 968K| 968K| 740K (0)|
> | 2 | NESTED LOOPS | | 1 | 2 | 7 |
> 00:00:01.35 | 8217 | | | |
> | 3 | TABLE ACCESS FULL| VT_DEL_KEYS | 1 | 3 | 3 |
> 00:00:00.10 | 3 | | | |
> |* 4 | TABLE ACCESS FULL| VT_DEL_LIB | 3 | 1 | 7 |
> 00:00:01.25 | 8214 | | | |
> ------------------------------------------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> 4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')
>
> The above completed in 1.35 seconds.
>
> Now, an experimentation by creating a unique index on the VT_DEL_LIB
> table, modifying the KEY column so that it cannot be NULL, and
> gathering table and index stats:
> CREATE UNIQUE INDEX IND_VT_DEL_LIB ON VT_DEL_LIB (KEY);
> ALTER TABLE VT_DEL_LIB MODIFY(KEY NOT NULL);
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_LIB',CASCADE=>TRUE);
>
> select distinct l.key
> from
> vt_del_lib l
> join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',
> %';
>
> ---------------------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows | A-
> Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
> ---------------------------------------------------------------------------------------------------------------------------------------
> | 1 | HASH UNIQUE | | 1 |
> 135K| 5 |00:00:01.40 | 6816 | 3 | 968K| 968K| 742K
> (0)|
> | 2 | NESTED LOOPS | | 1 |
> 135K| 7 |00:00:01.40 | 6816 | 3 | |
> | |
> | 3 | TABLE ACCESS FULL | VT_DEL_KEYS | 1 | 3
> | 3 |00:00:00.01 | 3 | 0 | | | |
> |* 4 | INDEX FAST FULL SCAN| IND_VT_DEL_LIB | 3 | 45001
> | 7 |00:00:01.40 | 6813 | 3 | | | |
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> 4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')
>
> The query is now using the index that was just created, but the
> execution time increased by 0.05 seconds.
>
> SELECT DISTINCT
> L.KEY
> FROM
> VT_DEL_LIB L,
> VT_DEL_KEYS D
> WHERE
> D.VAL1 LIKE '%'||L.KEY||'%';
>
> ------------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows | A-
> Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
> ------------------------------------------------------------------------------------------------------------------------------
> | 1 | HASH UNIQUE | | 1 |
> 135K| 5 |00:00:00.81 | 6816 | 968K| 968K| 713K (0)|
> | 2 | NESTED LOOPS | | 1 |
> 135K| 7 |00:00:00.81 | 6816 | | | |
> | 3 | TABLE ACCESS FULL | VT_DEL_KEYS | 1 | 3
> | 3 |00:00:00.01 | 3 | | | |
> |* 4 | INDEX FAST FULL SCAN| IND_VT_DEL_LIB | 3 | 45001
> | 7 |00:00:00.80 | 6813 | | | |
> ------------------------------------------------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> 4 - filter("D"."VAL1" LIKE '%'||"L"."KEY"||'%')
>
> The above completed in 0.81 seconds, which is a nice improvement, but
> still not great.
>
> Now, let's experiment with regular expressions (an adaptation of a
> previous solution by Maxim Demenko for splitting a sentence into
> words):
> SELECT DISTINCT
> REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
> FROM
> VT_DEL_KEYS
> CONNECT BY
> REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL;
>
> KEY
> ------
> 010005
> 010008
> 010010
> 010004
> 010001
>
> If we can take the above results and drive back into into the
> VT_DEL_LIB table using the index that we previously created, we obtain
> an efficient plan:
> SELECT DISTINCT
> L.KEY
> FROM
> (SELECT DISTINCT
> REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
> FROM
> VT_DEL_KEYS
> CONNECT BY
> REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL) D,
> VT_DEL_LIB L
> WHERE
> L.KEY=D.KEY;
>
> ----------------------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-
> Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
> ----------------------------------------------------------------------------------------------------------------------------------------
> | 1 | HASH UNIQUE | | 1
> | 3 | 5 |00:00:00.01 | 15 | 968K| 968K| 717K (0)|
> | 2 | NESTED LOOPS | | 1
> | 3 | 5 |00:00:00.01 | 15 | | | |
> | 3 | VIEW | | 1
> | 3 | 5 |00:00:00.01 | 3 | | | |
> | 4 | HASH UNIQUE | | 1
> | 3 | 5 |00:00:00.01 | 3 | 936K| 936K| 738K (0)|
> | 5 | CONNECT BY WITHOUT FILTERING| | 1
> | | 21 |00:00:00.01 | 3 | | | |
> | 6 | TABLE ACCESS FULL | VT_DEL_KEYS | 1
> | 3 | 3 |00:00:00.01 | 3 | | | |
> |* 7 | INDEX UNIQUE SCAN | IND_VT_DEL_LIB | 5
> | 1 | 5 |00:00:00.01 | 12 | | | |
> ----------------------------------------------------------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> 7 - access("L"."KEY"="D"."KEY")
>
> The above completed in 0.01 seconds. As the data size increases, it
> may be necessary to add an /*+ ORDERED */ hint to make certain that
> the plan does not change significantly. Experiment to see what
> happens as the data size increases.
>
> Charles Hooper
> IT Manager/Oracle DBA
> K&M Machine-Fabricating, Inc.
Thank you *very* much for that amazing analysis, Charles Hooper!
I get the impression that the regex. support is not available in Oracle 9i - is it? We're still on 9i. But, I'm going to adapt the basic idea and try it out using a "lookup" function I wrote a while back.
Something like this:
SELECT DISTINCT
bpr_app.bpr_tools.getTokenN(VAL1, LEVEL, ',')
FROM
VT_DEL_KEYS
CONNECT BY
bpr_app.bpr_tools.getTokenN(VAL1, LEVEL, ',') IS NOT NULL;
Since the objective is to extract only the keys, I don't need to join with the LIB table anymore - the *real* data goes thro' validations that ensure that all keys used are actually valid!! :-D Removing the DISTINCT gives me strange results, but, I'll try to figure that out. (I always thought there should be support for functionality that is the reverse of GROUP BY - thanks so much for introducing it)
I'll post here when I've a working design (if I can apply the outline above to my real queries) or if I run into problems.
Thanks!
Vamsi.
Received on Wed Feb 06 2008 - 22:21:05 CST