Re: Help: Join a "glossary" table and a "data" table referring to terms in the glossary
Date: Tue, 5 Feb 2008 19:11:49 -0800 (PST)
Message-ID: <d8e2d79c-f354-470f-b751-3554af8e73fa@j20g2000hsi.googlegroups.com>
On Feb 5, 8:13 pm, Vamsidhar <tvamsid..._at_gmail.com> wrote:
> On Feb 5, 7:15 pm, Mark D Powell <Mark.Pow..._at_eds.com> wrote:
> > On Feb 5, 6:24 pm, Vamsidhar <tvamsid..._at_gmail.com> wrote:
>
> > > I have a "glossary" table that holds a unique identifier (key) for
> > > each "text" string used elsewhere in the app.
> > > The fields in the data tables that should hold the text are then
> > > populated with the "key" instead of the text - helps clients
> > > substitute their own "glossary" for the default one, I suppose. In
> > > some cases, there may be more than one key in a given field (separated
> > > by commas). And the objective here is to extract a list of all the
> > > keys used in a given data table.
>
> > > I've tried to describe the scenario with a simplified example below:
>
> > > Given the following data:
>
> > > -- ---------------------
> > > -- This is my "glossary"
> > > create table vt_del_lib (key varchar2(100), text varchar2(100));
>
> > > insert into vt_del_lib values ('010001', 'One');
> > > insert into vt_del_lib values ('010002', 'Two');
> > > insert into vt_del_lib values ('010003', 'Three');
> > > insert into vt_del_lib values ('010004', 'Four');
> > > insert into vt_del_lib values ('010005', 'Five');
> > > insert into vt_del_lib values ('010006', 'Six');
> > > insert into vt_del_lib values ('010007', 'Seven');
> > > insert into vt_del_lib values ('010008', 'Eight');
> > > insert into vt_del_lib values ('010009', 'Nine');
> > > insert into vt_del_lib values ('010010', 'Ten');
> > > commit;
>
> > > -- This is my "data" table
> > > create table vt_del_keys (val1 varchar2(1000));
>
> > > insert into vt_del_keys values('010001');
> > > insert into vt_del_keys values('010004,010005,010008,010010');
> > > insert into vt_del_keys values('010005,010008');
> > > commit;
> > > -- ---------------------
>
> > > Here's the query I've finally decided to use. I know it works, but not
> > > how :) I was wondering if someone could explain - how/why does it work
> > > - which version is better - is there a better way - etc.
>
> > > It gets pretty slow as the data table grows and I'm having trouble
> > > trying to improve its performance.
>
> > > -- ---------------------
> > > select distinct l.key
> > > from
> > > vt_del_lib l
> > > join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > -- OR
>
> > > select distinct l.key
> > > from
> > > vt_del_keys d
> > > join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > -- ---------------------
>
> > > Sorry for the cryptic subject line - I couldn't classify the problem
> > > any better :)
>
> > > TIA!
> > > Vamsi.
>
> > > P.S.: Some sample output...
>
> > > SQL> --Get used keys
> > > SQL> select l.key
> > > 2 from
> > > 3 vt_del_lib l
> > > 4 join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > KEY
> > > --------------------------------------------------------------------------------
> > > 010001
> > > 010004
> > > 010005
> > > 010008
> > > 010010
> > > 010005
> > > 010008
>
> > > 7 rows selected
>
> > > Executed in 0.375 seconds
> > > SQL> select l.key
> > > 2 from
> > > 3 vt_del_keys d
> > > 4 join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > KEY
> > > --------------------------------------------------------------------------------
> > > 010001
> > > 010004
> > > 010005
> > > 010005
> > > 010008
> > > 010008
> > > 010010
>
> > > 7 rows selected
>
> > > Executed in 0.375 seconds
> > > SQL> --Get distinct used kyes
> > > SQL> select distinct l.key
> > > 2 from
> > > 3 vt_del_lib l
> > > 4 join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> > > KEY
> > > --------------------------------------------------------------------------------
> > > 010001
> > > 010004
> > > 010005
> > > 010008
> > > 010010
>
> > I do not have a system in front of me that I can test with but if you
> > run explain plan on these queries I expect that both tables are being
> > read via full table scans since you concatenate a comma before and
> > after you key columns rendering any index useless unless you have
> > built function based indexes on both set of keys.
>
> > Because the run time is exactly equal I suspect Oracle is using the
> > same plan for both orderings of the table name in the query.
>
> > Is the goal to find the key value or to get the descriptions for the
> > keys?
>
> > -- Mark D Powell --- Hide quoted text -
>
> > - Show quoted text -
>
> Thanks Mark D Powell :)
>
> The goal is to extract the keys values.
>
> Yes, the explain plan shows that both tables undergo full table scan.
> I didn't think of creating an index on the data table because I
> thought the LIKE clause doesn't allow for an index to be used. Can you
> suggest what the function-based index should be like on the data
> table? - or the glossary table, for that matter :)
>
> Thanks.
> Vamsi
Thanks for posting the DDL and DML to create and populate the tables.
Try a bit of experimentation working with some of Mark's suggestions.
In addition to the rows that you inserted into the tables, I excuted
the following to add an additional 900,000 rows into one of the
tables:
NSERT INTO
VT_DEL_LIB
SELECT
TO_CHAR(ROWNUM+10010,'000000'),
'N'||TO_CHAR(ROWNUM+10010,'000000')
FROM
DUAL
CONNECT BY
LEVEL<=900000;
COMMIT;
Now, let's gather stats on the tables and any indexes:
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_LIB',CASCADE=>TRUE);
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_KEYS',CASCADE=>TRUE);
select distinct l.key
from
vt_del_lib l join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
The DBMS Xplan:
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
| 1 | HASH UNIQUE | | 1 | 2 | 5 | 00:00:01.35 | 8217 | 968K| 968K| 740K (0)|
| 2 | NESTED LOOPS | | 1 | 2 | 7 | 00:00:01.35 | 8217 | | | | | 3 | TABLE ACCESS FULL| VT_DEL_KEYS | 1 | 3 | 3 | 00:00:00.10 | 3 | | | | |* 4 | TABLE ACCESS FULL| VT_DEL_LIB | 3 | 1 | 7 | 00:00:01.25 | 8214 | | | | ------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')
The above completed in 1.35 seconds.
Now, an experimentation by creating a unique index on the VT_DEL_LIB
table, modifying the KEY column so that it cannot be NULL, and
gathering table and index stats:
CREATE UNIQUE INDEX IND_VT_DEL_LIB ON VT_DEL_LIB (KEY);
ALTER TABLE VT_DEL_LIB MODIFY(KEY NOT NULL);
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'VT_DEL_LIB',CASCADE=>TRUE);
select distinct l.key
from
vt_del_lib l join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
| Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
| 1 | HASH UNIQUE | | 1 | 135K| 5 |00:00:01.40 | 6816 | 3 | 968K| 968K| 742K (0)| | 2 | NESTED LOOPS | | 1 | 135K| 7 |00:00:01.40 | 6816 | 3 | | | | | 3 | TABLE ACCESS FULL | VT_DEL_KEYS | 1 | 3 | 3 |00:00:00.01 | 3 | 0 | | | | |* 4 | INDEX FAST FULL SCAN| IND_VT_DEL_LIB | 3 | 45001 | 7 |00:00:01.40 | 6813 | 3 | | | | ---------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')
The query is now using the index that was just created, but the execution time increased by 0.05 seconds.
SELECT DISTINCT
L.KEY
FROM
VT_DEL_LIB L,
VT_DEL_KEYS D
WHERE
D.VAL1 LIKE '%'||L.KEY||'%';
| Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
| 1 | HASH UNIQUE | | 1 | 135K| 5 |00:00:00.81 | 6816 | 968K| 968K| 713K (0)| | 2 | NESTED LOOPS | | 1 | 135K| 7 |00:00:00.81 | 6816 | | | | | 3 | TABLE ACCESS FULL | VT_DEL_KEYS | 1 | 3 | 3 |00:00:00.01 | 3 | | | | |* 4 | INDEX FAST FULL SCAN| IND_VT_DEL_LIB | 3 | 45001 | 7 |00:00:00.80 | 6813 | | | | ------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
4 - filter("D"."VAL1" LIKE '%'||"L"."KEY"||'%')
The above completed in 0.81 seconds, which is a nice improvement, but still not great.
Now, let's experiment with regular expressions (an adaptation of a
previous solution by Maxim Demenko for splitting a sentence into
words):
SELECT DISTINCT
REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
FROM
VT_DEL_KEYS
CONNECT BY
REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL;
KEY
010005
010008
010010
010004
010001
If we can take the above results and drive back into into the
VT_DEL_LIB table using the index that we previously created, we obtain
an efficient plan:
SELECT DISTINCT
L.KEY
FROM
(SELECT DISTINCT
REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
FROM
VT_DEL_KEYS
CONNECT BY
REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL) D,
VT_DEL_LIB L
WHERE
L.KEY=D.KEY;
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
| 1 | HASH UNIQUE | | 1 | 3 | 5 |00:00:00.01 | 15 | 968K| 968K| 717K (0)| | 2 | NESTED LOOPS | | 1 | 3 | 5 |00:00:00.01 | 15 | | | | | 3 | VIEW | | 1 | 3 | 5 |00:00:00.01 | 3 | | | | | 4 | HASH UNIQUE | | 1 | 3 | 5 |00:00:00.01 | 3 | 936K| 936K| 738K (0)| | 5 | CONNECT BY WITHOUT FILTERING| | 1 | | 21 |00:00:00.01 | 3 | | | | | 6 | TABLE ACCESS FULL | VT_DEL_KEYS | 1 | 3 | 3 |00:00:00.01 | 3 | | | | |* 7 | INDEX UNIQUE SCAN | IND_VT_DEL_LIB | 5 | 1 | 5 |00:00:00.01 | 12 | | | | ----------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
7 - access("L"."KEY"="D"."KEY")
The above completed in 0.01 seconds. As the data size increases, it may be necessary to add an /*+ ORDERED */ hint to make certain that the plan does not change significantly. Experiment to see what happens as the data size increases.
Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
Received on Tue Feb 05 2008 - 21:11:49 CST