Re: The 20% rule

From: DA Morgan <damorgan_at_psoug.org>
Date: Sat, 26 Jul 2008 14:32:24 -0700
Message-ID: <1217107939.463942@bubbleator.drizzle.com>


Charles Hooper wrote:

> On Jul 25, 4:37 pm, DA Morgan <damor..._at_psoug.org> wrote:

>> One of the issues I have with the demo I posted, and as I indicated
>> I only posted it to prove the rule was mythology, is that the bytes
>> for an index row greater than for a table row. One can easily reverse
>> that with something like this:
>>
>> CREATE TABLE index_demo (
>> person_id NUMBER(10),
>> gender VARCHAR2(1),
>> state VARCHAR2(2),
>> textcol VARCHAR2(2000));
>>
>> DECLARE
>> g index_demo.gender%TYPE := 'F';
>> BEGIN
>> FOR i IN 1 .. 250000
>> LOOP
>> INSERT INTO index_demo
>> (person_id, gender, state, textcol)
>> VALUES
>> (i, g, 'WA', RPAD('x', 1799, 'x'));
>>
>> IF g = 'F' THEN
>> g := 'M';
>> ELSE
>> g := 'F';
>> END IF;
>> END LOOP;
>> COMMIT;
>>
>> UPDATE index_demo
>> SET state = 'OR'
>> WHERE person_id LIKE '%1';
>>
>> UPDATE index_demo
>> SET state = 'CA'
>> WHERE person_id LIKE '%2';
>>
>> UPDATE index_demo
>> SET state = 'ID'
>> WHERE person_id LIKE '%3';
>>
>> UPDATE index_demo
>> SET state = 'NY'
>> WHERE person_id LIKE '%4';
>>
>> UPDATE index_demo
>> SET state = 'MA'
>> WHERE person_id LIKE '%5';
>>
>> UPDATE index_demo
>> SET state = 'MN'
>> WHERE person_id LIKE '%6';
>>
>> UPDATE index_demo
>> SET state = 'VA'
>> WHERE person_id LIKE '%7';
>>
>> UPDATE index_demo
>> SET state = 'NC'
>> WHERE person_id LIKE '%8';
>>
>> UPDATE index_demo
>> SET state = 'MI'
>> WHERE person_id like '%9';
>>
>> COMMIT;
>> END;
>> /
>>
>> I use them in demos of bitmap indexes in part to demonstrate
>> why gender is not the ideal usage. Some people may want to
>> use it to construct test data of their own.
>> --
>> Daniel A. Morgan
>> Oracle Ace Director & Instructor
>> University of Washington
>> damor..._at_x.washington.edu (replace x with u to respond)
>> Puget Sound Oracle Users Groupwww.psoug.org- Hide quoted text -
> 
> Thanks for providing PL/SQL code.
> 
> I started running the script and then started performing some
> calculations while I was waiting for the PL/SQL code to complete.  At
> a little less than 2,000 bytes per row, there would be roughly 4 rows
> per 8KB block.  8KB * 250,000/4 = ~512MB.  Then I started wondering
> why you were using PL/SQL rather than SQL (possibly to test some other
> performance metric during the updates?).  For example:
> INSERT INTO INDEX_DEMO
>   (PERSON_ID,
>   GENDER,
>   STATE,
>   TEXTCOL)
> SELECT
>   ROWNUM,
>   DECODE(MOD(ROWNUM,2),0,'M','F'),
>   DECODE(MOD(ROWNUM,10),1,'OR',2,'CA',3,'ID',4,'NY',5,'MA',6,'MN',
> 7,'VA',8,'NC',9,'MI','WA'),
>   RPAD('x', 1799, 'x')
> FROM
>   (SELECT
>     ROWNUM RN
>   FROM
>     DUAL
>   CONNECT BY
>     LEVEL<=250) T1,
>   (SELECT
>     ROWNUM RN
>   FROM
>     DUAL
>   CONNECT BY
>     LEVEL<=1000) T2;
> 
> Your test setup could be interesting - I wonder if you would need more
> than 10 states in the data set to determine the tipping point
> percentage were an index scan or table scan would be more efficient to
> retrieve a particular subset of data in a particular database
> instance.
> 
> Would you need multiple bitmap indexes on multiple columns (with one
> column per bitmap index) in the hope that predicates would be applied
> to multiple columns in order to allow bitmap combining operations in
> order to effectively use bitmap indexes on columns with few distinct
> values?  I thought that I remembered reading an article by Richard
> Foote that showed demonstrated such a use of bitmap indexes.
> 
> Charles Hooper
> IT Manager/Oracle DBA
> K&M Machine-Fabricating, Inc.

What I provided is a simplified version from Morgan's Library. What I typically do when doing my own testing is DBMS_CRYPTO.RANDOMBYTES and DBMS_CRYPTO.RANDOMINTEGER along with enough DECODE and CASE statements to make it interesting.

Can it be done with SQL:

INSERT INTO t
SELECT dbms_crypto.randombytes(10), dbms_crypto.randominteger FROM dual CONNECT BY LEVEL <=200;

of course. But with PL/SQL students can more readily make modifications when more and more columns and complexity is added. Thus I start them out that way. For some reason CONNECT BY is more difficult for them to master.

-- 
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damorgan_at_x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Received on Sat Jul 26 2008 - 16:32:24 CDT

Original text of this message