RE: Logic for collecting statistics

From: Charudatta Joshi <joshic_at_mahindrabt.com>
Date: Fri, 30 Jul 2004 11:50:29 +0530
Message-ID: <MHEAIPLKCACENJKNJIALMEMNCDAA.joshic@mahindrabt.com>

Hi all,

I think I should have been more specific regarding my query. Actually I am unsure about collecting stats regularly, varying estimate size and automatically deciding bucket sizes. So here's a small questionnaire:

(BTW, ours is a D/W environment.)

Do you use MONITORING option on tables and collect stale stats regularly?
Do you vary your estimate percent based on table size. If so, what criteria do you use?
Do you create histograms on all columns that have indexes and have low cardinality?
Is there any other useful tip you can provide regarding collecting stats regularly?
Lastly, any comments on my script?

Okay, some background about my confusion:

I have read on this list the advise against gathering stats regularly as it can lead to undesired query plan changes. Makes sense. But how can I be sure that the plan that works now will work in future?
I have read Tom's opinion that we should gather statistics accurately and then allow the CBO to take decision, as it considers a lot more factors than we do. Makes sense too. But what about the 'Fallacies of the CBO'? Maybe I shouldn't leave it to CBO?

Due to the above two conflicting assertions, I am a little unsure. Please let me know your opinion.

Thanks & regards,
Charu.

-----Original Message-----
From: Charudatta Joshi [mailto:joshic_at_mahindrabt.com] Sent: Thursday, July 29, 2004 4:33 PM
To: oracle-l_at_freelists.org
Subject: Logic for collecting statistics

Hi All,

Version : 8.1.7.4
OS: Win2K Server

I am writing a procedure to automate collection of statistics for modified tables. The procedure calculates the estimate_percent and histogram size on the fly, prompted by JPL's comment regarding better statistics for smaller tables. However, the cutoff number of rows selected for deriving estimate_percent are totally arbitrary.

Please care to share your comments if any.

Thanks & regards,
Charu.

The logic is:

Get the list of tables and num_rows that need to be analyzed (User_Tab_Modifications).
Backup the current statistics after deleting the old ones.
Calculate estimate percent. CASE WHEN tabrows(i) < 50000 THEN v_estmt_prcnt := 100; WHEN tabrows(i) BETWEEN 50001 AND 200000 THEN v_estmt_prcnt := 40; WHEN tabrows(i) BETWEEN 200001 AND 600000 THEN v_estmt_prcnt := 20; WHEN tabrows(i) BETWEEN 600001 AND 4000000 THEN v_estmt_prcnt := 10; WHEN tabrows(i) > 4000000 THEN v_estmt_prcnt := 5; END CASE; (Anybody has better formula for calculating this?)
Get the list of columns having low distinct values and also having index on them.
Prepare method_opt parameter based on number of distinct values for the column.
Gather statistics.

And the entire PL/SQL block is:

DECLARE TYPE tablist_typ IS TABLE OF user_tables.table_name%TYPE

INDEX BY BINARY_INTEGER;
TYPE tabrows_typ IS TABLE OF user_tables.num_rows%TYPE

INDEX BY BINARY_INTEGER;
TYPE tabcols_typ IS TABLE OF user_tab_columns.column_name%TYPE

INDEX BY BINARY_INTEGER;
TYPE tabndv_typ IS TABLE OF user_tab_columns.num_distinct%TYPE

INDEX BY BINARY_INTEGER;

    tablist tablist_typ;
    tabrows tabrows_typ;
    tabcols tabcols_typ;

tabndv tabndv_typ;

    v_estmt_prcnt NUMBER;
    v_degree NUMBER := 8;
    v_method_opt VARCHAR2(32000) := 'FOR ALL INDEXED COLUMNS';

BEGIN

Get the list of tables and num_rows that need to be analyzed. SELECT a.table_name, NVL(b.num_rows, 0) + NVL(a.inserts - a.deletes, 0) BULK COLLECT INTO tablist, tabrows FROM user_tab_modifications a, user_tables b WHERE a.table_name = b.table_name;

FOR i IN NVL(tablist.FIRST, 1)..NVL(tablist.LAST, 0) LOOP

Backup earlier statistics after deleting the old ones.

        DBMS_STATS.DELETE_SCHEMA_STATS(ownname=>USER,
                                       stattab=>'STAT_TAB',
                                       statid=>tablist(i));
        DBMS_STATS.EXPORT_TABLE_STATS(ownname=>USER,
                                      stattab=>'STAT_TAB',
                                      statid=>tablist(i),
                                      tabname=>tablist(i),
                                      cascade=>TRUE);
        -- Calculate estimate percent.
        CASE
            WHEN tabrows(i) < 50000 THEN
                v_estmt_prcnt := 100;
            WHEN tabrows(i) BETWEEN 50001 AND 200000 THEN
                 v_estmt_prcnt := 40;
            WHEN tabrows(i) BETWEEN 200001 AND 600000 THEN
                 v_estmt_prcnt := 20;
            WHEN tabrows(i) BETWEEN 600001 AND 4000000 THEN
                 v_estmt_prcnt := 10;
            WHEN tabrows(i) > 4000000 THEN
                 v_estmt_prcnt := 5;
        END CASE;

        -- Build method_opt string.

        -- Collect columns having low distinct values
        -- and also having index on them.

        SELECT DISTINCT a.column_name,
                        a.num_distinct
        BULK COLLECT INTO tabcols,
                          tabndv
        FROM user_tab_columns a, user_ind_columns b
        WHERE a.table_name = b.table_name
        AND a.column_name = b.column_name
        AND a.table_name = tablist(i)
        AND a.num_distinct <= 255;

        IF tabcols.COUNT > 0 THEN

            v_method_opt := v_method_opt || ' FOR COLUMNS ';
            FOR j IN tabcols.FIRST..tabcols.LAST
            LOOP
                v_method_opt := v_method_opt || tabcols(j)
                                || ' SIZE ' || tabndv(j) || ', ' ;
            END LOOP;

            -- Remove the last comma.
            v_method_opt := SUBSTR(v_method_opt, 1, LENGTH(v_method_opt) -

2);

END IF;

Analyze the table. DBMS_STATS.GATHER_TABLE_STATS(ownname=>USER, tabname=>tablist(i), estimate_percent=>v_estmt_prcnt, degree=>v_degree, method_opt=>v_method_opt, cascade=>TRUE);

END LOOP;
END;
/

Disclaimer:

This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited.

Visit us at http://www.mahindrabt.com

Please see the official ORACLE-L FAQ: http://www.orafaq.com

To unsubscribe send email to: oracle-l-request_at_freelists.org put 'unsubscribe' in the subject line.

--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Received on Fri Jul 30 2004 - 01:16:54 CDT