From oracle-l-bounce@freelists.org Thu Aug 11 10:21:32 2005 Return-Path: Received: from air891.startdedicated.com (root@localhost) by orafaq.com (8.12.10/8.12.10) with ESMTP id j7BFLWrb015444 for ; Thu, 11 Aug 2005 10:21:32 -0500 X-ClientAddr: 206.53.239.180 Received: from turing.freelists.org (freelists-180.iquest.net [206.53.239.180]) by air891.startdedicated.com (8.12.10/8.12.10) with ESMTP id j7BFLSIP015424 for ; Thu, 11 Aug 2005 10:21:28 -0500 Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id EDD091DDE16; Thu, 11 Aug 2005 10:21:18 -0500 (EST) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 22682-10; Thu, 11 Aug 2005 10:21:18 -0500 (EST) Received: from turing (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 6349F1DDEEB; Thu, 11 Aug 2005 10:21:18 -0500 (EST) Message-ID: <42FB6C7A.90505@centrexcc.com> Date: Thu, 11 Aug 2005 09:19:22 -0600 From: Wolfgang Breitling Organization: Centrex Consulting Corporation User-Agent: Mozilla Thunderbird 0.7.2 (Windows/20040707) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Leng Kaing Cc: oracle-l@freelists.org Subject: Re: dba_tables.num_rows is less than dba_indexes.num_rows References: <18D551B1B928FF47A65B2D91F705906A017BACC7@HSNDON-EX01.hsntech.int> In-Reply-To: <18D551B1B928FF47A65B2D91F705906A017BACC7@HSNDON-EX01.hsntech.int> Content-Type: text/plain; charset=us-ascii; format=flowed X-UCalgary-MailScanner-Information: Please contact IT Help Desk at (403) 220-5555 for more information X-UCalgary-MailScanner: Found to be clean X-UCalgary-MailScanner-From: breitliw@centrexcc.com X-archive-position: 23758 X-ecartis-version: Ecartis v1.0.0 Sender: oracle-l-bounce@freelists.org Errors-To: oracle-l-bounce@freelists.org X-original-sender: breitliw@centrexcc.com Precedence: normal Reply-To: breitliw@centrexcc.com X-list: oracle-l X-Virus-Scanned: by amavisd-new-20030616-p9 (Debian) at avenirtech.net X-mailscan-MailScanner-Information: Please contact the ISP for more information X-mailscan-MailScanner: Found to be clean X-MailScanner-From: oracle-l-bounce@freelists.org X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on air891.startdedicated.com X-Spam-Status: No, hits=-3.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=2.63 Leng Kaing wrote: > Wolfgang, > > This is all news to me... > > 1) I always thought that NUM_ROWS is important. I know that a few weeks > ago when I noticed the differences in the num_rows in tables vs indexes, > we manually increased the table's num_rows and the query would revert to > using an index rather than a full table scan. I have yet to do a test to > see if what would happen if I set NUM_ROWS to a small number, eg. 5. num_rows is important, but the CBO only uses the value from *_tables, not the one from *_indexes. As Christian already said, if someone has information to the contrary - with a repeatable testcase - let me know. > > 2) I also did "..., estimate_percent=>dbms_stats.auto_sample_size, > method_opt=>'FOR > ALL INDEXED COLUMNS', degree=>2, cascade=>true" > > Are you saying that FOR ALL INDEX COLUMNS or FOR COLUMNS only gathers > histograms for the tables, but not the stats? So do I have to do it in 2 > steps - gather stats, then gather histograms? > > 3) I've never computed on any of the systems that I've worked on. And it > certainly cannot be done on the current database as it would take too > long. I always thought an estimate would be good enough. Now I'm hearing > that only a compute will do. ARGH!! We cannot afford to do this. Estimate is usually good enough for tables. In fact amazingly small estimate percentages (.01 or smaller) produce very good estimates for num_rows and avg_row_len. Estimate is often not good enough for index statistics and IMO never good enough for column histograms. Maybe one of the reasons you do not have enough time for a gather_compute is that you gather too many histograms and maybe miss a few worth having. A gather_table_stats(.. method_opt=>'for all [indexed] columns size ') will do a full scan of the table for every column for which a histogram is collected. For estimate_percent=n a temp table with a n% sample of the original table is used instead of the full table. This testcase shows that even for a very moderatley sized table. I ran every gather twice to alleviate any caching issues: SQL> select count(*) from pspnlfield; COUNT(*) ---------- 282100 SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:00.04 SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for all columns size 254'); Elapsed: 00:05:33.01 SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:01.02 SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for all columns size 1'); Elapsed: 00:00:56.01 SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:00.03 SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for all columns size 254'); Elapsed: 00:06:04.06 SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:01.02 SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for all columns size 1'); Elapsed: 00:01:01.01 SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:00.01 SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for all columns size 254'); Elapsed: 00:00:32.00 SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:01.00 SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for all columns size 1'); Elapsed: 00:00:06.08 SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:00.00 SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for all columns size 254'); Elapsed: 00:00:31.07 SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:01.01 SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for all columns size 1'); Elapsed: 00:00:06.07 A clarification to the statement that you probably collect too many histograms and at the same time may be missing some: You have method_opt=> 'for all indexed columns [size 75]' a) not all indexed columns really need histograms, only crucial ones with significant skew - that's the too many piece. b) histograms on crucial columns with significant skew, even if not indexed, can lead the optimizer to a better performing plan - that's the too few piece. If anyone thinks that too many histograms can't hurt (other than the time wasted collecting them), I had an incident where a client changed their statistics gathering from 'all columns size 1' (i.e. no histograms) to 'all indexed columns size skewonly' and the performance of a frequently used sql went from 0.01 seconds to ~ 90 seconds. -- Regards Wolfgang Breitling Centrex Consulting Corporation www.centrexcc.com -- http://www.freelists.org/webpage/oracle-l