Relations and correlations
Date: 14 May 2008 09:44:46 GMT
Message-ID: <482ab48d$0$30638$834e42db@reader.greatnowhere.com>
DBMS_STATS package in version 11g includes "CREATE_EXTENDED_STATS"
function. This package creates an "extension", consisting of group of
columns, for which cardinality and distribution are calculated separately.
This is probably the 1st and the last place to note that I absolutely
abhor the fact that it is a function and not a procedure. Other then that,
it is a great tool for handling correlated columns, like the famous
Jonathan's example of months and zodiac signs. Below is a little script
to test the new feature:
set trimout on
set trimspool on
connect scott/tiger
spool /tmp/test_stats.out
drop table test_stats purge;
CREATE TABLE TEST_STATS (
A NUMBER(12,0), B NUMBER(12,0), C VARCHAR2(32)
);
declare
sd integer;
i integer;
t varchar2(64);
ext varchar2(64);
begin
select to_number(to_char(sysdate,'J')) into sd from dual; dbms_random.seed(sd);
for sd in 1..100000
loop
i:=dbms_random.random;
t:=dbms_random.string('A',32);
insert into test_stats values(i,i,t);
end loop;
commit;
select dbms_stats.create_extended_stats('SCOTT','TEST_STATS','(A,B)') into ext
from dual;
dbms_output.put_line('Created extension is:'||ext); end;
/
prompt 'Analyzing table...'
analyze table test_stats compute statistics for all columns size 16; column column_name format a32
select column_name,count(*) endpoints from user_histograms where table_name='TEST_STATS'
group by column_name
order by column_name;
exit;
After the script is finished, you should see the following result:
Table dropped.
Elapsed: 00:00:00.21
Table created.
Elapsed: 00:00:00.26
Created extension is:SYS_STUNA$6DVXJXTP05EH56DTIR0X
PL/SQL procedure successfully completed.
Elapsed: 00:00:13.44
'Analyzing table...'
Table analyzed.
Elapsed: 00:00:02.49
COLUMN_NAME ENDPOINTS -------------------------------- ---------- A 17 B 17 C 17 SYS_STUNA$6DVXJXTP05EH56DTIR0X 17
Elapsed: 00:00:00.01
Disconnected from Oracle Database 11g Enterprise Edition Release
11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing
options
Not only that, but a closer examination detects that DBMS_METADATA package now returns the following as the DDL for the table:
CREATE TABLE "SCOTT"."TEST_STATS"
( "SYS_STUNA$6DVXJXTP05EH56DTIR0X" NUMBER GENERATED ALWAYS AS
(SYS_OP_COMBINED_HASH("A","B")) VIRTUAL HIDDEN ,
"A" NUMBER(12,0),
"B" NUMBER(12,0),
"C" VARCHAR2(32)
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "USERS" ;
There is a virtual and hidden column added to our table, as hash of
columns A and B. Mere "DESCRIBE" will not show the virtual column and
neither will USER_TAB_COLUMNS.
Virtual columns, with "GENERATED ALWAYS" option are the new feature of
Oracle11g. Virtual columns are described here:http://tinyurl.com/44wm46
I found out that generating extensions for the columns that are frequently used together in joins can help with getting a better plan. I cannot give an example because of the NDA. The database, after all, belongs to my employer and not to me.
-- Mladen Gogala http://mgogala.freehostia.comReceived on Wed May 14 2008 - 04:44:46 CDT