Relations and correlations

From: Mladen Gogala <mgogala_at_yahoo.com>
Date: 14 May 2008 09:44:46 GMT
Message-ID: <482ab48d$0$30638$834e42db@reader.greatnowhere.com>


DBMS_STATS package in version 11g includes "CREATE_EXTENDED_STATS" function. This package creates an "extension", consisting of group of columns, for which cardinality and distribution are calculated separately. This is probably the 1st and the last place to note that I absolutely abhor the fact that it is a function and not a procedure. Other then that, it is a great tool for handling correlated columns, like the famous Jonathan's example of months and zodiac signs. Below is a little script to test the new feature:

set trimout on
set trimspool on
connect scott/tiger
spool /tmp/test_stats.out
drop table test_stats purge;
CREATE TABLE TEST_STATS (

	A NUMBER(12,0), 
	B NUMBER(12,0), 
	C VARCHAR2(32)

);
declare
sd integer;
i integer;
t varchar2(64);
ext varchar2(64);
begin
select to_number(to_char(sysdate,'J')) into sd from dual; dbms_random.seed(sd);
for sd in 1..100000
loop
i:=dbms_random.random;
t:=dbms_random.string('A',32);
insert into test_stats values(i,i,t);
end loop;
commit;
select dbms_stats.create_extended_stats('SCOTT','TEST_STATS','(A,B)') into ext
from dual;
dbms_output.put_line('Created extension is:'||ext); end;
/
prompt 'Analyzing table...'
analyze table test_stats compute statistics for all columns size 16; column column_name format a32
select column_name,count(*) endpoints from user_histograms where table_name='TEST_STATS'
group by column_name
order by column_name;
exit;

After the script is finished, you should see the following result:

Table dropped.

Elapsed: 00:00:00.21

Table created.

Elapsed: 00:00:00.26
Created extension is:SYS_STUNA$6DVXJXTP05EH56DTIR0X

PL/SQL procedure successfully completed.

Elapsed: 00:00:13.44
'Analyzing table...'

Table analyzed.

Elapsed: 00:00:02.49

COLUMN_NAME                       ENDPOINTS
-------------------------------- ----------
A                                        17
B                                        17
C                                        17
SYS_STUNA$6DVXJXTP05EH56DTIR0X           17

Elapsed: 00:00:00.01
Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Not only that, but a closer examination detects that DBMS_METADATA package now returns the following as the DDL for the table:

CREATE TABLE "SCOTT"."TEST_STATS"
   ( "SYS_STUNA$6DVXJXTP05EH56DTIR0X" NUMBER GENERATED ALWAYS AS (SYS_OP_COMBINED_HASH("A","B")) VIRTUAL HIDDEN ,
"A" NUMBER(12,0),
"B" NUMBER(12,0),
"C" VARCHAR2(32)

   ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING   STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)   TABLESPACE "USERS" ;   There is a virtual and hidden column added to our table, as hash of columns A and B. Mere "DESCRIBE" will not show the virtual column and neither will USER_TAB_COLUMNS.
Virtual columns, with "GENERATED ALWAYS" option are the new feature of Oracle11g. Virtual columns are described here:http://tinyurl.com/44wm46

I found out that generating extensions for the columns that are frequently used together in joins can help with getting a better plan. I cannot give an example because of the NDA. The database, after all, belongs to my employer and not to me.

-- 
Mladen Gogala
http://mgogala.freehostia.com
Received on Wed May 14 2008 - 04:44:46 CDT

Original text of this message