Comparing Cardinality Estimates with Results
Date: Fri, 15 Nov 2013 16:34:43 -0600
Message-id: <002b01cee252$e168f1f0$a43ad5d0$_at_net>
Stephane Faroult, who is having trouble posting to the list right now, asked me to pose this question on his behalf:
I am currently working on an application of hell (Entity/Attribute/Value all over the place, unpartitioned tables in the 200 million to 3 billion row range, blanket indexing, cluster factor usually very low, hard-coded queries with long lists of identifiers, cursor_sharing set to force ... and the bulk of the work of course done in Java). In a few cases the optimizer goes south, which is understandable and seems to be more or less corrected by more aggressive dynamic sampling (the application isn't too demanding on Oracle CPU). However, I was curious to compare, globally, cardinality estimates with actual results. I wrote the following query, which is a kind of first draft (warning: requires the performance pack, I query the dba_hist tables), in which I try to collect the "last level" of cardinality estimate as reported in the plan (I know, rather meaningless with aggregates, but that's the best I have come up with so far; improvements are welcome) for queries during a precise interval. I have set three categories, "Spot on" when the order of magnitude is the same for the estimate and the actual average, "Off" when there is one order of magnitude of difference (either way) and "Gone South" if it's more than that. I was rather surprised to discover that queries were more or less equally distributed between the three categories. I would be curious to know how the optimizer performs, cardinality-wise, in other environments and what could be considered "reasonably stable".
with q as
(select d.dbid,
i.instance_number,
s.snap_id,
round(s.begin_interval_time, 'HH24') snap_hour
from dba_hist_snapshot s
cross join
(select dbid
from v$database) d
cross join
(select instance_number
from v$instance) i
where s.dbid = d.dbid
and s.instance_number = i.instance_number
and s.begin_interval_time >= to_timestamp('&start_YYYYMMDDHHMI', /*start YYYYMMMDDHH */
'YYYYMMDDHH24MI')
and s.end_interval_time <= to_timestamp('&end_YYYYMMDDHHMI', /*end YYYYMMDDHH */
'YYYYMMDDHH24MI')) select * from (select case actual_cardinality
when estimated_cardinality then 'Spot on'
else case
when abs(round(log(10, greatest(actual_cardinality, 1)
/greatest(estimated_cardinality, 1)))) < 1 then 'Off'
else 'Gone south'
end
end as cardinality_estimate,
count(*) number_of_plans
from (select s.dbid,
s.sql_id,
s.plan_hash_value,
round(log(10, greatest(avg(rows_processed_delta), 1))) actual_cardinality,
round(avg(rows_processed_delta)) as actual
from q
inner join dba_hist_sqlstat s
on s.dbid = q.dbid
and s.instance_number = q.instance_number
and s.snap_id = q.snap_id
group by s.dbid,
s.sql_id,
s.plan_hash_value) a
inner join (select x.sql_id,
x.plan_hash_value,
round(log(10, greatest(sum(x.cardinality), 1))) as estimated_cardinality,
sum(x.cardinality) as estimate
from (select dbid,
sql_id,
plan_hash_value,
cardinality,
rank() over (partition by dbid,
plan_hash_value
order by id) rnk
from dba_hist_sql_plan
where cardinality is not null) x
where x.rnk = 1
group by x.sql_id,
x.plan_hash_value) b
on b.sql_id = a.sql_id
and b.plan_hash_value = a.plan_hash_value
group by case actual_cardinality
when estimated_cardinality then 'Spot on'
else case
when abs(round(log(10, greatest(actual_cardinality, 1)
/greatest(estimated_cardinality, 1)))) < 1 then 'Off'
else 'Gone south'
end
end)
order by case cardinality_estimate
when 'Spot on' then 1
when 'Off' then 2
else 3
end
/
Larry G. Elkins
elkinsl_at_verizon.net
Cell: 214.695.8605
-- http://www.freelists.org/webpage/oracle-lReceived on Fri Nov 15 2013 - 23:34:43 CET
