Skip navigation.

Jonathan Lewis

Syndicate content Oracle Scratchpad
Just another Oracle weblog
Updated: 2 hours 58 min ago

Add primary key.

Wed, 2016-01-27 03:07

I thought I had written this note a few years ago, on OTN or Oracle-L if not on my blog, but I can’t find any sign of it so I’ve decided it’s time to write it (again) – starting as a question about the following code:


create table t1
as
with generator as (
        select  rownum  id
        from            dual
        connect by
                        rownum <= 1000
)
select
        rownum                                  id,
        trunc((rownum-1)/50)                    clustered,
        mod(rownum,20000)                       scattered,
        lpad(rownum,10)                         vc_small,
        rpad('x',100,'x')                       vc_padding
from
        generator       g1,
        generator       g2
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

alter system flush buffer_cache;

alter table t1 add constraint t1_pk primary key(id, scattered);

I’ve generated a table with 1,000,000 rows, including a column that’s guaranteed to be unique; then I’ve added a (two-column) primary key constraint to that table.

Because of the guaranteed unique column the call to add constraint will succeed. Because Oracle will automatically create a unique index to support that constraint it will have to do a tablescan of the table. So here’s the question: HOW MANY TIMES will it tablescan that table (and how many rows will it scan) ?

Space for thought …

The answer is three tablescans, 3 million rows.

Oracle will scan the table to check the validity of adding a NOT NULL definition and constraint for the id column, repeat the scan to do the same for the scattered column, then one final scan to accumulate the key data and rowids to sort and create the index.

Knowing this, you may be able to find ways to modify bulk data loading operations to minimise overheads.

The most recent version I’ve tested this on is 12.1.0.2.

See also: https://jonathanlewis.wordpress.com/2012/03/02/add-constraint/

My reference: pk_overhead.sql


Trace file size

Tue, 2016-01-26 02:30

Here’s a convenient enhancement for tracing that came up on Twitter a few days ago – first in a tweet that I retweeted, then in a question from Christian Antognini based on this bit of the 12c Oracle documentation (opens in separate tab). The question was – does it work for you ?

The new description for max_dump_file_size says that for large enough values Oracle will split the file into multiple chunks of a few megabytes, using a suffix to identify the sequence of the chunks, keeping only the first chunk and the most recent chunks. Unfortunately this doesn’t seem to be true. However, prompted by Chris’ question I ran a quick query against the full parameter list looking for parameters with the word “trace” in their name:


select
        /*+
                leading(nam val val2)
                full(name)
                full(val)  use_hash(val)  no_swap_join_inputs(val)
                full(val2) use_hash(val2) no_swap_join_inputs(val2)
        */
        nam.ksppinm                             name,
        val.ksppstvl                            ses_val,
        val2.ksppstvl                           sys_val,
        nam.ksppdesc                            description,
        nam.indx+1                              numb,
        nam.ksppity                             type,
        val.ksppstdf                            is_def,
        decode(bitand(nam.ksppiflg/256,1),
                1,'True',
                  'False'
        )                                       ses_mod,
        decode(bitand(nam.ksppiflg/65536,3),
                1,'Immediate',
                2,'Deferred' ,
                3,'Immediate',
                  'False'
        )                                       sys_mod,
        decode(bitand(val.ksppstvf,7),
                1,'Modified',
                4,'System Modified',
                  'False'
        )                                       is_mod,
        decode(bitand(val.ksppstvf,2),
                2,'True',
                  'False'
        )                                       is_adj,
        val.ksppstcmnt                          notes
from
        x$ksppi         nam,
        x$ksppcv        val,
        x$ksppsv        val2
where
        nam.indx = val.indx
and     val2.indx = val.indx
and     ksppinm like '%&m_search.%'
order by
        nam.ksppinm
;

Glancing through the result I spotted a couple of interesting parameters with the letters “uts” in their names, so re-ran my query looking for all the “uts” parameters, getting the following (edited) list:


NAME                           SYS_VAL         DESCRIPTION    
------------------------------ --------------- ---------------------------------------------
_diag_uts_control              0               UTS control parameter
_uts_first_segment_retain      TRUE            Should we retain the first trace segment
_uts_first_segment_size        0               Maximum size (in bytes) of first segments 
_uts_trace_disk_threshold      0               Trace disk threshold parameter
_uts_trace_segment_size        0               Maximum size (in bytes) of a trace segment
_uts_trace_segments            5               Maximum number of trace segments 

Note particularly the “first segment size” and “trace segment size” – defaulting to zero (which often means a hidden internal setting, though that doesn’t seem to be the case here, but maybe that’s what the “diag control” is for). I haven’t investigated all the effects, but after a little experimentation I found that all I needed to do to get the behaviour attributed to max_dump_file_size was to set the following two parameters – which I could do at the session level.


alter session set "_uts_first_segment_size" = 5242880;
alter session set "_uts_trace_segment_size" = 5242880;

The minimum value for these parameters is the one I’ve shown above (5120 KB) and with the default value for _uts_trace_segments you will get a maximum of 5 trace files with sequential names like the following:

ls -ltr *4901*.trc

-rw-r----- 1 oracle oinstall 5243099 Jan 26 08:15 orcl_ora_4901_1.trc
-rw-r----- 1 oracle oinstall 5243064 Jan 26 08:15 orcl_ora_4901_12.trc
-rw-r----- 1 oracle oinstall 5243058 Jan 26 08:15 orcl_ora_4901_13.trc
-rw-r----- 1 oracle oinstall 5242993 Jan 26 08:15 orcl_ora_4901_14.trc
-rw-r----- 1 oracle oinstall 1363680 Jan 26 08:15 orcl_ora_4901.trc

As you can see I’m currently generating my 15th trace, and Oracle has kept the first one and the previous three. It’s always working on a file with no suffix to its name but as soon as that file hits its limiting size (plus or minus a few bytes) it gets its appropriate suffix, the oldest file is deleted, and a new trace file without a suffix is started.

Apart from the usual header information the trace files start and end with lines like:

*** TRACE CONTINUED FROM FILE /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4901_11.trc ***
  
*** TRACE SEGMENT RENAMED TO /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4901_12.trc ***

There is one little trap to watch out for: if you set either of these parameters to be larger than max_dump_file_size tracing stops as soon as one of the segments hits the max_dump_file_size and that trace file ends with the usual “overflow” message – e.g, when I changed the max_dump_file_size to 4M in mid-session:

*** DUMP FILE SIZE IS LIMITED TO 4194304 BYTES ***

In my case I had started with max_dump_file_size set to 20M, so I got lucky with my choice of 5M as the segment size.

Further investigation is left as an exercise to the interested reader.

 


Semijoin_driver

Sun, 2016-01-24 05:42

Here’s one of those odd little tricks that (a) may help in a couple of very special cases and (b) may show up at some future date – or maybe it already does – in the optimizer if it is recognised as a solution to a more popular problem. It’s about an apparent restriction on how the optimizer uses the BITMAP MERGE operation, and to demonstrate a very simple case I’ll start with a data set with just one bitmap index:


create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1000)        n1,
        rpad('x',10,'x')        small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create bitmap index t1_b1 on t1(n1);

select index_name, leaf_blocks, num_rows from user_indexes;

/*
INDEX_NAME           LEAF_BLOCKS   NUM_ROWS
-------------------- ----------- ----------
T1_B1                        500       1000
*/

Realistically we don’t expect to use a single bitmap index to access data from a large table, usually we expect to have queries that give the optimizer the option to choose and combine several bitmap indexes (possibly driving through dimension tables first) to reduce the target row set in the table to a cost-effective level.

In this example, though, I’ve created a column data set that many people might view as “inappropriate” as the target for a bitmap index – in one million rows I have one thousand distinct values, it’s not a “low cardinality” column – but, as Richard Foote (among others) has often had to point out, it’s wrong to think that bitmap indexes are only suitable for columns with a very small number of distinct values. Moreover, it’s the only index on the table, so no chance of combining bitmaps.

Another thing to notice about my data set is that the n1 column has been generated by the mod() function; because of this the column cycles through the 1,000 values I’ve allowed for it, and this means that the rows for any given value are scattered widely across the table, but it also means that if I find a row with the value X in it then there could well be a row with the value X+4 (say) in the same block.

I’ve reported the statistics from user_indexes at the end of the sample code. This shows you that the index holds 1,000 “rows” – i.e. each key value requires only one bitmap entry to cover the whole table, with two rows per leaf block.  (By comparison, a B-tree index oon the column was 2,077 leaf block uncompressed, or 1,538 leaf blocks when compressed).

So here’s the query I want to play with, followed by the run-time execution plan with stats (in this case from a 12.1.0.2 instance):


alter session set statistics_level = all;

select
        /*+
                qb_name(main)
        */
        max(small_vc)
from
        t1
where
        n1 in (1,5)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |      1 |        |      1 |00:00:00.03 |    2006 |      4 |
|   1 |  SORT AGGREGATE                       |       |      1 |      1 |      1 |00:00:00.03 |    2006 |      4 |
|   2 |   INLIST ITERATOR                     |       |      1 |        |   2000 |00:00:00.03 |    2006 |      4 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      2 |   2000 |   2000 |00:00:00.02 |    2006 |      4 |
|   4 |     BITMAP CONVERSION TO ROWIDS       |       |      2 |        |   2000 |00:00:00.01 |       6 |      4 |
|*  5 |      BITMAP INDEX SINGLE VALUE        | T1_B1 |      2 |        |      2 |00:00:00.01 |       6 |      4 |
------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("N1"=1 OR "N1"=5))

The query is selecting 2,000 rows from the table, for n1 = 1 and n1 = 5, and the plan shows us that the optimizer probes the bitmap index twice (operation 5), once for each value, fetching all the rows for n1 = 1, then fetching all the rows for n1 = 5. This entails 2,000 buffer gets. However, we know that for every row where n1 = 1 there is another row nearby (probably in the same block) where n1 = 5 – it would be nice if we could pick up the 1 and the 5 at the same time and do less work.

Technically the optimizer has the necessary facility to do this – it’s known as the BITMAP MERGE – Oracle can read two or more entries from a bitmap index, superimpose the bits (effectively a BITMAP OR), then convert to rowids and visit the table. Unfortunately there are cases (and it seems to be only the simple cases) where this doesn’t appear to be allowed even when we – the users – can see that it might be a very effective strategy. So can we make it happen – and since I’ve asked the question you know that the answer is almost sure to be yes.

Here’s an alternate (messier) SQL statement that achieves the same result:


select
        /*+
                qb_name(main)
                semijoin_driver(@subq)
        */
        max(small_vc)
from
        t1
where
        n1 in (
                select /*+ qb_name(subq) */
                        *
                from    (
                        select 1 from dual
                        union all
                        select 5 from dual
                        )
        )
;

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |      1 |00:00:00.02 |    1074 |       |       |          |
|   1 |  SORT AGGREGATE                      |       |      1 |      1 |      1 |00:00:00.02 |    1074 |       |       |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |   2000 |   2000 |00:00:00.02 |    1074 |       |       |          |
|   3 |    BITMAP CONVERSION TO ROWIDS       |       |      1 |        |   2000 |00:00:00.01 |       6 |       |       |          |
|   4 |     BITMAP MERGE                     |       |      1 |        |      1 |00:00:00.01 |       6 |  1024K|   512K| 8192  (0)|
|   5 |      BITMAP KEY ITERATION            |       |      1 |        |      2 |00:00:00.01 |       6 |       |       |          |
|   6 |       VIEW                           |       |      1 |      2 |      2 |00:00:00.01 |       0 |       |       |          |
|   7 |        UNION-ALL                     |       |      1 |        |      2 |00:00:00.01 |       0 |       |       |          |
|   8 |         FAST DUAL                    |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|   9 |         FAST DUAL                    |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|* 10 |       BITMAP INDEX RANGE SCAN        | T1_B1 |      2 |        |      2 |00:00:00.01 |       6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  10 - access("N1"="from$_subquery$_002"."1")

Key points from this plan – and I’ll comment on the SQL in a moment: The number of buffer visits is roughly halved (In many cases we picked up two rows as we visited each buffer); operation 4 shows us that we did a BITMAP MERGE, and we can see in operations 5 to 10 that we did a BITMAP KEY ITERATION (which is a bit like a nested loop join – “for each row returned by child 1 (operation 6) we executed child 2 (operation 10)”) to probe the index twice and get two strings of bits that operation 4 could merge before operation 3 converted to rowids.

For a clearer picture of how we visit the table, here are the first few rows and last few rows from a version of the two queries where we simply select the ID column rather than aggregating on the small_vc column:

select  id from ...

Original query structure
         1
      1001
      2001
      3001
...
    997005
    998005
    999005

2000 rows selected.

Modified query structure:

         1
         5
      1001
      1005
      2001
      2005
...
    998001
    998005
    999001
    999005
    
2000 rows selected.

As you can see, one query returns all the n1 = 1 rows then all the n1 = 5 rows while the other query alternates as it walks through the merged bitmap. You may recall the Exadata indexing problem (now addressed, of course) from a few years back where the order in which rows were visited after a (B-tree) index range scan made a big difference to performance. This is the same type of issue – when the optimizer’s default plan gets the right data in the wrong order we may be able to find ways of modifying the SQL to visit the data in a more efficient order. In this case we save only fractions of a second because all the data is buffered, but it’s possible that in a production environment with much larger tables many, or all, of the re-visits could turn into physical reads.

Coming back to the SQL, the key to the re-write is to turn my IN-list into a subquery, and then tell the optimizer to use that subquery as a “semijoin driver”. This is essentially the mechanism used by the Star Tranformation, where the optimizer rewrites a simple join so that each dimension table (typically) appears twice, first as an IN subquery driving the bitmap selection then as a “joinback”. But (according to the manuals) a star transformation requires at least two dimension tables to be involved in a join to the central fact table – and that may be why the semi-join approach is not considered in this (and slightly more complex) cases.

 

 

My reference: bitmap_merge.sql, star_hack3.sql


Drop Column

Mon, 2016-01-18 02:14

I published a note on AllthingsOracle a few days ago discussing the options for dropping a column from an existing table. In a little teaser to a future article I pointed out that dropping columns DOESN’T reclaim space; or rather, probably doesn’t, and even if it did you probably won’t like the way it does it.

I will  be writing about “massive deletes” for AllthingsOracle in the near future, but I thought I’d expand on the comment about not reclaiming space straight away. The key point is this – when you drop a column you are probably dropping a small fraction of each row. (Obviously there are some extreme variants on the idea – for example, you might have decided to move a large varchar2() to a separate table with shared primary key).

If you’ve dropped a small fraction of each row you’ve freed up a small fraction of each block, which probably means the block hasn’t been identified as having available free space for inserts. In many cases this is probably  a good thing – because it’s quite likely the if every block in your table is suddenly labelled as having sufficient free space for new row then you could end up with a difficult and ongoing performance problem.

Many large tables have a “time-based” component to their usage – as time passes the most recently entered rows are the ones that get most usage, and older rows are no longer accessed; this means you get a performance benefit from caching because the most useful fractions of such tables are often well cached and the “interesting” data is fairly well clustered.

In a case like this, imagine what will happen if EVERY block in your table suddenly acquires enough free space to accept a couple of new rows – over the next few days the incoming data will be spread across the entire length of the table, and for the next couple of months, or years, you will have to keep the entire table cached in memory if the performance is to stay constant; moreover the clustering_factor of the most useful indexes is likely to jump from “quite small” to “absolutely massive”, and the optimizer will start changing lots of plans because it will decide that your favourite indexes are probably much to expensive to user.

I am, of course, painting a very grim picture – but it is a possible scenario that should be considered before you drop a column from a table. Combined with my observations about the locking and overheads of dropping a column you might (probably ought to) decide that you should never drop a column you should only mark it as unused or (better still if you’re on 12c) mark it invisible for a while before marking it unused. You can worry about space reclamation at a later date when you considered all the ramifications of how it might impact on performance.

Footnote: If you’re still using freelist management then dropping a column won’t put a block on the freelist until the total used space in the block falls below the value dictated by pctused (default 40%); if you’re using ASSM then the block doesn’t become available for reuse until (by default) the free space exceeds 25% of the block’s usable space.

 

 


Subquery Effects

Mon, 2016-01-11 06:50

Towards the end of last year I used a query with a couple of “constant” subqueries as a focal point for a blog note on reading parallel execution plans. One of the comments on that note raised a question about cardinality estimates and, coincidentally, I received an email about the cost calculations for a similar query a few days later.

Unfortunately there are all sorts of anomalies, special cases, and changes that show up across versions when subqueries come into play – it’s only in recent versions of 11.2, for example, that a very simple example I’ve got of three equivalent statements that produce the same execution plan report the same costs and cardinality. (The queries are:  table with IN subquery, table with EXISTS subquery, table joined to “manually unnested” subquery – the three plans take the unnested subquery shape.)

I’m just going to pick out one particular anomaly, which is a costing error with multiple subqueries when “OR-ed”. Here’s my sample data set:


create table t1
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 20000
;


create table t2
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 25000
;

create table t3
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 30000
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t1',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t2',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t3',
                method_opt       => 'for all columns size 1'
        );
end;
/

The three tables are slightly different sizes so that it will be easy to see different costs of tablescans, and there are no indexes to everything I do in the queries will be tablescans. Here are six queries I’m going to test – they all scan t1, with “constant” subqueries against t2 and/or t3. The first pair is just to show you the basic cost of the query with a single subquery, the second pair shows you the default action with two subqueries in two different orders, the final pair shows you what happens with two subqueries when you block subquery pushing.


select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     t1.n2 > (select avg(t2.n2) from t2)
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     t1.n3 > (select avg(t3.n3) from t3)
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n2 > (select avg(t2.n2) from t2)
         or t1.n3 > (select avg(t3.n3) from t3)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n3 > (select avg(t3.n3) from t3)
         or t1.n2 > (select avg(t2.n2) from t2)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
         or t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
         or t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
        )
;

Here are the first two plans, pulled from memory (which you might have guessed thanks to the “disappearing subquery predicate” in the predicate section. These examples came from 12.1.0.2, but the same happens in 11.2.0.4:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   111 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   500 |  5000 |    49   (3)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND "T1"."N2">))

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   123 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   500 |  5000 |    49   (3)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND "T1"."N3">))

As you can see, the cost of the query is the cost of the t1 tablescan plus the cost of running the t2 or t3 subquery once: 111 = 49 + 62, and 123 = 49 + 74.

(As a general guideline, recent versions of the optimizer tend to allow for subqueries by including “cost of subquery” * “number of times the optimizer thinks it will execute” – in this case the optimizer knows that the subquery will run exactly once).

But what happens when we test the query that applies BOTH subqueries to the tablescan ?


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    50 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   975 | 14625 |    50   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND ("T1"."N2"> OR "T1"."N3">)))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    50 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   975 | 14625 |    50   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND ("T1"."N3"> OR "T1"."N2">)))

The cost of the query in both cases is just the cost of the tablescan of t1 – the subqueries are, apparently, free. You can check from the predicate section, by the way, that the subqueries are applied in the order they appear in original statement.

Does anything change if the subqueries are not pushed ?


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   111 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   FILTER             |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   4 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   5 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
|   6 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N2"> OR "T1"."N3">))
   3 - filter("T1"."N1">10000)

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   124 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   FILTER             |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   4 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   5 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   6 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N3"> OR "T1"."N2">))
   3 - filter("T1"."N1">10000)

The two plans have different costs – and the cost is the cost of the tablescan of t1 plus the cost of just the first subquery in the filter predciate list.

The non-pushed subqueries show up another anomaly: you’ll notice that the t1 tablescan reports 10,001 rows cardinality, but the FILTER operation doesn’t have an associated cardinality so we can’t see how many rows the optimizer thinks will survive the subqueries. So let’s run a query that allows us to see the surviving row estimate:


select
        max(n1)
from
        (
        select
                /*+ no_eliminate_oby */
                t1.n1
        from
                t1
        where
                t1.n1 > 10000
        and     (
                   t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
                or t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
                )
        order by
                n1
        )
;

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |       |       |   126 (100)|          |
|   1 |  SORT AGGREGATE        |      |     1 |    13 |            |          |
|   2 |   VIEW                 |      | 10001 |   126K|   126   (5)| 00:00:01 |
|   3 |    SORT ORDER BY       |      | 10001 |   146K|   126   (5)| 00:00:01 |
|*  4 |     FILTER             |      |       |       |            |          |
|*  5 |      TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   6 |      SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |       TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   8 |      SORT AGGREGATE    |      |     1 |     5 |            |          |
|   9 |       TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("T1"."N3"> OR "T1"."N2">))
   5 - filter("T1"."N1">10000)

As you can see, the SORT ORDER BY operation thinks it’s going to handle 10,0001 rows – it looks as if the optimizer arithmetic hasn’t applied the usual subquery guess of 5% for the two subqueries. (When the subqueries were automatically pushed you saw a cardinality of 975 – which is 5% for subquery t2 plus (due to OR) 5% for subquery t3 minus 5% of 5% (=25) for the overlap – this is the standard OR arithmetic)

tl;dr

Although the optimizer code has been enhanced in many places for dealing with subquery estimates, but there are still some odd errors and inconsistencies that you need to be aware of. The examples I’ve shown may not be particularly significant in terms of what they do, but the pattern is one that you may recognise in more complex queries.

 

Reference script: subq_cost_anomaly_2.sql