Jonathan Lewis

Subscribe to Jonathan Lewis feed Jonathan Lewis
Just another Oracle weblog
Updated: 1 hour 28 min ago

Conditional SQL – 5

Fri, 2018-03-02 06:49

Here’s a note that has been sitting around for more than 3 years (the draft date is Jan 2015), waiting for me to finish it off; and in that time we’ve got a new version of Oracle that changes the solution to the problem it presented. (I also managed to write “Conditional SQL –  6” in the intervening period !)

This posting started with a question on the OTN (now ODC) database forum about an execution plan used by 11.2.0.3.  Here’s a model to represent the data and the query:

rem
rem     Script:         null_plan_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2015
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment here to avoid format issue
)
select
        rownum  catentry_id,
        case
                when mod(rownum-1,100) > 0 then mod(rownum-1,100)
        end     member_id,
        case
                when trunc((rownum-1)/100) > 0 then trunc((rownum-1)/100)
        end     partnumber,
        rpad('x',100)   padding
from
        generator,
        generator
where
        rownum <= 100 * 100 -- > comment here to avoid format issue
;

execute dbms_stats.gather_table_stats(user,'catentry');

create unique index cat_i0 on catentry(member_id, partnumber) compress 1;
--  create        index cat_i1 on catentry(member_id, partnumber, 0) compress 1;
--  create        index cat_i2 on catentry(partnumber, member_id, 0) compress 1;

variable b1 number
variable b2 number
variable b3 number
variable b4 number

begin
        :b1 := 22;
        :b2 := 1;
        :b3 := 44;
        :b4 := 1;
end;
/

select
        catentry_id
from
        catentry
where
        (   partnumber= :b1
         or (0 = :b2 and partnumber is null)
        )
and     (    member_id= :b3
         or (0 = :b4 and member_id is null)
        )
;

select * from table(dbms_xplan.display_cursor);

==============================================================================

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |       |       |    23 (100)|          |
|*  1 |  TABLE ACCESS FULL| CATENTRY |     1 |    10 |    23   (5)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((("PARTNUMBER"=:B1 OR ("PARTNUMBER" IS NULL AND 0=:B2))
              AND ("MEMBER_ID"=:B3 OR ("MEMBER_ID" IS NULL AND 0=:B4))))

The question this example raised was, effectively: “Why didn’t Oracle use bind peeking to work out that the best plan for this query – with these input values – was an index range scan?”

The basic answer to this question is this: “No matter how clever Oracle can be with bind peeking and executions plans it has to produce an execution plan that will give the right answer whatever the supplied values might be.”

The OP was hoping that the optimizer would see :b2 and :b4 were arriving with the value 1, infer that “0 = 1” is always false, and reduce the query predicate to “partnumber =22 and member_id = 44” to produce the following plan:


----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | CAT_I0   |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("PARTNUMBER"=22 AND "MEMBER_ID"=44)

But this plan could produce the wrong results if the next execution of the query supplied zeros for :b2 or :b4, so Oracle has to do something more generic. (Bear in mind that adaptive cursor sharing keeps reusing the same execution plan until it detects that the performance for some input values is bad; it doesn’t pre-emptively create new plans based on the incoming values – though in principle it might be possible for the Oracle developers to introduce code that can recognise special cases for predicates of the form “constant1 = constant2”).

If you review the SQL statement you can see that it’s clearly trying to allow the user to find data about member_ids and partnumbers where both, neither, or either value is allowed to be null: a couple of “if – then – else” conditions that should have been handled in the client code have been embedded in the code. As we have seen several times before if you can’t change the client code then you have to hope that Oracle will use some clever transformation to handle the query in sections.

We can infer from various details of the posting that the member_id and partnumber columns were both allowed to be null, so if we want to make sure that Oracle always uses an indexed access path to acquire data for this query we need to have an index which starts with those two columns and then has at least one column which is guaranteed to be non-null so, for example, we could simply drop the current index and replace it with one that has a fixed zero on the end:

create index cat_i1 on catentry(member_id, partnumber, 0) compress 1;

With my particular data set, query, and version of Oracle this didn’t make any difference to the plan. But then I thought about the data definition and realised (and checked) that the index had a terrible clustering_factor, so I dropped it and created it with the first two columns in the opposite order:

create index cat_i2 on catentry(member_id, partnumber, 0) compress 1;

Side note:
You’ll notice that I’ve replaced the original unique index with a non-unique index. This was necessary because there were many rows where both partnumber and member_id were null, so if I want to maintain the logic of the previous unique index I’ll need to add a unique constraint on (member_id, partnumber). It’s possible, of course, that in similar circumstances I might want both indexes – one for the uniqueness and to access the data using only one of the columns, the other to access the data using only the other column.

With this index in place, and unhinted, the plan I got from 11.2.0.4 changed to use concatenation with an impressive four-way split:


------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |       |       |    12 (100)|          |
|   1 |  CONCATENATION                |          |       |       |            |          |
|*  2 |   FILTER                      |          |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | CAT_I2   |     1 |       |     2   (0)| 00:00:01 |
|*  5 |   FILTER                      |          |       |       |            |          |
|*  6 |    TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN          | CAT_I2   |     1 |       |     2   (0)| 00:00:01 |
|*  8 |   FILTER                      |          |       |       |            |          |
|*  9 |    TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|* 10 |     INDEX RANGE SCAN          | CAT_I2   |     1 |       |     2   (0)| 00:00:01 |
|* 11 |   TABLE ACCESS BY INDEX ROWID | CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|* 12 |    INDEX RANGE SCAN           | CAT_I2   |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((0=:B2 AND 0=:B4))
   4 - access("PARTNUMBER" IS NULL AND "MEMBER_ID" IS NULL)
       filter("MEMBER_ID" IS NULL)
   5 - filter(0=:B2)
   6 - filter((LNNVL("MEMBER_ID" IS NULL) OR LNNVL(0=:B4)))
   7 - access("PARTNUMBER" IS NULL AND "MEMBER_ID"=:B3)
       filter("MEMBER_ID"=:B3)
   8 - filter(0=:B4)
   9 - filter((LNNVL("PARTNUMBER" IS NULL) OR LNNVL(0=:B2)))
  10 - access("PARTNUMBER"=:B1 AND "MEMBER_ID" IS NULL)
  11 - filter(((LNNVL("MEMBER_ID" IS NULL) OR LNNVL(0=:B4)) AND
              (LNNVL("PARTNUMBER" IS NULL) OR LNNVL(0=:B2))))
  12 - access("PARTNUMBER"=:B1 AND "MEMBER_ID"=:B3)


To execute this plan the run-time engine works as follows:

  • Operation 2: If :b2 and :b4 are both zero we use the index to find the rows where member_id and partnumber are null (the filter “member_id is null” seems to be redundant)
  • Operation 5: if :b2 is null we use the index to find rows where the partnumber is null and the member_id is the supplied value (and if that’s null the access will immediately return zero rows because of the equality predicate), and discard any rows that have already been returned by operation 2
  • Operation 8: if :b4 is zero we will use the index to find rows where the partnumber is the supplied value (and if the partnumber is null, that access will immediately return zero rows because of the equality predicate) and the member_id is null, and discard any rows that have already been returned by operation 2.
  • Operations 11 and 12 will always run – using the index to find rows that match with equality on both the incoming member_id and partnumber, discarding any rows already returned by the previous operations, and obviously not matching any rows where either column “IS” null.

The critical feature of this plan, of course, is that we got it because we have given Oracle an efficient option to find the rows where both member_id and partnumber are null – and that allows the rest of the concatenation options to take place.

Hints and Upgrades

Interestingly, after the clue that 11g would happily use concatenation with a “good enough” index I went back to the example where I’d just added a zero to the existing index and checked to see what would happen if I added a /*+ use_concat */ hint (without any of the qualifying parameters that the hint can now use) and got the same path with concatenation. The fact that the path appeared wasn’t the interesting bit – see if you can spot what is the interesting bit:

------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |       |       |    12 (100)|          |
|   1 |  CONCATENATION                |          |       |       |            |          |
|*  2 |   FILTER                      |          |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | CAT_I1   |     1 |       |     2   (0)| 00:00:01 |
|*  5 |   FILTER                      |          |       |       |            |          |
|*  6 |    TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN          | CAT_I1   |     1 |       |     2   (0)| 00:00:01 |
|*  8 |   FILTER                      |          |       |       |            |          |
|*  9 |    TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|* 10 |     INDEX RANGE SCAN          | CAT_I1   |     1 |       |     2   (0)| 00:00:01 |
|* 11 |   TABLE ACCESS BY INDEX ROWID | CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|* 12 |    INDEX RANGE SCAN           | CAT_I1   |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Check the cost, and compare it with the cost of the full tablescan. The hinted path has a lower cost than the default path. I think this may be another case of an “unknowable” range scan being ignored in favour of a known alternative.

Finally, we get to today – when I tested the code against 12.1.0.2 and 12.2.0.1. Nothing exciting happened in 12.1.0.2 – the plans were just like the 11g plans, but here’s the plan I got in 12.2 with the “bad” indexing (original index with added zero column) without any hints in the SQL:


----------------------------------------------------------------------------------------------------------
| Id  | Operation			       | Name		 | Rows  | Bytes | Cost (%CPU)| Time	 |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT		       |		 |	 |	 |     9 (100)| 	 |
|   1 |  VIEW				       | VW_ORE_37A7142B |     4 |    52 |     9   (0)| 00:00:01 |
|   2 |   UNION-ALL			       |		 |	 |	 |	      | 	 |
|   3 |    TABLE ACCESS BY INDEX ROWID	       | CATENTRY	 |     1 |    10 |     2   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN		       | CAT_I1 	 |     1 |	 |     1   (0)| 00:00:01 |
|*  5 |    FILTER			       |		 |	 |	 |	      | 	 |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| CATENTRY	 |     1 |    10 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN		       | CAT_I1 	 |     1 |	 |     1   (0)| 00:00:01 |
|*  8 |    FILTER			       |		 |	 |	 |	      | 	 |
|   9 |     TABLE ACCESS BY INDEX ROWID BATCHED| CATENTRY	 |     1 |    10 |     2   (0)| 00:00:01 |
|* 10 |      INDEX RANGE SCAN		       | CAT_I1 	 |     1 |	 |     1   (0)| 00:00:01 |
|* 11 |    FILTER			       |		 |	 |	 |	      | 	 |
|  12 |     TABLE ACCESS BY INDEX ROWID BATCHED| CATENTRY	 |     1 |    10 |     3   (0)| 00:00:01 |
|* 13 |      INDEX RANGE SCAN		       | CAT_I1 	 |     1 |	 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      ...
      OR_EXPAND(@"SEL$1" (1) (2) (3) (4))
      ...
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("MEMBER_ID"=:B3 AND "PARTNUMBER"=:B1)
   5 - filter(0=:B4)
   7 - access("MEMBER_ID" IS NULL AND "PARTNUMBER"=:B1)
       filter(("PARTNUMBER"=:B1 AND LNNVL("MEMBER_ID"=:B3)))
   8 - filter(0=:B2)
  10 - access("MEMBER_ID"=:B3 AND "PARTNUMBER" IS NULL)
       filter(LNNVL("PARTNUMBER"=:B1))
  11 - filter((0=:B4 AND 0=:B2))
  13 - access("MEMBER_ID" IS NULL AND "PARTNUMBER" IS NULL)
       filter(("PARTNUMBER" IS NULL AND LNNVL("PARTNUMBER"=:B1) AND LNNVL("MEMBER_ID"=:B3)))

The full tablescan didn’t appear – but it wasn’t eliminated by concatenation but by the “new” 12.2  variant: “OR EXPANSION”. In this case the net effect is remarkably similar – we still have filter operations comparing :b2 and :b4 with zero, and we still have a scattering of lnnvl() function calls being used to discard rows we’ve already accessed, but the pattern is slightly different and we have a union all.

This change prompted me to go back to testing with just the original index (member_id, partnumber) … which took me back to the full tablescan until I added the hint /*+ or_expand */ to the query to get the following plan:


----------------------------------------------------------------------------------------------------------
| Id  | Operation			       | Name		 | Rows  | Bytes | Cost (%CPU)| Time	 |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT		       |		 |	 |	 |    29 (100)| 	 |
|   1 |  VIEW				       | VW_ORE_37A7142B |     4 |    52 |    29   (4)| 00:00:01 |
|   2 |   UNION-ALL			       |		 |	 |	 |	      | 	 |
|   3 |    TABLE ACCESS BY INDEX ROWID	       | CATENTRY	 |     1 |    10 |     2   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN		       | CAT_I0 	 |     1 |	 |     1   (0)| 00:00:01 |
|*  5 |    FILTER			       |		 |	 |	 |	      | 	 |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| CATENTRY	 |     1 |    10 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN		       | CAT_I0 	 |     1 |	 |     1   (0)| 00:00:01 |
|*  8 |    FILTER			       |		 |	 |	 |	      | 	 |
|   9 |     TABLE ACCESS BY INDEX ROWID BATCHED| CATENTRY	 |     1 |    10 |     2   (0)| 00:00:01 |
|* 10 |      INDEX RANGE SCAN		       | CAT_I0 	 |     1 |	 |     1   (0)| 00:00:01 |
|* 11 |    FILTER			       |		 |	 |	 |	      | 	 |
|* 12 |     TABLE ACCESS FULL		       | CATENTRY	 |     1 |    10 |    23   (5)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("MEMBER_ID"=:B3 AND "PARTNUMBER"=:B1)
   5 - filter(0=:B4)
   7 - access("MEMBER_ID" IS NULL AND "PARTNUMBER"=:B1)
       filter(("PARTNUMBER"=:B1 AND LNNVL("MEMBER_ID"=:B3)))
   8 - filter(0=:B2)
  10 - access("MEMBER_ID"=:B3 AND "PARTNUMBER" IS NULL)
       filter(LNNVL("PARTNUMBER"=:B1))
  11 - filter((0=:B4 AND 0=:B2))
  12 - filter(("PARTNUMBER" IS NULL AND "MEMBER_ID" IS NULL AND LNNVL("PARTNUMBER"=:B1) AND
	      LNNVL("MEMBER_ID"=:B3)))

The plan shows “or expansion”, and highlights the most significant difference between concatenation and expansion – concatenation requires indexed access paths in all branches, or-expansion doesn’t.

At first sight this plan with its full tablescan at operation 12 might seem like a bad idea; but check operation 11, the guarding filter, the tablescan occurs only if both :b2 and :b4 are null. Perhaps that special condition is never supposed to appear, perhaps it’s going to do a lot of work whether or not you can use an index. The fact that you can now handle the original problem without adding or altering existing indexes – provided you can inject this or_expand hint – may be of significant benefit. (Of course, being able to recreate the original index with the columns in the reverse order would even avoid the necessity of worrying about the hint.)

tl;dr

Applications that push “if-then-else” decisions into the SQL and down to the optimizer are a pain in the backside; the performance problems they produce can sometimes be bypassed by the addition of extra indexes that might give you plans (possibly hinted) that report the use of the concatentation operation. In 12cR2 the optimizer has an improved strategy for damage limitation “Cost-based Or Expansion” that can produce very similar effects without the addition of extra indexes. These plans will report union all operations, referencing views with names like: VW_ORE_xxxxxxxxx.

When you next upgrade you may find a few cases where you can get rid of indexes that were only created to work around defective application coding strategies. You may also want to hunt down any code where you’ve added use_concat hints and see if they can be removed, or if they should be replaced by or_expand. since the former hint will disable the latter feature.

 

255 Columns

Wed, 2018-02-28 06:27

It’s the gift that keeps on giving – no matter how many problems you find there are always more waiting to be found. It’s been some time since I last wrote about tables with more than 255 columns, and I said then that there was more to come. In the last article I described how adding a few columns to a table, or updating a trailing column in a way that made the table’s used column count exceed 255, could result in some strange row-splitting behaviour – in this article I’m going to look at a critical side effect of that behaviour.

We’ll start with a simple model and a question – I’ll create a table with a thousand rows of data, then I’ll add a few columns to that table and update the last column in every row and check the undo and redo generated by the update.  Eventually I’m going to need a table with more than 255 columns so the script to create the table is rather long and I’ve posted it at the end of the article in case you want to experiment – in the following text I’ve omitted a few hundred lines of column declarations and values.


rem
rem     Script:         wide_table_5.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018
rem

create table t1(
        col000,
        col001,
        col002,
        col003,
...
        col199,
/*
        col200,
        col201,
        col202,
        col203,
        col204,
        col205,
        col206,
        col207,
        col208,
        col209,
*/
        col210,
        col220,
...
        col247,
        col248,
        col249
)
as
with generator as (
        select 
                rownum id
        from dual
        connect by
                level <= 1e3 -- > comment to avoid WordPress format issue
)
select
        lpad(000,10,'0'),
        lpad(001,10,'0'),
        lpad(002,10,'0'),
        lpad(003,10,'0'),
...
        lpad(247,10,'0'),
        lpad(248,10,'0'),
        lpad(249,10,'0')
from
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname     => user,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

alter table t1 add(
        col250 varchar2(10),
        col251 varchar2(10),
        col252 varchar2(10),
        col253 varchar2(10),
        col254 varchar2(10),
        col255 varchar2(10),
        col256 varchar2(10),
        col257 varchar2(10),
        col258 varchar2(10),
        col259 varchar2(10)
)
;

update t1 set col259 = lpad('259',10,'0');
commit;


The table I’ve created has columns named from col000 to col249 populated with a text matching the column’s numeric id – a total of 250 columns, except that for the first part of the test I’ve commented out the creation and population of 10 of those columns, giving me a total of 240 columns. Then I’ve added 10 more columns and updated the last of those 10. So, for the first part of the test, I’ve grown my table from 240 used columns to 250 used columns. Here are a few critical stats for that update from the session statistics view (v$sesstat joined to v$statname):

 

Name                                                 Value
----                                                 -----
db block gets                                        1,023
consistent gets                                        555
db block changes                                     2,012
redo entries                                           907
redo size                                          262,692
undo change vector size                             76,052
table scan rows gotten                               1,002
table scan blocks gotten                               501
HSC Heap Segment Block Changes                       1,000

That’s pretty much the sort of thing we might expect. For a small update to a row it’s reasonable to see 250 – 300 bytes of redo of which about half is the redo for the undo. We can see that we’ve scanned 1,000 rows and made 2,000 block changes (one to the table block and one to an undo block for each row in the table). The table was 500 blocks of data (the avg_row_len is about 2640 bytes from 240 columns at 10 bytes + a length byte, which makes two rows per block with lots of spare space at the default 10 pctfree). You might been expecting the number of redo entries to be a close match to the number of rows but it’s a little short because the redo records for the first few updates would have been stored in private redo and turned into a single large redo entry.

So what do the stats look like when we start with 250 columns and grow to 260 columns, breaking through the dreaded 255 barrier ?


Name                                                 Value
----                                                 -----
db block gets                                        9,503
consistent gets                                      1,894
db block changes                                     9,384
redo size                                        8,110,584
redo entries                                         5,778
undo change vector size                          3,780,260
table scan rows gotten                               1,002
table scan blocks gotten                               501
HSC Heap Segment Block Changes                       3,000

Such a simple change – with such a large impact.

The average undo is now nearly 3.5KB per row (and the rows were only about 2,860 bytes each anyway), the number of redo entries is up to 6 times the original, we’re averaging 3 “HSC Heap Segment Block Changes” per row instead of 1 and in total we’ve managed to do an extra 7,000 db block changes overall.

To get an idea of what was going on behind the scenes I dumped the redo log file. For the first part of the test most of the redo entries consisted of a pair of redo change vectors with OP codes 5.1 (modify undo block) and 11.5 (update row piece). The 5.1 corresponded to the undo needed to reverse out the effect of its paired 11.5 and both change vectors were pretty small.

For the second part of the test I found a frequently repeated sequence of three consecutive redo records of paired redo vectors: (5.1, 11.4), (5.1, 11.2) and (5.1, 11.6). Again each 5.1 corresponds to the undo needed to reverse out the effect of its associated 11.x, and in this case the three “table” (level 11) OP codes are, in order: “lock row piece”, “insert row piece”, “overwrite row piece”. These 3 pairs occured 1,000 times each, accounting for roughly 3,000 of the redo entries reported.

On top of this there were 2,500 redo records holding redo change vectors with op code 13.22 (“State change on level 1 bitmap block”), and a few hundred , with op code 13.21 (“Format page table block”) with just a few others bringing the total up to the 5,800 redo entries reported. (You might note that the last OP code is a clue that we added quite a lot of new blocks to the table as we executed the update – in fact the number of used table blocks grew by about 50%.)

We also had 500 redo change vectors of type 5.2 (“get undo header”). This number was significantly more than in the first part of the test because we had a lot more undo block changes (with bigger undo change vectors) which means we used a lot more undo blocks, and each time we move to a new undo block we update our transaction slot in the undo header with the current undo block address. I mentioned the pairing of 5.1 and 11.6 above – in fact 50% of those records recorded just two changes (5.1, 11.6) the other 50% recorded three changes (5.2, 5.1, 11.6) – in effect every other row update resulted in Oracle demanding a new undo block.

I’ll explain in a moment why we have a very large number of “State change on level 1 bitmap block”; first let’s examine the key steps of how Oracle is processing a single row update – the sequence of 11.4, 11.2, 11.6:

  • 11.4: lock the head row piece – this seems to happen when the row is in (or going to turn into) multiple pieces; presumably because the piece that is going to change might not be the head piece. This is a tiny operation that generates undo and redo in the order of tens of bytes.
  • 11.2: our code extends the row to 260 columns, which means Oracle has to split it into two pieces of 5 columns and 255 columns respectively – so one of those row-pieces has to cause an insert to take place. Inserting a row may require a lot of redo, of course, but the undo for a (table-only) insert is, again, tiny.
  • 11.6: When Oracle has to split a wide row (>255 columns) it counts columns from the end, so the first row piece of our row is 5 columns and the second row piece (which is the one inserted by the 11.2 operation) is 255 columns. This means we have to overwrite the original row piece with a much shorter row piece. So we’re replacing 2,750 bytes (250 columns) with 55 bytes (5 columns), which means we have to write the contents of the “deleted” 250 columns to the undo tablespace – and that’s where most of the huge volume of undo comes from.

There are two extra points to note about the way Oracle handles the insert/overwrite steps. The length of our row exaggerates the effect, of course, but when we insert the ending 255 columns the block they go to is probably going to change from nearly empty to half full, or half full to full – which means its bitmap entry has to be updated; similarly when the initial 250 columns is overwritten with just 5 columns a huge amount of free space will appear in the block which, again, means that the block’s bitmap entry has to be updated. This gives us a good idea of why we see so many 13.22 (“L1 bitmap state change”)redo change vectors.

The second point is that the numbers still don’t add up. Allowing a couple of hundred bytes of undo per row for the lock row and insert row, then 2,750 plus a few more for the overwrite, I’ve accounted for about 3,000 bytes per row updated – which leaves me short by about 800 bytes per row.  If I dump undo blocks I can see that the undo change vector for the overwrite is actually 3,628 bytes long rather than being “slightly over” the 2,750 for the before image of the overwritten row. Strangely I can see a couple of hundred bytes of what looks like damage in the row image, and there’s also a couple of hundred bytes of what looks like garbage (but probably isn’t) after the row image, but I’ve got no idea why there’s so much excess data in the record.

One day I (or someone else) may get around to looking very closely at why that particular undo record is a lot bigger than an observer might expect – but at least we can see that the space is there, and even if some of that volume could be made to disappear the problem of the massive undo relating to the overwrite and Oracle’s odd strategy of counting columns from the end is still going to be present, and there are probably some occasions when you need to know about this.

tl;dr

Once the number of columns in a table goes over 255 then a simple update to a “trailing” null column (i.e. one past the current end of the row) is likely to generate a much larger volume of undo and redo than you might expect. In particular the size of the undo record is likely to be equivalent to the volume of the last 255 columns of the table – and then a large fraction more.

The reason why this particular anomaly came to my attention is because a client had a problem upgrading an application that required them to add a few columns to a table and then update a couple of them. The size of the undo tablespace was 300 GB, the update (the only thing running at the time) should have added about 30 bytes to the length of each row, the update should have affected 250 million rows. The process crashed after a few hours “ORA-30036: unable to extend segment by %s in undo tablespace ‘%s'”. Even allowing for a “reasonable” overhead it seemed rather surprising that Oracle needed more than 1,200 bytes of undo space per row – but then I found the table was defined with 350 columns.

Solutions for this user:  it’s an upgrade that’s allowed to take quite a long time, and the nature of the update is such that it would be possible to update in batches, committing after each batch.  It would also be nice to review how the table was used to see if it could be rebuilt with a different column order to move all the unused columns to the end of the row – with a little luck the result table might find almost all the rows fitting into a single row piece, even after the upgrade.

Footnote

If you want to experiment, here’s the whole script to create the table, insert some rows, then add a few more columns and update one of them. It’s very long, and not in the least bit exciting, but it may save you a little typing time if you want to use it.


create table t1(
        col000,
        col001,
        col002,
        col003,
        col004,
        col005,
        col006,
        col007,
        col008,
        col009,
        col010,
        col011,
        col012,
        col013,
        col014,
        col015,
        col016,
        col017,
        col018,
        col019,
        col020,
        col021,
        col022,
        col023,
        col024,
        col025,
        col026,
        col027,
        col028,
        col029,
        col030,
        col031,
        col032,
        col033,
        col034,
        col035,
        col036,
        col037,
        col038,
        col039,
        col040,
        col041,
        col042,
        col043,
        col044,
        col045,
        col046,
        col047,
        col048,
        col049,
        col050,
        col051,
        col052,
        col053,
        col054,
        col055,
        col056,
        col057,
        col058,
        col059,
        col060,
        col061,
        col062,
        col063,
        col064,
        col065,
        col066,
        col067,
        col068,
        col069,
        col070,
        col071,
        col072,
        col073,
        col074,
        col075,
        col076,
        col077,
        col078,
        col079,
        col080,
        col081,
        col082,
        col083,
        col084,
        col085,
        col086,
        col087,
        col088,
        col089,
        col090,
        col091,
        col092,
        col093,
        col094,
        col095,
        col096,
        col097,
        col098,
        col099,
        col100,
        col101,
        col102,
        col103,
        col104,
        col105,
        col106,
        col107,
        col108,
        col109,
        col110,
        col111,
        col112,
        col113,
        col114,
        col115,
        col116,
        col117,
        col118,
        col119,
        col120,
        col121,
        col122,
        col123,
        col124,
        col125,
        col126,
        col127,
        col128,
        col129,
        col130,
        col131,
        col132,
        col133,
        col134,
        col135,
        col136,
        col137,
        col138,
        col139,
        col140,
        col141,
        col142,
        col143,
        col144,
        col145,
        col146,
        col147,
        col148,
        col149,
        col150,
        col151,
        col152,
        col153,
        col154,
        col155,
        col156,
        col157,
        col158,
        col159,
        col160,
        col161,
        col162,
        col163,
        col164,
        col165,
        col166,
        col167,
        col168,
        col169,
        col170,
        col171,
        col172,
        col173,
        col174,
        col175,
        col176,
        col177,
        col178,
        col179,
        col180,
        col181,
        col182,
        col183,
        col184,
        col185,
        col186,
        col187,
        col188,
        col189,
        col190,
        col191,
        col192,
        col193,
        col194,
        col195,
        col196,
        col197,
        col198,
        col199,
        col200,
        col201,
        col202,
        col203,
        col204,
        col205,
        col206,
        col207,
        col208,
        col209,
        col210,
        col211,
        col212,
        col213,
        col214,
        col215,
        col216,
        col217,
        col218,
        col219,
        col220,
        col221,
        col222,
        col223,
        col224,
        col225,
        col226,
        col227,
        col228,
        col229,
        col230,
        col231,
        col232,
        col233,
        col234,
        col235,
        col236,
        col237,
        col238,
        col239,
        col240,
        col241,
        col242,
        col243,
        col244,
        col245,
        col246,
        col247,
        col248,
        col249
)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e3 -- > comment to avoid WordPress format issue
)
select
        lpad(000,10,'0'),
        lpad(001,10,'0'),
        lpad(002,10,'0'),
        lpad(003,10,'0'),
        lpad(004,10,'0'),
        lpad(005,10,'0'),
        lpad(006,10,'0'),
        lpad(007,10,'0'),
        lpad(008,10,'0'),
        lpad(009,10,'0'),
        lpad(010,10,'0'),
        lpad(011,10,'0'),
        lpad(012,10,'0'),
        lpad(013,10,'0'),
        lpad(014,10,'0'),
        lpad(015,10,'0'),
        lpad(016,10,'0'),
        lpad(017,10,'0'),
        lpad(018,10,'0'),
        lpad(019,10,'0'),
        lpad(020,10,'0'),
        lpad(021,10,'0'),
        lpad(022,10,'0'),
        lpad(023,10,'0'),
        lpad(024,10,'0'),
        lpad(025,10,'0'),
        lpad(026,10,'0'),
        lpad(027,10,'0'),
        lpad(028,10,'0'),
        lpad(029,10,'0'),
        lpad(030,10,'0'),
        lpad(031,10,'0'),
        lpad(032,10,'0'),
        lpad(033,10,'0'),
        lpad(034,10,'0'),
        lpad(035,10,'0'),
        lpad(036,10,'0'),
        lpad(037,10,'0'),
        lpad(038,10,'0'),
        lpad(039,10,'0'),
        lpad(040,10,'0'),
        lpad(041,10,'0'),
        lpad(042,10,'0'),
        lpad(043,10,'0'),
        lpad(044,10,'0'),
        lpad(045,10,'0'),
        lpad(046,10,'0'),
        lpad(047,10,'0'),
        lpad(048,10,'0'),
        lpad(049,10,'0'),
        lpad(050,10,'0'),
        lpad(051,10,'0'),
        lpad(052,10,'0'),
        lpad(053,10,'0'),
        lpad(054,10,'0'),
        lpad(055,10,'0'),
        lpad(056,10,'0'),
        lpad(057,10,'0'),
        lpad(058,10,'0'),
        lpad(059,10,'0'),
        lpad(060,10,'0'),
        lpad(061,10,'0'),
        lpad(062,10,'0'),
        lpad(063,10,'0'),
        lpad(064,10,'0'),
        lpad(065,10,'0'),
        lpad(066,10,'0'),
        lpad(067,10,'0'),
        lpad(068,10,'0'),
        lpad(069,10,'0'),
        lpad(070,10,'0'),
        lpad(071,10,'0'),
        lpad(072,10,'0'),
        lpad(073,10,'0'),
        lpad(074,10,'0'),
        lpad(075,10,'0'),
        lpad(076,10,'0'),
        lpad(077,10,'0'),
        lpad(078,10,'0'),
        lpad(079,10,'0'),
        lpad(080,10,'0'),
        lpad(081,10,'0'),
        lpad(082,10,'0'),
        lpad(083,10,'0'),
        lpad(084,10,'0'),
        lpad(085,10,'0'),
        lpad(086,10,'0'),
        lpad(087,10,'0'),
        lpad(088,10,'0'),
        lpad(089,10,'0'),
        lpad(090,10,'0'),
        lpad(091,10,'0'),
        lpad(092,10,'0'),
        lpad(093,10,'0'),
        lpad(094,10,'0'),
        lpad(095,10,'0'),
        lpad(096,10,'0'),
        lpad(097,10,'0'),
        lpad(098,10,'0'),
        lpad(099,10,'0'),
        lpad(100,10,'0'),
        lpad(101,10,'0'),
        lpad(102,10,'0'),
        lpad(103,10,'0'),
        lpad(104,10,'0'),
        lpad(105,10,'0'),
        lpad(106,10,'0'),
        lpad(107,10,'0'),
        lpad(108,10,'0'),
        lpad(109,10,'0'),
        lpad(110,10,'0'),
        lpad(111,10,'0'),
        lpad(112,10,'0'),
        lpad(113,10,'0'),
        lpad(114,10,'0'),
        lpad(115,10,'0'),
        lpad(116,10,'0'),
        lpad(117,10,'0'),
        lpad(118,10,'0'),
        lpad(119,10,'0'),
        lpad(120,10,'0'),
        lpad(121,10,'0'),
        lpad(122,10,'0'),
        lpad(123,10,'0'),
        lpad(124,10,'0'),
        lpad(125,10,'0'),
        lpad(126,10,'0'),
        lpad(127,10,'0'),
        lpad(128,10,'0'),
        lpad(129,10,'0'),
        lpad(130,10,'0'),
        lpad(131,10,'0'),
        lpad(132,10,'0'),
        lpad(133,10,'0'),
        lpad(134,10,'0'),
        lpad(135,10,'0'),
        lpad(136,10,'0'),
        lpad(137,10,'0'),
        lpad(138,10,'0'),
        lpad(139,10,'0'),
        lpad(140,10,'0'),
        lpad(141,10,'0'),
        lpad(142,10,'0'),
        lpad(143,10,'0'),
        lpad(144,10,'0'),
        lpad(145,10,'0'),
        lpad(146,10,'0'),
        lpad(147,10,'0'),
        lpad(148,10,'0'),
        lpad(149,10,'0'),
        lpad(150,10,'0'),
        lpad(151,10,'0'),
        lpad(152,10,'0'),
        lpad(153,10,'0'),
        lpad(154,10,'0'),
        lpad(155,10,'0'),
        lpad(156,10,'0'),
        lpad(157,10,'0'),
        lpad(158,10,'0'),
        lpad(159,10,'0'),
        lpad(160,10,'0'),
        lpad(161,10,'0'),
        lpad(162,10,'0'),
        lpad(163,10,'0'),
        lpad(164,10,'0'),
        lpad(165,10,'0'),
        lpad(166,10,'0'),
        lpad(167,10,'0'),
        lpad(168,10,'0'),
        lpad(169,10,'0'),
        lpad(170,10,'0'),
        lpad(171,10,'0'),
        lpad(172,10,'0'),
        lpad(173,10,'0'),
        lpad(174,10,'0'),
        lpad(175,10,'0'),
        lpad(176,10,'0'),
        lpad(177,10,'0'),
        lpad(178,10,'0'),
        lpad(179,10,'0'),
        lpad(180,10,'0'),
        lpad(181,10,'0'),
        lpad(182,10,'0'),
        lpad(183,10,'0'),
        lpad(184,10,'0'),
        lpad(185,10,'0'),
        lpad(186,10,'0'),
        lpad(187,10,'0'),
        lpad(188,10,'0'),
        lpad(189,10,'0'),
        lpad(190,10,'0'),
        lpad(191,10,'0'),
        lpad(192,10,'0'),
        lpad(193,10,'0'),
        lpad(194,10,'0'),
        lpad(195,10,'0'),
        lpad(196,10,'0'),
        lpad(197,10,'0'),
        lpad(198,10,'0'),
        lpad(199,10,'0'),
        lpad(200,10,'0'),
        lpad(201,10,'0'),
        lpad(202,10,'0'),
        lpad(203,10,'0'),
        lpad(204,10,'0'),
        lpad(205,10,'0'),
        lpad(206,10,'0'),
        lpad(207,10,'0'),
        lpad(208,10,'0'),
        lpad(209,10,'0'),
        lpad(210,10,'0'),
        lpad(211,10,'0'),
        lpad(212,10,'0'),
        lpad(213,10,'0'),
        lpad(214,10,'0'),
        lpad(215,10,'0'),
        lpad(216,10,'0'),
        lpad(217,10,'0'),
        lpad(218,10,'0'),
        lpad(219,10,'0'),
        lpad(220,10,'0'),
        lpad(221,10,'0'),
        lpad(222,10,'0'),
        lpad(223,10,'0'),
        lpad(224,10,'0'),
        lpad(225,10,'0'),
        lpad(226,10,'0'),
        lpad(227,10,'0'),
        lpad(228,10,'0'),
        lpad(229,10,'0'),
        lpad(230,10,'0'),
        lpad(231,10,'0'),
        lpad(232,10,'0'),
        lpad(233,10,'0'),
        lpad(234,10,'0'),
        lpad(235,10,'0'),
        lpad(236,10,'0'),
        lpad(237,10,'0'),
        lpad(238,10,'0'),
        lpad(239,10,'0'),
        lpad(240,10,'0'),
        lpad(241,10,'0'),
        lpad(242,10,'0'),
        lpad(243,10,'0'),
        lpad(244,10,'0'),
        lpad(245,10,'0'),
        lpad(246,10,'0'),
        lpad(247,10,'0'),
        lpad(248,10,'0'),
        lpad(249,10,'0')
from
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname     => user,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

select
        avg_row_len, num_rows,  blocks,
        num_rows / trunc(8000/avg_row_len) estimated_blocks
from
        user_tables
where
        table_name = 'T1'
;

prompt  =================
prompt  Add a few columns
prompt  =================

alter table t1 add(
        col250 varchar2(10),
        col251 varchar2(10),
        col252 varchar2(10),
        col253 varchar2(10),
        col254 varchar2(10),
        col255 varchar2(10),
        col256 varchar2(10),
        col257 varchar2(10),
        col258 varchar2(10),
        col259 varchar2(10)
)
;

-- alter system switch logfile;

update t1 set col259 = lpad('259',10,'0');
commit;

-- execute dump_log

P.S. if you do investigate and solve the question of the excess space in the undo record, and the odd content in the row “before image” then do let me know. (Hint: part of the excess may be a “null columns” map – but that still leaves plenty to account for.)

Match_recognize

Mon, 2018-02-26 08:59

In the spirit of Cary Millsap’s comment: “The fastest way to do anything is to not do it at all”, here’s my take (possibly not an original one) on solving problems:

“The best time to solve a problem is before it has happened.”

I spend quite a lot of my “non-contact” time thinking about boundary cases, feature collisions, contention issues, and any other things that could go wrong when you start to implement real systems with (new) Oracle features. The benefit of doing this, of course, is that when I’m looking at a client’s system I can often solve problems because I recognise symptoms that I’ve previously created “in the lab”. The strange thing about this is that there have been times when I’ve pushed Oracle to a breaking point, documented it, and then dismissed the threat because “no one would do that in real life” only to find that someone has done it in real life.

All this is just a preamble to a demonstration of a threat with a terrific feature that is just beginning to gain greater acceptance as a solution to some interesting problems – and the demonstration is going to exaggerate the problem to a level that (probably) won’t appear in a production. The driving example appeared as a question on the OTN/ODC database forum:

“I need customers who have done a transaction in September but not in October.”

There are obviously many ways to address this type of requirement (my first thought was to use the MINUS operator), and a few questions you might ask before trying to address it, but the OP had supplied some data to play which consisted of just a few rows of a table with three columns and some data restricted to just one year, and one solution offered was a very simple query using match_recognize():


CREATE TABLE TEST_TABLE   
  ( T_ID NUMBER, -- trans-id  
    CUST_ID NUMBER,   
    TRANS_DT DATE  
  ) ;  
                  
Insert into TEST_TABLE (T_ID,CUST_ID,TRANS_DT) values (1,100,to_date('12-SEP-17','DD-MON-RR'));  
Insert into TEST_TABLE (T_ID,CUST_ID,TRANS_DT) values (2,100,to_date('12-OCT-17','DD-MON-RR'));  
Insert into TEST_TABLE (T_ID,CUST_ID,TRANS_DT) values (3,200,to_date('12-SEP-17','DD-MON-RR'));  
Insert into TEST_TABLE (T_ID,CUST_ID,TRANS_DT) values (4,300,to_date('12-OCT-17','DD-MON-RR'));  
Insert into TEST_TABLE (T_ID,CUST_ID,TRANS_DT) values (5,400,to_date('12-JAN-17','DD-MON-RR'));  
Insert into TEST_TABLE (T_ID,CUST_ID,TRANS_DT) values (6,500,to_date('12-OCT-17','DD-MON-RR'));  
Insert into TEST_TABLE (T_ID,CUST_ID,TRANS_DT) values (7,500,to_date('12-MAR-17','DD-MON-RR'));  
Insert into TEST_TABLE (T_ID,CUST_ID,TRANS_DT) values (8,600,to_date('12-SEP-17','DD-MON-RR'));  
Insert into TEST_TABLE (T_ID,CUST_ID,TRANS_DT) values (9,600,to_date('12-JUL-17','DD-MON-RR'));  

commit;

select * from test_table
match_recognize
(
  partition by cust_id
  order by trans_dt
  pattern( x+ y* $)
  define
    x as extract(month from trans_dt)  = 9,
    y as extract(month from trans_dt) != 10
);
 
   CUST_ID
----------
       200
       600
      

The obvious benefit of this solution over a solution involving a set-wise MINUS is that it need only scan the data set once (whereas the MINUS strategy will be scanning it twice with a select distinct in each scan) – but it’s a solution that is likely to be unfamiliar to many people and may need a little explanation.

The partition by cust_id order by trans_dt means we sort the data by those two columns, breaking on cust_id. Then for each cust_id we walk through the data looking for a pattern which is defined as: “one or more rows where the month is september followed by zero or more rows where the month is NOT october followed by the end of the set for the customer”. The SQL leaves many details to default so the result set is just the cust_id column and only one row per occurrence of the pattern (which, given the data set, can occur at most once per customer).

For a cust_id that shows a matching pattern the work we will have done is:

  • Walk through rows for Jan to Aug until we reach the first September – which is the start of pattern
  • Keep on walking through to the last of the Septembers – which is a partial match
  • One of
  • Walk through zero rows of November and December and reach the end of cust_id
  • Walk through one or more rows of November and/or December then reach the end of cust_id
  • Record the end of pattern by reporting one row
  • Move on to next cust_id

The excitement starts when we think about a cust_id that doesn’t have a matching pattern – and for that I’m going to generate a new, extreme, data set.


rem
rem     Script:         match_recognize_07.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018
rem

create table t1
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        99                              cust_id,
        to_date('01-Sep-2017')          trans_dt,
        lpad(rownum,1000,'0')           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

update t1
set
        trans_dt = to_date('01-Oct-2017','dd-mon-yyyy')
where
        rownum = 1
;

begin
        dbms_stats.gather_table_stats(
                ownname     => user,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

select  *
from    (
        select 
                t1.*,
                extract(year from trans_dt) yr, 
                extract(month from trans_dt) mth
        from
                t1
        )
match_recognize
(
        partition by cust_id
        order by trans_dt
        measures
                padding as t1_padding,
        pattern( x+  y*  $ )
        define
                x as mth = 9,
                y as mth != 10
);

I’ve moved the calculation of month number from the define clause into an in-line view purely to make the match_recognize() clause a little tidier.

I’ve created a table with just one customer with 100,000 transactions on 1st September 2017, then I’ve updated one row from September to October. Thanks to that one row Oracle is not going to be able to find the requested pattern. I’ve added a padding column of 1,000 characters to the table and included it in the measures that I want to select, so Oracle will have to sort roughly 100MB of data (100,000 rows at roughly 1KB per row) before it starts walking the data to find matches – and, though it’s not visible in the script, the workarea settings mean the session won’t be allowed to expand its PGA to accommodate the whole 100MB.

Test 1 – comment out the update and see how long it takes to produce a result: 0.67 seconds, and the padding value reported was the last one from the pattern.
Test 2 – put the update back in place and try again:

After running for 46 seconds with no result and interrupting the query these are some figures from a snapshot of the session stats:

Name                                                 Value
----                                                 -----
CPU used when call started                           3,662
DB time                                              3,711
user I/O wait time                                   1,538
consistent gets                                     14,300
physical reads direct                            1,298,939
physical read IO requests                          736,478
physical read bytes                         10,640,908,288      
physical writes                                     25,228
physical writes direct                              25,228
physical reads direct temporary tablespace       1,298,939
physical writes direct temporary tablespace         25,228
table scan rows gotten                             100,000
table scan blocks gotten                            14,286

  • I’ve scanned a table of 14,286 blocks to find 100,000 rows.
  • I’ve sorted and spilled to disc, using roughly 25,000 blocks of direct path writes and reads to do the sort.
  • Then I’ve spend the rest of the time burning up CPU and reading 1.27 million blocks from the temporary tablespace trying to find a match

The way that basic pattern matching works on a match failure is to go back to the row after the one where the current match attempt started, and begin all over again. So in this example, after dumping 100MB of Septembers to temp Oracle started at row 1, read 999,999 rows, then found the October that failed the match; so it went to row 2, read 999,998 rows, then found the October that failed the match; so it went to row 3 and so on. Every time it went back to (nearly) the beginning it had to start re-reading that 100,000 rows from temp because the session wasn’t allowed to keep the whole 100MB in memory.

You need to avoid defining a pattern that has to scan large volumes of data to identify a single occurrence of the pattern if the matching process is likely to fail. Even if you can keep the appropriate volume of data in memory for the entire time and avoid a catastrophic volume of reads from the temporary tablespace you can still see a huge amount of CPU being used to process the data – when I reduced the table from 100,000 rows to 10,000 rows it still took me 99 CPU seconds to run the query.

tl;dr

Match_recognize() is a terrific tool, but you must remember two important details about the default behaviour when you think about using it:

  • You will sort a volume of data that is the number of input rows multiplied but the total length of the measures/partition output.
  • If you have a long sequence of rows that ends up failing to match a pattern Oracle goes back to the row after the start of the previous match attempt.

With the usual proviso that “large”, “small” etc. are all relative: keep the data volume small, and try to define patterns that will be short  runs of rows.

Do note, however, that I engineered this example to produce a catastrophe. There are many non-default actions you can choose to minimise the workload you’re likely to produce with match_recognize(), and if you just spare a little time to think about worst case events you probably won’t need to face a scenario like this in a real production environment.

See also:

Part 6 (which includes a list of earlier installments) of an introductory series to match_recognize() by Keith Laker.

A pdf file of Keith Laker’s presentation on match_recognize(), including some technical implementation details.

 

Huge Pages

Thu, 2018-02-22 03:03

A useful quick summary from Neil Chandler replying to a thread on Oracle-L:

Topic: RAC install on Linux

You should always be using Hugepages.

They give a minor performance improvement and a significant memory saving in terms of the amount of memory needed to handle the pages – less Transaction Lookaside Buffers, which also means less TLB misses (which are expensive).

You are handling the memory chopped up into 2MB pieces instead of 4K. But you also have a single shared memory TLB for Hugepages.

The kernel has less work to do, bookkeeping fewer pointers in the TLB.

You also have contiguous memory allocation and it can’t be swapped.

If you are having problems with Hugepages, you have probably overallocated them (I’ve seen this several times at clients so it’s not uncommon). Hugepages can *only* be used for your SGA’s. All of your SGA’s should fit into the Hugepages and that should generally be no more than about 60% of the total server memory (but there are exceptions), leaving plenty of “normal” memory (small pages) for PGA , O/S and other stuff like monitoring agendas.

As an added bonus, AMM can’t use Hugepages, so your are forced to use ASMM. AMM doesn’t work well and has been kind-of deprecated by oracle anyway – dbca won’t let you setup AMM if the server has more than 4GB of memory.

 

Interval Partition Problem

Wed, 2018-02-21 02:40

Assume you’ve got a huge temporary tablespace, there’s plenty of space in your favourite tablespace, you’ve got a very boring, simple table you want to copy and partition, and no-one and nothing is using the system. Would you really expect a (fairly) ordinary “create table t2 as select * from t1” to end with an Oracle error “ORA-1652: unable to extend temp segment by 128 in tablespace TEMP” . That’s the temporary tablespace that’s out of space, not the target tablespace for the copy.

Here’s a sample data set (tested on 11.2.0.4 and 12.1.0.2) to demonstrate the surprise – you’ll need about 900MB of space by the time the entire model has run to completion:

rem
rem     Script:         pt_interval_threat_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018
rem

column tomorrow new_value m_tomorrow
select to_char(sysdate,'dd-mon-yyyy') tomorrow from dual;

create table t1
as
with g as (
        select rownum id
        from dual
        connect by level <= 2e3
)
select
        rownum id,
        trunc(sysdate) + g2.id  created,
        rpad('x',50)            padding
from
        g g1,
        g g2
where
        rownum  comment to avoid WordPress format mess
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

I’ve created a table of 4 million rows, covering 2,000 dates out into the future starting from sysdate+1 (tomorrow). As you can see there’s nothing in the slightest bit interesting, unusual, or exciting about the data types and content of the table.

I said my “create table as select” was fairly ordinary – but it’s actually a little bit out of the way because it’s going to create a partitioned copy of this table.


execute snap_my_stats.start_snap

create table t2
partition by range(created)
interval(numtodsinterval(7, 'day'))
(
        partition p_start       values less than (to_date('&m_tomorrow','dd-mon-yyyy'))
)
storage(initial 1M)
nologging
as
select
        *
from
        t1
;

set serveroutput on
execute snap_my_stats.end_snap

I’ve created the table as a range-partitioned table with an interval() declared. Conveniently I need only mention the partitioning column by name in the declaration, rather than listing all the columns with their types, and I’ve only specified a single starting partition. Since the interval is 7 days and the data spans 2,000 days I’m going to end up with nearly 290 partitions added.

There’s no guarantee that you will see the ORA-01652 error when you run this test – the data size is rather small and your machine may have sufficient other resources to hide the problem even when you’re looking for it – but the person who reported the problem on the OTN/ODC database forum was copying a table of 2.5 Billion rows using about 200 GB of storage, so size is probably important, hence the 4 million rows as a starting point on my small system.

Of course, hitting an ORA-01652 on TEMP when doing a simple “create as select” is such an unlikely sounding error that you don’t necessarily have to see it actually happen; all you need to see (at least as a starting point in a small model) is TEMP being used unexpectedly so, for my first test (on 11.2.0.4), I’ve included some code to calculate and report changes in the session stats – that’s the calls to the package snap_my_stats. Here are some of the more interesting results:


---------------------------------
Session stats - 20-Feb 16:58:24
Interval:-  14 seconds
---------------------------------
Name                                                                     Value
----                                                                     -----
table scan rows gotten                                               4,000,004
table scan blocks gotten                                                38,741

session pga memory max                                             181,338,112

sorts (rows)                                                         2,238,833

physical reads direct temporary tablespace                              23,313
physical writes direct temporary tablespace                             23,313

The first couple of numbers show the 4,000,000 rows being scanned from 38,741 table blocks – and that’s not a surprise. But for a simple copy the 181MB of PGA memory we’ve acquired is a little surprising, though less so when we see that we’ve sorted 2.2M rows, and then ended up spilling 23,313 blocks to the temporary tablespace. But why are we sorting anything – what are those rows ?

My first thought was that there was a bug in some recursive SQL that was trying to define or identify dynamically created partitions, or maybe something in the space management code trying to find free space, so the obvious step was to enable extended tracing and look for any recursive statements that were running a large number of times or doing a lot of work. There weren’t any – and the trace file (particularly the detailed wait events) suggested the problem really was purely to do with the CTAS itself; so I ran the code again enabling events 10032 and 10033 (the sort traces) and found the following:


---- Sort Statistics ------------------------------
Initial runs                              1
Input records                             2140000
Output records                            2140000
Disk blocks 1st pass                      22292
Total disk blocks used                    22294
Total number of comparisons performed     0
Temp segments allocated                   1
Extents allocated                         175
Uses version 1 sort
Uses asynchronous IO

One single operation had resulted in Oracle sorting 2.14 million rows (but not making any comparisons!) – and the only table in the entire system with enough rows to do that was my source table! Oracle seems to be sorting a large fraction of the data for no obvious reason before inserting it.

  • Why, and why only 2.14M out of 4M ?
  • Does it do the same on 12.1.0.2 (yes), what about 12.2.0.1 (no – hurrah: unless it just needs a larger data set!).
  • Is there any clue about this on MoS (yes Bug 17655392 – though that one is erroneously, I think, flagged as “closed not a bug”)
  • Is there a workaround ? (Yes – I think so).

Playing around and trying to work out what’s happening the obvious pointers are the large memory allocation and the “incomplete” spill to disc – what would happen if I fiddled around with workarea sizing – switching it to manual, say, or setting the pga_aggregate_target to a low value. At one point I got results showing 19M rows (that’s not a typo, it really was close to 5 times the number of rows in the table) sorted with a couple of hundred thousand blocks of TEMP used – the 10033 trace showed 9 consecutive passes (that I can’t explain) as the code executed from which I’ve extract the row counts, temp blocks used, and number of comparisons made:


Input records                             3988000
Total disk blocks used                    41544
Total number of comparisons performed     0

Input records                             3554000
Total disk blocks used                    37023
Total number of comparisons performed     0

Input records                             3120000
Total disk blocks used                    32502
Total number of comparisons performed     0

Input records                             2672000
Total disk blocks used                    27836
Total number of comparisons performed     0

Input records                             2224000
Total disk blocks used                    23169
Total number of comparisons performed     0

Input records                             1762000
Total disk blocks used                    18357
Total number of comparisons performed     0

Input records                             1300000
Total disk blocks used                    13544
Total number of comparisons performed     0

Input records                             838000
Total disk blocks used                    8732
Total number of comparisons performed     0

Input records                             376000
Total disk blocks used                    3919
Total number of comparisons performed     0

There really doesn’t seem to be any good reason why Oracle should do any sorting of the data (and maybe it wasn’t given the total number of comparisons performed in this case) – except, perhaps, to allow it to do bulk inserts into each partition in turn or, possibly, to avoid creating an entire new partition at exactly the moment it finds just the first row that needs to go into a new partition. Thinking along these lines I decided to pre-create all the necessary partitions just in case this made any difference – the code is at the end of the blog note. Another idea was to create the table empty (with, and without, pre-created partitions), then do an “insert /*+ append */” of the data.

Nothing changed (much – though the number of rows sorted kept varying).

And then — it all started working perfectly with virtually no rows reported sorted and no I/O to the temporary tablespace !

Fortunately I thought of looking at v$memory_resize_ops and found that the automatic memory management had switched a lot of memory to the PGA, allowing Oracle to do whatever it needed to do completely in memory without reporting any sorting. A quick re-start of the instance fixed that “workaround”.

Still struggling with finding a reasonable workaround I decided to see if the same anomaly would appear if the table were range partitioned but didn’t have an interval clause. This meant I had to precreate all the necessary partitions, of course – which I did by starting with an interval partitioned table, letting Oracle figure out which partitions to create, then disabling the interval feature – again, see the code at the end of this note.

The results: no rows sorted on the insert, no writes to temp. Unless it’s just a question of needing even more data to reproduce the problem with simple range partitioned tables, it looks as if there’s a problem somewhere in the code for interval partitioned tables and all you have to do to work around it is precreate loads of partitions, disable intervals, load, then re-enable the intervals.

Footnote:

Here’s the “quick and dirty” code I used to generate the t2 table with precreated partitions:


create table t2
partition by range(created)
interval(numtodsinterval(7, 'day'))
(
        partition p_start values less than (to_date('&m_tomorrow','dd-mon-yyyy'))
)
storage(initial 1M)
nologging
monitoring
as
select
        *
from
        t1
where
        rownum <= 0
;


declare
        m_max_date      date;
begin
        select  max(created)
        into    expand.m_max_date
        from    t1
        ;

        
        for i in 1..expand.m_max_date - trunc(sysdate) loop
                dbms_output.put(
                        to_char(trunc(sysdate) + loop.i,'dd-mon-yyyy') || chr(9)
                );
                execute immediate
                        'lock table t2 partition for ('''  ||
                        to_char(trunc(sysdate) + loop.i,'dd-mon-yyyy') ||
                        ''') in exclusive mode'
                ;
        end loop;
        dbms_output.new_line();
end;
/

prompt  ========================
prompt  How to disable intervals
prompt  ========================

alter table t2 set interval();

The code causes partitions to be created by locking the relevant partition for each date between the minimum and maximum in the t1 table; locking the partition is enough to create it if it doesn’t already exists. The code is a little wasteful since it locks each partition 7 times as we walk through the dates – but it’s only a quick demo for a model, and for copying a very large table wastage would probably be very small compared to the work of doing the actual data copy. Obviously one could be more sophisticated and limit the code to locking and creating only the partitions needed, and only locking them once each.

 

Taking Notes – 2

Tue, 2018-02-20 05:08

[Originally written August 2015, but not previously published]

If I’m taking notes in a presentation that you’re giving there are essentially four possible reasons:

  • You’ve said something interesting that I didn’t know and I’m going to check it and think about the consequences
  • You’ve said something that I knew but you’ve said it in a way that made me think of some possible consequences that I need to check
  • You’ve said something that I think is wrong or out of date and I need to check it
  • You’ve said something that has given me a brilliant idea for solving a problem I’ve had to work around in the past and I need to work out the details

Any which way, if I’m taking notes it means I’ve probably just added a few more hours of work to my todo list.

Footnote

“Checking” can include:

  • having a chat
  • reading the manuals
  • finding a recent Oracle white-paper
  • searching MoS
  • building some models

Philosophy

Tue, 2018-02-20 05:03

Here’s a note I’ve just re-discovered – at the time I was probably planning to extend it into a longer article but I’ve decided to publish the condensed form straight away.

In a question to the Oak Table a couple of years ago (May 2015) Cary Millsap asked the following:

If you had an opportunity to tell a wide audience of system owners, users, managers, project leaders, system architects, DBAs, and developers “The most important things you should know about Oracle” what would you tell them?

I imagine that since then Cary has probably discussed the pros and cons of some of the resulting thoughts in one of his excellent presentations on how to do the right things, but this was my quick response:

If I had to address them all at once it would be time to go more philosophical than technical.

The single most important point: Oracle is a very large, complex, and flexible product. It doesn’t matter where you are approaching it from you will not have enough information on your own to make best use of it. You have to talk to your peer group to get alternative ideas, and you have to talk to the people at least one step either side of you on the technology chain (dev to dba, dba to sysadmin, Architect to dev, dba to auditor etc.) to understand options and consequences. Create 4 or 5 scenarios of how your system should behave and then get other people – and not just your peer group – to identify their advantages and threats.

Assumptions

Tue, 2018-02-20 02:57

As the years roll on I’ve found it harder and harder to supply quick answers to “simple” questions on the Oracle-L list server and OTN/ODC forum because things are constantly changing and an answer that may have been right the last time I checked could now be wrong. A simple example of the consequences of change showed up recently on the OTN/ODC forum where one reply to a question started:

Just why do you need distinct in a subquery??? That’s the first thing that appears really shocking to me. If it’s a simple in (select …) adding a distinct to the subquery would just impose a sort unique(as you can see in the explain plan), which may be quite costly.

Three question-marks is already tip-toeing its way to the Pratchett limit – but “really shocking” ? It’s bad enough that the comment goes operatic, but going operatic in order to introduce an error pushes the thing into tragedy (or possibly comedy – or maybe both). To make the self-inflicted injury worse, there were two execution plans supplied in the original post anyway of which only one showed any attempt to achieve uniqueness.

Bottom line – when you’re about to correct someone for doing something that is “obviously” wrong, be a little bit kind about it and then be kind to yourself and do a quick sanity check that your attempt at correction is itself correct. A good guideline would be to ask yourself: “How do I know what I know – and am I about to make myself look like an idiot.”

Check It

Question: Does a  “distinct” in a subquery impose a sort (or hash) unique ?

Answer: No – a uniqueness operation may appear, but it’s not guaranteed to appear.

Here’s a quick example which does not result in any attempt at imposing uniqueness (running 11.2.0.4):


drop table t2 purge;
drop table t1 purge;
create table t1 as select * from all_objects where rownum  <= 100;
create table t2 as select * from all_objects where rownum <= 100;

create index t1_i1 on t1(owner);
create index t2_i2 on t2(object_type);

set autotrace traceonly explain

select  * 
from    t1 
where   owner = 'OUTLN' 
and     object_name in (
                select distinct object_name 
                from   t2 
                where  object_type = 'TABLE'
        )
;


Execution Plan
----------------------------------------------------------
Plan hash value: 3169044451

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     3 |   558 |     4   (0)| 00:00:01 |
|*  1 |  HASH JOIN SEMI              |       |     3 |   558 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     3 |   474 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     3 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2    |    12 |   336 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T2_I2 |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_NAME"="OBJECT_NAME")
   3 - access("OWNER"='OUTLN')
   5 - access("OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic sampling used for this statement (level=2)


There’s no sign of a sort unique or hash unique. The optimizer has decided that the IN subquery can be transformed into an EXISTS subquery, which can then be transformed into a semi-join.

I can think of three other execution plan strategies that might have appeared depending on the data, indexing, and statstics:

a) Transform the IN subquery to an EXISTS subquery then operate as a filter subquery (with no uniqueness imposed):


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |   158 |     5   (0)| 00:00:01 |
|*  1 |  FILTER                      |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     3 |   474 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     3 |       |     1   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |    28 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T2_I2 |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

b) Simple unnest with sort/hash unique and join


---------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |     1 |   175 |     9 |
|*  1 |  HASH JOIN                     |          |     1 |   175 |     9 |
|   2 |   TABLE ACCESS BY INDEX ROWID  | T1       |     4 |   632 |     1 |
|*  3 |    INDEX RANGE SCAN            | T1_I1    |     2 |       |     1 |
|   4 |   VIEW                         | VW_NSO_1 |     4 |    68 |     7 |
|   5 |    SORT UNIQUE                 |          |     4 |   112 |     7 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T2       |     4 |   112 |     1 |
|*  7 |      INDEX RANGE SCAN          | T2_I2    |     2 |       |     1 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_NAME"="OBJECT_NAME")
   3 - access("OWNER"='OUTLN')
   7 - access("OBJECT_TYPE"='TABLE')

For this data set I actually had to take the optimizer_features_enable back to ‘8.1.7’ to get this plan – but you can see that there’s a SORT UNIQUE at operation 5, but that would have been there whether or not the DISTINCT keyword had appeared in the SQL. Effectively the query has been transformed to:

select  t1.*
from    (
                select  distinct t2.object_name object_name
                from    t2
                where   t2.object_type='TABLE'
        )
        vw_nso_1,
        t1
where   t1.owner = 'OUTLN'
and     t1.object_name = vw_nso_1.object_name
/

c) Unnest, then “place group by” so that the distinct is applied after the join

--------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |     3 |   474 |     5  (20)| 00:00:01 |
|   1 |  VIEW                          | VM_NWVW_1 |     3 |   474 |     5  (20)| 00:00:01 |
|   2 |   HASH UNIQUE                  |           |     3 |   594 |     5  (20)| 00:00:01 |
|*  3 |    HASH JOIN                   |           |     3 |   594 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1        |     3 |   510 |     2   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | T1_I1     |     3 |       |     1   (0)| 00:00:01 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T2        |    12 |   336 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN          | T2_I2     |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."OBJECT_NAME"="T2"."OBJECT_NAME")
   5 - access("T1"."OWNER"='OUTLN')
   7 - access("T2"."OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic sampling used for this statement (level=2)

Again, the plan would be the same whether or not the original subquery had a redundant DISTINCT. (Again, with this data set, I had to be a bit devious to get this lan).

The things you think you know may have been true 10 years ago – but maybe they’re not true any longer, or maybe they’re still true on your version of the database but not every  version of the database. So I often end up looking at a question, thinking the poster’s claim can’t be right, and then working out and modelling the circumstances that might make the poster’s observations appear (and learning something new).

Remember: “I’ve never seen it before” doesn’t mean “It doesn’t happen”.

 

Join Factorization

Wed, 2018-02-14 09:38

This item is, by a roundabout route, a follow-up to yesterday’s note on a critical difference in cardinality estimates that appeared if you used the coalesce() function in its simplest form as a substitute for the nvl() function. Connor McDonald wrote a followup note about how using the nvl() function in a suitable predicate could lead to Oracle splitting a query into a UNION ALL (in version 12.2), which led me to go back to a note I’d written on the same topic about 10 years earlier where the precursor of this feature already existed but used CONCATENATION instead of OR-EXPANSION. The script I’d used for my earlier article was actually one I’d written in February 2003 and tested fairly regularly since – which brings me to this article, because I finally tested my script against 12.2.0.1 to discover a very cute bit of optimisation.

The business of splitting a query into two parts can be used even when the queries are more complex and include joins – this doesn’t always happen automatically and sometimes has to be hinted, but that can be a costs/statistics thing) for example, from 12.1.0.2 – a query and its execution plan:


select
        *
from
        t1, t2
where
        t1.v1 = nvl(:v1,t1.v1)
and     t2.n1 = t1.n1
;

---------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |         |  1001 |   228K|    11   (0)| 00:00:01 |
|   1 |  CONCATENATION                          |         |       |       |            |          |
|*  2 |   FILTER                                |         |       |       |            |          |
|*  3 |    HASH JOIN                            |         |  1000 |   228K|     8   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL                   | T2      |  1000 |   106K|     4   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL                   | T1      |  1000 |   122K|     4   (0)| 00:00:01 |
|*  6 |   FILTER                                |         |       |       |            |          |
|   7 |    NESTED LOOPS                         |         |     1 |   234 |     3   (0)| 00:00:01 |
|   8 |     NESTED LOOPS                        |         |     1 |   234 |     3   (0)| 00:00:01 |
|   9 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1      |     1 |   125 |     2   (0)| 00:00:01 |
|* 10 |       INDEX RANGE SCAN                  | T1_IDX1 |     1 |       |     1   (0)| 00:00:01 |
|* 11 |      INDEX UNIQUE SCAN                  | T2_PK   |     1 |       |     0   (0)| 00:00:01 |
|  12 |     TABLE ACCESS BY INDEX ROWID         | T2      |     1 |   109 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(:V1 IS NULL)
   3 - access("T2"."N1"="T1"."N1")
   5 - filter("T1"."V1" IS NOT NULL)
   6 - filter(:V1 IS NOT NULL)
  10 - access("T1"."V1"=:V1)
  11 - access("T2"."N1"="T1"."N1")

You can see in this plan how Oracle has split the query into two queries combined through concatenation with FILTER operations at lines 2 (:v1 is null) and 6 (:v1 is not null) to allow the runtime engine to execute only the appropriate branch. You’ll also note that each branch can be optimised separately and in this case the two branches get dramatically different paths because of the enormous difference in the estimated volumes of data.

So let’s move up to 12.2.0.1 and see what happens to this query – but first I’m going to execute a naughty “alter session…”:


------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                 |  1001 |   180K|    11   (0)| 00:00:01 |
|   1 |  VIEW                                    | VW_ORE_F79C84EE |  1001 |   180K|    11   (0)| 00:00:01 |
|   2 |   UNION-ALL                              |                 |       |       |            |          |
|*  3 |    FILTER                                |                 |       |       |            |          |
|   4 |     NESTED LOOPS                         |                 |     1 |   234 |     3   (0)| 00:00:01 |
|   5 |      NESTED LOOPS                        |                 |     1 |   234 |     3   (0)| 00:00:01 |
|   6 |       TABLE ACCESS BY INDEX ROWID BATCHED| T1              |     1 |   125 |     2   (0)| 00:00:01 |
|*  7 |        INDEX RANGE SCAN                  | T1_IDX1         |     1 |       |     1   (0)| 00:00:01 |
|*  8 |       INDEX UNIQUE SCAN                  | T2_PK           |     1 |       |     0   (0)| 00:00:01 |
|   9 |      TABLE ACCESS BY INDEX ROWID         | T2              |     1 |   109 |     1   (0)| 00:00:01 |
|* 10 |    FILTER                                |                 |       |       |            |          |
|* 11 |     HASH JOIN                            |                 |  1000 |   228K|     8   (0)| 00:00:01 |
|  12 |      TABLE ACCESS FULL                   | T2              |  1000 |   106K|     4   (0)| 00:00:01 |
|* 13 |      TABLE ACCESS FULL                   | T1              |  1000 |   122K|     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(:V1 IS NOT NULL)
   7 - access("T1"."V1"=:V1)
   8 - access("T2"."N1"="T1"."N1")
  10 - filter(:V1 IS NULL)
  11 - access("T2"."N1"="T1"."N1")
  13 - filter("T1"."V1" IS NOT NULL)

There’s nothing terribly exciting about the change – except for the disappearence of the CONCATENATION operator and the appearance of the VIEW and UNION ALL operators to replace it (plus you’ll see that the two branches appear in the opposite order in the plan). But let’s try again, without doing that “alter session…”:


--------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                    |  1001 |   229K|    10   (0)| 00:00:01 |
|*  1 |  HASH JOIN                              |                    |  1001 |   229K|    10   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                     | T2                 |  1000 |   106K|     4   (0)| 00:00:01 |
|   3 |   VIEW                                  | VW_JF_SET$A2355C8B |  1001 |   123K|     6   (0)| 00:00:01 |
|   4 |    UNION-ALL                            |                    |       |       |            |          |
|*  5 |     FILTER                              |                    |       |       |            |          |
|*  6 |      TABLE ACCESS FULL                  | T1                 |  1000 |   122K|     4   (0)| 00:00:01 |
|*  7 |     FILTER                              |                    |       |       |            |          |
|   8 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1                 |     1 |   125 |     2   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                  | T1_IDX1            |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="ITEM_1")
   5 - filter(:V1 IS NULL)
   6 - filter("T1"."V1" IS NOT NULL)
   7 - filter(:V1 IS NOT NULL)
   9 - access("T1"."V1"=:V1)

The plan now shows a VIEW which is a UNION ALL involving only table t1 in both its branches. The result set from the view is then used as the probe table of a hash join with t2. You’ll note that the name of the view is now VW_JF_SET$A2355C8B – that’s JF for “Join Factorization”, and the alter session I excecuted to get the first plan was to disable the feature: ‘alter session set “_optimizer_join_factorization”= false;’.

Join factorization can occur when the optimizer sees a union all view with some tables that are common to both (all) branches of the query, and finds that it can move those tables outside the query while getting the same end result at a lower cost. In this case it happens to be a nice example of how the optimizer can transform and transform again to get to the lowest cost plan.

It’s worth noting that Join Factorization has been around since 11.2.x.x, and Or Expansion has been around for even longer – but it’s not until 12.2 that nvl() transforms through Or Expansion, which allows it to transform through Join Factorization.

You’ll note, by the way that with this plan we always do a full tablescan of t2, whereas with just Or-Expansion it’s a potential threat that may never (or hardly ever) be realised.  That’s a point to check if you find that the transformation starts to appear inappropriately on an upgrade. There is a hint to disable the feature for a query, but it’s not trivial to get it right so if you do need to block the feature the smart hint (or SQL Patch) would be “opt_param(‘_optimizer_join_factorization’ ‘false’)”.

Footnote:

If you want to run the experiments yourself, here’s the script I used to generate the data. It’s more complicated than it needs to be because I use the same tables in several different tests:

rem
rem     Script:         null_plan_122.sql
rem     Author:         Jonathan Lewis
rem     Dated:          February 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1        Join Factorization
rem             12.1.0.2        Concatenation
rem
rem

drop table t2;
drop table t1;

-- @@setup  -- various set commands etc.

create table t1 (
        n1              number(5),
        n2              number(5),
        v1              varchar2(10),
        v2              varchar2(10),
        v3              varchar2(10),
        v4              varchar2(10),
        v5              varchar2(10),
        padding         varchar2(100),
        constraint t1_pk primary key(n1)
);

insert into t1
select
        rownum,
        rownum,
        rownum,
        trunc(100 * dbms_random.value),
        trunc(100 * dbms_random.value),
        trunc(100 * dbms_random.value),
        trunc(100 * dbms_random.value),
        rpad('x',100)
from all_objects
where
        rownum <= 1000 -- > comment to avoid WordPress format mess
;

create unique index t1_n2 on t1(n2);

create index t1_idx1 on t1(v1);
create index t1_idx2 on t1(v2,v1);
create index t1_idx3 on t1(v3,v2,v1);

create table t2 (
        n1              number(5),
        v1              varchar2(10),
        padding         varchar2(100),
        constraint t2_pk primary key(n1)
);

insert into t2
select
        rownum,
        rownum,
        rpad('x',100)
from all_objects
where
        rownum <= 1000     -- > comment to avoid WordPress format mess
;

create index t2_idx on t2(v1);

begin dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

variable n1 number
variable n2 number
variable v1 varchar2(10)
variable v2 varchar2(10)
variable v3 varchar2(10)

exec :n1 := null
exec :n2 := null
exec :v1 := null
exec :v2 := null
exec :v3 := null

spool null_plan_122

set autotrace traceonly explain

prompt  ============================================
prompt  One colx = nvl(:b1,colx) predicate with join
prompt  ============================================

select
        *
from
        t1, t2
where
        t1.v1 = nvl(:v1,t1.v1)
and     t2.n1 = t1.n1
;

alter session set "_optimizer_join_factorization" = false;

select
        *
from
        t1, t2
where
        t1.v1 = nvl(:v1,t1.v1)
and     t2.n1 = t1.n1
;

alter session set "_optimizer_join_factorization" = true;

set autotrace off

spool off

Coalesce v. NVL

Tue, 2018-02-13 05:23

“Modern” SQL should use the coalesce() function rather than the nvl() function – or so the story goes – but do you always want to do that to an Oracle database ? The answer is “maybe not”. Although the coalesce() function can emulate the nvl() function (in many cases) there are significant differences in behaviour, some that suggest it’s a good idea to use the substitution and others that suggest otherwise. Different decisions may be appropriate for different circumstances, and this note highlights one case against the substitution. We’ll start with a simple data set:

rem
rem     Script:         nvl_coalesce_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level  comment to avoid wordpress format mess
)
select
        rownum                          id,
        case mod(rownum,4)
                when 0  then 'Y'
                        else 'N'
        end                             yes_no,
        case mod(rownum,5)
                when 0  then 'Y'
                when 1  then null
                        else 'N'
        end                             yes_null_no,
        lpad('x',100,'x')               padding
from
        generator
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1 for columns size 5 yes_no yes_null_no'
        );
end;
/

I’ve created a table with 10,000 rows which two columns with a highly skewed data distribution. Because I know that the skew is supposed to have a significant effect I’ve used a non-standard method_opt when gathering stats – in a production system I would have the packaged procedure dbms_stats.set_table_prefs() to associate this with the table.

The difference between the yes_no and the yes_null_no columns is that the latter is null for a significant fraction of the rows.

  • yes_no has: 7,500 N, 2,500 Y
  • yes_null_no has: 6,000 N, 2,000 null, 2,000 Y

Let’s now try to count the “N or null” rows using two different functions and see what estimates the optimizer produces for the counts. First counting the yes_no column – using nvl() then coalesce()


set autotrace traceonly explain

select * from t1 where nvl(yes_no,'N') = 'N';
select * from t1 where coalesce(yes_no,'N') = 'N';

set autotrace off

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  7500 |   798K|    24   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  7500 |   798K|    24   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL("YES_NO",'N')='N')

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 | 10900 |    25   (8)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 | 10900 |    25   (8)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(COALESCE("YES_NO",'N')='N')

The estimate for the nvl() is accurate; the estimate for the coalesce() query is 100 rows.

Let’s repeat the test using the yes_null_no column, again starting with nvl() followed by coalesce():


set autotrace traceonly explain

select * from t1 where nvl(yes_null_no,'N') = 'N';
select * from t1 where coalesce(yes_null_no,'N') = 'N';

set autotrace off

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  8000 |   851K|    24   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  8000 |   851K|    24   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL("YES_NULL_NO",'N')='N')

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 | 10900 |    25   (8)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 | 10900 |    25   (8)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(COALESCE("YES_NULL_NO",'N')='N')

Again we get the right result for the nvl() estimate (8,000 = 6,000 N + 2,000 null) and 100 for the coalesce() estimate.

By now you’ve probably realised that the coalesce() estimate is simply the “1% guess for equality” that applies to most cases of function(column). So, as we saw in the previous post, coalesce() gives us the benefits of “short-circuiting” but now we see it also threatens us with damaged cardinality estimates. The latter is probably less important than the former in many cases (especially since we might ne able to address the problem very efficiently using virtual columns), but it’s probably worth remembering.

 

 

Histogram Threat

Tue, 2018-01-30 02:07

Have you ever seen a result like this:


SQL> select sql_id, count(*) from V$sql group by sql_id having count(*) > 1000;

SQL_ID		COUNT(*)
------------- ----------
1dbzmt8gpg8x7	   30516

A client of mine who had recently upgraded to 12.2.0.1 RAC, using DRCP (database resident connection pooling) for an application using PHP was seeing exactly this type of behaviour for a small number of very simple SQL statements and wanted to find out what was going on because they were also seeing an undesirable level of contention in the library cache when the system load increased.

In this note I just want to highlight a particular detail of their problem – with an example – showing how easily histograms can introduce problems if you don’t keep an eye out for the dangers.

One of their queries really was as simple as this:

select count(*), sum(skew2) from t1 where skew = :b1;

And I’m going to use this query to model the problem. All I have to do is arrange for a data set that results in a hybrid (or height-balanced) histogram being created on the skew column, and then run the query lots of times with different input bind values. In the case of the client there were around 18,000 possible values for the column, and the number of rows per value varied from 1 to about 20,000 – but whatever the number of rows selected the optimum execution plan was always going to be an indexed access.


rem
rem     Script:         acs_child_cursors.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem

create table t1 (
        id, skew, skew2, padding
)
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 3e3
)
select
        rownum  id,
        g1.id   id1,
        g2.id   id2,
        rpad('x',100)
from
        generator       g1,
        generator       g2
where
        g2.id <= g1.id     -- > comment to avoid WordPress format issue
order by
        g2.id, g1.id
;

alter table t1 modify skew not null;
alter table t1 modify skew2 not null;

create index t1_skew on t1(skew);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1 for columns skew size 254'
--              method_opt       => 'for all columns size 1'
        );
end;
/

variable b1 number
exec :b1 := 0;

set termout off
@start_1000
@start_1000
@start_1000
set termout on

set linesize 120
set trimspool on

column sql_text format a55

select
        child_number, plan_hash_value, executions,
        is_bind_sensitive,
        is_bind_aware,
        is_shareable,
        sql_text
from
        v$sql
where   sql_id = 'b82my582cnvut'
;

The data set contains 3,000 distinct values for skew and the way I’ve generated the rows means that the value N will appear N times – so there’s one row with the value 1 and 3,000 rows with the value 3,000 and so on for a total of 4,501,500 rows. If you want to run the tes the code is likely to take a couple of minutes to complete, requiring roughly 700 MB of disk space.

The mechanism of the script start_1000 is something I published a few years ago, and essentially it executes a script called start_1.sql 1,000 times which, for this test, contains the following two lines:


exec :b1 := :b1 + 1

select count(*), sum(skew2) from t1 where skew = :b1;

The net effect of the 3 calls to start_1000.sql is that my simple SQL statement is called once in turn for each value of skew from 1 to 3,000. The SQL_ID of the statement is ‘b82my582cnvut’ which I’ve used to query v$sql when the run is complete, with the following result:


CHILD_NUMBER PLAN_HASH_VALUE EXECUTIONS I I I SQL_TEXT
------------ --------------- ---------- - - - -------------------------------------------------------
	   0	  1041516234	    498 Y N N select count(*), sum(skew2) from t1 where skew = :b1
	   1	  1041516234	     25 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   2	  1041516234	    104 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   3	  1041516234	    308 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   4	  1041516234	    429 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   5	  1041516234	    640 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   6	  1041516234	     31 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   7	  1041516234	    305 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   8	  1041516234	    660 Y Y Y select count(*), sum(skew2) from t1 where skew = :b1

9 rows selected.

I’ve got 9 child cursors, all with the same execution plan, all except the last labelled as not shareable (you may find that you don’t get exactly the same result, on repeated tests I got between 5 and 9 cursors). Given enough time all these cursors except the last (shareable) one would be aged out of the library cache. In the case of the client, who had a shared pool that was probably quite a bit larger than needed, the number of non-shareable cursors could get very large and they were hanging around for ages. Typically most of the cursors would report an execution count less than 30, with many showing just one or two executions and a handful showing execution counts in the region of several hundred (and that hanful were the ones that were still marked as shareable).

After eliminating the critical histogram (using dbms_stats.delete_column_stats()) and eliminating the redundant child cursors (using dbms_pool.purge()) the massive proliferation stopped happening and the performance threat disappeared. The only issue then was to change the table preferences for stats collection on this table to add the clause “for columns size 1 skew” so that the histogram would not be recreated on the next gather.

Further Observations.

I suspect that part of the client’s probem – something that exaggerated the count rather than causing it – could be attributed to using DRCP (database resident connection pool) which probably still has some quirky bits of behaviour. It was also true that the client’s connection pool was probably much bigger than it needed to be so if there were any funny little bits of emergent behaviour at scale the client would probably have hit them.

The problem of escalating child cursors is one that Oracle has been working on for quite a long time, and there’s a (hidden) parameter that was introduced late in 11gR2 (though I think that the 11g mechanism first appeared through a fix control) to allow Oracle to mark a parent cursor obsolete if it acquired too many child cursors.  There’s a note on MoS that the client had read on this topic: Doc ID: 2298504.1: Cursor Mutex X Wait Events: After Upgrading To 12.2″ which looked as if it was describing their symptoms so they had set this parameter (_cursor_obsolete_threshold) from 8192 (the 12.2 default) down to 1024 (the default for 12.1 and earlier versions). This had been of some help with the library cache problem.  When the sql_id at the top of this article reported 30,516 child cursors that would be 29 “obsolete” parent cursors with 1,024 childs cursor and one “live” parent cursor with 820 child cursors.

You can appreciate that if Oracle has to find a parent cursor and pin it while walking a chain of 30,516 child cursors that’s likely to take a lot more time than walking a chain of 30 parent cursors (while holding a library cache mutex, perhaps) to find the one non-obsolete parent, then acquiring the parent mutex to walk a chain of 820 child cursor.

I don’t know the exact details of the mechanisms involved with manipulating the various chains – but there are likely to be times when one process needs exclusive mutexes/latches to change a chain while other processes are holding shared mutexes/latches to search the chain. When you’ve got 30,000 child cursors in total the important questions become: “where’s the trade-off between making the child chains shorter and making the parent search longer ?” (I’d expect – or hope – that the Oracle developers had actually found a way to code the obsolence so that the new parent was first in the search, and the rest were never visited, of course.)

One of the suggestions I made to try to alleviate the problem – which I had assumed was due in part to the degree of concurrent execution of the statement – was to mark the cursor as “hot” This resulted in 36 differet sql_ids for the same statement (the client machine had 72 CPUs). This had some effect but ultimately meant that there were 36 chains of parents that would eventually end up with lots of child cursors – the only difference was the rate at which the total cursor count was growing (a lot slower), balanced against the threat that they might all manage to get to 30,000+ child cursors! Instead, as a quick and dirty workaround, I supplied the client with a script that could be run at quieter moments during the day to call dbms_shared_pool.purge() for the sql_id to flush all its cursors from the library cache.

One final oddity – which isn’t from the client site. When I changed my test above to avoid generating the histogram (using the commented out method_opt “for all columns size 1”) I found that I still got two child cursors; the first ended up marked as bind-aware but insensitive and non-shareable, the second appeared (time and time again) when my bind value got to 499 and was marked bind-sensitive, bind-aware and shareable.  I still have no idea why this happened.

Footnote:

When repeating the test I started with a “flush shared_pool” – but I’ve removed this line from the code above in case anyone ran it without considering the possible side effects. You do need to clear previous copies of the key statement from the library cache, though, if you want to avoid confusing the issue on repeated runs of the test.

 

Case Study – 1

Mon, 2018-01-29 04:45

It has been some time since I wrote an article walking through the analysis of information on an AWR report, but a nice example appeared a few weeks ago on Twitter that broke a big AWR picture into a sequence of bite-sized chunks that made a little story so here it is, replayed in sync with my ongoing thoughts. The problem started with the (highly paraphrased) question – “How could I get these headline figures when all the ‘SQL ordered by’ sections of the report show captured SQL account for 0.0% of Total?”. The report was a 30 minute snapshot from 11.2.0.3, and here’s the Load Profile:As you can see, the database was doing quite a lot of work as measured by the physical reads and writes, the number of block changes and size of redo, and the fact that the average CPU usage by the foreground sessions in this instance accounts for 9.5 CPUs. Generally speaking the 49 seconds per second of DB time is also a fairly good hint,when combined with the other numbers, that the instance is busy but, in the absence of any other information, that could be one session holding a blocking lock with 48 other sessions waiting for the lock to be released.

There are a couple of unusual (though not impossible) features to this profile. Given the amount of work the figures for user calls and executes look very small – again not technically impossible, just unlikely in “normal” database processing given the apparent workload – and bear in mind that the “SQL ordered by ..” figures seem to account for none of the workload. Similarly the figures for logical reads and block changes are a little unusual (for “normal” processing) – on average this instance seems to have modified every block it visited (without any work showing up in the captured SQL).

Next in the set of images was the Time Model:As you can see, the foreground time (DB time) is 85,944 seconds or which foreground CPU usage (DB CPU) is 16,735 seconds – with about 69,000 seconds unaccounted ! THere’s virtually no time spend on PL/SQL or SQL, and Rman doesn’t even make an appearance  (I mention rman specifically because there was at least one version of Oracle where the rman time was accidentally lost from this summary).

So what does the Top Timed Events look like:

It’s nice to see that this is consistent with the Load Profile: the DB CPU matches, and there’s a lot of physical reads (and a quick check says that 6,560,642/1,800 = 3,644 … which is pretty close to the 3,746 physical reads per second in the Load Profile).  There’s one major anomaly here, though: the huge number of (and time spent on) latch: row cache objects. and even though it’s not the single largest component of time it’s the most obvious one to pursue so the next bit of the report to check is the Dictionary Cache Statistics, with the Tablespace IO Stats and Segments by Physical Reads to follow. I don’t have an image for the dictionary cache stats, but the answer to the question “What were all the rowcache object gets for?” was: “dc_tablespaces (214,796,434)” – which (probably) told me everything I needed to know.

I could show you the specific Instance Activity statistic that I wanted to see next, but I’ll postpone that for a moment and jump to the Tablespace IO Stats – which we were planning to do and might have done anyway if we hadn’t known the rocket science behind massive number of gets on dc_tablespaces.

That’s a horrendous number of (single block) reads of the undo tablespace – and why would they be happening ? The instance engaged in some massive rollback activity (and the transactions being rolled back are on objects in the GIRO tablespace – which is why it is also suffering a lot of single block reads) and this is the point where we jump to the relevant Instance Activity statistic to confirm the claim:

There are two other “undo records applied” statistics, but we don’t need to think about them – the match between the count of records applied and the gets on the dc_tablespaces latch is nearly perfect. Almost everything that this instance is doing is rolling back – there must have been some enormous data changes (possibly only one, possibly parallel-enabled) that failed in the previous half hour and now the entire mess is being cleaned up.

One little detail to note – the “undo records applied” per second is 122,355, but the Load Profile reported 247,885 “Block changes” per second. The two figures are consistent with each other. Each application of an undo record is two block changes – the first when you modify the original data block, the second when you update the undo record itself to flag it as “applied”:  122,355 * 2  = 244,710, which is a good match for 247,855.

Final Thoughts

There is a second reason why you could see lots of gets on dc_tablespaces – but I think it was a bug in 9i relating to temporary tablespace groups. The phenomenon as a side effect of rolling back was something I discovered in the 8i timeline and I’ve not actually checked what an AWR report really would look like if I forced a massive rollback to take place as the only workload across the interval – so here’s a quick test I constructed and ran to finish the thread off:


set timing on

create table t1 as
with g as (select rownum id from dual connect by level <= 1000)
select rownum id, rownum n1, rpad ('x',150) padding from g,g
;

create index t1_i1 on t1(id);

begin
        for i in 1..1000000 loop
                update t1 set n1 = n1 where id = i;
        end loop;
end;
/

alter system flush buffer_cache;

pause take AWR snapshot from another session

rollback;

prompt take AWR snapshot from another session

On the small VM I used to run the test it took a little over 60 seconds to run the update, and the same again to complete the rollback. The “DB time” show in the Time Model section of the AWR report was 62.0 seconds, while the “sql execute elapsed time” was 3.9 seconds (which was the SQL executed while capturing the AWR data).

Conclusion

This was a problem where the database seemed to have done a lot of work that couldn’t be attributed to an SQL. While I had a little rocket science up my sleeve that may have allowed me to identify the source more quickly and with more confidence than the average DBA all I’ve done in this note it highlight a couple of oddities and big numbers that anyone could have spotted, and followed a couple of simple guesses:

a) DB time was large, but sql (and pl/sql) time was non-existent

b) Top timed events were I/O and latches – so identify the latch

c) The specific latch pointed towards tablespaces – so check the Tablespace I/O and note the UNDO tablespace

d) Look at any Instance Activity statistics about “undo”.

e) Job done – but a quick test case is the icing on the cake.

 

 

gc buffer busy

Thu, 2018-01-25 08:12

I had to write this post because I can never remember which way round Oracle named the two versions of gc  buffer busy when it split them. There are two scenarios to cover when my session wants my instance to acquire a global cache lock on a block and some other session is already trying to acquire that lock (or is holding it in an incompatible fashion):

  • The other session is in my instance
  • The other session is in a remote instance

One of these cases is reported as “gc buffer busy acquire”, the other as a “gc buffer busy release” – and I always have to check which is which. I think I usually get it right first time when I see it, but I always manage to convince myself that I might have got it wrong and end up searching the internet for Riyaj Shamsudeen’s blog posting about it.

The “release” is waiting for another instance to surrender the lock to my instance; the “acquire” is waiting for another session in my instance to finish acquiring the lock from the other  instance.

I decided to jot down this note so I didn’t have to keep searching for Riyaj’s and also because a little problem on OTN at the moment showed a couple of AWR reports with an unlikely combination of waits for acquire (180,000,000) and release (2,000) waits.

If you’re wondering why this looks odd – if I’m waiting for an acquire someone else in my instance must be waiting for a release.  Obviously many sessions could be waiting for one release, and if acquirers time out very rapidly (though they’re not reported as doing so) then the ratio could get very high – but 90,000 acquires per release doesn’t look right.

 

Histograms

Tue, 2018-01-23 01:58

I had a sudden urge to go a bit meta – so here’s a relative frequency histogram of my observations of the general use of histograms in Oracle:

histogram_histogram

 

Nested MVs

Fri, 2018-01-19 08:43

A recent client was seeing a very large redo penalty from refreshing materialized views. Unfortunately they had to be refreshed very frequently, and were being handled with a complete refresh in atomic mode – which means delete every row from every MV then re-insert every row.  The total redo was running at about 5GB per hour, which wasn’t a problem for throughput, but the space for handling backup and recovery was getting a bit extreme.

The requirement consisted of two MVs which extracted and aggregated row and column subsets in two different ways from a single table; then two MVs that aggregated one of the first MVs in two different ways; then two MVs which each joined one of the first level MVs to one of the scond level MVs.

No problem – join MVs are legal, aggregate MVs are legal, “nested” MVs are legal: all you have to do is create the right MV logs and pick the right refresh command.  Since the client was also running Standard Editions (SE2) there was no need to worry about how to ensure that query rewrite would work (feature not implemented on SE).

So here, simplified and camouflaged, is a minimum subset of just the first few stages of the construction: a base table with MV log, one first-level aggregate MV with its own MV log, and two aggregate MVs based on the first MV.

drop materialized view log on req_line;
drop materialized view log on jpl_req_group_numlines;

drop materialized view jpl_req_group_numlines;
drop materialized view jpl_req_numsel;
drop materialized view jpl_req_basis;

drop table req_line;

-- ----------
-- Base Table
-- ----------

create table req_line(
        eventid         number(10,0),
        selected        number(10,0),
        req             number(10,0),
        basis           number(10,0),
        lnid            number(10,0),
        area            varchar2(10),
        excess          number(10,0),
        available       number(10,0),
        kk_id           number(10,0),
        eventdate       number(10,0),
        rs_id           number(10,0)
)
;

-- --------------------
-- MV log on base table
-- --------------------

create materialized view log 
on
req_line
with rowid(
        req, basis, lnid, eventid, selected, area,
        excess, available, kk_id, eventdate, rs_id
)
including new values
;

-- --------------------
-- Level 1 aggregate MV
-- --------------------

create materialized view jpl_req_group_numlines(
        eventid, selected, 
        row_ct, req_ct, basis_ct, req, basis, 
        maxlnid, excess, numsel, area, available, kk_id, 
        rs_id, eventdate
)
segment creation immediate
build immediate
refresh fast on demand 
as 
select 
        eventid,
        selected,
        count(*)        row_ct,
        count(req)      req_ct,
        count(basis)    basis_ct,
        sum(req)        req,
        sum(basis)      basis,
        max(lnid)       maxlnid,
        excess,
        count(selected) numsel,
        area,
        available,
        kk_id,
        rs_id,
        eventdate
from 
        req_line
group by 
        eventid, selected, area, excess,
        available, kk_id, eventdate, rs_id
;

-- ------------------------
-- MV log on first level MV
-- ------------------------

create materialized view log 
on
jpl_req_group_numlines
with rowid 
(
        eventid, area, selected, available,
        basis, req, maxlnid, numsel
)
including new values
;


-- ----------------------------
-- First "level 2" aggregate MV
-- ----------------------------

create materialized view jpl_req_numsel(
        eventid, selected, 
        row_ct, totalreq_ct, totalbasis_ct, totalreq, totalbasis, 
        maxlnid, numsel_ct, numsel, area
)
segment creation immediate
build immediate
refresh fast on demand
as 
select 
        eventid,
        selected,
        count(*)        row_ct,
        count(req)      req_ct,
        count(basis)    basis_ct,
        sum(req)        req,
        sum(basis)      basis,
        max(maxlnid)    maxlnid,
        count(numsel)   numsel_ct,
        sum(numsel)     numsel,
        area
from 
        jpl_req_group_numlines
group by 
        eventid, selected, area
;


-- -----------------------------
-- Second "level 2" aggregate MV
-- -----------------------------

create materialized view jpl_req_basis(
        eventid, 
        row_ct, totalbasis_ct, totalreq_ct, totalbasis, totalreq, 
        area, selected, available, maxlnid ,
        numsel_ct, numsel
)
segment creation immediate
build immediate
refresh fast on demand
as 
select 
        eventid,
        count(*)        row_ct,
        count(basis)    totalbasis_ct,
        count(req)      totalreq_ct,
        sum(basis)      totalbasis,
        sum(req)        totalreq,
        area,
        selected,
        available,
        max(maxlnid)    maxlnid,
        count(numsel)   numsel,
        sum(numsel)     numsel
from
        jpl_req_group_numlines
group by 
        eventid, area, available, selected
;

Once the table, MV logs and MVs exist we can insert some data into the base table, then try refreshing the views. I have tried three different calls to the dbms_refresh package, dbms_mview.refresh_all_mviews(), dbms_mview.refresh_dependent(), and dbms_mview.refresh(), specifying the ‘F’ (fast) refresh method, atomic refresh, and nested. All three fail in the same way on 12.2.0.1. The code below shows only the refresh_dependent() call.

I’ve included a query to report the current state of the materialized views before and after the calls, and set a two second sleep before the refresh so that changes in “last refresh” time will appear. The final queries are just to check that the expected volume of data has been transferred to the materialized views.


-- ------------------------------------
-- Insert some data into the base table
-- ------------------------------------

begin
        for i in 1..100 loop
                execute immediate 'insert into req_line values( :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx)' 
                using i,i,i,i,i,i,i,i,i,i,i;
                commit;
        end loop;
end;
/

set linesize 144
column mview_name format a40

select
        mview_name, staleness, compile_state, last_refresh_type, 
        to_char(last_refresh_date,'dd-mon hh24:mi:ss')          ref_time
from
        user_mviews
ORDER by
        last_refresh_date, mview_name
;

prompt  Waiting for 2 seconds to allow refresh time to change

execute dbms_lock.sleep(2)

declare
        m_fail_ct       number(6,0);
begin
        dbms_mview.refresh_dependent(
                number_of_failures      => m_fail_ct,
                list                    => 'req_line',
                method                  => 'F',
                nested                  => true,
                atomic_refresh          => true
        );

        dbms_output.put_line('Failures: ' || m_fail_ct);
end;
/

select
        mview_name, staleness, compile_state, last_refresh_type, 
        to_char(last_refresh_date,'dd-mon hh24:mi:ss')          ref_time
from
        user_mviews
order by
        last_refresh_date, mview_name
;

-- --------------------------------
-- Should be 100 rows in each table
-- --------------------------------

select count(*) from jpl_req_basis;
select count(*) from jpl_req_group_numlines;
select count(*) from jpl_req_numsel;

Both the earlier versions of Oracle are happy with this code and refresh all three materialized view without fail. Oracle 12.2.0.1 crashes the procedure call with a deadlock error which, when traced, shows itself to be a self-deadlock while attempting to select a data dictionary row for update:


MVIEW_NAME                               STALENESS	     COMPILE_STATE	 LAST_REF REF_TIME
---------------------------------------- ------------------- ------------------- -------- ------------------------
JPL_REQ_BASIS                            FRESH		     VALID		 COMPLETE 19-jan 14:03:01
JPL_REQ_GROUP_NUMLINES			 NEEDS_COMPILE	     NEEDS_COMPILE	 COMPLETE 19-jan 14:03:01
JPL_REQ_NUMSEL                           FRESH		     VALID		 COMPLETE 19-jan 14:03:01

3 rows selected.

Waiting for 2 seconds to allow refresh time to change

PL/SQL procedure successfully completed.

declare
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2952
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 85
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 245
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 1243
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2414
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2908
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3699
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3723
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 75
ORA-06512: at line 4


MVIEW_NAME				 STALENESS	     COMPILE_STATE	 LAST_REF REF_TIME
---------------------------------------- ------------------- ------------------- -------- ------------------------
JPL_REQ_NUMSEL                           NEEDS_COMPILE	     NEEDS_COMPILE	 COMPLETE 19-jan 14:03:01
JPL_REQ_BASIS                            FRESH		     VALID		 FAST	  19-jan 14:03:04
JPL_REQ_GROUP_NUMLINES                   FRESH		     VALID		 FAST	  19-jan 14:03:04

The deadlock graph from the trace file, with a little extra surrounding information, looks like this:


Deadlock graph:
                                          ------------Blocker(s)-----------  ------------Waiter(s)------------
Resource Name                             process session holds waits serial  process session holds waits serial
TX-00020009-00000C78-A9B090F8-00000000         26      14     X        40306      26      14           X  40306


*** 2018-01-19T14:18:03.925859+00:00 (ORCL(3))
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=1, mask=0x0)
----- Error Stack Dump -----
----- Current SQL Statement for this session (sql_id=2vnzfjzg6px33) -----
select log, oldest, oldest_pk, oldest_oid, oldest_new, youngest+1/86400,  flag, yscn, oldest_seq, oscn, oscn_pk, oscn_oid, oscn_new, oscn_seq  from sys.mlog$ where mowner = :1 and master = :2 for update
----- PL/SQL Stack -----

So far I haven’t been able to spot whether or not I’m doing something wrong, or prohibited, and I haven’t been able to find a matching problem on MoS. Since the code works on 11gR2 and 12cR1 I’m inclined to believe it’s a bug introduced in the 12cR2 timeline – which is a nuisance for my client, but if it is a bug then perhaps a fix will appear fairly promptly.

Column Stats

Thu, 2018-01-18 08:22

I’ve made several comments in the past about the need for being selective when gathering objects statistics with particular reference to the trade-offs when creating histograms. With Oracle 12c it’s now reasonably safe (as far as I’m concerned) to set a method_opt as a table preference that identifies columns where you expect to see Frequency or (pace the buggy behaviour described in a recent post) a Top-N histograms. The biggest problem I have is that I keep forgetting the exact syntax I need – so I’ve written this note more as a reminder to myself than anything else.

Typically I might expect to use the standard 254 columns for gathering histograms, with an occasional variation to increase the bucket count; but for the purposes of this note I’m going to demonstarate with a much lower value. So here’s a table creation statement (running 12.1.0.2 – so it will gather basic stats on the create) and two variations of a call to gather stats with a specific method_opt – followed by a question:

create table t1
as
select
        object_type o1,
        object_type o2,
        object_type o3,
        object_id,
        object_name
from
        all_objects
where
        rownum <= 50000 -- > comment to bypass wordpress format problem
;

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1' order by column_id;

execute dbms_lock.sleep(2)

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1 for columns o1 o2 o3 size 15'
        );
end;
/

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1' order by column_id;

execute dbms_lock.sleep(2)

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1 for columns size 15 o1 o2 o3'
        );
end;
/

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1';


The big question is this: which columns will have histograms after each of the gather_table_stats() calls:

method_opt=>'for all columns size 1 for columns o1 o2 o3 size 15'
method_opt=>'for all columns size 1 for columns size 15 o1 o2 o3'

The problem I have is simple – to me both options look as if they will create histograms on all three named columns but the first option is the one that I type in “intuitively” if I don’t stop to think about it carefully. The first option, alas, will only gather a histogram on column o3 – the second option is the one that creates three histograms.

The manuals are a little unclear and ambiguous about how to construct a slightly complicated method_opt; there’s a fragment of text with the usual mix of square brackets, italics and ellipses to indicate optional and repeated clauses (interestingly the only clue about multiple columns is that comma separation seems to be required – despite one of the examples above working withough commas) but there’s no explanation of when a “size” clause should go before a “column” column and when it should go after.

So here are a few more method_opt clauses – can you work out in advance which columns would have histograms if you used them and how many buckets in each histogram; there are a couple that may surprise you:


for columns o1 size 12, o2 size 13, o3 size 14

for columns o1 size 15 o2 size 16 o3 size 17

for columns size 18 o1 size 19 o2 size 20 o3

for columns size 21 o1 o2 size 22 o3

for columns o1 size 12, o2 size 12, o3 size 13, object_id size 13 object_name size 14

for columns size 22 o1 o2 for columns size 23 o3 object_id for columns size 24  object_name

Bottom line – to me – is to check very carefully that the method_opt is going to do what I want it to do; and for production systems I tend to use the final form that repeats the “for columns {size clause} {column list}”.

Histogram Hassle

Mon, 2018-01-15 07:01

I came across a simple performance problem recently that ended up highlighting a problem with the 12c hybrid histogram algorithm. It was a problem that I had mentioned in passing a few years ago, but only in the context of Top-N histograms and without paying attention to the consequences. In fact I should have noticed the same threat in a recent article by Maria Colgan that mentioned the problems introduced in 12c by the option “for all columns size repeat”.

So here’s the context (note – all numbers used in this example are approximations to make the arithmetic obvious).  The client had a query with a predicate like the follwing:

    t4.columnA = :b1
and t6.columnB = :b2

The optimizer was choosing to drive the query through an indexed access path into t6, which returned ca. 1,000,000 rows before joining (two tables later) to t4 at which point all but a couple of rows remained – typical execution time was in the order of tens of minutes. A /*+ leading(t4) */ hint to start on t4 with an index that returned two rows reduced the response time to the classic “sub-second”.

The problem had arisen because the optimizer had estimated a cardinality of 2 rows for the index on t6 and the reason for this was that, on average, that was the correct number. There were 2,000,000 rows in the table with 1,000,000 distinct values. It was just very unlucky that one of the values appeared 1,000,000 times and that was the value the users always wanted to query – and there was no histogram on the column to tell the optimizer that there was a massive skew in the data distibribution.

Problem solved – all I had to do was set a table preference for this table to add a histogram to this column and gather stats. Since there were so many distinct values and so much “non-popular” data in the table the optimizer should end up with a hybrid histogram that would highlight this value. I left instructions for the required test and waited for the email telling me that my suggestion was brilliant and the results were fantastic… I got an email telling me it hadn’t worked.

Here’s a model of the situation – I’ve created a table with 2 million rows and a column where every other row contains the same value but otherwise contains the rownum. Because the client code was using a varchar2() column I’ve done the same here, converting the numbers to character strings left-padded with zeros. There are a few rows (about 20) where the column value is higher than the very popular value.


rem
rem     Script:         histogram_problem_12c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 2e4
)
select
        rownum  as id,
        case
                when mod(rownum,2) = 0
                        then '999960'
                        else lpad(rownum,6,'0')
        end     as bad_col
from
        generator       v1,
        generator       v2
where
        rownum <= 2e6
;

Having created the data I’m going to create a histogram on the bad_col – specifying 254 columns – then query user_tab_histograms for the resulting histogram (from which I’ll delete a huge chunk of boring rows in the middle):


begin

        dbms_stats.gather_table_stats(
                ownname         => 'TEST_USER',
                tabname         => 'T1',
                method_opt      => 'for columns bad_col size 254'
        );

end;
/

select
        column_name, histogram, sample_size
from
        user_tab_columns
where
        table_name = 'T1'
;

column end_av format a12

select
        endpoint_number         end_pt,
        to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx') end_val,
        endpoint_actual_value   end_av,
        endpoint_repeat_count   end_rpt
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = 'BAD_COL'
order by
        endpoint_number
;


COLUMN_NAME          HISTOGRAM             Sample
-------------------- --------------- ------------
BAD_COL              HYBRID                 5,513
ID                   NONE               2,000,000

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
         1  303030303031001f0fe211e0800000 000001                1
        12  3030383938311550648a5e3d200000 008981                1
        23  303135323034f8f5cbccd2b4a00000 015205                1
        33  3032333035311c91ae91eb54000000 023051                1
        44  303239373236f60586ef3a0ae00000 029727                1
...
      2685  3938343731391ba0f38234fde00000 984719                1
      2695  39393235303309023378c0a1400000 992503                1
      2704  3939373537370c2db4ae83e2000000 997577                1
      5513  393939393938f86f9b35437a800000 999999                1

254 rows selected.

So we have a hybrid histogram, we’ve sampled 5,513 rows to build the histogram, we have 254 buckets in the histogram report, and the final row in the histogram is end point 5513 (matching the sample size). The first row of the histogram shows us the (real) low value in the column and the last row of the histogram reports the (real) high value. But there’s something very odd about the histogram – we know that ‘999960’ is the one popular value, occurring 50% of the time in the data, but it doesn’t appear in the histogram at all.

Looking more closely we see that every bucket covers a range of about 11 (sometimes 9 or 10) rows from the sample, and the highest value in each bucket appears just once; but the last bucket covers 2,809 rows from the sample with the highest value in the bucket appearing just once. We expect a hybrid histogram to have buckets which (at least initially) are all roughly the same size – i.e. “sample size”/”number of buckets” – with some buckets being larger by something like the amount that appears in their repeat count, so it doesn’t seem right that we have an enormous bucket with a repeat count of just 1. Something is broken.

The problem is that the sample didn’t find the low and high values for the column – although the initial full tablescan did, of course – so Oracle has “injected” the low and high values into the histogram fiddling with the contents of the first and last buckets. At the bottom end of the histogram this hasn’t really caused any problems (in our case), but at the top end it has taken the big bucket for our very popular ‘999960’ and apparently simply replaced the value with the high value of ‘999999’ and a repeat count of 1.

As an indication of the truth of this claim, here are the last few rows of the histogram if I repeat the experiment but, before gathering the histogram, delete the rows where bad_col is greater than ‘999960’. (Oracle’s sample is random, of course, and has changed slightly for this run.)

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
      2641  3938373731371650183cf7a0a00000 987717                1
      2652  3939353032310e65c1acf984a00000 995021                1
      2661  393938393433125319cc9f5ba00000 998943                1
      5426  393939393630078c23b063cf600000 999960             2764

Similarly, if I inserted a few hundred rows with a higher value than my popular value (in this case I thought 500 rows would be a fairly safe bet as the sample was about one in 360 rows) I got a histogram which started with a bucket about the popular bucket, so the problem of that bucket being hacked to the high value was less significant:


    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
      2718  393736313130fe68d8cfd6e4000000 976111                1
      2729  393836373630ebfe9c2b7b94c00000 986761                1
      2740  39393330323515efa3c99771600000 993025                1
      5495  393939393630078c23b063cf600000 999960             2747
      5497  393939393938f86f9b35437a800000 999999                1

Bottom line, then: if you have an important popular value in a column and there aren’t very many rows with a higher value, you may find that Oracle loses sight of the popular value as it fudges the column’s high value into the final bucket.

Workaround

I did consider writing a bit of PL/SQL for the client to fake a realistic frequency histogram, but decided that that wouldn’t be particularly friendly to future DBAs who might have to cope with changes. Luckily the site doesn’t gather stats using the automatic scheduler job and only rarely updates stats anyway, so I suggested we create a histogram on the column using an estimate_percent of 100. This took about 8 minutes to run – for reasons that I will go into in a moment – after which I suggested we lock stats on the table and document the fact that when stats are collected on this table it’s got to be a two-pass job – the normal gather with its auto_sample_size to start with, then a 100% sample for this column to gather the histogram:


begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt       => 'for columns bad_col size 254',
                estimate_percent => 100,
                cascade          => false
        );
end;
/

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
       125  39363839393911e01d15b75c600000 968999                0
       126  393834373530e98510b6f19a000000 984751                0
       253  393939393630078c23b063cf600000 999960                0
       254  393939393938f86f9b35437a800000 999999                0

129 rows selected.

This took a lot longer, of course, and produced an old-style height-balanced histogram. Part of the time came from the increased volume of data that had to be processed, part of it came from a suprise (which also appeared, in a different guise, in the code that created the original hybrid histogram).

I had specifically chosen the method_opt to gather for nothing but the single column. In fact whether I forced the “legact” (height-balanced) code or the modern (hybrid) code, I got a full tablescan that did some processing of EVERY column in the table and then threw most of the results away. Here are fragements of the SQL – old version first:


select /*+  
            no_parallel(t) no_parallel_index(t) dbms_stats
            cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 
            xmlindex_sel_idx_tbl no_substrb_pad  
       */
       count(*), 
       count("ID"), sum(sys_op_opnsize("ID")),      
       count("BAD_COL"), sum(sys_op_opnsize("BAD_COL"))    
       ...
from
       "TEST_USER"."T1" t


select /*+
           full(t)    no_parallel(t) no_parallel_index(t) dbms_stats
           cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
           xmlindex_sel_idx_tbl no_substrb_pad
       */
       to_char(count("ID")),
       to_char(count("BAD_COL")),
       substrb(dump(min("BAD_COL"),16,0,64),1,240),
       substrb(dump(max("BAD_COL"),16,0,64),1,240),
       ...
       count(rowidtochar(rowid)) 
from
       "TEST_USER"."T1" t  /* ACL,TOPN,NIL,NIL,RWID,U,U254U*/

The new code only used the substrb() functions on the bad_col, but all other columns in the table were subject to the to_char(count()).
The old code applied the count() and sys_op_opnsize() to every column in the table.

This initial scan was a bit expensive – and disappointing – for the client since their table had 290 columns (which means intra-block chaining as a minimum) and had been updated so much that 45% of the rows in the table had to be “continued fetches”. I can’t think why every column had to be processed like this, but if they hadn’t been that would have saved a lot of CPU and I/O since the client’s critical column was very near the start of the table.

Finally

This problem with the popular value going missing is a known issue, for which there is a bug number, but there is further work going on in the same area which means this particular detail is being rolled into another bug fix. More news when it becomes available.

 

 

ASSM tangle

Thu, 2018-01-11 11:35

Here’s a follow-on from Tuesday’s (serious) note about a bug in 12.1.0.2 that introduces random slowdown on large-scale inserts. This threat in this note, while truthful and potentially a nuisance, is much less likely to become visible because it depends on you doing something that you probably shouldn’t be doing.

There have always been problems with ASSM and large-scale deletes – when should Oracle mark a block as having free space on deletion: if your session does it immediately then other sessions will start trying to use the free space that isn’t really there until you commit; if your session doesn’t do it immediately when can it happen, since you won’t want it done on commit – but that means the segment could “lose” a lot of free space if something doesn’t come along in a timely fashion and tidy up.

But here’s a quirky problem that takes things one step further. What happens if you try to delete a load of data and fail and your session rolls back? If we start with yesterday’s script (running on 11.2.0.4 or 12.2.0.1) we can create a table with 1M rows in it and the following space usage:


Unformatted		      : 	   0 /		      0
Freespace 1 (  0 -  25% free) : 	   0 /		      0
Freespace 2 ( 25 -  50% free) : 	   1 /		  8,192
Freespace 3 ( 50 -  75% free) : 	   0 /		      0
Freespace 4 ( 75 - 100% free) : 	  67 /		548,864
Full			      :       41,666 /	    341,327,872

You will recall that each “Full” block actually had the basic 10% free space, plus a couple of hundred extra bytes which Oracle had to “forget about” because the incoming rows were always 290 bytes long. Let’s take this table and delete the first 100,000 rows, then emulate a session error and roll back, and then check the space usage:


delete from t1 where rownum <= 100000;
rollback;

-- generate space usage report

Unformatted		      : 	   0 /		      0
Freespace 1 (  0 -  25% free) :        4,167 /	     34,136,064
Freespace 2 ( 25 -  50% free) : 	   1 /		  8,192
Freespace 3 ( 50 -  75% free) : 	   0 /		      0
Freespace 4 ( 75 - 100% free) : 	  67 /		548,864
Full			      :       37,499 /	    307,191,808

We have 4,167 blocks which were full, and we know they are effectively full for the purposes of our data, but they’re now declared as having some free space. When Oracle rolled back the delete it wasn’t running code that would attempt to discover that the block was going to go over the limit, it simply calculated the byte change from re-inserting the row, added it to the total free space (tosp) and produced a number that hadn’t reached the limit set by pctfree – so flagged the block accordingly. (Remember my comment in the earlier article that Oracle doesn’t generate undo for the state changes on the Level 1 bitmap blocks – this is, at least in part – a consequence of that strategy).

Guideline

If you’re going to do large-scale deletes in an ASSM environment, make sure they don’t fail or subsequent inserts may take a long time.

 


ASSM Argh 2

Wed, 2018-01-10 07:24

After yesterday’s post one of the obvious follow-up questions was whether the problem I demonstrated was a side effect of my use of PL/SQL arrays and loops to load data. What would happen with a pure “insert select” statement.  It’s easy enough to check:


rem
rem     Script:         assm_argh2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

drop table t2 purge;
drop table t1 purge;

create table t2
segment creation immediate
tablespace test_8k_assm
as
select * from all_objects where rownum <= 50000 -- >comment to avoid WordPress anomaly
;

create table t1
segment creation immediate
tablespace test_8k_assm
as
select * from all_objects where rownum = 0
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

insert /*+ append */ into t1
select t2.*
from
        (
         select /*+ cardinality(40) */ rownum id
         from dual connect by level <= 40 -- > comment to avoid WordPress anomaly
        ) d,
        t2
;

commit;

declare
        m_unformatted_blocks    number;
        m_unformatted_bytes     number;
        m_fs1_blocks            number;
        m_fs1_bytes             number;
        m_fs2_blocks            number;
        m_fs2_bytes             number;

        m_fs3_blocks            number;
        m_fs3_bytes             number;
        m_fs4_blocks            number;
        m_fs4_bytes             number;
        m_full_blocks           number;
        m_full_bytes            number;

begin
        dbms_space.SPACE_USAGE(
                segment_owner           => 'TEST_USER',
                segment_name            => 'T1',
                segment_type            => 'TABLE',
                unformatted_blocks      => m_unformatted_blocks,
                unformatted_bytes       => m_unformatted_bytes,
                fs1_blocks              => m_fs1_blocks ,
                fs1_bytes               => m_fs1_bytes,
                fs2_blocks              => m_fs2_blocks,
                fs2_bytes               => m_fs2_bytes,
                fs3_blocks              => m_fs3_blocks,
                fs3_bytes               => m_fs3_bytes,
                fs4_blocks              => m_fs4_blocks,
                fs4_bytes               => m_fs4_bytes,
                full_blocks             => m_full_blocks,
                full_bytes              => m_full_bytes
        );

        dbms_output.put_line('Unformatted                   : ' || to_char(m_unformatted_blocks,'999,999,990') || ' / ' || to_char(m_unformatted_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 1 (  0 -  25% free) : ' || to_char(m_fs1_blocks,'999,999,990') || ' / ' || to_char(m_fs1_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 2 ( 25 -  50% free) : ' || to_char(m_fs2_blocks,'999,999,990') || ' / ' || to_char(m_fs2_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 3 ( 50 -  75% free) : ' || to_char(m_fs3_blocks,'999,999,990') || ' / ' || to_char(m_fs3_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 4 ( 75 - 100% free) : ' || to_char(m_fs4_blocks,'999,999,990') || ' / ' || to_char(m_fs4_bytes,'999,999,999,990'));
        dbms_output.put_line('Full                          : ' || to_char(m_full_blocks,'999,999,990') || ' / ' || to_char(m_full_bytes,'999,999,999,990'));
end;
/

I’ve copied the first 50,000 rows from all_objects as a way of generating date, then cloned it 40 times into the main table to give me a total of 2,000,000 rows.

A comment on yesterday’s blog reported that the behaviour I described has been fixed in the October bundle patch for 12.1.0.2, but I haven’t patched my copy yet. So here are the results (with a little cosmetic editing) from running the insert and reporting on space usage from 11.2.0.4, 12.1.0.2, and 12.2.0.1 in order:


11.2.0.4
========
2000000 rows created.

Unformatted                   :          764 /        6,258,688
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          133 /        1,089,536
Full                          :       28,579 /      234,119,168

12.1.0.2
========
2000000 rows created.

Unformatted                   :          256 /        2,097,152
Freespace 1 (  0 -  25% free) :       32,810 /      268,779,520
Freespace 2 ( 25 -  50% free) :            0 /                0
Freespace 3 ( 50 -  75% free) :            1 /            8,192
Freespace 4 ( 75 - 100% free) :           47 /          385,024
Full                          :          443 /        3,629,056

12.2.0.1
========
2000000 rows created.

Unformatted		      : 	 764 /	      6,258,688
Freespace 1 (  0 -  25% free) : 	   0 /		      0
Freespace 2 ( 25 -  50% free) : 	   1 /		  8,192
Freespace 3 ( 50 -  75% free) : 	   0 /		      0
Freespace 4 ( 75 - 100% free) : 	 226 /	      1,851,392
Full			      :       39,706 /	    325,271,552

The total number of blocks involved changes from version to version, of course, thanks to the huge differences in the contents of all_objects, but the headline message is clear – 12.1.0.2 is broken for this basic requirement. On the plus side, though, this is what you get from 12.1.0.2 if you change that insert to include the /*+ append */ hint:


2000000 rows created.

Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            0 /                0
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :            0 /                0
Full                          :       33,380 /      273,448,960

Unsurprisingly, 11.2.0.4 and 12.2.0.1 also behave and report 100% Full. This is a slightly special case, of course since there was no previous data in the table, but even when I started the big insert after inserting and committing a few rows all three versions behaved.


ASSM argh!

Tue, 2018-01-09 11:53

Here’s a problem with ASSM that used to exist in older versions of Oracle had disappeared by 11.2.0.4 and then re-appeared in 12.1.0.2 – disappearing again by 12.2.0.1. It showed up on MoS a few days ago under the heading: “Insert is running long with more waits on db file sequential read”.

The obvious response to this heading is to question the number of indexes on the table – because big tables with lots of indexes tend to give you lots of random I/O as Oracle maintains the indexes – but this table had no indexes. The owner of the problem supplied several of bits of information in the initial post, with further material in response to follow-up questions, including the tkprof summary of the 10046/level 12 trace of the insert and two extracts from the trace file to show us some of the “db file sequential read” waits – the first extract made me wonder if there might be some issue involving 16KB blocks but the second one dispelled that illusion.

There are several buggy things that can appear with ASSM and large-scale DML operations, and sometimes the problems can appear long after the original had done the dirty deed, so I thought I’d create a simple model based on the information supplied to date – and discovered what the problem (probably) was. Here’s how it starts – I’ve created a tablespace using ASSM, and in this tablespace I’ve created a table which has 48 columns with a row length of 290 bytes (roughly matching the OP’s table), and I’ve hacked out a simple PL/SQL block that loops around inserting arrays of 100 rows at a time into the table for a total of 1M rows before committing.


rem
rem     Script:         assm_cleanout.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2        Lots of blocks left "not full"
rem             11.2.0.4
rem

rem
rem     using OMF, so no file-name needed
rem     Ran this bit as SYS, and altered test user to have unlimited quota
rem

/*
create
        tablespace test_8k_assm
        datafile size 1G
        extent management local
        autoallocate
        segment space management auto
;
*/

rem
rem     Create the table, pre-allocate some space.
rem     This means we should get consistent 8M extents and not initial little ones
rem

create table t1 (
        v001 varchar2(5), v002 varchar2(5), v003 varchar2(5), v004 varchar2(5), v005 varchar2(5),
        v006 varchar2(5), v007 varchar2(5), v008 varchar2(5), v009 varchar2(5), v010 varchar2(5),
        v011 varchar2(5), v012 varchar2(5), v013 varchar2(5), v014 varchar2(5), v015 varchar2(5),
        v016 varchar2(5), v017 varchar2(5), v018 varchar2(5), v019 varchar2(5), v020 varchar2(5),
        v021 varchar2(5), v022 varchar2(5), v023 varchar2(5), v024 varchar2(5), v025 varchar2(5),
        v026 varchar2(5), v027 varchar2(5), v028 varchar2(5), v029 varchar2(5), v030 varchar2(5),
        v031 varchar2(5), v032 varchar2(5), v033 varchar2(5), v034 varchar2(5), v035 varchar2(5),
        v036 varchar2(5), v037 varchar2(5), v038 varchar2(5), v039 varchar2(5), v040 varchar2(5),
        v041 varchar2(5), v042 varchar2(5), v043 varchar2(5), v044 varchar2(5), v045 varchar2(5),
        v046 varchar2(5), v047 varchar2(5), v048 varchar2(5)
)
segment creation immediate
tablespace test_8k_assm
storage(initial 8M)
;

alter table t1 allocate extent (size 8M);
alter table t1 allocate extent (size 8M);

rem
rem     Simple anonymous pl/sql block
rem     Large insert, handled with array inserts
rem     Can modify loop count and array size very easily
rem

declare
        type tab_array is table of t1%rowtype;
        junk_array tab_array;
begin

        select
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx'
        bulk collect into
                junk_array
        from
                all_objects
        where
                rownum  <= 100 -- > comment to avoid WordPress format issue
        ;

        for i in 1..10000 loop
                forall j in 1..junk_array.count
                        insert into t1 values junk_array(j) ;
        end loop;

end;

commit;

The number of rows per block after this insert is 24, with 1038 bytes free space left (808 due to the pctfree = 10, then the bit that was too small to take a 25th row before breaching the pctfree barrier). This means we should report 1M/24 = 41,666 full blocks and one block with some free space. So we query the table using the dbms_space package:


declare
        m_unformatted_blocks    number;
        m_unformatted_bytes     number;
        m_fs1_blocks            number;
        m_fs1_bytes             number;
        m_fs2_blocks            number;
        m_fs2_bytes             number;

        m_fs3_blocks            number;
        m_fs3_bytes             number;
        m_fs4_blocks            number;
        m_fs4_bytes             number;
        m_full_blocks           number;
        m_full_bytes            number;

begin
        dbms_space.SPACE_USAGE(
                segment_owner           => 'TEST_USER',
                segment_name            => 'T1',
                segment_type            => 'TABLE',
                unformatted_blocks      => m_unformatted_blocks,
                unformatted_bytes       => m_unformatted_bytes,
                fs1_blocks              => m_fs1_blocks ,
                fs1_bytes               => m_fs1_bytes,
                fs2_blocks              => m_fs2_blocks,
                fs2_bytes               => m_fs2_bytes,
                fs3_blocks              => m_fs3_blocks,
                fs3_bytes               => m_fs3_bytes,
                fs4_blocks              => m_fs4_blocks,
                fs4_bytes               => m_fs4_bytes,
                full_blocks             => m_full_blocks,
                full_bytes              => m_full_bytes
        );

        dbms_output.new_line;
        dbms_output.put_line('Unformatted                   : ' || to_char(m_unformatted_blocks,'999,999,990') || ' / ' || to_char(m_unformatted_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 1 (  0 -  25% free) : ' || to_char(m_fs1_blocks,'999,999,990') || ' / ' || to_char(m_fs1_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 2 ( 25 -  50% free) : ' || to_char(m_fs2_blocks,'999,999,990') || ' / ' || to_char(m_fs2_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 3 ( 50 -  75% free) : ' || to_char(m_fs3_blocks,'999,999,990') || ' / ' || to_char(m_fs3_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 4 ( 75 - 100% free) : ' || to_char(m_fs4_blocks,'999,999,990') || ' / ' || to_char(m_fs4_bytes,'999,999,999,990'));
        dbms_output.put_line('Full                          : ' || to_char(m_full_blocks,'999,999,990') || ' / ' || to_char(m_full_bytes,'999,999,999,990'));

end;
/

The results aren’t what we expect:


Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :       35,001 /      286,728,192
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :           67 /          548,864
Full                          :        6,665 /       54,599,680

We have one block marked as 25 – 50% free (that’s the one block with 16 rows in it, which means about 40% space currently free) but our 41,666 full blocks are actually reported as 6,665 full blocks and 35,001 blocks with some space available. That’s going to hurt eventually if some process wants to insert more rows and finds that it has to fail its way through 35,001 blocks before finding a block which has enough free space.

So what happens when I repeat the PL/SQL block (and commit)? Here are the results from calls to dbms_space after the next two cycles:


Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :       70,002 /      573,456,384
Freespace 2 ( 25 -  50% free) :            2 /           16,384
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          220 /        1,802,240
Full                          :       13,330 /      109,199,360

Unformatted                   :          256 /        2,097,152
Freespace 1 (  0 -  25% free) :      105,003 /      860,184,576
Freespace 2 ( 25 -  50% free) :            3 /           24,576
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          117 /          958,464
Full                          :       19,995 /      163,799,040

Every time we execute the PL/SQL block we leave a trail of 35,001 more blocks which are flagged as “not quite full”.

Looking at the session stats while running the insert loop I can tell that Oracle isn’t checking to see whether or not it should be using those blocks. (A quick way of proving this is to flush the buffer cache before each execution of the PL/SQL and note that Oracle doesn’t read back the 105,000 blocks before inserting any data). So somehow, sometime, someone might get a nasty surprise – and here’s one way that it might happen:

Since I know I my data fits 24 rows per block I’m going to modify my PL/SQL block to select one row into the array then loop round the insert 25 times – so I know I’m inserting a little bit more than one block’s worth of data. Starting from the state with 105,003 blocks marked as “Freespace 1” this is what I saw – first, the free space report after inserting 25 rows:


Unformatted                   :          240 /        1,966,080
Freespace 1 (  0 -  25% free) :        1,074 /        8,798,208
Freespace 2 ( 25 -  50% free) :            0 /                0
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          133 /        1,089,536
Full                          :      123,927 /    1,015,209,984

Then a few wait events and session statistics for the insert:


---------------------------------------------------------
SID:    39:TEST_USER - jonathan
Session Events - 09-Jan 16:57:18
Interval:-      6 seconds
---------------------------------------------------------
Event                                             Waits   Time_outs        Csec    Avg Csec    Max Csec
-----                                             -----   ---------        ----    --------    --------
db file sequential read                          15,308           0         128        .008           3
db file scattered read                           20,086           0         271        .014           4

---------------------------------
Session stats - 09-Jan 16:57:18
Interval:-  6 seconds
---------------------------------
Name                                                                     Value
----                                                                     -----
session logical reads                                                  269,537
physical read total IO requests                                         35,401
db block gets                                                          229,522
consistent gets                                                         40,015
physical reads                                                         124,687
physical reads cache                                                   124,687
db block changes                                                       208,489
physical reads cache prefetch                                           89,293
redo entries                                                           207,892
redo size                                                           16,262,724
undo change vector size                                                  1,720
deferred (CURRENT) block cleanout applications                         103,932
table scan blocks gotten                                                20,797
HSC Heap Segment Block Changes                                              25

The session has read and updated almost all of the level 1 bitmap blocks. I don’t know exactly what triggered this re-read, but seems to be related to the number of rows inserted (or, perhaps, the amount of space used rather than the row count) as an insert crosses the pctfree boundary and fails over to the next block. I’ve only done a couple of little tests to try and get a better idea of why an insert sometimes sweeps through the bitmap blocks – so I know that inserting 2 or 3 rows at a time will also trigger the cleanout – but there are probably several little details involved that need to be identified.

You might note a couple of details in the stats:

  • Because I had flushed the buffer cache before the insert Oracle did its “cache warmup” tablescanning trick – if this had not happened I would probably have done a single block read for every single bitmap block I touched.
  • There are 103,932 block cleanout applications – but 208,000 db block changes and redo entries. Roughly half the latter are for data block cleanouts (OP code 4.1) and half are the state changes on the level 1 bitmap blocks (OP code 13.22). You’ll notice that neither change produces any undo.
  • I’ve also included the HSC Heap Segment Block Changes statistics to show you that not all changes to Heap Segment Blocks show up where you might expect them.
And finally:

If you re-run the tests on 11.2.0.4 and 12.2.0.1 you get the following results after the intial script run – the problem doesn’t exist:


11.2.0.4
========
Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :           67 /          548,864
Full                          :       41,666 /      341,327,872

12.2.0.1
========
Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :           67 /          548,864
Full                          :       41,666 /      341,327,872

So how does this help the OP.
  • First, there may be a huge mess still waiting to be cleaned in the table – but at 34M blocks I’m not too keen to suggest running the dbms_space routine to find out what it looks like – but maybe that’s necessary.
  • Secondly – an upgrade to 12.2 will probably avoid the problem in future.
  • Thirdly – if the number of rows per block is very close to uniform, write a little code to do a loop that inserts (say) 2 * expected number of rows per block as single row inserts and rolls back; the inserts will probably trigger a massive though perhaps not complete cleanout, so rinse and repeat until the cleanout is complete. Try to find a time when you don’t mind the extra load to get this job done.
  • Finally – on the big job that does the bulk insert – repeat the dummy insert/rollback at the end of the job to clean up the mess made by the job.
Addenda

Prompted by comment #2 below, I should add that if the problem has been fixed in 12.2 then possibly there’s a bug report and patch for it already. If there isn’t then the OP could raise an SR (referencing this blog note), and request a bug fix or back-port from 12.2.

And with 24 hours of publication, comment #4 (from Yury Pudovchenko) tells us that the bug is fixed by the Oct 2017 Bundle Patch.

 

 


Pages