Re: L1 BMB - incorrect freeness status - are they relevant for tables alone?

From: Vishnu Potukanuma <vishnupotukanuma_at_gmail.com>
Date: Thu, 28 Nov 2019 01:34:56 +0530
Message-ID: <CAP-RywwkZX0SZAfOB_GvhZsVarY_YFe6+NJkgcrh=68nu0Af4w_at_mail.gmail.com>



I will certainly go through Richard Foote and your blog as well in its entirety looks like I still missed many internals...

Regarding the 50-50% block split case, i will test the hash partitioning and hash based clusters to the best, i already have written code to simulate most issues, i will create these and just run the code to see what happens.

the issue i was talking about regarding the 50-50% split case was very peculiar.....
I can't name the client but since its also 7 years now I can only reveal this info... they had over 7000+ oracle databases and the strategy is same in every database across the entire fleet, their strategy was to add datafiles to the tablespaces only when the space usage was about 98% full and if we see all the existing datafiles are already full... this added a lot of bloat in their indexes,, the worst part was data was always growing in most databases that i used to manage... the workload is mostly inserts and updates... there were no deletes but rather dropping the oldest partition after the data is moved to a warehouse..... updates caused the leaf block entries shifting from one leaf to another, but the bloat remained the same... as the tables increased extents were allocated in the new data files and the FILE_ID kept on increasing with the new datafiles due to that ROWID value also.. they were never really open to adding data files up front due to storage costs. and the underlying storage was thin provisioned... shrinking or coalescing made more sense but they were never open to this idea as this generated additional redo.. which was basically additional cost for backups (tapes -- 10 years retention). and rebuilding was out of question since additional storage requirements and redo generation.... tried convincing but proved to be useless as i was a consultant from a different company.

Regarding our research... I will send you a personal mail, from my official mail id tomorrow..
since this is a public domain.. i don't want to disclose anything here...

Thanks,
Vishnu

On Thu, Nov 28, 2019 at 12:09 AM Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk> wrote:

>
> Vishnu,
>
> You're certainly going the job in the right way.
>
> One thing that's worth mentioning is that the most important source of
> information you should look at (after you've checked the manuals) is
> Richard Foote's blog. He's got a lot of material about how indexes work,
> what can go wrong, alternative strategies and so on. There are always
> alternative ideas to pursue.
>
> The 2,000 rows per key value is a good example of spotting the issue of
> 50/50 splits, it's a poster case (that I've either written about or
> lectured on) for using an index because it's more efficient than a
> tablescan, except the index is inherently going to become less efficient
> than you might expect unless the table is in a tablespace that started life
> with multiple files in which case the 50/50 splits will tend to back-fill.
> Then you start thinking about the potential benefits of hash partitioning
> into 128 pieces, or look at creating a range-partitioned hash cluster, or
> look at ways of naking best use of attribute clustering.
>
> All the best with the research - and remember you don't have to do
> everything from scratch, but don't trust anything you read on the internet
> to be 100% accurate.
>
> Regards
> Jonathan Lewis
>
> ________________________________________
> From: Vishnu Potukanuma <vishnupotukanuma_at_gmail.com>
> Sent: 27 November 2019 18:23
> To: Jonathan Lewis
> Cc: Oracle L
> Subject: Re: L1 BMB - incorrect freeness status - are they relevant for
> tables alone?
>
> Hi Jonathan,
>
> I will go through the blog contents... My example in the previous mail was
> bad, probably i should have mentioned a range of 1... 50000 with uniform
> distribution for 100 miillion table rows, and with larger indexed column
> size, that would have made more sense. Please don't take my examples
> literally as it is, my intention was basically the overall idea , its just
> a cultural thing...
>
> As this is a DL, I cannot say much but i can only say this.
> My company at present is working on perfecting cognitive learning and
> tacit knowledge... and these edge cases or rather impossible scenarios are
> very vital...
>
> Thanks,
> Vishnu
>
> On Wed, Nov 27, 2019 at 7:00 PM Jonathan Lewis <
> jonathan_at_jlcomp.demon.co.uk<mailto:jonathan_at_jlcomp.demon.co.uk>> wrote:
> As far as sizing and efficiency of indexes is concerned, Oracle doesn't
> really help at all - though if they got rid of the exclusive lock during
> "analyze index validate structure" that would be a big help.
>
> However it's not difficult to write code that produces help information
> about indexes but as you might expect - if you want detail you have to read
> the index blocks.
>
> https://jonathanlewis.wordpress.com/index-efficiency-3/ This one tells
> you about the number of index entries per leaf block, then summarises to
> showing how many leaf blocks there are with the given numbers of entries.
> So an indication of uneven usage
>
> https://jonathanlewis.wordpress.com/index-sizing/ If your index stats are
> up to date this tells you how many leaf blocks there are in each index and
> how many leaf blocks there would be if you rebuilt it at any given
> percentage. It's very old and has lots of limitations - most particularly
> it doesn't cater for prefix compression.
>
> https://jonathanlewis.wordpress.com/2009/08/17/treedump/ Tells you how
> to do a treedump - which gets you the index walked in order and a report of
> rows per leaf block in the session's trace file. (Basically a better
> option than the index_efficiency_3 query but without the summary).
>
> https://jonathanlewis.wordpress.com/2010/03/07/treedump-2/
> Post-processing a treedump by reading a trace file as an external table.
>
> https://jonathanlewis.wordpress.com/2010/07/01/index-branches/ Variant on
> treedump-2 that reports only branch blocks, allowing irregular deletion
> patterns in indexes to become visible.
>
>
> I'll repeat what I said before about the importance of being aware of
> potential issues so that you are open to spotting the threats when you see
> a working environment. If you try and anticipate everything that could
> happen (and don't test very carefully) you have to think about all the
> reasons why you might be wrong, or why you may have missed something, or
> why the thought might be irrelevant (or put up with people like me telling
> you why ... )
>
> Take, for example, your "marks from 1..100" and 50/50 split. If the data
> is evenly distributed there are two corollaries:
>
> a) there's no point in having the index at all, it will be quicker to do a
> tablescan for any single value than to use the index. The index may (for
> example) send you to every 10th block in the table - when a single seek and
> 1MB read would get you 128 blocks, which means 12 of the ones you want in
> single read.
>
> b) Worrying about the index being twice the size it needs to be will
> (almost always) be a minor performance detail because each index block you
> read might result in 200 table block reads - so you would be saving maybe
> 1/4 of one percent of the total I/O by improving the packing of the index.
>
>
> Regards
> Jonathan Lewis
>
>
> ________________________________________
> From: Vishnu Potukanuma <vishnupotukanuma_at_gmail.com<mailto:
> vishnupotukanuma_at_gmail.com>>
> Sent: 27 November 2019 12:24
> To: Jonathan Lewis
> Cc: Oracle L
> Subject: Re: L1 BMB - incorrect freeness status - are they relevant for
> tables alone?
>
> Thanks Jonathan,
>
> I was wondering whether we can use this dbms_space as an alternative which
> is very less resource intensive than running a segment advisor to track the
> space saving of an index following an index rebuild or shrink space or
> coalesce as in few environments they disable the automatic advisor tasks
> and there are few other challenges inherent to the way the data is stored
> in index leaf blocks.
>
> The reason for this thought is because of this:
> Usually in an OLTP environment, we add data files when we run out of space
> and as the segment grows, extents at the further points in a data file get
> allocated , which increases the value of ROWID, and as the ROWID is
> dependent on factors such as FILE_id, blockID, position in the block,
> data_object_id, the rowid for a particular key value will always increase
> when the extents are allocated further beyond the current point.
>
> Since the indexed key value is only not incrementing monotonically and
> have finite distinct values most of the time unless unique, which presents
> a 90-10 split, which may cause only some unnecessary free space usage, but
> not the case with 50-50 block splits.
>
> the ROWIDs are sorted in the index. This can pose two different challenges.
>
> if the extents are always allocated beyond the current position which is
> most likely the case when the datafiles are added with autoextend on, and
> the ROWIDs increases, the index leaf blocks for a particular key value,
> will mostly be only half full. unless the extent gets allocated below the
> current position of existing indexes only then the key value gets inserted
> to a leaf block or the value that we insert falls in the range. For
> example, we know that marks can be 1-100 but never fractions, the index
> leaf blocks are getting reused only when we insert values such as 1.1 or
> 1.2 which never occur in an application. so the usage is only about 50% in
> these cases.
> Unless the applications delete the entries or update the entries of other
> rows with ROWIDs that falls within the range captured by a particular index
> block. if there are mostly inserts and deletes, then the space usage in a
> leaf block in this scenarios will be even less - as PCTFREE is relevant
> only during index creation... this may not pose a challenge when the
> rowsizes are large in which case the clustering factor is almost always
> high and the number of index blocks visited during an index range scan is
> very less, but his poses a challenge when the row sizes are small as the
> index sizes are large. in which case an index range scan ends up performing
> more consistent gets than required.
>
> This becomes even challenging to answer why the consistent gets for a
> particular index key value is high or less, as it reads many index blocks.
> this is when the extents gets allocated below the current position of the
> extent as the ROWID will be less, in this case the chances of index leaf
> blocks being used is far good, and this can again pose two challenges, even
> though the uniform distribution of values, the consistent gets for each
> value can differ significantly. coalesce and shrink space provides a better
> alternative in these cases, in consolidating the leaf blocks to get the
> consistent gets uniform provided uniform distribution of values.
>
> To mitigate this we dont have a parameter that can control how a block
> split happens, as this gives us more control over redo generation during
> block splits and space wastage if we know the workload all well, and could
> be related to balancing the b-tree not sure or it might be more guesstimate
> looking at the worst case scenario..
>
> i mean coalesce is a better option considering a rebuild as it requires
> additional space but this prevents index prefetching as the blocks are
> stored at random locations, and rebuild can help read the index leaf blocks
> that are adjacent in one go but the saving in terms of IOPS may be less but
> provided the SQL statements are executed more frequently this poses a
> challenge with coalesce.
>
> Thanks,
> Vishnu
>
> On Wed, Nov 27, 2019 at 4:46 PM Jonathan Lewis <
> jonathan_at_jlcomp.demon.co.uk<mailto:jonathan_at_jlcomp.demon.co.uk><mailto:
> jonathan_at_jlcomp.demon.co.uk<mailto:jonathan_at_jlcomp.demon.co.uk>>> wrote:
>
> L1 BMB are used to track usage for index blocks, but they only record 2
> states: FULL or FS2
>
> If an index (leaf) block has some entries it doesn't matter how much free
> space it has, you can only insert a new entry into it if the entry is
> supposed to be in that leaf block, so any attempt to show how much free
> space is available is a waste of time. The only interesting conditions are
> (a) it's got some entries in it (FULL) or (b) it's got no entries left in
> it and could be re-used somewhere else in the index (FS2).
>
> Note - an index leaf block is left in the index structure even if it's
> completely empty until such time as Oracle needs to add a new leaf block to
> the index, then the block can be detached from its current location and
> reused in a new location. This is also true for freelist management, where
> an index block could be both on a free list and in the structure at the
> same time.
>
> Regards
> Jonathan Lewis
>
> ________________________________________
> From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org
> ><mailto:oracle-l-bounce_at_freelists.org<mailto:
> oracle-l-bounce_at_freelists.org>> <oracle-l-bounce_at_freelists.org<mailto:
> oracle-l-bounce_at_freelists.org><mailto:oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org>>> on behalf of Vishnu Potukanuma <
> vishnupotukanuma_at_gmail.com<mailto:vishnupotukanuma_at_gmail.com><mailto:
> vishnupotukanuma_at_gmail.com<mailto:vishnupotukanuma_at_gmail.com>>>
> Sent: 27 November 2019 11:03
> To: Oracle L
> Subject: L1 BMB - incorrect freeness status - are they relevant for tables
> alone?
>
> Hi,
>
> Background:
> L1BMB is used to track the freeness of a data block in the extent that
> this block manages and indicates whether the block is full or how much
> space usage there is, this is accurate when it comes to table but for
> indexes, it doesn't actually say whether the space is full or not.
>
> create table temp(roll number, name varchar2(20), mark1 number);
> create index idx on temp(mark1);
> insert into temp select rownum, dbms_random.string(0,20),
> round(dbms_random.value(0,100)) from dual connect by level < 1000000;
> commit;
> exec dbms_stats.gather_table_stats('VISHNU','TEMP',CASCADE=>TRUE);
> alter system flush buffer_cache;
>
> During the entire load I checked how many block splits were happening and
> what kind of block splits were occurring:
> NAME VALUE
> ---------------------------------------------------------------- ----------
> leaf node splits 3373
> leaf node 90-10 splits 0
> branch node splits 7
> root node splits 1
>
> most are basically 50-50 block splits. and the total number of blocks of
> that index is
> SQL> select sum(blocks) from dba_extents where segment_name='TEMP_IDX';
> SUM(BLOCKS)
> -----------
> 3456
>
> now that I have loaded the data, when i check the space usage of the
> blocks using the dbms_space.space_usage to see the freeness status it
> showed the following:
> FS1 Bytes (at least 0 to 25% free space) = 0
> FS1 Blocks(at least 0 to 25% free space) = 0
> FS2 Bytes (at least 25 to 50% free space)= 376832
> FS2 Blocks(at least 25 to 50% free space)= 46
> FS3 Bytes (at least 50 to 75% free space) = 0
> FS3 Blocks(at least 50 to 75% free space) = 0
> FS4 Bytes (at least 75 to 100% free space) = 0
> FS4 Blocks(at least 75 to 100% free space)= 0
> Full Blocks in segment = 3222
> Full Bytes in segment = 26394624
>
> it said only 46 blocks have a free space and about 3222 blocks are full,
> this cannot be true, since the data is loaded into a mark1 column and there
> are more block splits that resulted in 50-50 block splits.
>
> the tablespace is auto-allocate so I randomly took a extent which is 1MB
> of size so that a L1BMB tracks the freeness status of the blocks in that
> extent. Interestingly the L1BMB freeness status is as follows:
> --------------------------------------------------------
> DBA Ranges :
> --------------------------------------------------------
> 0x01c02980 Length: 64 Offset: 0
>
> 0:Metadata 1:Metadata 2:FULL 3:FULL
> 4:FULL 5:FULL 6:FULL 7:FULL
> 8:FULL 9:FULL 10:FULL 11:FULL
> 12:FULL 13:FULL 14:FULL 15:FULL
> 16:FULL 17:FULL 18:FULL 19:FULL
> 20:FULL 21:FULL 22:FULL 23:FULL
> 24:FULL 25:FULL 26:FULL 27:FULL
> 28:FULL 29:FULL 30:FULL 31:FULL
> 32:FULL 33:FULL 34:FULL 35:FULL
> 36:FULL 37:FULL 38:FULL 39:FULL
> 40:FULL 41:FULL 42:FULL 43:FULL
> 44:FULL 45:FULL 46:FULL 47:FULL
> 48:FULL 49:FULL 50:FULL 51:FULL
> 52:FULL 53:FULL 54:FULL 55:FULL
> 56:FULL 57:FULL 58:FULL 59:FULL
> 60:FULL 61:FULL 62:FULL 63:FULL
>
> All the blocks are full? now i checked dumped a index block, from that
> extent to see the index leaf block header whether the space usage mentioned
> in the L1BMB is actually accurate.
>
> End dump data blocks tsn: 4 file#: 7 minblk 10624 maxblk 10624
> [oracle_at_oracle trace]$ alter system dump datafile ^C
> [oracle_at_oracle trace]$ ss
> SQL*Plus: Release 19.0.0.0.0 - Production on Wed Nov 27 10:41:30 2019
> Version 19.5.0.0.0
> Copyright (c) 1982, 2019, Oracle. All rights reserved.
> Connected to:
> Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
> Version 19.5.0.0.0
> SQL> alter system dump datafile 7 block 10639;
>
> the dump says:
> kdxcolev 0
> KDXCOLEV Flags = - - -
> kdxcolok 0
> kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
> kdxconco 2
> kdxcosdc 2
> kdxconro 314 ---> number of entries in the leaf block.
> kdxcofbo 664=0x298 --> offset for starting free space in the block.
> kdxcofeo 4264=0x10a8 --> similarly marker to point the end of free space.
> kdxcoavs 3600
> kdxlespl 0
> kdxlende 0
> kdxlenxt 29374428=0x1c037dc
> kdxleprv 29370843=0x1c029db
> kdxledsz 0
> kdxlebksz 8032
> row#0[4552] flag: -------, lock: 0, len=12
> col 0; len 2; (2): c1 49
> col 1; len 6; (6): 01 c0 2f 67 00 18
> row#1[4564] flag: -------, lock: 0, len=12
>
> Obviously there is free space in the block as indicated in the index leaf
> block header dump.
> Can someone please tell me whether the L1 BMB blocks are used to track
> freeness status for tables alone and not for indexes?
>
> Thanks,
> Vishnu
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Nov 27 2019 - 21:04:56 CET

Original text of this message