Re: high Block change tracking waits

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Fri, 12 Feb 2021 11:03:58 +0000
Message-ID: <CAGtsp8=miOEruhy7qKYt8v6VphEmx_7k4mkOrz+p5f-mMFcxPg_at_mail.gmail.com>



First important point - I've been checking back on my posts on this thread, and found that there are a lot of typos that I can't understand - like missing text, cut-n-pastes that didn't paste what I was expecting, and sundry other errors that make some of my comments virtually incomprehensible. I also failed to notice that you had been working with two difference BCT parameters (and I think _bct_pubic_dba_buffer_size sounds like the most appropriate guess as the one to address your buffer waits).

I would definitely raise the growing file size with Oracle support - after your change to the buffer size it might not matter any more but it's odd that the file keeps growing. However, you do need to pin down first of all whether or not you have added files (and if so how large) or extended some files (and by how much) in the interval since restarting BCT.

BCT works by "logging" which datafile blocks have been changed since the last rman backup, it does this by creating a bitmap for each data file in which one bit represents a 32KB contiguous section of the file; the bits are initially set to zero, and then changed to 1 if any block in the 32KB chunk is changed. By default Oracle keeps eight generations of the bitmap; I think the idea behind this is that you could then do a full backup once per week and 7 incremental backups on a daily basis and Oracle would then have a bitmap corresponding to the changes from one backup to the next). It's worth mentioning that there is a parameter that sets the 32KB chunk size, and you could affect the size of the BCT file by increasing the value of this parameter (though this would then mean backing up more datafile blocks for each block you actually changed). Have I mentioned that there's a paper by Alex Gorbachev of Pythian on this somewhere on the Internet, it's about 10 years old but I don't think I've seen anything newer.

The puzzle of "used" staying the same but "allocated" growing MIGHT be down to the recyclebin - there are some views in Oracle where it doesn't handle segments in the recyclebin consistently (see, for example: https://jonathanlewis.wordpress.com/2017/05/10/quantum-space/ ), so the AWR history may have managed to produce some misleading results. The other thing that might have an impact on the figures is the TEMPORARY tablespace - does it feature in your report, or does Oracle suppress it? It may be TEMP that keeps growing but is never reported as used.

Going back to the bitmaps - If we take 144TB (total allocation) and divide by 32KB (chunk size) we need 4.8 billion bits for a single map, which corresponds to 600 million bytes (call it 600MB) as the size of the minimum bitmap for the database. If Oracle is keeping 8 copies that would demand 4.8GB of pure bitmap. Clearly there has to be some "metadata" overhead but even with 100% overhead that's still only about 9GB, the size of your original file. So I think we need some very knowledgeable input on what's actually going on in the change tracking file that makes it that large to start with and then allows it to grow. I wonder if someone has set the number of generations to a value greater than 8, or if Oracle has adopted a dynamic approach that keeps adding generations until you take a full, or incremental 0 backup - there are lots of ways in which the technology may have been tweaked over the years to cater for extremely large database.

Regards
Jonathan Lewis

On Wed, 10 Feb 2021 at 18:08, Lok P <loknath.73_at_gmail.com> wrote:

>
> Thank You Jonathan.
>
> I am now seeing the size of the BCT file become ~12.46GB. It means it
> keeps growing. At Least it has increased by ~3GB in the last 2-3days when I
> checked/posted. I wonder if we are going to hit that issue again? Maybe we
> need to raise the same question with Oracle , why the size was 22Gb and now
> that it has become ~9GB after reinitiating the BCT but again growing?
>
> To your guess i need to confirm but as we are keeping adding data files so
> mostly DBAs must have resized a few of the datafiles. Bitmaps logic in
> relation to data files is going a bit over my head , But something odd I
> notice when I see the tablespace usage history from
> dba_hist_tbspc_space_usage using the below query. I am seeing the used
> space is consistent around 100TB throughout, but the allocated space keeps
> on growing (it's now ~144TB). And those (144-100)= ~44TB of space is huge
> as per my understanding. And checking with team i got to know in one
> instance , the dev team encountered the error (Ora- 01688 unable to extend
> table tab_part partition part1_04_02 by 8192 in tablespace tbs_mnth1) and
> thus DBAs endup adding datafile to that tablespace in similar scenarios
> which looks okay. But i am wondering why used space is still the same but
> allocated keep on growing (must be because we are adding more data files)
> or we are hitting some buggy behaviour with regards to storage space and
> that is anyway related to the BCT buffer space wait event?
>
> With regards to BCT parameter change, we did change to two different
> parameters and I think both have different significance. We increased
> _bct_buffer_allocation_max to ~1GB which didn't help in reducing the BCT
> buffer space wait. And after that we did change _bct_public_dba_buffer_size
> to ~256MB(from ~128MB), so my thought was may the later one actually the
> one helped getting rid of the BCT Buffer space waits.
>
> WITH ts_info
>
> AS ( SELECT dbid,
>
> ts#,
>
> tsname,
>
> MAX (block_size) block_size
>
> FROM dba_hist_datafile
>
> GROUP BY dbid, ts#, tsname),
>
> -- Get the maximum snaphsot id for each day from dba_hist_snapshot
>
> snap_info
>
> AS ( SELECT dbid,
>
> TO_CHAR (end_interval_time, 'MON-DD-YY') dd,
>
> MAX (s.snap_id) snap_id
>
> FROM dba_hist_snapshot s
>
> GROUP BY dbid, TO_CHAR (end_interval_time, 'MON-DD-YY'))
>
> SELECT s.dd,
>
> s.dbid,
>
> SUM (tablespace_size * f.block_size) / 1024 / 1024 / 1024 / 1024
>
> allocated,
>
> SUM (TABLESPACE_USEDSIZE * f.block_size) / 1024 / 1024 / 1024 /
> 1024
>
> used
>
> FROM dba_hist_tbspc_space_usage sp, ts_info f, snap_info s
>
> WHERE s.dbid = sp.dbid
>
> AND s.snap_id = sp.snap_id
>
> AND sp.dbid = f.dbid
>
> AND sp.tablespace_id = f.ts#
>
> GROUP BY s.dd, s.dbid
>
> ORDER BY TO_DATE (dd, 'MON-DD-YY')
>
> On Wed, Feb 10, 2021 at 6:37 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>>
>> *So I think it's mainly the increasing of _bct_pubic_dba_buffer_size
>> from ~128Mb to ~256MB which helped us in eliminating the block change
>> tracking buffer space wait here. *
>>
>> I don't think I can agree with that - after all you didn't get any
>> improvement when you increased the value from 128M to 1GB.
>> It seems more likely that th1e drop from 22GB to 9GB for the block change
>> tracking file was the most significant point.
>> This raises the question of how to prove, or disprove, the hypothesis;
>> but it also raises the question of WHY the file was 22GB in the first place.
>>
>> My first guess would be that at some point you were resizing data files,
>> which means the bitmaps had to grow, and either Oracle created conpletely
>> new bitmaps for a file or (perhaps more likely) it created a chain of
>> bitmap sections that was gradually scattered through the file. Maybe the
>> rate of growth was such that the bitmaps started wasting space, maybe you
>> also dropped/shrunk a few tablespaces/datafiles. leaving unused bitmaps in
>> the middle of the file.
>>
>> Whatever you did, it's possible that Oracle had to follow some sort of
>> linked list of pointers when it was updating the BCT file for some of the
>> updates, and the need to read the file and follow chains of pointer MIGHT
>> have been why write were slowed down and led to waits for the buffer to
>> clear. Stopping and restarting BCT would have produced a completely clean
>> - no chained sections - file, and so writes could be completed on the first
>> access, allowing the buffer to clear more quickly.
>>
>> Note - as a general rule the "end" of data files tends to be the part
>> most subject to change, which means if the BCT file had lots of linked
>> lists then most of the updates to the file would have been following the
>> longest linked list.
>>
>> If the hypothesis is correct then it could be tested - but it would be
>> very tedious to set up the experiment.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>>
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>> On Mon, 8 Feb 2021 at 04:52, Lok P <loknath.73_at_gmail.com> wrote:
>>
>>> Thanks much, Jonathan.
>>>
>>> We did a few things first we increased _bct_buffer_allocation_max to
>>> 1GB(from ~128MB) and large pool to 2GB(from ~1GB). But then we didn't
>>> notice any difference. Then we turn OFF the BCT , set the
>>> _bct_pubic_dba_buffer_size to ~256MB(from ~128MB) , and turn it ON again.
>>> Then I see the BCT file size has now become ~9GB(it was ~22Gb earlier). So
>>> now the "block change tracking buffer space" wait is gone and we have the
>>> data load queries and the database is performing as it was before.
>>>
>>> So I think it's mainly the increasing of _bct_pubic_dba_buffer_size
>>> from ~128Mb to ~256MB which helped us in eliminating the block change
>>> tracking buffer space wait here. And as you mentioned initially the buffer
>>> may not be growing proportionately as we added new datafiles to the system
>>> and thus it looks to us like a buggy behaviour as we endup manually bumping
>>> up the underscore parameter. But then keeping in mind the max size of this
>>> parameter as ~1GB and if in near future we keep adding some more datafiles,
>>> do we may also have to consider increasing this to ~512MB or ~1GB. Or
>>> should the current size ~256MB should take care?
>>>
>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Feb 12 2021 - 12:03:58 CET

Original text of this message