Re: Redo per transaction inconsistency when running SLOB

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Sat, 25 Apr 2020 09:43:52 +0100
Message-ID: <CAGtsp8=C9Yo2Ct176mienxZ=FPdWunjcixpW5Nb_U86855KoBg_at_mail.gmail.com>



Simone,

If the question had been comparing (e.g.) Siebel 8 and Siebel 9 on "the same" data set prior to going to a new version in production I would agree with your comments; however the question is about running SLOB, which is a bench-mark utility that should genereate a nearly identical workload if you re-run with the same control parameters. That's why we can expect very similar numbers of db block changes per transaction and similar volumes of redo per transaction.

Regards
Jonathan Lewis

On Sat, Apr 25, 2020 at 2:18 AM Simone Fumasoli <simone.fumasoli_at_gmail.com> wrote:

> I'm still not so sure to see a real problem with the I/O, given the
> description. Let me see if I understood correctly:
> The DB on the new machine is a copy of the one on the old one, but you say
> it's a Physical Standby... so are you measuring the performance based on
> the standby recovery activity, or you change it into a snapshot standby to
> execute the tests?
>
> From the awr data, I would say it's the latter, since the block changes
> are quite different, unless you're taking two different instances of a RAC
> database. Given the difference in activity, how can we actually compare the
> numbers? And given that the new machine hosts a standby database, are
> standby redo logs present on one or both? Is the old one the primary or is
> there another database altogether?
>
> Personally, I'm not convinced by the direct correlation between block
> changes and redo size: even assuming that the block size is the same on
> both db, they indicate two different things. Block changes are a simple
> counter of the changes made to a block into the SGA, while Redo Size is the
> actual size of the modified data.
>
> Now, a change on a block may actually imply a change on differently sized
> data, depending of what changed on the block, the size of the record, of
> the fields, and so on. So, that roughly the same number of changes produces
> different sized modified data, per se, doesn't mean anything necessarily
> strange, IMHO.
>
> Same goes for redo size per transaction: that's simply an average and it's
> merely an indication of the kind of job that the database has been doing in
> the period covered by the report, but that depends on the workload, not on
> the efficiency of the database. A transaction that generated 50k of redo
> size, but takes 1hr to commit, could be an issue, even if the actual work
> is little.
>
> I really don't see a real reason, unless of course I missed something, for
> thinking about issues with the I/O, rather the load is different and so are
> those statistics.
>
> You want to check what SQL is maybe generating the REDO: I would try to
> look for the SQL Ordered by Elapsed time and User I/O time, focusing on
> DMLs or procedures.
>
> I would also take a quick look at the foreground wait events, with
> particular attention to disk and log file sync waits and of course, paying
> attention to the average time.
>
> And I would look for the IOStat summaries to see if the average wait time
> for LGWR are maybe too high. But once again, I wouldn't suspect that just
> by seeing that data.
>
> To say that a DB is "slow" is not always immediate, but there are some
> metrics that are more important than others, or simply empirical experience
> ("this query/funcion/operation takes 5 seconds on site A, but it takes 10
> minutes on site B") and often it's even tricky to have all the activities
> performing equally well.
>
> To compare two systems so different is even trickier: the configuration is
> different, possibly the architecture is different (why on the old one there
> are no hugepages? Do the both use ASMM or is the old one with AMM? ) I
> expect the disks to be different... to compare the performances, they
> should at least have the exact same load and it doesn't look like that's
> the case, either.
>
> If you want, clarify how the copy was made, who's the primary, how the DB
> was opened in write and what are the exact differences between the two
> environments: machine, db configuration, log files - standby log included -
> and disks. And most of everything: what made you come to the conclusion
> that there was a problem with the I/O? I read and re-read but I couldn't
> find the evidence, or even a lead. And that's important, to be sure that
> the investigation is actually correctly addressed.
>
> Cheers,
> Simone
>
>
> Il giorno ven 24 apr 2020 alle ore 12:36 Jonathan Lewis <
> jlewisoracle_at_gmail.com> ha scritto:
>
>>
>> The first thing that stands out is that there's a significant difference
>> in the number of db block changes per commit (i.e. per transaction) -
>> 135 per commit for the fast db, 162 per commit for the slow db. If
>> there's little else going on at the same time as SLOB then that's an
>> important clue - we need to see "redo entries" as well as a further clue.
>>
>> I've written previously about anomalies that result in changes to things
>> like db block changes, redo entries and performance for the same code. Off
>> the cuff the things that spring to mind are: (a) private redo disabled -
>> so no "large" redo entries generated in private, (b) some quirky little bug
>> when auditing was enabled (c) a couple of features that change "update"
>> into "select for update/ update" (d) trigger declarations - even NULL ones
>> (e) supplemental logging
>>
>> Regards
>> Jonathan Lewis
>>
>>
>> On Fri, Apr 24, 2020 at 10:41 AM Paul Houghton <
>> Paul.Houghton_at_uis.cam.ac.uk> wrote:
>>
>>> Thanks for the responses.
>>>
>>>
>>>
>>> I have a third “identical” database on another server which is also
>>> slow, so I thought I would play with that. It also generates more redo, and
>>> redo I/O seems to be the bottleneck.
>>>
>>>
>>>
>>> I took the pfile from the “fast” database and applied it to this one
>>> (Just had to change the name, and the location of some files). Redo is
>>> still generated at twice the rate. This rules out standby as a cause of the
>>> issue. I suppose I need to investigate Jonathans suggestion to see whether
>>> SLOB is doing something different between the two servers, but it really
>>> shouldn’t because the configuration is identical. Also any workload seems
>>> slower, and potentially redo I/O bound. I will have to do that on Monday.
>>>
>>>
>>>
>>> Nothing looks strange in AWRs calculations. Redo does seem to be the
>>> limiting factor, so if I could get the slow db to generate less redo like
>>> the fast one, it would have a big impact on performance. I checked the redo
>>> block size and it is 512 on both. Investigating whether 4K is better for
>>> the SAN is another rabbit hole, but isn’t a cause of difference here.
>>>
>>>
>>>
>>> Statistic
>>>
>>> Fast DB
>>>
>>> Slow DB
>>>
>>> Db block changes
>>>
>>> 79,758,656
>>>
>>> 27,719,598
>>>
>>> redo size
>>>
>>> 32,053,006,128
>>>
>>> 34,648,068,868
>>>
>>> user commits
>>>
>>> 592,015
>>>
>>> 170,871
>>>
>>> user rollbacks
>>>
>>> 2
>>>
>>> 1
>>>
>>>
>>>
>>>
>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Sat Apr 25 2020 - 10:43:52 CEST

Original text of this message