Re: Comparing apples to apples on Exadata

From: Tanel Poder <tanel_at_tanelpoder.com>
Date: Mon, 18 Dec 2017 21:00:36 +0200
Message-ID: <CAMHX9JKD6O0Mv-4Cd_rP2OYmLsn9BX+64pagnrNwWBpGW-msRg_at_mail.gmail.com>



I just listed a few semi-educated guesses of what *could* be causing this, in addition to Jonathan's suggestions... but they are guesses.

The easiest way to start narrowing this down would be to either run Snapper or just look into V$SESSTAT metrics after running both of these queries in their databases and looking for metrics like Jonathan mentioned (table fetch continued row, %undo records applied ones and cell blocks processed% ones). If you paste the numbers here, we can help :)

--
Tanel Poder
http://blog.tanelpoder.com


On Mon, Dec 18, 2017 at 6:57 PM, Will Beldman <wbeldma_at_uwo.ca> wrote:


> Thanks for your input Tanel,
>
> So if I understand you correctly,
> 1) you are opining that the table in Database 2 is getting written to
> regularly and therefore it is triggering full block reads to guarantee
> correctness? I would have to check with the user but I don't get the
> impression that this table is written to very often.
> 2) I might try to build an identical table in Database 2 and copy the data
> in
> to see if dropping the data in in one shot will resolve any storage nuances
> 3) The table DOES have 264 columns but it is a summation across 4 of them
> (all
> of which are < 255).
>
> On Sunday December 17 2017 09:32:27 PM Tanel Poder wrote:
> > Yep - looks like I didn't pay enough attention when skimming through the
> > original email - I thought that the top 2 SQLMon reports were the good
> ones
> > and bottom 2 the bad ones. Should have read the explanation!
> >
> > Indeed the smart scan had kicked in in both cases listed in the 1st
> > section, so not much to add to Jonathan's comments ... perhaps some ideas
> > about what may cause this (minding that there's no info about the
> DB/table
> > layout differences, if there's any difference in query text/columns
> > accessed in different DBs etc):
> >
> > 1) there's transactional activity on the queried table in the slow
> > database, but not in the fast database. in the slow database the smart
> scan
> > has to fall back to regular block shipping mode for many blocks due to
> > consistent reads (and/or delayed block cleanouts) that also need to
> access
> > undo blocks. this can cause orders of magnitude performance difference
> for
> > scanning.
> > 2) the rows are very wide and in the slow database they are chained, but
> > not in the fast database (if the fast DB has bigger block size than
> smaller
> > - this assumes no HCC is used)
> > 3) the query in the fast DB accesses columns lower than column number
> 255,
> > in slow DB higher than col #255 (historically there have been a few bugs
> > with smart scans & queries that access data from columns >255)
> >
> > There are other reasons, but no need to guess, looking into V$SESSTAT
> > metrics for this query should help to narrow down the root cause
> > systematically.
> >
> > --
> > Tanel Poder
> > https://blog.tanelpoder.com
> >
> >
> > On Fri, Dec 15, 2017 at 5:29 PM, Jonathan Lewis <
> jonathan_at_jlcomp.demon.co.uk
> > > wrote:
> > >
> > >
> > > Tanel,
> > >
> > > The top set both show Cell Offload, so I don't think it's (purely) a
> smart
> > > scan thing.
> > >
> > > I'd like to see the session stats for the executions, since that will
> > > probably explain what the I/O and CPU are spent on, but from the top
> set
> > > one thing I note is that the cell physical reads are single block, and
> the
> > > slow system does 62,900 compared to 6,000 on the fast system. To me
> this
> > > suggests either a chained rows or a read-consistency/cleanout/commit
> time
> > > effect on the slow system.
> > >
> > > With repeated executions and offload disabled the slow query does 4M
> > > buffer gets (which is remarkably similar to the number of rows is that
> a
> > > coincidence, or a comment on the number of migrated rows?) and only
> 324MB
> > > read in 9,000 read requests against 3GB in 6,000 requests - that
> suggests
> > > a
> > > lot of table blocks cached on the tablescan - perhaps because so many
> of
> > > them have been pre-read and got into the hot cache area thanks to
> > > "continued row fetches" in the initial runs.
> > >
> > >
> > > Regards
> > > Jonathan Lewis
> > >
> > > ________________________________________
> > > From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on
> > > behalf of Tanel Poder <tanel_at_tanelpoder.com>
> > > Sent: 14 December 2017 23:06:37
> > > To: wbeldma_at_uwo.ca
> > > Cc: ORACLE-L
> > > Subject: Re: Comparing apples to apples on Exadata
> > >
> > > In the fast case, the direct path read (thus also smart scan) has
> kicked
> > > in. In the slow case it hasn't and regular buffered IO via buffer
> cache is
> > > used.
> > >
> > > One indicator of that is the fast query having a Cell offload column in
> > > its stats, the slow query doesn't (therefore there was no smart scan).
> > >
> > > The other indicator is that you have "cell...physical read" wait events
> > > showing up in the slow scenario. Smart scan can be so fast
> asynchronously
> > > feeding data for your query, so that in the fast case you see only CPU
> > > usage and no IO wait events at all.
>
>
-- http://www.freelists.org/webpage/oracle-l
Received on Mon Dec 18 2017 - 20:00:36 CET

Original text of this message