RE: parallel recovery slaves waiting on undo reads

From: Noveljic Nenad <nenad.noveljic_at_vontobel.com>
Date: Sat, 29 Feb 2020 18:43:47 +0000
Message-ID: <39649_1583001843_5E5AB0F3_39649_746_1_0bd79f38cc0d42439301fa389effa520_at_vontobel.com>



With regard to ‘db file parallel read’ with sync IO, there seems to be a difference between Solaris x64 and Linux. On Solaris I’m seeing a bunch of serial pread calls (as opposed to preadv on Linux). The sum of their durations seems to be contained in the wait event time. My observation refers to a non-multitenant 12.1 instance running on Solaris 11.4. Therefore, the async IO would bring a huge benefit, provided there’s enough bandwidth to handle parallel IO calls.

Best regards,

Nenad

https://nenadnoveljic.com/blog

From: Frits Hoogland <frits.hoogland_at_gmail.com> Sent: Samstag, 29. Februar 2020 12:35
To: Noveljic Nenad <nenad.noveljic_at_vontobel.com> Cc: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>; oracle-l_at_freelists.org Subject: Re: parallel recovery slaves waiting on undo reads

Sadly, wait event timing can be influenced by database parameters, and recently I found that multi tenant changed the way the timing was done too.

Wait events typically (but not always!) time system call(s), for which the wait event time sometimes is an indication of performance in another layer in the application stack. I often use wait events to talk to for example storage admins about performance. Therefore, it’s critical that the wait event timings do correlate with timings of the admins of the other layer, so we can work together. This is one of the most important reasons I study wait events to the level that I do; so I understand what the timing incorporates, and therefore can explain that to for example the storage admin.

Last time I checked, the db file parallel read wait event timing for asynchronous IO looked like this:

1. io_submit (multiple IOs via an iocb struct, see: http://man7.org/linux/man-pages/man2/io_submit.2.html)
2. start wait event
3. io_getevents (blocking; wait for all IOs to finish)
4. end wait event

So not the total IO time is timed (although a very little part of it isn’t), and indeed if you disregard the small part that isn’t timed, it’s the timing of the slowest IO of all IOs that this wait event shows.

For synchronous IO, preadv() performs the same function, but with submission and waiting combined. All the IOs are submitted via an iovec (https://linux.die.net/man/2/preadv) using a single system call. The timing of this system call is obvious:

1. start wait event
2. preadv
3. end wait event

I can’t find a definitive source that tells me how preadv is implemented on linux. I would assume that linux is prepared for modern IO and does not assume it’s operating on a single disk and therefore performing the different IOs serially, but as I said, I would love to be pointed to the kernel source where the vector read is performed to validate it being serial or parallel. So for pread, I hope it’s the maximum time of the slowest IO, but it could be the sum of all individual IO times (serial).

I recently studied log file parallel write (again) for a conference is poland. Much to my surprise I found that the log file parallel write timing was done in the following way

1. io_submit
2. io_getevents (non-blocking; if all IOs are found goto 6)
3. start wait event
4. io_getevents (blocking)
5. end wait event
6. done

In other words: if the IO subsystem is fast enough, the wait event does not occur at all. This is consistent with what I found years ago with Oracle’s asynchronous direct path read implementation.

However, this was with multi-tenancy turned on. With it turned off, the timing became:

1. start wait event
2. io_submit
3. io_getevents
4. end wait event

I am surprised that multi-tenancy has this massive change in timing implementation. Of course the wait event timing (the latency) does not change that much, and both essentially give the IO time of the longest taking IO. In the light of new technologies like persistent writable memory (“pmem”) I can see this making sense: if the IO is nearly instantaneous, assume it is, and only start the time accounting (alias wait event) if it turns out it isn’t.

Frits Hoogland

http://fritshoogland.wordpress.com<http://fritshoogland.wordpress.com/> frits.hoogland_at_gmail.com<mailto:frits.hoogland_at_gmail.com> Mobile: +31 6 14180860



Please consider the environment before printing this e-mail. Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken.

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css">p { font-family: Arial;font-size:9pt }</style>
</head>
<body>
<p>
<br>Important Notice</br>
<br />
This message is intended only for the individual named. It may contain confidential or privileged information. If you are not the named addressee you should in particular not disseminate, distribute, modify or copy this e-mail. Please notify the sender immediately by e-mail, if you have received this message by mistake and delete it from your system.<br /> Without prejudice to any contractual agreements between you and us which shall prevail in any case, we take it as your authorization to correspond with you by e-mail if you send us messages by e-mail. However, we reserve the right not to execute orders and instructions transmitted by e-mail at any time and without further explanation.<br /> E-mail transmission may not be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete. Also processing of incoming e-mails cannot be guaranteed. All liability of Vontobel Holding Ltd. and any of its affiliates (hereinafter collectively referred to as "Vontobel Group") for any damages resulting from e-mail use is excluded. You are advised that urgent and time sensitive messages should not be sent by e-mail and if verification is required please request a printed version.</br> Please note that all e-mail communications to and from the Vontobel Group are subject to electronic storage and review by Vontobel Group. Unless stated to the contrary and without prejudice to any contractual agreements between you and Vontobel Group which shall prevail in any case, e-mail-communication is for informational purposes only and is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction.<br /> The legal basis for the processing of your personal data is the legitimate interest to develop a commercial relationship with you, as well as your consent to forward you commercial communications. You can exercise, at any time and under the terms established under current regulation, your rights. If you prefer not to receive any further communications, please contact your client relationship manager if you are a client of Vontobel Group or notify the sender. Please note for an exact reference to the affected group entity the corporate e-mail signature. For further information about data privacy at Vontobel Group please consult <a href="https://www.vontobel.com">www.vontobel.com</a>.<br />
</p>
</body>
</html>

--
http://www.freelists.org/webpage/oracle-l
Received on Sat Feb 29 2020 - 19:43:47 CET

Original text of this message