Re: Latency of "direct path reads"

From: Frits Hoogland <frits.hoogland_at_gmail.com>
Date: Mon, 19 Aug 2013 22:44:42 +0200
Message-Id: <ECA27188-DDEF-4532-A5FA-0D55D4C67862_at_gmail.com>

Content-Transfer-Encoding: quoted-printable Content-Type: text/plain;

charset=windows-1252
In order to test networking between two sides of a network, use 'iperf'.=20=

The direct path read code does a lot of things different than the = traditional multiblock-read code path.
Sadly, this behavior is not or as good as not documented by Oracle.

Takeaways: >> please mind these are the results from testing on Linux = x64 with Oracle 11.2.0.1 & 11.2.0.3 with local attached disks and ASM=20 - direct path read starts off with 2 IOs in parallel - heuristics determine if the number IOs in parallel scale up (and = supposedly down, but never saw that)
- if IOs are fast enough, no wait is shown in the tracefile (fast enough = means all IOs are ready)

Frits Hoogland

http://fritshoogland.wordpress.com
frits.hoogland_at_gmail.com
+31 6 53569942

On Aug 19, 2013, at 6:45 PM, "Hameed, Amir" <Amir.Hameed_at_xerox.com> = wrote:

> Hi Folks,
> We migrated our single instance ERP database to a four-node RAC system =
about a week ago. As part of this migration, we also moved from SAN to = NAS. Here is a high-level architecture of the environment:
> =D8 RDBMS and Grid versions=11.2.0.3 with April PSU

>=20

> =D8 Each node has two 10Gbe paths to the NAS.

>=20

> =D8 Database is configured with DNFS.
>=20
> =D8 Storage vendor is EMC. NAS Data Movers are connected to the EMC =
VMAX storage.
>=20
> =D8 Database files are sitting on SSD drives with RAID-5 =
configuration.

>=20

> =D8 Redo log files are on RAID-10

>=20
> Since migrating to RAC, we are seeing high latencies on some batch =
jobs. Jobs that used to run in 45 minutes are now taking a few hours to = run. We have a copy of production that is setup the same way as = production (same disk layout, same hardware, etc.) with the only = difference that non-production is a three-node RAC environment. If I = take a simple statement to force a full scan in parallel on a large = table and run it against production and the production-sized test = environment, the average wait of "direct path reads" from 10046 trace is = almost three times higher in production than in non-production. Here is = the statement that I am running:
>=20
> select /*+ full(MMT) parallel(MMT,8) */ count(*) from =
apps.mtl_material_transactions MMT ;

>=20


> Production

> call     count       cpu    elapsed       disk      query    current   =

     rows

> ------- ------ -------- ---------- ---------- ---------- ---------- =

> Parse 1 0.00 0.00 0 0 0 =

        0


> Execute      1      0.02       0.02          0         31          0   =

        0


> Fetch        2    185.89     888.70    7706356    7706545          0   =

        1

> ------- ------ -------- ---------- ---------- ---------- ---------- =

> total 4 185.91 888.73 7706356 7706576 0 =

>=20


> Elapsed times include waiting on following events:

>  Event waited on                             Times   Max. Wait  Total =

Waited


>  ----------------------------------------   Waited  ----------  =

------------


>  reliable message                                2        0.00         =

 0.00


>  enq: KO - fast object checkpoint                4        0.00         =

 0.00


>  SQL*Net message to client                       2        0.00         =

 0.00


>  Disk file operations I/O                      117        0.00         =

 0.00


>  direct path read                            83856        3.03        =

465.14


>  db file sequential read                        14        0.00         =

 0.02


>  SQL*Net message from client                     2        0.00         =

0.00
> =

**************************************************************************=

>=20


> Non-production

> call     count       cpu    elapsed       disk      query    current   =

     rows

> ------- ------ -------- ---------- ---------- ---------- ---------- =

> Parse 1 0.00 0.00 0 2 0 =

        0


> Execute      1      0.03       0.03          0         31          0   =

        0


> Fetch        2    148.36     224.13    7103122    7103268          0   =

        1

> ------- ------ -------- ---------- ---------- ---------- ---------- =

> total 4 148.39 224.17 7103122 7103301 0 =

>=20


> Elapsed times include waiting on following events:

>  Event waited on                             Times   Max. Wait  Total =

Waited


>  ----------------------------------------   Waited  ----------  =

------------


>  enq: KO - fast object checkpoint                6        0.00         =

 0.00


>  reliable message                                2        0.00         =

 0.00


>  enq: PS - contention                           52        0.00         =

 0.01


>  SQL*Net message to client                       2        0.00         =

 0.00


>  Disk file operations I/O                      106        0.00         =

 0.00


>  direct path read                             3773        1.49         =

67.53


>  SQL*Net message from client                     2      845.16        =

845.16
> =

**************************************************************************=

>=20
> The raw trace file shows that some IOs in production are taking over a =
second to complete as shown below:
>=20
> WAIT #18446744071452723008: nam=3D'direct path read' ela=3D 1109814 =
file number=1125 first dba"2080 block cnt=128 obj#r04 tim=950081082913 >=20
> We have EMC and Oracle engaged and are trying to diagnose the issue. =
Since, this is DNFS and all IOs are going over the network layer, we had = our network folks look at the network layer and according to them all is = clean there, however I am not convinced. EMC is reporting that from the = NAS side, they are seeing an average service response of 2-4ms. >=20
> My question is, is there a way to send a large packet from the RAC =
host side and gauge its latency? Ping can work for up to 32k size packet = but because Oracle tries to do a 1M IO for direct path reads, I am = trying to see if there is a tool available that can allow me to send = larger packets and gauge its latency.

>=20

> Any help will be appreciated.

>=20

> Thanks,
> Amir

>=20
>=20
>=20

> --
> http://www.freelists.org/webpage/oracle-l
>=20
>=20

--
http://www.freelists.org/webpage/oracle-l

Received on Mon Aug 19 2013 - 22:44:42 CEST

This message: [ Message body ]
Next message: Adric Norris: "Re: New Non-ASM Standby Trying to use ASM during recovery"
Previous message: Eriovaldo Andrietta: "Re: What is the main purpose of the Oracle Version 12c ?"
In reply to: Hameed, Amir: "Latency of "direct path reads""
Next in thread: Austin Hackett: "Re: Latency of "direct path reads""

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message