Re: Point at which WAN latency affects perceived app response?

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: 14 Sep 2006 17:24:58 -0700
Message-ID: <1158279898.535391.179780@e3g2000cwe.googlegroups.com>

sphealey wrote:
> Does anyone have any experience with the effect of WAN latency on
> applications accessing an Oracle database over a WAN?
>
> We are using Oracle Server 9ir2 (9.2.0.7.0) as our back end database
> for a packaged 2-tier client/server business application. The
> development tool and network APIs for the application are fairly
> obscure, but they must touch on SQL*Net eventually as the Oracle client
> is required.
>
> We have two sites. Each has an Oracle database for the portion of the
> app that serves that site's business function. But of course, people
> from A need to access the primary application for B, and people from B
> need to access the primary application for A.
>
> To this point we have served the cross-site requirements by installing
> both sets of software at both sites, and simply accessing the Oracle
> data directly across the WAN. Brutal and perhaps not elegent, but
> simple and straightforward (particularly compared to an experiment with
> Citrix).
>
> We are now in the process of replacing our T1 leased line between the
> sites with an MPLS-based network. Each primary site A and B has a 3
> Mbps MPLS port. There are a few other minor sites that connect, none
> that run the business apps.
>
> Bandwidth on the MPLS connection is good: transferring large files of
> random numbers I see close to full port thruput between the sites.
> E-mail, web, etc work fine.
>
> But when we attempted to bring up the Oracle-based apps the results
> were not so good. Perceived performance got worse by a factor of 4,
> and stopwatch timings confirm that the new configuration is slower by
> at least a factor of 3.
>
> Looking at ping times, I find the following for the two different
> connections:
>
> Packet size Leased Line MPLS
> =========== =========== =========
> 256 6 ms 48 ms
> 512 11 ms 55 ms
> 1024 17 ms 69 ms
> 2048 26 ms 84 ms
>
> Now, clearly less latency is better than more latency. Still, 60 ms
> latency between sites is considered quite good on most WANs of any size
> and I would think most Oracle users would be in that range or higher.
> Would 60 ms be sufficient to cause the application to appear to the end
> users as noticably slow?
>
> I have been trying to watch the connection with Ethereal, and one thing
> I noticed was that there seem to be a lot of fragemented packets in the
> 100-300 byte range. Other tools confirm that the non-fragmented MTU
> for the link is 1500 bytes (same as the leased line). Does it make
> sense that either the server or the router would break larger packets
> up that small?
>
> My knowledge in this area is like the Platte River (wide and shallow),
> so any input would be appreciated.
>
> sPh

I suspect that this client/server application is designed as a database platform independent application that supports a handful of different database engines, rather than just Oracle, with the authors of the application having never read chapter 1 of either of Tom Kyte's books. The client/server application is likely shooting for the least common denominator of the supported database platforms - Ex: DB platform 1 does not support inline views, DB platform 2 does not support CONNECT BY PRIOR, DB platform 3 has a problem if the application does NOT try to commit 25,000 times per second, DB platform 4 requires a slightly different SQL syntax for table joins. Instead of one SQL statement to perform an operation, which would incur one unit of latency over the WAN link, the client/server application uses 100 SQL statements, or feeds the bind variables 100 times, which then incurs 100 units of latency over the WAN link rather than one.

For example, consider the following SQL statement: SELECT

  O.WORKORDER_SUB_ID,
  O.SEQUENCE_NO,
  O.RESOURCE_ID,

DECODE(O.WORKORDER_TYPE,'M','M','Q','Q',O.STATUS) STATUS,

  O.SCHED_FINISH_DATE SCHED_FINISH_DATE,
  O.SCHED_START_DATE SCHED_START_DATE,
  O.USER_7 MRP_START_DATE,
  O.SETUP_HRS,
  O.RUN,
  O.RUN_TYPE,
  O.RUN_HRS,
  O.MOVE_HRS,
  O.MINIMUM_MOVE_QTY,
  O.CALC_END_QTY,
  O.COMPLETED_QTY,

  NVL(OPR.CONCURRENT,0) CONCURRENT,
  OB.BITS BITS
FROM
  OPERATION O,
  OPERATION_BINARY OB,
  (SELECT

    OPR.WORKORDER_TYPE,
    OPR.WORKORDER_BASE_ID,
    OPR.WORKORDER_LOT_ID,
    OPR.WORKORDER_SPLIT_ID,
    OPR.WORKORDER_SUB_ID,
    OPR.SEQUENCE_NO,

    COUNT(OPR.RESOURCE_ID) CONCURRENT
  FROM
    OPERATION_RESOURCE OPR
  WHERE
    OPR.WORKORDER_TYPE= :1

    AND OPR.WORKORDER_BASE_ID= :2
    AND OPR.WORKORDER_LOT_ID= :3
    AND OPR.WORKORDER_SPLIT_ID= :4

GROUP BY

    OPR.WORKORDER_TYPE,
    OPR.WORKORDER_BASE_ID,
    OPR.WORKORDER_LOT_ID,
    OPR.WORKORDER_SPLIT_ID,
    OPR.WORKORDER_SUB_ID,
    OPR.SEQUENCE_NO) OPR

WHERE
O.WORKORDER_TYPE= :5

  AND O.WORKORDER_BASE_ID= :6
  AND O.WORKORDER_LOT_ID= :7
  AND O.WORKORDER_SPLIT_ID= :8
  AND O.WORKORDER_TYPE=OB.WORKORDER_TYPE(+)
  AND O.WORKORDER_BASE_ID=OB.WORKORDER_BASE_ID(+)
  AND O.WORKORDER_LOT_ID=OB.WORKORDER_LOT_ID(+)
  AND O.WORKORDER_SPLIT_ID=OB.WORKORDER_SPLIT_ID(+)
  AND O.WORKORDER_SUB_ID=OB.WORKORDER_SUB_ID(+)
  AND O.SEQUENCE_NO=OB.SEQUENCE_NO(+)
  AND O.WORKORDER_TYPE=OPR.WORKORDER_TYPE(+)
  AND O.WORKORDER_BASE_ID=OPR.WORKORDER_BASE_ID(+)
  AND O.WORKORDER_LOT_ID=OPR.WORKORDER_LOT_ID(+)
  AND O.WORKORDER_SPLIT_ID=OPR.WORKORDER_SPLIT_ID(+)
  AND O.WORKORDER_SUB_ID=OPR.WORKORDER_SUB_ID(+)
  AND O.SEQUENCE_NO=OPR.SEQUENCE_NO(+)

ORDER BY
O.WORKORDER_SUB_ID,
O.SEQUENCE_NO; This SQL statement will only work on one database platform: Oracle. How many individual, database independent SQL statements would the client send to the database server to perform the same task? The logic of the above SQL statement may be further wrapped in another procedure that steps through the parent record of the "OPERATION", so this may further add to the round trips between the server and client.

Another possible problem area is that the syntax of the SQL statement may need to be slightly different from one database platform to another. Oracle requires NVL, DB platform 2 requires @NVL, DB platform 3 requires ISNULL. Depending on where the translation library is located, and whether or not the client locally caches the translation library contents after each read, the communication between the client side of the WAN link and the database side of the WAN link may incur an additional unit of latency for each SQL statement to be sent to the database server so that the translation library may be read.

Nagle and delayed ACKs: there is a document floating around the Internet that appears to be authored by Microsoft, which suggests as a performance tuning method to increase a value in the Windows registry from 2 to 13, which controls the number of packets to be sent/received before expecting to send/receive an ACK for the previous packets. If both computers are not set the same, delays of roughly 0.2 seconds (plus the latency) will be incurred while the receiving side waits for packets 3 through 13 to arrive. When the packets fail to arrive in the 0.2 second timeout period, the client sends an ACK for packets 1 and 2. How severe is this? It can cause a 134MB file, which normally transfers in roughly 3 seconds over a gigabit connection, to require 45 minutes for the transfer to complete.

I am sure that the more experienced database experts who frequent this group will be able to offer additional details.

Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc. Received on Thu Sep 14 2006 - 19:24:58 CDT