Re: TAF and FCF together ?

From: <kuassi.mensah_at_gmail.com>
Date: 25 Sep 2006 22:17:10 -0700
Message-ID: <1159247830.158659.8450@h48g2000cwc.googlegroups.com>

joeNOSPAM_at_BEA.com wrote:
> kuassi.mensah_at_gmail.com wrote:
> > Joe,
> > See elements of response inline,
> >
> > > > > 1 - That document seems to state that some queries will
> > > > > *not* survive failover, whether with a delay or not. Is this
> > > > > true?
> > > >
> > > > Yes, TAF will not always successfully restart a query in progress. It
> > > > reopens the cursors and attempts to discard rows already returned. In
> > > > order to achieve that, it performs a chekcsum of to-be-discarded rows
> > > > and compares that checksum against a checksum for the rows already
> > > > returned. If the checksums are different, TAF knows the discarded rows
> > > > are not the same as the rows already returned. In such a case, it will
> > > > not resume returning rows and returns an error. Checksum discrepancies
> > > > are more likely to happen with replica databases which are not
> > > > block-for-block identical.
> > >
> > > Kuassi, thank you very much. Your actual knowledge of the topic
> > > is invaluable.
> >
> > Well, i am just the spokesman of our dev teams; you have a great
> > knowledge of the issues, yourself!

> Well, thanks. Our chat will serve to educate others about TAF. Oracle
> is
> my favorite DBMS, I recommend it. It is only when I have problems that
> I sound critical. I just want to make it explicit about how
> untransparent
> TAF can be, especially for JDBC in a complex working environment.

>


> > > So, what will make DBMSes likely to be non-identical? For instance, I

> > > suspect there is a timing issue for TAF failover. If an application has

> > > paused (such as being one of many threads waiting to run in an

> > > application server) when the DBMS failure occurs, I am wondering if

> > > it's connection failover and repositioning of it's cursor will happen

> > > only

> > > when the application makes it's next use of the connection. Is that

> > > true? If so, there may be other application instances who get going

> > > sooner after the failover, and if any of these make any changes to the

> > > data involved in the query, our first application's query will likely

> > > fail,

> > > correct? In fact, even if TAF endeavours to reposition all the

> > > application cursors immediately on failover, it the application has 200 connections

> > > all being operated on independently by as many threads (maybe in

> > > several JVMs), I'm guessing that it is unavoidable that TAF may allow

> > > some of these threads to continue before others, and if any of these

> > > change the data, other threads doing a (re)query of the data will fail.

> >

> > Yes, there are some queries that will not survive failover.  OCI

> > implicitly replays the query on the surviving instance using the

> > original SCN and discards the rows that were previously returned.  If,

> > after failover, the same set of rows is not returned, then we throw an

> > error indicating that the query must be re-executed.  With use of the

> > original SCN *and* the checksum verification on the re-fetched rows, it

> > is unlikely that OCI will erroneously re-return rows that had been
> > previously seen.

> Ok. Can you home in for me on the timing question? For a given
> TAF-enabled connection, is the failover and reposition done when
> a user call makes OCI find that *this* connection has lost it's socket
> (eg)?
> I am guessing that OCI doesn't catalog connections by the DBMS, so
> it can't/doesn't failover all the related connections at once, but will
> do so individually as each connection incurs a problem. A quiescent
> connection will not fail over until/unless used, right?

Failover occurs when OCI detects that a particular socket has been closed. A quiescent conneciton will not failover until it is used

> > > > > 2 - If I am in a transaction during a transparent failover,
> > > > > what behavior will the Oracle driver exhibit that informs
> > > > > me to roll back my transaction?
> > > >
> > > > JDBC is just a conduit b/w the RDBMS and TAF. (i) Upon node/instance
> > > > failure, the connection is dead and all in-flight transactions are
> > > > automatically rolled-back by the RDBMS. (ii) The JDBC Connection is
> > > > failed/switched over by TAF. (iii) TAF issues an exception. (iv) TAF
> > > > requires an ACK. (v) The application rolls back the transaction (which
> > > > can be see as an ACK to TAF) and replays the transaction.
> > >
> > > The standard JDBC objects do not return any exception (due to the
> > > failure/failover per se). It is only the callback that notifies the
> > > application,
> > > and the application must rollback/redo the transaction. If the
> > > application is multi-threaded, and the callback is being processed by
> > > and administrator thread, and the user thread is in the midst of any
> > > possible JDBC call to the Oracle driver, does the Oracle driver (it's
> > > internal synchronization etc) always allow the callback thread to
> > > interrupt the user thread's JDBC operation and/or call a rollback on
> > > the connection? The reason I ask is because I have seen deadlocks
> > > on driver-internal locks when one user thread is doing it's JDBC and
> > > another thread is trying to do management stuff like rollbacks, closing
> > > JDBC objects etc.
> >
> > You will most likely see an ORA-25402 error. However applications
> > should be prepared to handle errors in the 25400-25425 range and
> > rollback appropriately.

> Again, can I ask you about multithreaded access to Oracle JDBC
> objects? I am suggesting that if one Java thread gets the callback
> that Connection A has failed over and it's DBMS transaction has been
> rolled back, and some other Java thread is using connection A, and
> because the failover is transparent at the user level, this other
> thread is still doing JDBC, perhaps querying and/or processing a Blob etc,
> I am concerned that the admin thread may not be able to stop the
> user thread it it's in the middle of a JDBC call.

Two different threads cannot both have an active call on the same connection, so I am not sure what is meant by "stopping the user thread if it's in the middle of a JDBC call."

> Also, I assume that the autoCommit state of the session survives
> a failover, correct? I would hate if a user thread that thought it was
> operating in an autoCommit(false) mode, was suddenly doing updates
> that were the back-half of the terminated transaction, but were now
> being committed individually. However, let's assume that the session stays
> in autoCommit(false), and the application has a transactional
> architecture that is careful to avoid deadlocks by always locking datum A before
> locking datum B. If there's a failover when a JDBC thread is doing a
> transaction, and had locked datum A, the DBMS will now not have
> datum A locked. Unless we're careful, fast and lucky, can the user
> thread go on to issue it's update of datum B and a second update of
> datum A, and risk a deadlock?
> I noticed that the Oracle driver does do some session altering
> during initial log-in, eg: date format and NLS settings. This may only be the
> thin driver, but because these are lost with failover, I assume that if the
> driver does the same via OCI, that it redoes it on failover.
>

> > > > > 3 - If I have a set of PreparedStatements in my application
> > > > > that I am continually re-using, will the failover ever affect
> > > > > them and/or the DBMS cursors on which they depend?
> > > >
> > > > To my knowledge, PreparedStatement and their cursor are
> > > > re-openned/executed. However If the application uses any stored
> > > > procedures (i.e., CallableStatement), then the state of those
> > > > procedures is lost after failover.
> > >
> > > Thanks. So prepared or callable statements that use stored
> > > procedures will throw exceptions, and must be closed and replaced.
> > > Prepared statements that execute plain SQL should always survive,
> > > correct?
> >
> > The surviving PreparedStatement assumes that the Statement handle's
> > bind values have not changed but this unlikey as different bind may
> > return a different result set.
> > At any rate, OCI transparently replays the execute-SQL/fetch during TAF
> > assuming same set of rows is returned from server.

> Ok, but to be clear, any prepared statement that was running a
> stored procedure is dead, and must be replaced, correct?
this is correct

>Or does OCI also transparently re-execute the procedure? Nope

> What about executeBatch()? Does TAF/OCI re-execute the whole batch?

No idea

> You could also maybe tell me about how much support for TAF there
> is in the Oracle Application Server?

Oracle AS rather integrates FCF and transparently handles connection retry for CMP Beans. I don't have much details on the remaining part of the question relative to EJBs

>For instance, with applications

> built from generated EJBs etc, does Oracle generate EJBs that
> buffer all their input so they can automatically replay their
> transactional
> input? If EJB A does a query and for each row it calls EJB B or EJB C
> with the row's data etc, and these EJBs do updates based on
> permutations
> of the input, and after 30 loops of this and a similar descent from
> each of the other EJBs, when there's a failover, how often does the Oracle
> Application Server make it transparently OK?
>
> Joe Weinstein at BEA Systems
Received on Tue Sep 26 2006 - 00:17:10 CDT