Re: TAF and FCF together ?

From: <joeNOSPAM_at_BEA.com>
Date: 22 Sep 2006 13:37:57 -0700
Message-ID: <1158957477.315426.225120@k70g2000cwa.googlegroups.com>

kuassi.mensah_at_gmail.com wrote:
> Joe,
> See elements of response inline,
>
> > > > 1 - That document seems to state that some queries will
> > > > *not* survive failover, whether with a delay or not. Is this
> > > > true?
> > >
> > > Yes, TAF will not always successfully restart a query in progress. It
> > > reopens the cursors and attempts to discard rows already returned. In
> > > order to achieve that, it performs a chekcsum of to-be-discarded rows
> > > and compares that checksum against a checksum for the rows already
> > > returned. If the checksums are different, TAF knows the discarded rows
> > > are not the same as the rows already returned. In such a case, it will
> > > not resume returning rows and returns an error. Checksum discrepancies
> > > are more likely to happen with replica databases which are not
> > > block-for-block identical.
> >
> > Kuassi, thank you very much. Your actual knowledge of the topic
> > is invaluable.
>
> Well, i am just the spokesman of our dev teams; you have a great
> knowledge of the issues, yourself!

Well, thanks. Our chat will serve to educate others about TAF. Oracle is
my favorite DBMS, I recommend it. It is only when I have problems that I sound critical. I just want to make it explicit about how untransparent
TAF can be, especially for JDBC in a complex working environment.

> > So, what will make DBMSes likely to be non-identical? For instance, I
> > suspect there is a timing issue for TAF failover. If an application has
> > paused (such as being one of many threads waiting to run in an
> > application server) when the DBMS failure occurs, I am wondering if
> > it's connection failover and repositioning of it's cursor will happen
> > only
> > when the application makes it's next use of the connection. Is that
> > true? If so, there may be other application instances who get going
> > sooner after the failover, and if any of these make any changes to the
> > data involved in the query, our first application's query will likely
> > fail,
> > correct? In fact, even if TAF endeavours to reposition all the
> > application cursors immediately on failover, it the application has 200 connections
> > all being operated on independently by as many threads (maybe in
> > several JVMs), I'm guessing that it is unavoidable that TAF may allow
> > some of these threads to continue before others, and if any of these
> > change the data, other threads doing a (re)query of the data will fail.
>
> Yes, there are some queries that will not survive failover. OCI
> implicitly replays the query on the surviving instance using the
> original SCN and discards the rows that were previously returned. If,
> after failover, the same set of rows is not returned, then we throw an
> error indicating that the query must be re-executed. With use of the
> original SCN *and* the checksum verification on the re-fetched rows, it
> is unlikely that OCI will erroneously re-return rows that had been
> previously seen.

Ok. Can you home in for me on the timing question? For a given TAF-enabled connection, is the failover and reposition done when a user call makes OCI find that *this* connection has lost it's socket (eg)?
I am guessing that OCI doesn't catalog connections by the DBMS, so it can't/doesn't failover all the related connections at once, but will do so
individually as each connection incurs a problem. A quiescent connection
will not fail over until/unless used, right?

> > > > 2 - If I am in a transaction during a transparent failover,
> > > > what behavior will the Oracle driver exhibit that informs
> > > > me to roll back my transaction?
> > >
> > > JDBC is just a conduit b/w the RDBMS and TAF. (i) Upon node/instance
> > > failure, the connection is dead and all in-flight transactions are
> > > automatically rolled-back by the RDBMS. (ii) The JDBC Connection is
> > > failed/switched over by TAF. (iii) TAF issues an exception. (iv) TAF
> > > requires an ACK. (v) The application rolls back the transaction (which
> > > can be see as an ACK to TAF) and replays the transaction.
> >
> > The standard JDBC objects do not return any exception (due to the
> > failure/failover per se). It is only the callback that notifies the
> > application,
> > and the application must rollback/redo the transaction. If the
> > application is multi-threaded, and the callback is being processed by
> > and administrator thread, and the user thread is in the midst of any
> > possible JDBC call to the Oracle driver, does the Oracle driver (it's
> > internal synchronization etc) always allow the callback thread to
> > interrupt the user thread's JDBC operation and/or call a rollback on
> > the connection? The reason I ask is because I have seen deadlocks
> > on driver-internal locks when one user thread is doing it's JDBC and
> > another thread is trying to do management stuff like rollbacks, closing
> > JDBC objects etc.
>
> You will most likely see an ORA-25402 error. However applications
> should be prepared to handle errors in the 25400-25425 range and
> rollback appropriately.

Again, can I ask you about multithreaded access to Oracle JDBC objects? I am suggesting that if one Java thread gets the callback that Connection A has failed over and it's DBMS transaction has been rolled back, and some other Java thread is using connection A, and because the failover is transparent at the user level, this other thread is
still doing JDBC, perhaps querying and/or processing a Blob etc, I am concerned that the admin thread may not be able to stop the user thread it it's in the middle of a JDBC call.

Also, I assume that the autoCommit state of the session survives a failover, correct? I would hate if a user thread that thought it was operating in an autoCommit(false) mode, was suddenly doing updates that were the back-half of the terminated transaction, but were now being
committed individually. However, let's assume that the session stays in autoCommit(false), and the application has a transactional architecture
that is careful to avoid deadlocks by always locking datum A before locking datum B. If there's a failover when a JDBC thread is doing a transaction, and had locked datum A, the DBMS will now not have datum A locked. Unless we're careful, fast and lucky, can the user thread go on to issue it's update of datum B and a second update of datum A, and risk a deadlock?

I noticed that the Oracle driver does do some session altering during
initial log-in, eg: date format and NLS settings. This may only be the thin
driver, but because these are lost with failover, I assume that if the driver does the same via OCI, that it redoes it on failover.

> > > > 3 - If I have a set of PreparedStatements in my application
> > > > that I am continually re-using, will the failover ever affect
> > > > them and/or the DBMS cursors on which they depend?
> > >
> > > To my knowledge, PreparedStatement and their cursor are
> > > re-openned/executed. However If the application uses any stored
> > > procedures (i.e., CallableStatement), then the state of those
> > > procedures is lost after failover.
> >
> > Thanks. So prepared or callable statements that use stored
> > procedures will throw exceptions, and must be closed and replaced.
> > Prepared statements that execute plain SQL should always survive,
> > correct?
>
> The surviving PreparedStatement assumes that the Statement handle's
> bind values have not changed but this unlikey as different bind may
> return a different result set.
> At any rate, OCI transparently replays the execute-SQL/fetch during TAF
> assuming same set of rows is returned from server.

Ok, but to be clear, any prepared statement that was running a stored procedure is dead, and must be replaced, correct? Or does OCI also transparently re-execute the procedure? What about executeBatch()? Does TAF/OCI re-execute the whole batch?

You could also maybe tell me about how much support for TAF there is in the Oracle Application Server? For instance, with applications built from generated EJBs etc, does Oracle generate EJBs that buffer all their input so they can automatically replay their transactional
input? If EJB A does a query and for each row it calls EJB B or EJB C with the row's data etc, and these EJBs do updates based on permutations
of the input, and after 30 loops of this and a similar descent from each of
the other EJBs, when there's a failover, how often does the Oracle Application
Server make it transparently OK?

Joe Weinstein at BEA Systems Received on Fri Sep 22 2006 - 15:37:57 CDT