Re: parallel query server stays on hp for killed processes

From: Billy Verreynne <vslabs_at_onwe.co.za>
Date: 1998/02/23
Message-ID: <6cr7br$8p8$1@hermes.is.co.za>#1/1

A.E.Vervaeke wrote in message <6chlov$jvk_at_news9.noc.netcom.net>...

>Thomas Hilke wrote in message <34EC2A0B.7A44C536_at_dlh.de>...
>>When a client-session (on NT 4.0 Workstation) is killed, or the client
>>crashed, then
>>are the pq-servers on the hp still active.

>Parallel Query servers P001, P002, etc... stay around for awhile after a
>client process is killed. By setting the PARALLEL_SERVER_IDLE_TIME
parameter
>in the INIT.ORA to 0, the process should die immediately. But then, it just
>takes that much more system resource when it becomes necessary to start up
>another one. Your call.

Yes, but I have had the situation where one of the PQ slaves continues processing it's statement causing huge performamnce problems.

But first to answer the original posting. If a user front-end process terminates, Oracle will continue to run the query (PQ or not). Only when Oracle completes the query and attempts to return results, it discovers no-one on the other end of the network connection. The solution in this case is to enable DCD (dead connection detection) in the listener. The listener checks the connection as specified intervals. If it sees that the client has terminated its connection, it will inform Oracle to kill the processes for that client.

Now back to PQs that does not terminate properly.

PQ's are listed in the v$process table. When a PQ is used, Oracle creates an entry in the v$session table. So when you kill a query controller (i.e. user session running a parallel query), Oracle is suppose to kill the PQ sessions so that the PQ's used can go back to "idle mode". A couple of times on OPS 7.3.2 I had PQ slaves that were not "killed" by Oracle.

e.g. I had a create table with unrecoverable select running. I killed the user session (alter system kill session), made some changes and submitted the query again - not knowing that one of the PQ'd was not killed by Oracle. This PQ still locked the table name in the data dictionary, causing my 2nd create table to hang (I thought it was just slow). After about 20 hours I discovered this rogue PQ process. At that stage it has already done close to 500 million table rows gotten, heavens know trying to do what. Surely if it can not sync with the query controller it should terminate with an error?

The biggest problem IMHO is determining which v$session is a PQ and to which query controller it belongs to. Doing it manually it is possible to look at the PQ sql statements and other v$session related tables to determine to what query the PQ session belongs to. And it may even be ok to do it on a 10 node OPS, but it gets to be a real bitch if you're running a lot of OPS instances.

Another thing I learned the hard way is to *NEVER* use the Unix kill command to zap a SQL*Plus script. Rather go into v$session, find the session for that SQL*Plus script and use alter system kill session to terminate it.

regards,
Billy Received on Mon Feb 23 1998 - 00:00:00 CST