PL/SQL: cursors and loops, hang issue

From: Aya the Vampire Slayer <ryoko_at_gatech.rmv.this.part.edu>
Date: Tue, 21 Oct 2008 20:51:25 +0000 (UTC)
Message-ID: <gdlfc9$834$1@news-int2.gatech.edu>


I apologize in advance for being excessively wordy. :/

DB: Oracle Database 10.2.0.1.0, Enterprise Edition OS: MS Windows Server 2003, Enterprise Edition (5.2.3790), SP1 App: Visual Studio 2003 (7.1.6030) C#, ODP.Net 10.2.0.1.0

I had been tasked with fixing a pretty severe bug a few weeks ago in a couple of stored procedures in our application schema. The bug is fixed now and the procs functional, but in the process of implementing the fix I ran into an unexpected and (what I consider) strange speed/hang issue. I essentially want to know if anyone has some insight into why the proc hung based on their knowledge of how oracle handles cursors+loops+etc under the hood. Details below.

This particular database schema design has the concept of an anchor or parent table that is the container for multiple "versions" of a dataset. The 2 procedures in question take in an old version id and a new version id and each copy a different subset of that given version's data into the new "version" (e.g., selecting from and inserting into the same tables). All stored procs and tables etc are in the same SCHEMA. Also, this is a development database, and I'm the only one logged into it while this dev/testing is going on.

Headers (NOTE, names have been changed throughout): create or replace procedure PR_Copy_Groups (P_To_Version_ID in RAW,

                                            P_From_Version_ID in RAW)

create or replace procedure PR_Copy_Other_Groups (P_To_Version_ID in RAW,
                                                  P_From_Version_ID in RAW)

The "PR_Copy_Groups" copies one table's data, and the "PR_Copy_Other_Groups" copies 3 tables' deep worth of data, but the top-level table (first table) is defined exactly the same as the one table that "PR_Copy_Groups" handles. The bug was in the initial select statement only, for both procedures (the initial select is exactly the same in both procs, only the "from TABLE" is different between the two).

The old version of the procedures used "Open Cursor --> loop and insert row by row" to do the copying. Because "Other_Groups" is copying 3 tables deep worth of data, that does in fact mean it had 3 nested loops (yuck). Initially, I just changed the first select statement because that's all the bug was -- it was selecting the wrong data and then later inserting it -- and left the loop(s) alone (for now).

The select statement change for "PR_Copy_Groups" worked just fine in testing. After applying the fix to the "PR_Copy_Other_Groups" procedure and successfully compiling it, in testing the copying of "Other_Groups" started hanging and simply never finished, when it used to run/finish just fine before, even with the 3 loops and the old (incorrect) select statement. (I let it run for 30min and it never did finish)

So I tested the copying from the application by doing the following in the "PR_Copy_Other_Groups" proc:
 I commented out the entire body of the proc -- it finishes  I commented out the two inner loops -- it finishes  I commented out only the innermost loop -- it finishes  I uncommented all three loops with the new SELECT -- it does NOT finish  I go back to the old bad SELECT with all 3 loops -- it finishes

I am baffled about how that initial select can change a stored procedure that does easily finish running into one that does not at all, when the initial select is only run once, isn't really slow, and does in fact work properly (as is shown by PR_Copy_Groups and the commenting out tests). The dataset is also not very big (see below the pseudocode).

Here's an overview of how the proc initially was written, the fix that caused the problem, and the final solution, using pseudo-PL/SQL (note the PSEUDO here, to keep it from becoming unbearably long).

  • ( OLD ) ------------------- PROCEDURE SCHEMA.PR_Copy_Other_Groups ( ... ) BEGIN open Cur1 for select * from SCHEMA.TB_OTHER_GROUP where VERSION_ID = P_From_Version_ID
    ;

  loop until Cur1 done

  • IN param, out param PR_Copy_CLOB_Tbl_Entry( Old_CLOB_ID, New_CLOB_ID );

    INSERT INTO SCHEMA.TB_OTHER_GROUP ( ..., MY_CLOB_ID )     VALUES ( ..., from Cur1, New_CLOB_ID )     

    Open Cur2 for SELECT (...) from SCHEMA.TB_INNER1 [...];     loop until Cur2 done
      INSERT INTO SCHEMA.TB_INNER1 [...] VALUES (..., from Cur2);

      Open Cur3 for SELECT (...) from SCHEMA.TB_INNER2 [...];
      loop until Cur3 done
        INSERT INTO SCHEMA.TB_INNER2 [...] VALUES (..., from Cur3);
      end loop;

    end loop;
  end loop;
END;
  • ( FIX w/hang ) --------------- PROCEDURE SCHEMA.PR_Copy_Other_Groups ( ... ) BEGIN open Cur1 for select (some stuff), e1.ID ENTITY1_ID, e2.ID ENTITY2_ID, e3.ID ENTITY3_ID, e4.ID ENTITY4_ID from SCHEMA.TB_OTHER_GROUP g LEFT OUTER JOIN SCHEMA.TB_ENTITY oe1 on oe1.ID = g.ENTITY1_ID LEFT OUTER JOIN SCHEMA.TB_ENTITY e1 on e1.NAME = oe1.NAME and e1.VERSION_ID = P_To_Version_ID LEFT OUTER JOIN SCHEMA.TB_ENTITY oe2 on oe2.ID = g.ENTITY2_ID LEFT OUTER JOIN SCHEMA.TB_ENTITY e2 on e2.NAME = oe2.NAME and e2.VERSION_ID = P_To_Version_ID LEFT OUTER JOIN SCHEMA.TB_ENTITY oe3 on oe3.ID = g.ENTITY3_ID LEFT OUTER JOIN SCHEMA.TB_ENTITY e3 on e3.NAME = oe3.NAME and e3.VERSION_ID = P_To_Version_ID LEFT OUTER JOIN SCHEMA.TB_ENTITY oe4 on oe4.ID = g.ENTITY4_ID LEFT OUTER JOIN SCHEMA.TB_ENTITY e4 on e4.NAME = oe4.NAME and e4.VERSION_ID = P_To_Version_ID where g.VERSION_ID = P_From_Version_ID
    ;

  loop until Cur1 done
    [ same as above, nothing changed here ]   end loop;   

END;


  • ( FINAL FIX ) ---------------- PROCEDURE SCHEMA.PR_Copy_Other_Groups ( ... ) BEGIN INSERT INTO SCHEMA.TB_OTHER_GROUP ( ... ) SELECT ( ... ) from [ that big new select above ]
    ;

  INSERT INTO SCHEMA.TB_INNER1 ( ... )
    SELECT ( ... ) from SCHEMA.TB_INNER1     where VERSION_ID = P_From_Version_ID
;
  

  INSERT INTO SCHEMA.TB_INNER2 ( ... )
    SELECT ( ... ) from SCHEMA.TB_INNER2     whre VERSION_ID = P_From_Version_ID
;

END;



Row counts for the above tables:
 TB_GROUP: 2500
 TB_OTHER_GROUP: 1500
 TB_ENTITY: 200
 TB_INNER1: 2000
 TB_INNER2: 8000
------------------------------------------------

I can provide some actual code snippets with changed names where needed but all the code (tables + stored procs) for this whole thing is really a ton of code and and it would take hours to desensitize it all if I were to post everything relevant (the old Copy Other Groups proc is 200 lines by itself). Just ask if you'd like to see some particular snippet or xplans or init params or whatever.

Anyway, like I said earlier, I am hoping, for my own curiosity, someone here can provide some under-the-hood insight into how oracle processes cursors+loops (or whatever else) that would cause a previously-working procedure to hang just by changing an exterior (to the loops) SELECT statement that is only run once per procedure invocation.

Thanks for reading, if you managed to make it this far. :)

-- 
"Care must be exorcised when handring Opiticar System as it is apts to
be sticked by dusts and hand-fat." --Japanese Translators

"Keep your fingers off the lens." --Elton Byington, English Translator
Received on Tue Oct 21 2008 - 15:51:25 CDT

Original text of this message