Re: Query with Bind variables hangs but hard coded literals works fine

From: Chris Taylor <christopherdtaylor1994_at_gmail.com>
Date: Tue, 26 Nov 2019 09:44:55 -0500
Message-ID: <CAP79kiT_hN54nYH-mPTs3stAsWmtgfhLq15s5mByq4v0Xrgsrw_at_mail.gmail.com>





There's a lot of good discussion points here, but I think maybe you're probably looking for a means to fix this while you investigate. And you have a few options here.
This is going to be a bit long but I wanted to give you some things that can help you get this SQL squared away and performing well that ONLY affect this SQL_ID.

(As far as the bad plan, the first execution of the SQL was probably parsed with poor representative bind variables and thus the original, and still cached plan, is a poor choice for the bind variables that get passed in for the SQL_ID under review)

Things you will need:
1. SQL_ID & PLAN_HASH_VALUE of the poor performing plan 2. SQL_ID & PLAN_HASH_VALUE of the good performing plan that uses your literals

NOTE: These require access to the SQL Tuning & Diagnostics pack so if you're org didn't pay for that pack, I wouldn't recommend these.

OPTION 1: Low Hanging Fruit Option
Attached is a SQL script to run the performance SQL Tuning advisor for the SQL_ID/BAD_PLAN . Run the advisor script and examine the plan advisor output. The output of the advisor is more easy to read if you check the Advisors in OEM or some other tool. SQLPLus output for the advisor is a little bit harder to decipher.

I've attached 2 sql scripts - one is for running the advisor while the SQL can still be found in the cache. The other is to run the advisor against the SQL_ID using dba_hist views and information contained in snapshots. Use the SQL_ID that has the bad plan (not your sql_id that has literals)

The first script is called "tune_sqlid_cached.sql" - it will return an error if the SQL_ID cannot be found.
The second script is called "tune_sqlid_using_hist_snapshots.sql" and will first run a query to find the min/max snap_ids that you'll need to plug into the 2nd part of the script.

Run one or the other (probably the 2nd one to get better coverage but it will likely run longer) then examine the output.

OPTION 2: Low Hanging Fruit - GrabAPlan - coe_xfr_sql_profile.sql (Carlos or someone wrote this)
If you have a database (and perhaps even this database) that you know has the "good" plan that was previously used, run the attached script to pull out the good plan and run the generated SQL script to create a SQL Profile in the environment that forces the good plan.

I've attached coe_xfr_sql_profile.sql here. It takes two inputs. SQL_ID and PLAN_HASH_VALUE you want to extract. It creates a SQL script that you run to store the SQL profile.
NOTE: In the attached version, FORCE_MATCH=>TRUE so any variants of the SQL_ID will still use this profile.

OPTION 3: Mildly Complicated Fruit - HackAPlan - coe_xfr_sql_profile.sql This option gives you the ability to modify a SQL_IDs extracted plan and replace it with a known good plan.

Run coe_xfr_sql_profile for the SQL_ID and BAD plan_hash_value. Rename the output to BAD_PLAN.sql
Run coe_xfr_sql_profile for the SQL_ID and GOOD plan_hash_value (that used your literals). Rename the output to GOOD_PLAN.sql Copy BAD_PLAN.sql to HACKED_PLAN.sql (don't modify the original bad plan script)

Open HACKED_PLAN.SQL and GOOD_PLAN.sql in two separate editors.

In HACKED_PLAN.SQL and search for the line that says: q'[BEGIN_OUTLINE_DATA]',

Starting at that line remove it and all lines down till you get to this line:
q'[END_OUTLINE_DATA]'); and remove this line

At this point you've removed this block of code from HACKED_PLAN.sql q'[BEGIN_OUTLINE_DATA]',
q'[END_OUTLINE_DATA]');

Open up GOOD_PLAN.sql and search for:
q'[BEGIN_OUTLINE_DATA]',

COPY that whole block down to the q'[END_OUTLINE_DATA]'); to clipboard

PASTE that block into HACKED_PLAN.SQL in the section where you removed the previous OUTLINE DATA. If there any empty lines in your newly modified / pasted code in HACKED_PLAN.SQL be sure to remove those (sometimes when pasting you get an empty line before or after).

Now your HACKED_PLAN.sql might something like this for the modified section:

<snip>
DBMS_LOB.CLOSE(sql_txt);
h := SYS.SQLPROF_ATTR(
q'[BEGIN_OUTLINE_DATA]',
..... a bunch of directives
q'[END_OUTLINE_DATA]');
:signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt); :signaturef := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt, TRUE); <snip>

When you're confident that the HACKED_PLAN.SQL looks correct, scroll down to the bottom and look for:

name => 'coe_938d6tvzfdpgp_1961197729', description => 'coe 938d6tvzfdpgp 1961197729 '||:signature||' '||:signaturef||'',

Give the name & description fields something meaningful. I give mine SQL_ID & PLAN_HASH_VALUE names so I know what this profile is for.

Save HACKED_PLAN.sql and execute it into your database and it should create the sql profile with your good_plan that was generated using literals but will match on the SQL_Text of the bad SQL_ID that uses bind variables.

Run your query from the vendor app and see if it uses the newly created profile. (Applies to all the above options).

If possible do this in a test environment first but that's not always possible.

If you need to disable or drop the SQL_PROFILE its easy.

One of these:
dbms_sqltune.alter_sql_profile(name=>'Your SQL_Profile_Name', 'STATUS','DISABLED'); --- to disable
dbms_sqltune.drop_sql_profile(name=>'Your SQL_Profile_Name'); -- to drop sql_profile

Finally, I *think* there was some discussion on this list a few weeks ago about a new feature to IGNORE PLAN? I think you can now flag a plan as a "bad plan" and tell Oracle to never use it again. You could look into that as well as I'm not familiar with it yet.

Thanks,
Chris

On Tue, Nov 26, 2019 at 7:46 AM Andy Sayer <andysayer_at_gmail.com> wrote:

> There's so many differences in these plans that it's hard to pick out
> what's important, that and the plans we've got to look at may have no
> relation to the real plans means we are just stabbing in the dark. I
> suggest you get the real execution plans with row source statistics so we
> can see what's really important here.

>

> One thing to note is that
> colloq_name like '%ACTUATOR%'
> Is not the same as
> colloq_name like '%ACTUATOR% '
>

> There's a possibility that the optimizer really won't like your LIKE :bind
> filter and that seems like the most likely problem to me. The optimizer
> would be right to fear this predicate as it can't be used at all to access
> an index (it can only be used to filter down what you've already read from
> the index as it starts with a wildcard). You can lie to the optimizer with
> the _like_with_bind_as_equality hidden parameter, but because you are
> starting with a wildcard, this should be treated the same as index cost
> adj = 1.
>

> Your first explain plan manages to use the index MSF12001I1 when it does
> this colloq_name filter, this is actually a little more clever than it
> immediately seems - it's doing two different scans with the colloq_name
> filter starting with two different points, the scan is on at least the
> third key of the index. One thing that has changed between 12.1.0.2 and
> 18.8 (well, 12.2.0.1 is when the change was made) is how the optimizer
> decides to do this - have a read of
> https://jonathanlewis.wordpress.com/2018/11/15/num_index_keys/ . This is
> something you can hint at, but I know that's not on the menu.
>

> You should be easily able to put together SQL Plan Baseline to force the
> SQL to execute using your desired plan, but you'll have to note that index
> filters are not included in the baseline - Oracle might decide to
> effectively do a full scan of the index rather than the two accesses.
>

> This is why I'd suggest going for a SQL Patch - you can create them
> straight from dbms_Sqldiag now with dbms_sqldiag.create_sql_patch passing
> either your sql_id or the sql_text.
>

> With that said, the first plan you shared with an index fast full scan and
> a full table scan hash joined together to produce an estimated 1 row seems
> quite unexpected. From what you've said, it sounds like joining MSF100
> and MSF170 shouldn't cut down many rows - I guess you've got some wildly
> different low and high value stats for stock_code in these tables? The
> second plan changes how it does this join into a nested loop plan which
> visits MSF100 twice, this could be particularly bad. I suggest you
> investigate why the optimizer thinks this join is such a strong filter.
>

> Hope this gets you started,
> Andy
>
>
>
>

> On Tue, 26 Nov 2019 at 10:12, Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
> wrote:
>
>>
>> One other point I forgot to mention.
>>
>> It's a little unusual to see "rownum < :bindvariable". Do the bind
>> variables take the form  :SYS_B_nnnn ?  This would indicate that they have
>> been created by Oracle in response to the cursor_sharing parameter being
>> set to something other than EXACT.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>> ________________________________________
>> From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on
>> behalf of Steve Wales (AddOns) <steve.wales_at_addonsinc.com>
>> Sent: 26 November 2019 01:23
>> To: oracle-l_at_freelists.org
>> Subject: Query with Bind variables hangs but hard coded literals works
>> fine
>>
>> Sorry for the length of this up front.   Trying to get as much relevant
>> info into the initial post as people might need to point me where I’m
>> getting the problem
>>
>> I have an Oracle 18c (18.8 if it matters) Standard Edition 2 database on
>> Linux 7..
>>
>> There’s a query from the ERP system that’s hanging the online screen and
>> getting http timeouts because it’s not completing in a timely fashion.
>>
>> I pulled the text of the query from v$sqltext and the bind variables from
>> v$sql_bind_capture.
>>
>> The query is searching a parts catalog for colloquial names for parts.
>>
>> If I take the query, replace the bind variables from the query with
>> string values and run it through SQLPLUS, it runs in about 0.7 seconds and
>> returns the expected values.
>>
>> If I instead leave the bind variables in the query, and in SQLPLUS define
>> variables and then assign values via exec :v1 := ‘SOMEVALUE’, then it sits
>> and spins 57 MINUTES before returning the exact same result set.
>>
>> The execution plans change slightly but significantly as well.
>>
>> The tables:
>> MSF100 is a stock catalog table.  Contains all stock items that the
>> business catalogs across all properties
>> MSF170 is a table that busts up the global catalog among assorted
>> business units
>> MSF120 contains the colloquial names.  If I search on “ACTUATOR” it can
>> return me all the stock codes that have “ACTUATOR” as a colloquial name.
>> I assume rownum <= 20 is specified because that’s how many rows fit on a
>> screen at a time.
>>
>> select * from( select * from MSF100 CATALOG where (CATALOG.stock_code in
>> (select MSF170Rec.stock_code from MSF170 MSF170Rec, MSF100 MSF100Rec
>> where (MSF100Rec.stock_code = MSF170Rec.stock_code))) and ( exists
>> (select 1 from MSF120 MSF120Rec where (MSF120Rec.colloq_code =
>> CATALOG.stock_code
>> and MSF120Rec.colloq_name like '%ACTUATOR%'  and MSF120Rec.colloq_type =
>> 'S' )) or  exists (select 1 from MSF120 MSF120Rec
>> where (MSF120Rec.colloq_code = CATALOG.template_id and
>> MSF120Rec.colloq_name like '%ACTUATOR%'  and MSF120Rec.colloq_type = 'I' )))
>> and (CATALOG.stock_status <> 'X' ) order by CATALOG.stock_code ) where
>> ROWNUM <= 20;
>>
>> When using the static values (and I apologize in advance for what
>> proportional fonts do the formatting below)
>>
>> (Estimated plans generated just by doing good old “explain plan for”)
>>
>>
>> ---------------------------------------------------------------------------------------------------
>> | Id  | Operation                    | Name       | Rows  | Bytes
>> |TempSpc| Cost (%CPU)| Time     |
>>
>> ---------------------------------------------------------------------------------------------------
>> |   0 | SELECT STATEMENT             |            |     1 |  3243 |
>>  |  5756   (1)| 00:00:01 |
>> |*  1 |  COUNT STOPKEY               |            |       |       |
>>  |            |          |
>> |   2 |   VIEW                       |            |     1 |  3243 |
>>  |  5756   (1)| 00:00:01 |
>> |*  3 |    SORT ORDER BY STOPKEY     |            |     1 |  3243 |
>>  |  5756   (1)| 00:00:01 |
>> |   4 |     VIEW                     | VM_NWVW_2  |     1 |  3243 |
>>  |  5756   (1)| 00:00:01 |
>> |   5 |      HASH UNIQUE             |            |     1 |   785 |
>>  |  5756   (1)| 00:00:01 |
>> |*  6 |       FILTER                 |            |       |       |
>>  |            |          |
>> |   7 |        NESTED LOOPS SEMI     |            |     1 |   785 |
>>  |  5751   (1)| 00:00:01 |
>> |*  8 |         HASH JOIN            |            |     1 |   775 |
>> 5632K|  5751   (1)| 00:00:01 |
>> |   9 |          INDEX FAST FULL SCAN| MSF170_PK  |   262K|  2559K|
>>  |   265   (1)| 00:00:01 |
>> |* 10 |          TABLE ACCESS FULL   | MSF100     | 74750 |    54M|
>>  |  2466   (1)| 00:00:01 |
>> |* 11 |         INDEX UNIQUE SCAN    | MSF100_PK  | 83735 |   817K|
>>  |     0   (0)| 00:00:01 |
>> |  12 |        INLIST ITERATOR       |            |       |       |
>>  |            |          |
>> |* 13 |         INDEX RANGE SCAN     | MSF12001I1 |     1 |    49 |
>>  |     4   (0)| 00:00:01 |
>>
>> ---------------------------------------------------------------------------------------------------
>>
>> Predicate Information (identified by operation id):
>> ---------------------------------------------------
>>
>>    1 - filter(ROWNUM<=20)
>>    3 - filter(ROWNUM<=20)
>>    6 - filter( EXISTS (SELECT 0 FROM "MSF120" "MSF120REC" WHERE
>>               ("MSF120REC"."COLLOQ_CODE"=:B1 AND
>> "MSF120REC"."COLLOQ_TYPE"='S' OR
>>               "MSF120REC"."COLLOQ_CODE"=:B2 AND
>> "MSF120REC"."COLLOQ_TYPE"='I') AND
>>               "MSF120REC"."COLLOQ_NAME" LIKE '%ACTUATOR%'))
>>    8 - access("CATALOG"."STOCK_CODE"="MSF170REC"."STOCK_CODE")
>>   10 - filter("CATALOG"."STOCK_STATUS"<>'X')
>>   11 - access("MSF100REC"."STOCK_CODE"="MSF170REC"."STOCK_CODE")
>>   13 - access(("MSF120REC"."COLLOQ_CODE"=:B1 AND
>> "MSF120REC"."COLLOQ_TYPE"='S' OR
>>               "MSF120REC"."COLLOQ_CODE"=:B2 AND
>> "MSF120REC"."COLLOQ_TYPE"='I'))
>>        filter("MSF120REC"."COLLOQ_NAME" LIKE '%ACTUATOR%')
>>
>>
>> When using bind variables (Yes, I know that the columns are all defined
>> as CHAR instead of VARCHAR2.  No, I don’t know why the vendor of this
>> application insists on presenting their database that way):
>>
>> variable v1 char(24 char)
>> variable v2 char(1 char)
>> variable v3 char(24 char)
>> variable v4 char(1 char)
>> variable v5 char(1 CHAR)
>> variable v6 number
>>
>> exec :v1 := '%ACTUATOR%          ';
>> exec :v2 := 'S';
>> exec :v3 := '%ACTUATOR%          ';
>> exec :v4 := 'I';
>> exec :v5 := 'X';
>> exec :v6 := 20;
>>
>> select * from( select * from MSF100 CATALOG where (CATALOG.stock_code in
>> (select MSF170Rec.stock_code from MSF170 MSF170Rec, MSF100 MSF100Rec
>> where (MSF100Rec.stock_code = MSF170Rec.stock_code))) and ( exists
>> (select 1 from MSF120 MSF120Rec where (MSF120Rec.colloq_code =
>> CATALOG.stock_code
>> and MSF120Rec.colloq_name like :v1  and MSF120Rec.colloq_type = :v2 ))
>> or  exists (select 1 from MSF120 MSF120Rec
>> where (MSF120Rec.colloq_code = CATALOG.template_id and
>> MSF120Rec.colloq_name like :v3  and MSF120Rec.colloq_type = :v4 )))
>> and (CATALOG.stock_status <> :v5 ) order by CATALOG.stock_code ) where
>> ROWNUM <= :v6;
>>
>>
>> ------------------------------------------------------------------------------------------
>> | Id  | Operation                   | Name       | Rows  | Bytes | Cost
>> (%CPU)| Time     |
>>
>> ------------------------------------------------------------------------------------------
>> |   0 | SELECT STATEMENT            |            |     1 |  3243 |
>>  192K  (1)| 00:00:08 |
>> |*  1 |  COUNT STOPKEY              |            |       |       |
>>     |          |
>> |   2 |   VIEW                      |            |     1 |  3243 |
>>  192K  (1)| 00:00:08 |
>> |*  3 |    SORT ORDER BY STOPKEY    |            |     1 |   755 |
>>  192K  (1)| 00:00:08 |
>> |*  4 |     FILTER                  |            |       |       |
>>     |          |
>> |   5 |      NESTED LOOPS SEMI      |            |     3 |  2265 |
>>  192K  (1)| 00:00:08 |
>> |*  6 |       TABLE ACCESS FULL     | MSF100     | 63173 |    45M|  2466
>>  (1)| 00:00:01 |
>> |   7 |       VIEW PUSHED PREDICATE | VW_NSO_1   |    12 |    24 |     3
>>  (0)| 00:00:01 |
>> |   8 |        NESTED LOOPS         |            |   262K|  5120K|     3
>>  (0)| 00:00:01 |
>> |*  9 |         INDEX UNIQUE SCAN   | MSF100_PK  |     1 |    10 |     1
>>  (0)| 00:00:01 |
>> |* 10 |         INDEX RANGE SCAN    | MSF17001I1 |   262K|  2560K|     2
>>  (0)| 00:00:01 |
>> |* 11 |      INDEX FAST FULL SCAN   | MSF120_PK  |     1 |    49 |   476
>>  (1)| 00:00:01 |
>>
>> ------------------------------------------------------------------------------------------
>>
>> Predicate Information (identified by operation id):
>> ---------------------------------------------------
>>
>>    1 - filter(ROWNUM<=TO_NUMBER(:V6))
>>    3 - filter(ROWNUM<=TO_NUMBER(:V6))
>>    4 - filter( EXISTS (SELECT 0 FROM "MSF120" "MSF120REC" WHERE
>>               "MSF120REC"."COLLOQ_TYPE"=:V2 AND "MSF120REC"."COLLOQ_NAME"
>> LIKE :V1 AND
>>               "MSF120REC"."COLLOQ_CODE"=:B1 OR
>> "MSF120REC"."COLLOQ_TYPE"=:V4 AND
>>               "MSF120REC"."COLLOQ_NAME" LIKE :V3 AND
>> "MSF120REC"."COLLOQ_CODE"=:B2))
>>    6 - filter("CATALOG"."STOCK_STATUS"<>:V5)
>>    9 - access("MSF100REC"."STOCK_CODE"="CATALOG"."STOCK_CODE")
>>   10 - access("MSF170REC"."STOCK_CODE"="CATALOG"."STOCK_CODE")
>>        filter("MSF100REC"."STOCK_CODE"="MSF170REC"."STOCK_CODE")
>>   11 - filter("MSF120REC"."COLLOQ_TYPE"=:V2 AND "MSF120REC"."COLLOQ_NAME"
>> LIKE
>>               :V1 AND "MSF120REC"."COLLOQ_CODE"=:B1 OR
>> "MSF120REC"."COLLOQ_TYPE"=:V4 AND
>>               "MSF120REC"."COLLOQ_NAME" LIKE :V3 AND
>> "MSF120REC"."COLLOQ_CODE"=:B2)
>>
>> The CPU cost on this second query is orders of magnitude greater than the
>> other way.  I can only assume that this is why my CPU on the server goes
>> into overdrive as it sits and thinks forever before the http server throws
>> it’s hands up and gets sick of waiting.
>>
>> This database was just upgraded to 18.8 from 12.1.0.2 over the weekend.
>> (Vendor hasn’t certified the product for 19c yet, hence the upgrade to 18c).
>> This was not a problem with 12.1.0.2 SE2.
>>
>> One last thing, since this is an ERP package, I can’t change the code /
>> insert hints / build new indexes (well I suppose I could do that but the
>> next release upgrade would blow it away anyway).   Any solution really has
>> to be about a config / parameter change (but if an index would work, I
>> suppose I’d be willing to try it as a work around for the time being).
>>
>> Thanks in advance for any pointers.  I’m hoping that there’s an init
>> parameter or something that was new in the 12.2 engine from 12.1 that might
>> be causing the problem.
>>
>> Steve
>>
>>
>> Disclaimer
>>
>> The information contained in this communication from the sender is
>> confidential. It is intended solely for use by the recipient and others
>> authorized to receive it. If you are not the recipient, you are hereby
>> notified that any disclosure, copying, distribution or taking action in
>> relation of the contents of this information is strictly prohibited and may
>> be unlawful.
>>
>> This email has been scanned for viruses and malware, and may have been
>> automatically archived.
>> --
>> http://www.freelists.org/webpage/oracle-l
>>
>>
>>







--
http://www.freelists.org/webpage/oracle-l


Received on Tue Nov 26 2019 - 15:44:55 CET

Original text of this message