Re: How does the FILTER operation work?

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Thu, 24 May 2018 12:32:34 +0000
Message-ID: <LO1P123MB09775B33411D69FBD109EFE3A56A0_at_LO1P123MB0977.GBRP123.PROD.OUTLOOK.COM>


The "not exists" subquery is correlated - it may have to run once for each row in photo, which is why the final cost of the query is roughly 21,211 * 380. (I don't know why it's not a lot closer, but there are indications that the optimizer has a "self-caching" adjustment in the calculation in recent versions of Oracle.)

It looks as if you have a uniqueness constraint on emplid since your "distinct" hasn't shown up as a "/hash distinct" at any point in the plan.

The speed of execution may also be affected by the fact that your tablescan has to scan a table which is growing because of the insert, so you may be applying a lot of undo as you go so that you don't see the changes you have previously made in the insert.

I can't explain the difference in the cost of the tablescan; is there anything in the slightest bit "non-standard" about the table ? (e.g. includes a LOB, nested-table, has more than 255 columns).

For reference here's an old article I wrote about the effect of filter subqueries - I don't think the workaround will help you though (that's assuming my comment about uniqueness is correct). You might want try forcing the optimizer to unnest the subquery (using the /*+ unnest */ hint in the subquery), but the table definitions may make it illegal.

Regards
Jonathan Lewis



From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on behalf of Paul Houghton <Paul.Houghton_at_uis.cam.ac.uk> Sent: 24 May 2018 13:06
To: Oracle Mailinglist
Subject: How does the FILTER operation work?

Oracle 12.1.0.2. Linux x86_64.

I was asked why the following SQL was running for so long (~6 minutes). I can see a number of problems with it, and suggested that a way to tune it would be only to select the required data. For my own interest I want to understand why the cost of the insert is so high compared to the cost of the table scans.

If I remove the "not exists" (see below), the access of the temp table disappears (as you would expect) and the insert cost goes right down, so it must be the filter that is the issue. However, I don't understand why it is so expensive. I can't find documentation about what it is doing. Surely it has to compare 21211 rows against the 1 row (Actually 0, but I understand the optimizer never says 0) returned by the not exists sub query. I would have thought this means that the cost should be more like the cost of accessing the data in the first place? Clearly the optimizer is correct as the query does take a long time to run.

Also, why is the cost of the full table scan of the photo table so different between the two queries?

The queries are below. I edited the table names, (There are two, photo and temp) and the id, hopefully consistently.

Photo.photo is a blob. Emplid and emplid_search are varchar2(11). Oprid is varchar2(30). TEMP has no indexes. Photo has an index on emplid.

Thanks in advance for your insights.

  INSERT INTO TEMP
  SELECT DISTINCT 'id',
    PHT.emplid
  FROM PHOTO PHT
  WHERE 1 =1
  AND PHT.PHOTO IS NOT NULL
  AND NOT EXISTS
    (SELECT 'x'
    FROM TEMP TMP
    WHERE PHT.EMPLID=TMP.EMPLID_SRCH
    AND TMP.OPRID = 'id'
    )
  ;


  | Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |

-------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 21210 | 3334K| 5802K (2)| 00:03:47 | | 1 | LOAD TABLE CONVENTIONAL | TEMP | | | | | |* 2 | FILTER | | | | | | |* 3 | TABLE ACCESS FULL | PHOTO | 21211 | 3334K| 313 (1)| 00:00:01 | |* 4 | TABLE ACCESS FULL | TEMP | 1 | 17 | 380 (2)| 00:00:01 |
-------------------------------------------------------------------------------------

  Predicate Information (identified by operation id):


   2 - filter( NOT EXISTS (SELECT 0 FROM " TEMP" "TMP" WHERE

              "TMP"."EMPLID_SRCH"=:B1 AND "TMP"."OPRID"='id'))
   3 - filter("PHT"."PHOTO" IS NOT NULL)
   4 - filter("TMP"."EMPLID_SRCH"=:B1 AND "TMP"."OPRID"='id')

...

  INSERT INTO TEMP
  SELECT DISTINCT 'id',
    PHT.emplid
  FROM PHOTO PHT
  WHERE 1 =1
  AND PHT.UC_PBI_PHOTO IS NOT NULL;


  | Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |

-------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 21211 | 3334K| 3659 (1)| 00:00:01 | | 1 | LOAD TABLE CONVENTIONAL | TEMP | | | | | |* 2 | TABLE ACCESS FULL | PHOTO | 21211 | 3334K| 3659 (1)| 00:00:01 |
-------------------------------------------------------------------------------------

  Predicate Information (identified by operation id):


     2 - filter("PHT"."PHOTO" IS NOT NULL)

--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l
Received on Thu May 24 2018 - 14:32:34 CEST

Original text of this message