Re: How do you like this SQL?

From: Mladen Gogala <gogala.mladen_at_gmail.com>
Date: Sat, 15 Dec 2012 19:57:11 +0000 (UTC)
Message-ID: <pan.2012.12.15.19.57.12_at_gmail.com>



On Sat, 15 Dec 2012 12:19:44 +0000, Jonathan Lewis wrote:

> It's one of the standard methods for "find the most recent occurrence"

This can be generalized to find the most numerous item as well, also frequently needed in the applications. And this generic method also inevitably leads to performance problems when the data grows over the threshold. The problem with the method is self join which needs to scan the table twice. The first stab at solving the problem would, in my opinion, look something like this:

WITH INNER AS

     SELECT PO_ID, LINE_ITEM_PO, COMMENT_PO, DATE_ADD_COMMENT,
      ROW_TIMESTAMP
      MAX(ROW_TIMESTAMP) OVER (PARTITION BY (PO_ID,LINE_ITEM_PO)) AS REC
      FROM PO_COMMENT 
      WHERE   TYPE_PO_COMMENT='LE')

SELECT PO_ID, LINE_ITEM_PO, COMMENT_PO, DATE_ADD_COMMENT FROM INNER
WHERE ROW_TIMESTAMP=REC Generally speaking, this would go through the table only once, which makes it faster. Nevertheless, this too will hit the limit when the table outgrows the threshold.

The trick that would solve the problem with the query would involve creating the new table, say PO_LATEST_COMMENT, containing the columns

PO_ID, LINE_ITEM_PO, TYPE_PO_COMMENT and ROW_TIMESTAMP

The table would be updated from the insert/update trigger on PO_COMMENT table and the most recent timestamp would be maintained.

Of course, this is a report, looks like rather big one to me, and the company is probably flexible about the schedule. This can be scheduled to run overnight or made into a materialized view which can be accessed during working hours and refreshed nightly. I have seen such thing several times and the trigger based solution seems to be the easiest one to implement.
Of course, there is a price to pay for triggers, too: no direct load. That can sometimes be a problem and the solution is to partition the big table and write specialized load scripts with doing multithreaded bulk loads, one per partition. I have never run across the situation in which such customized load scripts would not be fast enough.

-- 
Mladen Gogala
The Oracle Whisperer
http://mgogala.byethost5.com
Received on Sat Dec 15 2012 - 20:57:11 CET

Original text of this message