Home » SQL & PL/SQL » SQL & PL/SQL » SQL Query re-write assistance requested (Oracle 9i)
SQL Query re-write assistance requested [message #326498] Wed, 11 June 2008 15:34 Go to next message
svguerin3
Messages: 44
Registered: November 2005
Location: TX
Member

I am needing to know the best way to go about re-writing a bad query at my workplace that needs to run at TOP performance due to very large amounts of data that will be run through it.

I have simplified the data and table to the following mock example:

ID SEQ_N IND
1 1 Y
1 2 N
2 1 N
2 2 N
3 1 N

I need a query that returns each ID, as well as the maximum SEQ_N for each ID. The tricky part is that if there is a "Y" in the indicator (IND) column, then use the maximum SEQ_N for those. If there is no "Y" indicator, then use the maximum SEQ_N for the "N" indicator column. So given the above example data, the query executed would return the following results:

ID MAX(SEQ_N)
1 1
2 2
3 1

I hope that makes sense.. The query we have running currently creates a string and gets the maximum from that. The inner part of the query looks like the following:

select id, max(decode(IND,'Y','1',0') || to_char(seq_n,'0000000009')) from table

And the outer query selects from this (using a substring and to_number) to get the sequence number back.. Very expensive; there must be a better way!

Thanks in advance!
Re: SQL Query re-write assistance requested [message #326511 is a reply to message #326498] Wed, 11 June 2008 16:53 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Do you have the flexibility to create a lookup table that would hold the MAX values & be maintained via a trigger?
Re: SQL Query re-write assistance requested [message #326513 is a reply to message #326511] Wed, 11 June 2008 17:06 Go to previous messageGo to next message
svguerin3
Messages: 44
Registered: November 2005
Location: TX
Member

Thanks for the quick response! Unfortunately we do not have this option of creating a lookup table (tight restrictions on this particular system) Sad It needs to be wrapped up in a single query if possible.

Thanks again!
Re: SQL Query re-write assistance requested [message #326589 is a reply to message #326513] Thu, 12 June 2008 01:30 Go to previous messageGo to next message
sarwagya
Messages: 87
Registered: February 2008
Location: Republic of Nepal
Member
Please try with this query. I tried it with few rows.
Let's hope it works for your data too.


CREATE TABLE test_table(id number(2),seq_n number(2),ind char(1));

	INSERT INTO test_table values(1,1,'Y');

	INSERT INTO test_table values(1,3,'Y');

	INSERT INTO test_table values(1,2,'N');

	INSERT INTO test_table values(2,1,'N');

	INSERT INTO test_table values(2,2,'N');

	INSERT INTO test_table values(3,1,'N');

	INSERT INTO test_table values(3,2,'Y');

	INSERT INTO test_table values(3,3,'Y');

	INSERT INTO test_table values(3,4,'N');


QUERY:
WITH DATA AS (
		SELECT ID,MAX(SEQ_N) M_SEQ_Y, NULL M_SEQ_N
		FROM TEST_TABLE
		WHERE IND = 'Y'
		AND SEQ_N IS NOT NULL
		GROUP BY ID
		UNION
		SELECT ID,NULL M_SEQ_Y,MAX(SEQ_N) M_SEQ_N
		FROM TEST_TABLE
		WHERE IND = 'N'
		AND SEQ_N IS NOT NULL
		GROUP BY ID
	)SELECT ID,M_SEQ_Y MAX_SEQ
	FROM DATA
	WHERE M_SEQ_Y IS NOT NULL
	UNION
	SELECT ID,M_SEQ_N MAX_SEQ
	FROM DATA OUTER
	WHERE M_SEQ_Y IS NULL
	AND ID NOT IN (SELECT ID FROM DATA WHERE M_SEQ_Y IS NOT NULL AND DATA.ID = OUTER.ID);


OUTPUT:
ID	MAX_SEQ
-----	-------
1	3
2	2
3	3

[Updated on: Thu, 12 June 2008 01:47]

Report message to a moderator

Re: SQL Query re-write assistance requested [message #326602 is a reply to message #326589] Thu, 12 June 2008 01:55 Go to previous messageGo to next message
Michel Cadot
Messages: 68643
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
I don't think OP's problem is the query but the performances and your query does not address this point.

Regards
Michel
Re: SQL Query re-write assistance requested [message #326652 is a reply to message #326602] Thu, 12 June 2008 02:59 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
Using the previous posts test setup, would this help:
select id,max_seq
from  (select id
             ,last_value(seq_n) over (partition by id order by ind,seq_n rows between unbounded preceding and unbounded following) max_seq
             ,row_number() over (partition by id order by null) rnum
       from   test_table)
where rnum = 1;
Re: SQL Query re-write assistance requested [message #326662 is a reply to message #326652] Thu, 12 June 2008 03:36 Go to previous messageGo to next message
sarwagya
Messages: 87
Registered: February 2008
Location: Republic of Nepal
Member
This seems a great query.

I am eager to know how it works. Please explain.

I tried to learn about LAST,ROW_NUMBER, etc, but couldn't understand properly.

[Updated on: Thu, 12 June 2008 03:36]

Report message to a moderator

Re: SQL Query re-write assistance requested [message #326666 is a reply to message #326662] Thu, 12 June 2008 03:47 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
They are caled
Analytic Functions


Put simply, they work out a value for a function based on a subset of the rows returned by the query.

There are 4 main parts to them

1) The function. This is the function that operates on the set of rows that you define. These are listed at the bottom of the link I gave above. LAST_VALUE returns the last value in an ordered set of rows. Row_Number takes an ordered set of rows, and gives them a number from 1..n

2) Partition Clause. This allows you to break down the total set of rows returned by the query into discrete subsets.
The function ROW_NUMBER() OVER (PARTION BY id ...) simply takes each set of rows that share the same ID, and gives each of them a unique row number.

3) Order By clause. This determines which order the rows are to be passed to the function in.
The function LAST_VALUE(seq_n) OVER (ORDER BY ind,seq_n) orders the set of rows so that all the rows with IND='N' come before the rows with IND='Y', and the rows with the highest Seq_n wll come last. The LAST_VALUE functino then picks the last value from this list.

4) Rows Between clause. This determines which set of rows the whole analytic function will operate on. The default is Unbounded Preceeding to Current Row, which means that the function will operate on all the current row and all the rows that were retrieved earlier.

[Mod-edit: Frank added missing bracket in URL-definition.]

[Updated on: Thu, 12 June 2008 04:44] by Moderator

Report message to a moderator

Re: SQL Query re-write assistance requested [message #326670 is a reply to message #326666] Thu, 12 June 2008 03:56 Go to previous messageGo to next message
sarwagya
Messages: 87
Registered: February 2008
Location: Republic of Nepal
Member
Thanks JRowBottom!

That was a helpful post.
Re: SQL Query re-write assistance requested [message #326680 is a reply to message #326498] Thu, 12 June 2008 04:33 Go to previous messageGo to next message
S.Rajaram
Messages: 1027
Registered: October 2006
Location: United Kingdom
Senior Member
Another way of doing it.

SQL> l
  1  with t
  2  as
  3  (
  4   select 1 id, 1 seq_n, 'Y' indicator from dual union all
  5   select 1, 2, 'N' from dual union all
  6   select 2, 1, 'N' from dual union all
  7   select 2, 2, 'N' from dual union all
  8   select 3, 1, 'N' from dual
  9  )
 10  select id, max(seq_n)
 11                 keep(dense_rank first order by decode(indicator,'Y',1,2)) max_seq
 12* from t group by id
SQL> /

        ID    MAX_SEQ
---------- ----------
         1          1
         2          2
         3          1



SQL> CREATE TABLE test_table(id number(2),seq_n number(2),ind char(1));

Table created.

SQL> INSERT INTO test_table values(1,1,'Y');

1 row created.

SQL>  INSERT INTO test_table values(1,3,'Y');
1 row created.

SQL>  INSERT INTO test_table values(1,2,'N');

1 row created.

SQL>  INSERT INTO test_table values(2,1,'N');

1 row created.

SQL>    INSERT INTO test_table values(2,2,'N');

1 row created.

SQL>    INSERT INTO test_table values(3,1,'N');

1 row created.

SQL>    INSERT INTO test_table values(3,2,'Y');

1 row created.

SQL>       INSERT INTO test_table values(3,3,'Y');

1 row created.

SQL>    INSERT INTO test_table values(3,4,'N');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from test_table;

        ID      SEQ_N I
---------- ---------- -
         1          1 Y
         1          3 Y
         1          2 N
         2          1 N
         2          2 N
         3          1 N
         3          2 Y
         3          3 Y
         3          4 N

9 rows selected.


SQL> l
  1  select id, max(seq_n)
  2                 keep(dense_rank first order by decode(ind,'Y',1,2)) max_seq
  3* from test_table group by id
SQL> /

        ID    MAX_SEQ
---------- ----------
         1          3
         2          2
         3          3

Regards

Raj
Re: SQL Query re-write assistance requested [message #326809 is a reply to message #326680] Thu, 12 June 2008 12:06 Go to previous messageGo to next message
svguerin3
Messages: 44
Registered: November 2005
Location: TX
Member

Thank you all for the great responses!! I have gone through them, as well as several others from Oracle forums, and formed 3 options for this query (below is what they actually look like). I ran tests of about 500k rows through each one, and the timings were about the same (~40-50s per query). I'm not very good at reading explain plans, so I'm a bit lost at which one to choose. Please let me know what you guys think the best choice should be (regarding performance above all else). If more info about indexes is needed, I can gladly supply it.

I will engage the DBA's on our side to analyze and apply hints to the resulting query to help speed it up even further. Thank you all for your help, this seemingly small change is huge for us!

/* OPTION 1 - 39s */
select co.ord_id,max(co.seq_n) keep(dense_rank first order by co.actv_chan_i desc,co.seq_n desc) seq_n
from t_fbsco_chan_ord co, t_fbsos_offr_stg os
where co.ord_id = os.ord_id
and os.run_id = 98889
group by co.ord_id

/* OPTION 2 - 38.5s */
select
co.ord_id, max(co.seq_n) keep(dense_rank first order by decode(co.actv_chan_i,'Y',1,2)) max_seq
from t_fbsco_chan_ord co, t_fbsos_offr_stg os
where co.ord_id = os.ord_id
and os.run_id = 98889
group by co.ord_id

/* OPTION 3 - */
select ord_id,max_seq
from (select co.ord_id
,last_value(co.seq_n) over (partition by co.ord_id
order by co.actv_chan_i,co.seq_n rows between unbounded preceding and unbounded following) max_seq
,row_number() over (partition by co.ord_id order by null) rnum
from t_fbsco_chan_ord co, t_fbsos_offr_stg os
where co.ord_id = os.ord_id
and os.run_id = 98889 )
where rnum = 1;
Re: SQL Query re-write assistance requested [message #327178 is a reply to message #326809] Sat, 14 June 2008 09:13 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
All of these queries do the same thing:
  • A full scan of the table
  • A sort

...which is exactly what the ORIGINAL query was doing.

I would be surprised if ANY of these was a material improvement.

The only way you are going to get a real improvement - short of restructuring your data model or hardware - is to eliminate either the full table scan or the sort.

How many rows per ID are there typically?

If it is more than 10, then you can probably avoid both the full table scan AND the sort.

Create an index on (ID, IND, SEQ_N) in that order and then use DBMS_STATS.GATHER_TABLE_STATS to gather statistics on the table AND the index.
Then create a BITMAP INDEX on just (ID). Gather stats for that one too.

Then try this query:
SELECT DISTINCT id
,      NVL(
        ( SELECT max(seq_n)
          FROM   tbl
          WHERE  id = t.id
          AND    ind = 'Y')
       ,( SELECT max(seq_n)
          FROM   tbl
          WHERE  id = t.id
          AND    ind = 'N')
       )
FROM   tbl t


CBO should go to the BITMAP index for the distinct list of IDs - that should be lightning fast. Then for each ID, it will pick the MAX(seq_n) for either Y or N as required from the index without scanning all rows for that ID.

If you have lots more than 10 rows per ID, this should work fine - otherwise it will fail miserably. Post the explain plan if you have lots of rows per ID and it is still not working.

If you have only a few rows per ID, then you cannot (and should not) avoid the full scan. But perhaps you can avoid the sort.

If your table is very large (say, millions of rows) then Oracle will not be able to perform the sort in memory - this generates disk IO. You can help it out by storing a sorted copy as an index.

There's a few way of doing that, but probably the simplest is to keep your ORIGINAL query, and create a function-based index on:
( ID, decode(IND,'Y','1',0') || to_char(SEQ_N,'0000000009')  )

Oracle should then use the index to sort the GROUP BY. Look for a GROUP BY (NOSORT) in the explain plan.

Ross Leishman
Re: SQL Query re-write assistance requested [message #327197 is a reply to message #327178] Sat, 14 June 2008 12:10 Go to previous messageGo to next message
svguerin3
Messages: 44
Registered: November 2005
Location: TX
Member

Thank you so much for the detailed explanation and suggestions! I'm so sorry I didn't post more information that would have been useful. Here is some additional info about these tables:

- There are about 80-90 million rows in our ID (t_fbsco_chan_ord) table
- There are never any more than 2 or 3 rows per ID. Usually it's just 1 or 2.

Based on the above information, do you think the original query is what we should stick to, and create the function-based index?

Thanks again for all your help, this is great information,
-Vince
Re: SQL Query re-write assistance requested [message #327565 is a reply to message #327197] Mon, 16 June 2008 22:53 Go to previous message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Yes, it's worth a try. Remember, you are looking for a NOSORT keyword in your explain plan. I haven't tried exploiting function-based indexes for sorting, but it should work. There was a thread here a while ago where they were having difficulty using a function-based index to support an ORDER BY DESC, so I don't know how you'll go.

Ross Leishman
Previous Topic: query to return entries with maximum column value
Next Topic: sql join
Goto Forum:
  


Current Time: Tue Apr 23 21:10:37 CDT 2024