Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: Newbie Query Question

Re: Newbie Query Question

From: JES <jeswoff_at_sbcglobal.net>
Date: Wed, 14 Apr 2004 23:28:59 GMT
Message-ID: <%Ojfc.16669$xd3.4562@newssvr22.news.prodigy.com>


Thanks Douglas,

I have much to learn. I am definitely going to give the pivot table a try. I just wanted to note that in my original post I was posting more of a pseudo code to give the reader an idea of what I was doing; sorry, I was so sloppy with my syntax. Of the two methods originally posted, I also wanted to point out that the method that currently works the quickest for me is Method 2. If it makes any difference, I did leave off a set of WHERE statements that I believe are -

WHERE tbl1.DATE = tbl1_tmp.DATE AND

              tbl1.DATE = tbl2_tmp.DATE AND
              tbl1.DATE = tbl3_tmp.DATE

Maybe I am just getting lucky that I am getting data out of this query? Essentially the jist of Method 2 in my newbie mind is that instead of having a select statement in the SELECT block I put the select statement in the FROM block to make three tables where tbl1_tmp only has DATANAME1 data, tbl2_tmp only has DATANAME2 data, etc...

Another question...
Would the analysis change significantly if tbl1 and its associated _tmp objects are actually a view of a raw data table? I have made a temporary table to hold the data from the view and then referenced the table instead but did not see significant time savings.

Thanks,

JES
"Douglas Hawthorne" <douglashawthorne_at_yahoo.com.au> wrote in message news:3Qqec.5251$ED.1237_at_news-server.bigpond.net.au...
> "JES" <nausadge_at_yahoo.com> wrote in message
> news:xu1ec.14590$mP1.6039_at_newssvr22.news.prodigy.com...
> > I have a table that is "long" and I must transform the table into a
"wide"
> > table. For example, the long table is DATE, DATAVALUE, DATANAME (where
> > there is one record for each measured DATAVALUE and where DATANAME comes
> > from pre-defined list of names) and I need to transform this table into
> > DATE, DATANAME1, DATANAME2, DATANAME3, ETC, where I get one record for
> each
> > unique DATE and where DATAVALUE from the original table is assigned to
the
> > corresponding DATANAME column in the "wide" table.
> >
> > I have done this two ways, and one way is much slower in Oracle than the
> > other. Why? Any suggestions for another method? Both methods seem to
> run
> > quickly in Access...
> >
> > Method 1. Something like the following...
> >
> > SELECT DATE AS DATE,
> > (SELECT DATAVALUE
> > FROM tbl1_tmp
> > WHERE tbl1.DATE = tbl1_tmp.DATE AND
> > WHERE DATANAME = 'DATANAME1') AS DATANAME1,
> > (SELECT DATAVALUE
> > FROM tbl1_tmp
> > WHERE tbl1.DATE = tbl1_tmp.DATE AND
> > DATANAME = 'DATANAME2') AS DATANAME2,
> > (SELECT DATAVALUE
> > FROM tbl1_tmp
> > WHERE tbl1.DATE = tbl1_tmp.DATE AND
> > WHERE DATANAME = 'DATANAME3') AS DATANAME3
> >
> > FROM tbl1
> >
> > Method 2. And something like the following
> >
> > SELECT DATE AS DATE,
> > SELECT tbl1_tmp1.DATAVALUE AS DATANAME1
> > SELECT tbl1_tmp2.DATAVALUE AS DATANAME2
> > SELECT tbl1_tmp3.DATAVALUE AS DATANAME3
> >
> > FROM
> > (SELECT DATAVALUE
> > FROM tbl1_tmp
> > WHERE tbl1.DATE = tbl1_tmp.DATE AND
> > WHERE DATANAME = 'DATANAME1') tbl1_tmp1,
> >
> > (SELECT DATAVALUE
> > FROM tbl1_tmp
> > WHERE tbl1.DATE = tbl1_tmp.DATE AND
> > DATANAME = 'DATANAME2') tbl1_tmp2,
> >
> > (SELECT DATAVALUE
> > FROM tbl1_tmp
> > WHERE tbl1.DATE = tbl1_tmp.DATE AND
> > WHERE DATANAME = 'DATANAME3') tbl1_tmp3,
> >
> > tbl1
> >
> >
> > Thanks,
> >
> > JES
> >
> >
> >
>
> JES,
>
> Method #1 has a couple of syntax errors which were easily fixed. However,
> method #2 is not valid SQL because you are mixing a correlated subquery
with
> a Cartesian product.
>
> In summary, the biggest improvement for method #1 came from putting a
> primary key constraint on the TBL1_TMP table with a reduction in elapsed
> time from 349.26 seconds to 2.46 seconds (a reduction of 99.55%). The
best
> time I could get with the test data was 0.82 seconds using a pivot query
on
> a heap table.
>
> The technique you need to master is that of pivot query. For an
> explanation, see "Pivot Query" on pp.576 to 582 of "Expert One-on-One
> Oracle" by Thomas Kyte (A-Press:2003). You could also search
> http://asktom.oracle.com for "pivot query".
>
> These tests were run under 10.1.0.2 on WinXP Pro.
>
> (1) Creation of Test Data
> =========================
>
> Let's create some test data:
> CREATE TABLE tbl1_tmp AS
> SELECT
> TRUNC( SYSDATE ) + FLOOR( rownum / 3 ) AS datadate,
> DECODE( MOD( rownum, 3),
> 0, 'DATANAME1',
> 1, 'DATANAME2',
> 2, 'DATANAME3'
> ) AS dataname,
> rownum AS datavalue
> FROM
> all_objects
> ;
>
> I could not call the first column, 'DATE', because it is a data type. The
> table structure is:
> SQL> desc tbl1_tmp
> Name Null? Type
> ------------------- -------- ------------------
> DATADATE DATE
> DATANAME VARCHAR2(9)
> DATAVALUE NUMBER
>
> The date table (TBL1) was created as follows:
> CREATE TABLE tbl1 AS SELECT DISTINCT datadate FROM tbl1_tmp;
>
> I collected statistics for the tables:
> SQL> exec dbms_stats.gather_table_stats( USER, 'TBL1' )
> SQL> exec dbms_stats.gather_table_stats( USER, 'TBL1_TMP' )
>
> To get the time taken for a query,
> SQL> SET TIMING ON
>
>
> I turned SQL tracing on with:
> ALTER SESSION SET timed_statistics=true;
> ALTER SESSION SET sql_trace=true;
>
> (2) Original Query
> ==================
>
> I then corrected some errors in your first query to produce:
> SELECT datadate,
> (SELECT DATAVALUE
> FROM tbl1_tmp
> WHERE tbl1.datadate = tbl1_tmp.datadate AND
> DATANAME = 'DATANAME1') AS DATANAME1,
> (SELECT DATAVALUE
> FROM tbl1_tmp
> WHERE tbl1.datadate = tbl1_tmp.datadate AND
> DATANAME = 'DATANAME2') AS DATANAME2,
> (SELECT DATAVALUE
> FROM tbl1_tmp
> WHERE tbl1.datadate = tbl1_tmp.datadate AND
> DATANAME = 'DATANAME3') AS DATANAME3
> FROM tbl1
> ;
>
> The results were as follows:
> 13035 rows selected.
> Elapsed: 00:05:49.26
>
> The TKPROF output shows:
> call count cpu elapsed disk query current
> rows
> ------- ------ -------- ---------- ---------- ---------- ---------- ----

--

> ----
> Parse 1 0.31 0.32 0 0 0
> 0
> Execute 1 0.00 0.00 0 0 0
> 0
> Fetch 870 347.29 347.95 0 6062169 0
> 13035
> ------- ------ -------- ---------- ---------- ---------- ---------- ----
--
> ----
> total 872 347.60 348.28 0 6062169 0
> 13035
>
> Misses in library cache during parse: 1
> Optimizer mode: ALL_ROWS
> Parsing user id: 73
>
> Rows Row Source Operation
> ------- ---------------------------------------------------
> 13034 TABLE ACCESS FULL TBL1_TMP (cr=2020425 pr=0 pw=0 time=115302886
us)
> 13034 TABLE ACCESS FULL TBL1_TMP (cr=2020425 pr=0 pw=0 time=115816431
us)
> 13034 TABLE ACCESS FULL TBL1_TMP (cr=2020425 pr=0 pw=0 time=115808387
us)
> 13035 TABLE ACCESS FULL TBL1 (cr=894 pr=0 pw=0 time=91381 us)
>
> The main point to note is that we are doing in excess of 6 million logical
> I/O's to process this query because each correlated subquery requires a
> table scan.
>
> (3) Original Query with PK
> ==========================
>
> I then added a primary key constraint (TBL1_TMP_PK) for DATADATE and
> DATANAME because the nature of the first query assumes that the correlated
> subqueries return a single value:
> ALTER TABLE tbl1_tmp
> ADD CONSTRAINT tbl1_tmp_pk
> PRIMARY KEY (datadate, dataname)
> ;
>
> I collected statistics for the index:
> SQL> exec dbms_stats.gather_index_stats( USER, 'TBL1_TMP_PK' )
>
> The query was rerun after statistics were collected. The results were as
> follows:
> 13035 rows selected.
> Elapsed: 00:00:02.46
>
> The TKPROF output showed:
> call count cpu elapsed disk query current
> rows
> ------- ------ -------- ---------- ---------- ---------- ---------- ----
--
> ----
> Parse 1 0.21 0.24 0 0 0
> 0
> Execute 1 0.00 0.00 0 0 0
> 0
> Fetch 870 1.00 1.24 177 81711 0
> 13035
> ------- ------ -------- ---------- ---------- ---------- ---------- ----
--
> ----
> total 872 1.21 1.49 177 81711 0
> 13035
>
> Misses in library cache during parse: 1
> Optimizer mode: ALL_ROWS
> Parsing user id: 73
>
> Rows Row Source Operation
> ------- ---------------------------------------------------
> 13034 TABLE ACCESS BY INDEX ROWID TBL1_TMP (cr=26939 pr=52 pw=0
> time=322452 us)
> 13034 INDEX UNIQUE SCAN TBL1_TMP_PK (cr=13905 pr=52 pw=0 time=177941
> us)(object id 54039)
> 13034 TABLE ACCESS BY INDEX ROWID TBL1_TMP (cr=26939 pr=50 pw=0
> time=290687 us)
> 13034 INDEX UNIQUE SCAN TBL1_TMP_PK (cr=13905 pr=50 pw=0 time=159581
> us)(object id 54039)
> 13034 TABLE ACCESS BY INDEX ROWID TBL1_TMP (cr=26939 pr=51 pw=0
> time=283721 us)
> 13034 INDEX UNIQUE SCAN TBL1_TMP_PK (cr=13905 pr=51 pw=0 time=153145
> us)(object id 54039)
> 13035 TABLE ACCESS FULL TBL1 (cr=894 pr=24 pw=0 time=70478 us)
>
> Now the logical I/O's have dropped from 6 million down to 81,711 and
> achieved a reduction in elapsed time of 99.6%.
>
> (4) Original Query with IOT
> ===========================
>
> In an attempt to improve things even further, I created an index organised
> table (IOT):
> CREATE TABLE tbl1_tmp_iot
> (
> datadate DATE,
> dataname VARCHAR2(9),
> datavalue NUMBER,
> CONSTRAINT tbl1_tmp_iot_pk
> PRIMARY KEY ( datadate, dataname )
> )
> ORGANIZATION INDEX
> ;
>
> I populated the table as follows:
> INSERT INTO tbl1_tmp_iot SELECT * FROM tbl1_tmp;
> COMMIT;
>
> I collected the statistics:
> SQL> exec dbms_stats.gather_index_stats( USER, 'TBL1_TMP_IOT_PK' )
>
> SELECT datadate,
> (SELECT DATAVALUE
> FROM tbl1_tmp_iot
> WHERE tbl1.datadate = tbl1_tmp_iot.datadate AND
> DATANAME = 'DATANAME1') AS DATANAME1,
> (SELECT DATAVALUE
> FROM tbl1_tmp_iot
> WHERE tbl1.datadate = tbl1_tmp_iot.datadate AND
> DATANAME = 'DATANAME2') AS DATANAME2,
> (SELECT DATAVALUE
> FROM tbl1_tmp_iot
> WHERE tbl1.datadate = tbl1_tmp_iot.datadate AND
> DATANAME = 'DATANAME3') AS DATANAME3
> FROM tbl1
> ;
>
> The elapsed time is 00:00:01.29 seconds. The TKPROF output is:
> call count cpu elapsed disk query current
> rows
> ------- ------ -------- ---------- ---------- ---------- ---------- ----
--
> ----
> Parse 1 0.00 0.00 0 0 0
> 0
> Execute 1 0.00 0.00 0 0 0
> 0
> Fetch 870 0.70 0.66 0 42609 0
> 13035
> ------- ------ -------- ---------- ---------- ---------- ---------- ----
--
> ----
> total 872 0.70 0.67 0 42609 0
> 13035
>
> Misses in library cache during parse: 1
> Optimizer mode: ALL_ROWS
> Parsing user id: 73
>
> Rows Row Source Operation
> ------- ---------------------------------------------------
> 13034 INDEX UNIQUE SCAN TBL1_TMP_IOT_PK (cr=13905 pr=0 pw=0 time=167930
> us)(object id 54041)
> 13034 INDEX UNIQUE SCAN TBL1_TMP_IOT_PK (cr=13905 pr=0 pw=0 time=130540
> us)(object id 54041)
> 13034 INDEX UNIQUE SCAN TBL1_TMP_IOT_PK (cr=13905 pr=0 pw=0 time=125185
> us)(object id 54041)
> 13035 TABLE ACCESS FULL TBL1 (cr=894 pr=0 pw=0 time=52274 us)
>
> Now we have almost halfed the number of logical I/O's by eliminating the
> table lookup for the value of the DATAVALUE column.
>
> (5) Pivot Query with PK
> =======================
>
> I took a modified version of Thomas Kyte's query from p.577 to produce the
> following query:
> SELECT
> datadate,
> MAX(
> DECODE(
> dataname,
> 'DATANAME1', datavalue
> )
> ) AS dataname1,
> MAX(
> DECODE(
> dataname,
> 'DATANAME2', datavalue
> )
> ) AS dataname2,
> MAX(
> DECODE(
> dataname,
> 'DATANAME3', datavalue
> )
> ) AS dataname3
> FROM
> tbl1_tmp
> GROUP BY
> datadate
> ;
>
> The elapsed time is 00:00:00.82 seconds and the TKPROF output shows:
> call count cpu elapsed disk query current
> rows
> ------- ------ -------- ---------- ---------- ---------- ---------- ----
--
> ----
> Parse 1 0.00 0.00 0 0 0
> 0
> Execute 1 0.00 0.00 0 0 0
> 0
> Fetch 870 0.12 0.33 110 155 2
> 13035
> ------- ------ -------- ---------- ---------- ---------- ---------- ----
--
> ----
> total 872 0.12 0.34 110 155 2
> 13035
>
> Misses in library cache during parse: 1
> Optimizer mode: ALL_ROWS
> Parsing user id: 73
>
> Rows Row Source Operation
> ------- ---------------------------------------------------
> 13035 SORT GROUP BY (cr=155 pr=110 pw=110 time=278318 us)
> 39102 TABLE ACCESS FULL TBL1_TMP (cr=155 pr=0 pw=0 time=78327 us)
>
> (6) Pivot Query with IOT
> ========================
>
> I used the above pivot query on the IOT:
> SELECT
> datadate,
> MAX(
> DECODE(
> dataname,
> 'DATANAME1', datavalue
> )
> ) AS dataname1,
> MAX(
> DECODE(
> dataname,
> 'DATANAME2', datavalue
> )
> ) AS dataname2,
> MAX(
> DECODE(
> dataname,
> 'DATANAME3', datavalue
> )
> ) AS dataname3
> FROM
> tbl1_tmp_iot
> GROUP BY
> datadate
> ;
>
> The elapsed time is 00:00:01.62 seconds, and the TKPROF output shows:
> call count cpu elapsed disk query current
> rows
> ------- ------ -------- ---------- ---------- ---------- ---------- ----
--
> ----
> Parse 1 0.00 0.00 0 0 0
> 0
> Execute 1 0.00 0.00 0 0 0
> 0
> Fetch 870 0.21 0.17 0 1013 0
> 13035
> ------- ------ -------- ---------- ---------- ---------- ---------- ----
--
> ----
> total 872 0.21 0.17 0 1013 0
> 13035
>
> Misses in library cache during parse: 1
> Optimizer mode: ALL_ROWS
> Parsing user id: 73
>
> Rows Row Source Operation
> ------- ---------------------------------------------------
> 13035 SORT GROUP BY NOSORT (cr=1013 pr=0 pw=0 time=183040 us)
> 39102 INDEX FULL SCAN TBL1_TMP_IOT_PK (cr=1013 pr=0 pw=0 time=117988
> us)(object id 54041)
>
> (7) Overview of Pivot Query
> ===========================
>
> The key to pivot queries is the use of the comibination of the MAX and
> DECODE functions with the GROUP BY clause.
>
> To see how the process of pivoting works, lets look at the intermediate
step
> (without the MAX function and GROUP BY clause):
> SELECT
> datadate,
> DECODE(
> dataname,
> 'DATANAME1', datavalue
> ) AS dataname1,
> DECODE(
> dataname,
> 'DATANAME2', datavalue
> ) AS dataname2,
> DECODE(
> dataname,
> 'DATANAME3', datavalue
> ) AS dataname3
> FROM
> tbl1_tmp
> WHERE
> rownum <= 5
> /
>
> DATADATE DATANAME1 DATANAME2 DATANAME3
> --------- ---------- ---------- ----------
> 11-APR-04 1
> 11-APR-04 2
> 12-APR-04 3
> 12-APR-04 4
> 12-APR-04 5
>
> You will note that rows #3, #4, and #5 have placed the value of the
> DATAVALUE column into different columns depending upon the value of the
> DATANAME column. The blank entries are really NULLs.
>
> To condense these groups of three (3) rows into a single row requires the
> use of the MAX function in conjunction with the GROUP BY clause.
>
>
> Douglas Hawthorne
>
>
Received on Wed Apr 14 2004 - 18:28:59 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US