Re: Newbie Query Question

From: JES <jeswoff_at_sbcglobal.net>
Date: Wed, 14 Apr 2004 23:28:59 GMT
Message-ID: <%Ojfc.16669$xd3.4562@newssvr22.news.prodigy.com>

Thanks Douglas,

I have much to learn. I am definitely going to give the pivot table a try. I just wanted to note that in my original post I was posting more of a pseudo code to give the reader an idea of what I was doing; sorry, I was so sloppy with my syntax. Of the two methods originally posted, I also wanted to point out that the method that currently works the quickest for me is Method 2. If it makes any difference, I did leave off a set of WHERE statements that I believe are -

WHERE tbl1.DATE = tbl1_tmp.DATE AND

              tbl1.DATE = tbl2_tmp.DATE AND
              tbl1.DATE = tbl3_tmp.DATE

Maybe I am just getting lucky that I am getting data out of this query? Essentially the jist of Method 2 in my newbie mind is that instead of having a select statement in the SELECT block I put the select statement in the FROM block to make three tables where tbl1_tmp only has DATANAME1 data, tbl2_tmp only has DATANAME2 data, etc...

Another question...
Would the analysis change significantly if tbl1 and its associated _tmp objects are actually a view of a raw data table? I have made a temporary table to hold the data from the view and then referenced the table instead but did not see significant time savings.

Thanks,

JES
"Douglas Hawthorne" <douglashawthorne_at_yahoo.com.au> wrote in message news:3Qqec.5251$ED.1237_at_news-server.bigpond.net.au...
> "JES" <nausadge_at_yahoo.com> wrote in message
> news:xu1ec.14590$mP1.6039_at_newssvr22.news.prodigy.com...
> > I have a table that is "long" and I must transform the table into a
"wide"
> > table. For example, the long table is DATE, DATAVALUE, DATANAME (where
> > there is one record for each measured DATAVALUE and where DATANAME comes
> > from pre-defined list of names) and I need to transform this table into
> > DATE, DATANAME1, DATANAME2, DATANAME3, ETC, where I get one record for
> each
> > unique DATE and where DATAVALUE from the original table is assigned to
the
> > corresponding DATANAME column in the "wide" table.
> >
> > I have done this two ways, and one way is much slower in Oracle than the
> > other. Why? Any suggestions for another method? Both methods seem to
> run
> > quickly in Access...
> >
> > Method 1. Something like the following...
> >
> > SELECT DATE AS DATE,
> > (SELECT DATAVALUE
> > FROM tbl1_tmp
> > WHERE tbl1.DATE = tbl1_tmp.DATE AND
> > WHERE DATANAME = 'DATANAME1') AS DATANAME1,
> > (SELECT DATAVALUE
> > FROM tbl1_tmp
> > WHERE tbl1.DATE = tbl1_tmp.DATE AND
> > DATANAME = 'DATANAME2') AS DATANAME2,
> > (SELECT DATAVALUE
> > FROM tbl1_tmp
> > WHERE tbl1.DATE = tbl1_tmp.DATE AND
> > WHERE DATANAME = 'DATANAME3') AS DATANAME3
> >
> > FROM tbl1
> >
> > Method 2. And something like the following
> >
> > SELECT DATE AS DATE,
> > SELECT tbl1_tmp1.DATAVALUE AS DATANAME1
> > SELECT tbl1_tmp2.DATAVALUE AS DATANAME2
> > SELECT tbl1_tmp3.DATAVALUE AS DATANAME3
> >
> > FROM
> > (SELECT DATAVALUE
> > FROM tbl1_tmp
> > WHERE tbl1.DATE = tbl1_tmp.DATE AND
> > WHERE DATANAME = 'DATANAME1') tbl1_tmp1,
> >
> > (SELECT DATAVALUE
> > FROM tbl1_tmp
> > WHERE tbl1.DATE = tbl1_tmp.DATE AND
> > DATANAME = 'DATANAME2') tbl1_tmp2,
> >
> > (SELECT DATAVALUE
> > FROM tbl1_tmp
> > WHERE tbl1.DATE = tbl1_tmp.DATE AND
> > WHERE DATANAME = 'DATANAME3') tbl1_tmp3,
> >
> > tbl1
> >
> >
> > Thanks,
> >
> > JES
> >
> >
> >
>
> JES,
>
> Method #1 has a couple of syntax errors which were easily fixed. However,
> method #2 is not valid SQL because you are mixing a correlated subquery
with
> a Cartesian product.
>
> In summary, the biggest improvement for method #1 came from putting a
> primary key constraint on the TBL1_TMP table with a reduction in elapsed
> time from 349.26 seconds to 2.46 seconds (a reduction of 99.55%). The
best
> time I could get with the test data was 0.82 seconds using a pivot query
on
> a heap table.
>
> The technique you need to master is that of pivot query. For an
> explanation, see "Pivot Query" on pp.576 to 582 of "Expert One-on-One
> Oracle" by Thomas Kyte (A-Press:2003). You could also search
> http://asktom.oracle.com for "pivot query".
>
> These tests were run under 10.1.0.2 on WinXP Pro.
>
> (1) Creation of Test Data
> =========================
>
> Let's create some test data:
> CREATE TABLE tbl1_tmp AS
> SELECT
> TRUNC( SYSDATE ) + FLOOR( rownum / 3 ) AS datadate,
> DECODE( MOD( rownum, 3),
> 0, 'DATANAME1',
> 1, 'DATANAME2',
> 2, 'DATANAME3'
> ) AS dataname,
> rownum AS datavalue
> FROM
> all_objects
> ;
>
> I could not call the first column, 'DATE', because it is a data type. The
> table structure is:
> SQL> desc tbl1_tmp
> Name Null? Type
> ------------------- -------- ------------------
> DATADATE DATE
> DATANAME VARCHAR2(9)
> DATAVALUE NUMBER
>
> The date table (TBL1) was created as follows:
> CREATE TABLE tbl1 AS SELECT DISTINCT datadate FROM tbl1_tmp;
>
> I collected statistics for the tables:
> SQL> exec dbms_stats.gather_table_stats( USER, 'TBL1' )
> SQL> exec dbms_stats.gather_table_stats( USER, 'TBL1_TMP' )
>
> To get the time taken for a query,
> SQL> SET TIMING ON
>
>
> I turned SQL tracing on with:
> ALTER SESSION SET timed_statistics=true;
> ALTER SESSION SET sql_trace=true;
>
> (2) Original Query
> ==================
>
> I then corrected some errors in your first query to produce:
> SELECT datadate,
> (SELECT DATAVALUE
> FROM tbl1_tmp
> WHERE tbl1.datadate = tbl1_tmp.datadate AND
> DATANAME = 'DATANAME1') AS DATANAME1,
> (SELECT DATAVALUE
> FROM tbl1_tmp
> WHERE tbl1.datadate = tbl1_tmp.datadate AND
> DATANAME = 'DATANAME2') AS DATANAME2,
> (SELECT DATAVALUE
> FROM tbl1_tmp
> WHERE tbl1.datadate = tbl1_tmp.datadate AND
> DATANAME = 'DATANAME3') AS DATANAME3
> FROM tbl1
> ;
>
> The results were as follows:
> 13035 rows selected.
> Elapsed: 00:05:49.26
>
> The TKPROF output shows:
> call count cpu elapsed disk query current
> rows
> ------- ------ -------- ---------- ---------- ---------- ---------- ----

--


> ----

> Parse        1      0.31       0.32          0          0          0

> 0

> Execute      1      0.00       0.00          0          0          0

> 0

> Fetch      870    347.29     347.95          0    6062169          0

> 13035

> ------- ------  -------- ---------- ---------- ---------- ----------  ----

--


> ----

> total      872    347.60     348.28          0    6062169          0

> 13035

>

> Misses in library cache during parse: 1

> Optimizer mode: ALL_ROWS

> Parsing user id: 73

>

> Rows     Row Source Operation

> -------  ---------------------------------------------------

>   13034  TABLE ACCESS FULL TBL1_TMP (cr=2020425 pr=0 pw=0 time=115302886

us)


>   13034  TABLE ACCESS FULL TBL1_TMP (cr=2020425 pr=0 pw=0 time=115816431

us)


>   13034  TABLE ACCESS FULL TBL1_TMP (cr=2020425 pr=0 pw=0 time=115808387

us)


>   13035  TABLE ACCESS FULL TBL1 (cr=894 pr=0 pw=0 time=91381 us)

>

> The main point to note is that we are doing in excess of 6 million logical

> I/O's to process this query because each correlated subquery requires a

> table scan.

>

> (3) Original Query with PK

> ==========================

>

> I then added a primary key constraint (TBL1_TMP_PK) for DATADATE and

> DATANAME because the nature of the first query assumes that the correlated

> subqueries return a single value:

> ALTER TABLE tbl1_tmp

>    ADD CONSTRAINT tbl1_tmp_pk

>       PRIMARY KEY (datadate, dataname)

> ;

>

> I collected statistics for the index:

> SQL> exec dbms_stats.gather_index_stats( USER, 'TBL1_TMP_PK' )

>

> The query was rerun after statistics were collected.  The results were as

> follows:

>    13035 rows selected.

>    Elapsed: 00:00:02.46

>

> The TKPROF output showed:

> call     count       cpu    elapsed       disk      query    current

> rows

> ------- ------  -------- ---------- ---------- ---------- ----------  ----

--


> ----

> Parse        1      0.21       0.24          0          0          0

> 0

> Execute      1      0.00       0.00          0          0          0

> 0

> Fetch      870      1.00       1.24        177      81711          0

> 13035

> ------- ------  -------- ---------- ---------- ---------- ----------  ----

--


> ----

> total      872      1.21       1.49        177      81711          0

> 13035

>

> Misses in library cache during parse: 1

> Optimizer mode: ALL_ROWS

> Parsing user id: 73

>

> Rows     Row Source Operation

> -------  ---------------------------------------------------

>   13034  TABLE ACCESS BY INDEX ROWID TBL1_TMP (cr=26939 pr=52 pw=0

> time=322452 us)

>   13034   INDEX UNIQUE SCAN TBL1_TMP_PK (cr=13905 pr=52 pw=0 time=177941

> us)(object id 54039)

>   13034  TABLE ACCESS BY INDEX ROWID TBL1_TMP (cr=26939 pr=50 pw=0

> time=290687 us)

>   13034   INDEX UNIQUE SCAN TBL1_TMP_PK (cr=13905 pr=50 pw=0 time=159581

> us)(object id 54039)

>   13034  TABLE ACCESS BY INDEX ROWID TBL1_TMP (cr=26939 pr=51 pw=0

> time=283721 us)

>   13034   INDEX UNIQUE SCAN TBL1_TMP_PK (cr=13905 pr=51 pw=0 time=153145

> us)(object id 54039)

>   13035  TABLE ACCESS FULL TBL1 (cr=894 pr=24 pw=0 time=70478 us)

>

> Now the logical I/O's have dropped from 6 million down to 81,711 and

> achieved a reduction in elapsed time of 99.6%.

>

> (4) Original Query with IOT

> ===========================

>

> In an attempt to improve things even further, I created an index organised

> table (IOT):

> CREATE TABLE tbl1_tmp_iot

>    (

>       datadate   DATE,

>       dataname   VARCHAR2(9),

>       datavalue  NUMBER,

>       CONSTRAINT tbl1_tmp_iot_pk

>          PRIMARY KEY ( datadate, dataname )

>    )

>    ORGANIZATION INDEX

> ;

>

> I populated the table as follows:

> INSERT INTO tbl1_tmp_iot SELECT * FROM tbl1_tmp;

> COMMIT;

>

> I collected the statistics:

> SQL> exec dbms_stats.gather_index_stats( USER, 'TBL1_TMP_IOT_PK' )

>

>  SELECT datadate,

>  (SELECT DATAVALUE

>  FROM tbl1_tmp_iot

>  WHERE tbl1.datadate = tbl1_tmp_iot.datadate AND

>  DATANAME = 'DATANAME1') AS DATANAME1,

>  (SELECT DATAVALUE

>  FROM tbl1_tmp_iot

>  WHERE tbl1.datadate = tbl1_tmp_iot.datadate AND

>  DATANAME = 'DATANAME2') AS DATANAME2,

>  (SELECT DATAVALUE

>  FROM tbl1_tmp_iot

>  WHERE tbl1.datadate = tbl1_tmp_iot.datadate AND

>  DATANAME = 'DATANAME3') AS DATANAME3

>  FROM tbl1

> ;

>

> The elapsed time is 00:00:01.29 seconds. The TKPROF output is:

> call     count       cpu    elapsed       disk      query    current

> rows

> ------- ------  -------- ---------- ---------- ---------- ----------  ----

--


> ----

> Parse        1      0.00       0.00          0          0          0

> 0

> Execute      1      0.00       0.00          0          0          0

> 0

> Fetch      870      0.70       0.66          0      42609          0

> 13035

> ------- ------  -------- ---------- ---------- ---------- ----------  ----

--


> ----

> total      872      0.70       0.67          0      42609          0

> 13035

>

> Misses in library cache during parse: 1

> Optimizer mode: ALL_ROWS

> Parsing user id: 73

>

> Rows     Row Source Operation

> -------  ---------------------------------------------------

>   13034  INDEX UNIQUE SCAN TBL1_TMP_IOT_PK (cr=13905 pr=0 pw=0 time=167930

> us)(object id 54041)

>   13034  INDEX UNIQUE SCAN TBL1_TMP_IOT_PK (cr=13905 pr=0 pw=0 time=130540

> us)(object id 54041)

>   13034  INDEX UNIQUE SCAN TBL1_TMP_IOT_PK (cr=13905 pr=0 pw=0 time=125185

> us)(object id 54041)

>   13035  TABLE ACCESS FULL TBL1 (cr=894 pr=0 pw=0 time=52274 us)

>

> Now we have almost halfed the number of logical I/O's by eliminating the

> table lookup for the value of the DATAVALUE column.

>

> (5) Pivot Query with PK

> =======================

>

> I took a modified version of Thomas Kyte's query from p.577 to produce the

> following query:

> SELECT

>       datadate,

>       MAX(

>          DECODE(

>             dataname,

>             'DATANAME1', datavalue

>          )

>       ) AS dataname1,

>       MAX(

>          DECODE(

>             dataname,

>             'DATANAME2', datavalue

>          )

>       ) AS dataname2,

>       MAX(

>          DECODE(

>             dataname,

>             'DATANAME3', datavalue

>          )

>       ) AS dataname3

>    FROM

>       tbl1_tmp

>    GROUP BY

>       datadate

> ;

>

> The elapsed time is 00:00:00.82 seconds and the TKPROF output shows:

> call     count       cpu    elapsed       disk      query    current

> rows

> ------- ------  -------- ---------- ---------- ---------- ----------  ----

--


> ----

> Parse        1      0.00       0.00          0          0          0

> 0

> Execute      1      0.00       0.00          0          0          0

> 0

> Fetch      870      0.12       0.33        110        155          2

> 13035

> ------- ------  -------- ---------- ---------- ---------- ----------  ----

--


> ----

> total      872      0.12       0.34        110        155          2

> 13035

>

> Misses in library cache during parse: 1

> Optimizer mode: ALL_ROWS

> Parsing user id: 73

>

> Rows     Row Source Operation

> -------  ---------------------------------------------------

>   13035  SORT GROUP BY (cr=155 pr=110 pw=110 time=278318 us)

>   39102   TABLE ACCESS FULL TBL1_TMP (cr=155 pr=0 pw=0 time=78327 us)

>

> (6) Pivot Query with IOT

> ========================

>

> I used the above pivot query on the IOT:

> SELECT

>       datadate,

>       MAX(

>          DECODE(

>             dataname,

>             'DATANAME1', datavalue

>          )

>       ) AS dataname1,

>       MAX(

>          DECODE(

>             dataname,

>             'DATANAME2', datavalue

>          )

>       ) AS dataname2,

>       MAX(

>          DECODE(

>             dataname,

>             'DATANAME3', datavalue

>          )

>       ) AS dataname3

>    FROM

>       tbl1_tmp_iot

>    GROUP BY

>       datadate

> ;

>

> The elapsed time is 00:00:01.62 seconds, and the TKPROF output shows:

> call     count       cpu    elapsed       disk      query    current

> rows

> ------- ------  -------- ---------- ---------- ---------- ----------  ----

--


> ----

> Parse        1      0.00       0.00          0          0          0

> 0

> Execute      1      0.00       0.00          0          0          0

> 0

> Fetch      870      0.21       0.17          0       1013          0

> 13035

> ------- ------  -------- ---------- ---------- ---------- ----------  ----

--


> ----

> total      872      0.21       0.17          0       1013          0

> 13035

>

> Misses in library cache during parse: 1

> Optimizer mode: ALL_ROWS

> Parsing user id: 73

>

> Rows     Row Source Operation

> -------  ---------------------------------------------------

>   13035  SORT GROUP BY NOSORT (cr=1013 pr=0 pw=0 time=183040 us)

>   39102   INDEX FULL SCAN TBL1_TMP_IOT_PK (cr=1013 pr=0 pw=0 time=117988

> us)(object id 54041)

>

> (7) Overview of Pivot Query

> ===========================

>

> The key to pivot queries is the use of the comibination of the MAX and

> DECODE functions with the GROUP BY clause.

>

> To see how the process of pivoting works, lets look at the intermediate

step


> (without the MAX function and GROUP BY clause):

> SELECT

>       datadate,

>       DECODE(

>          dataname,

>          'DATANAME1', datavalue

>       ) AS dataname1,

>       DECODE(

>          dataname,

>          'DATANAME2', datavalue

>       ) AS dataname2,

>       DECODE(

>          dataname,

>          'DATANAME3', datavalue

>       ) AS dataname3

>    FROM

>       tbl1_tmp

>    WHERE

>       rownum <= 5

> /

>

> DATADATE   DATANAME1  DATANAME2  DATANAME3

> --------- ---------- ---------- ----------

> 11-APR-04                     1

> 11-APR-04                                2

> 12-APR-04          3

> 12-APR-04                     4

> 12-APR-04                                5

>

> You will note that rows #3, #4, and #5 have placed the value of the

> DATAVALUE column into different columns depending upon the value of the

> DATANAME column.  The blank entries are really NULLs.

>

> To condense these groups of three (3) rows into a single row requires the

> use of the MAX function in conjunction with the GROUP BY clause.

>

>

> Douglas Hawthorne

>

>

Received on Wed Apr 14 2004 - 18:28:59 CDT