Home » RDBMS Server » Performance Tuning » Help proving a database is slow (10.2 / Solaris)
Help proving a database is slow [message #458786] Tue, 01 June 2010 19:13 Go to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I'm experiencing some very strange behaviour in my database, and it's happened at the worst time possible: during deployment.

Prior to now, it has been working admirably, but something has changed and noone is admitting to having changed anything.

I have a very simple test case that demonstrates the problem, but I do not have a benchmark to compare against. If someone asks "why do you think it should run faster?" I don't have a good answer. I want some proof that lower-spec boxes around the world are doing this orders of magnitude faster than our box.

What I would like is for some of you guys to run a query on your very large tables (>10M rows) and tell me the performance.

Here's the query:
select *
from (
    select /*+first_rows*/ *
    from <really big table> sample (1) a
    join <same really big table> b
    on a.primary_key = b.primary_key
    and rownum <= 10000
    )
where rownum > 1;


Please time it the FIRST TIME, because I want to rule out caching benefits.

The objective here is to pick up 10000 from all over the table using the SAMPLE mechanism, and then lookup those rows in the same table by using the Primary Key index. This should be especially fast because the table rows that are accessed via the index are already in the buffer cache from the SAMPLE access.

The ROWNUM > 1 will ensure that rows are not returned across the network, thereby isolating the test to the DB server alone.

I am getting consistent results across my database: with tables having row widths of 200-300 bytes, this query takes around 130s. This equates to <100 rows/sec. Prior to this problem we were getting Nested Loops joins of 6-10 tables returning rows at around 500-600 rows/sec. Now here's something really weird, a job that was using a Nested Loops plan was running at 50 rows/sec for about 2 hours yesterday, when it suddenly accellerated to 500+ rows/sec for the remainder of the job (another 30 mins or so). That improvement was temporary, and now the DB is back to its old tricks.

Here's what we have already checked:
- Contention: This is the only query running on the database. I have checked this using OEM looking at active sessions, and by performing a "ps -ef" on Unix.
- Buffer Cache too small: Currently 928MB, but we are going to increase it this morning - just need to schedule an outage.
- Shared Pool too small: Currently 240MB, but will increase.

I don't think these values are small enough to warrant the bad performance, and the DBA has not changed them recently, and most importantly, they did not change in the middle of the job that suddenly accellerated.

- Other databases on the same box: There is 1 other database. It is idle.
- Non-database activity on the box: "ps -ef" and "top" shows no other activity. We have also bounced the box.
- Heavy disk usage: "sar -d 5 1" does show some activity on the disks where our data is stored that I cannot explain, but it is very modest (<1%) compared to when a query is running. The avwait statistic on sar is showing 0.0 both when the box is idle and when my query is running.

The disks that this database resides on are physical devices that may be shared across other computers. I cannot see activity generated by the other computers, although Full Table Scan performance is great. It is only indexed lookups that are bad.

If anyone can help by running the above query on their large table, I would greatly appreciate knowing:
- The time taken
- The AVG_ROW_LEN of the table (from USER_TABLES)
- # rows in the table
- Any specs of your environment (buffer cache, shared pool, number of CPUs, RAM.


Ross Leishman
Re: Help proving a database is slow [message #458792 is a reply to message #458786] Tue, 01 June 2010 20:20 Go to previous messageGo to next message
ramoradba
Messages: 2456
Registered: January 2009
Location: AndhraPradesh,Hyderabad,I...
Senior Member
Quote:
If anyone can help by running the above query on their large table, I would greatly appreciate knowing:
- The time taken
- The AVG_ROW_LEN of the table (from USER_TABLES)
- # rows in the table
- Any specs of your environment (buffer cache, shared pool, number of CPUs, RAM.



From my testing server....
06:37:00 ind> select AVG_ROW_LEN    from user_tables
06:37:01   2  where table_name='MEMBER_PROFILE'
06:37:01   3  /

AVG_ROW_LEN
-----------
        753

1 row selected.

Elapsed: 00:00:00.01
06:37:01 ind> select count(*) from member_profile;

  COUNT(*)
----------
   1306293

1 row selected.

Elapsed: 00:00:00.42

06:39:40 ind> set autotrace traceonly
06:41:52 ind> select *
06:41:55   2  from (
06:41:55   3      select /*+first_rows*/ *
06:41:55   4      from member_profile sample (1) a
06:41:55   5      join member_profile b
06:41:55   6      on a.org_id = b.org_id and a.member_id=b.member_id
06:41:55   7      and rownum <= 10000
06:41:55   8      )
06:41:55   9  where rownum > 1;

no rows selected

Elapsed: 00:00:52.87

Execution Plan
----------------------------------------------------------
Plan hash value: 651687208

--------------------------------------------------------------------------------
---------------------------------------------

| Id  | Operation                               | Name              | Rows  | By
tes | Cost (%CPU)| Time     | Pstart| Pstop |

--------------------------------------------------------------------------------
---------------------------------------------

|   0 | SELECT STATEMENT                        |                   | 10000 |  6
392M| 54571   (1)| 00:10:55 |       |       |

|   1 |  COUNT                                  |                   |       |
    |            |          |       |       |

|*  2 |   FILTER                                |                   |       |
    |            |          |       |       |

|   3 |    VIEW                                 |                   | 10000 |  6
392M| 54571   (1)| 00:10:55 |       |       |

|*  4 |     COUNT STOPKEY                       |                   |       |
    |            |          |       |       |

|   5 |      NESTED LOOPS                       |                   | 12954 |
 18M| 54571   (1)| 00:10:55 |       |       |

|   6 |       PARTITION HASH ALL                |                   | 13080 |  9
618K| 28383   (1)| 00:05:41 |     1 |    16 |

|   7 |        TABLE ACCESS SAMPLE              | MEMBER_PROFILE    | 13080 |  9
618K| 28383   (1)| 00:05:41 |     1 |    16 |

|   8 |       TABLE ACCESS BY GLOBAL INDEX ROWID| MEMBER_PROFILE    |     1 |
753 |     2   (0)| 00:00:01 | ROWID | ROWID |

|*  9 |        INDEX UNIQUE SCAN                | PK_MEMBER_PROFILE |     1 |
    |     1   (0)| 00:00:01 |       |       |

--------------------------------------------------------------------------------
---------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(ROWNUM>1)
   4 - filter(ROWNUM<=10000)
   9 - access("A"."ORG_ID"="B"."ORG_ID" AND "A"."MEMBER_ID"="B"."MEMBER_ID")


Statistics
----------------------------------------------------------
       7487  recursive calls
          0  db block gets
      72724  consistent gets
      88868  physical reads
        536  redo size
      37519  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
        136  sorts (memory)
          0  sorts (disk)
          0  rows processed

06:42:50 ind> 


sriram Smile
Re: Help proving a database is slow [message #458798 is a reply to message #458792] Tue, 01 June 2010 22:36 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Thanks for that. 187 rows/sec and a much fatter row length. That's not great performance, but a lot better than mine.
Re: Help proving a database is slow [message #458854 is a reply to message #458798] Wed, 02 June 2010 03:38 Go to previous messageGo to next message
John Watson
Messages: 8931
Registered: January 2010
Location: Global Village
Senior Member
Here you go, hope it is useful. btw, low spec demo system - nothing else happening.

  1  select *
  2         from (
  3             select /*+first_rows*/ *
  4             from sc_cp_local sample (1) a
  5             join sc_cp_local b
  6             on a.prk = b.prk and a.prl=b.prl
  7             and rownum <= 10000
  8             )
  9*     where rownum > 1
EVD> /

no rows selected

Elapsed: 00:01:04.21

Execution Plan
----------------------------------------------------------
Plan hash value: 882620215

------------------------------------------------------------------------------------------------
| Id  | Operation                | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |             | 10000 |    38M|       |   419K  (2)| 01:23:54 |
|   1 |  COUNT                   |             |       |       |       |            |          |
|*  2 |   FILTER                 |             |       |       |       |            |          |
|   3 |    VIEW                  |             | 10000 |    38M|       |   419K  (2)| 01:23:54 |
|*  4 |     COUNT STOPKEY        |             |       |       |       |            |          |
|*  5 |      HASH JOIN           |             |   596K|    55M|    35M|   419K  (2)| 01:23:54 |
|   6 |       TABLE ACCESS SAMPLE| SC_CP_LOCAL |   591K|    28M|       |   125K  (2)| 00:25:04 |
|   7 |       TABLE ACCESS FULL  | SC_CP_LOCAL |    59M|  2652M|       |   125K  (2)| 00:25:09 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(ROWNUM>1)
   4 - filter(ROWNUM<=10000)
   5 - access("A"."PRK"="B"."PRK" AND "A"."PRL"="B"."PRL")


Statistics
----------------------------------------------------------
        404  recursive calls
          0  db block gets
     456903  consistent gets
     464165  physical reads
          0  redo size
        550  bytes sent via SQL*Net to client
        405  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed

EVD> select avg_row_len from user_tables where table_name='SC_CP_LOCAL';

AVG_ROW_LEN
-----------
         51

EVD> select count(*) from SC_CP_LOCAL;

  COUNT(*)
----------
  58633079

Elapsed: 00:01:21.88
EVD> sho sga

Total System Global Area 1071333376 bytes
Fixed Size                  1318172 bytes
Variable Size             444596964 bytes
Database Buffers          612368384 bytes
Redo Buffers               13049856 bytes
EVD> select stat_name,value from v$osstat;

STAT_NAME                                                             VALUE
---------------------------------------------------------------- ----------
NUM_CPUS                                                                  2
IDLE_TIME                                                          17787711
BUSY_TIME                                                            386805
USER_TIME                                                            249457
SYS_TIME                                                             136035
IOWAIT_TIME                                                          462129
NICE_TIME                                                               551
RSRC_MGR_CPU_WAIT_TIME                                                    0
LOAD                                                             .559570313
PHYSICAL_MEMORY_BYTES                                            2121875456
TCP_SEND_SIZE_MIN                                                      4096

STAT_NAME                                                             VALUE
---------------------------------------------------------------- ----------
TCP_SEND_SIZE_DEFAULT                                                 16384
TCP_SEND_SIZE_MAX                                                    131072
TCP_RECEIVE_SIZE_MIN                                                   4096
TCP_RECEIVE_SIZE_DEFAULT                                              87380
TCP_RECEIVE_SIZE_MAX                                                 174760
GLOBAL_SEND_SIZE_MAX                                                 131071
GLOBAL_RECEIVE_SIZE_MAX                                              131071

18 rows selected.

Elapsed: 00:00:00.23
EVD>

Re: Help proving a database is slow [message #458867 is a reply to message #458854] Wed, 02 June 2010 04:57 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Huh! For some reason it ignored the FIRST_ROWS hint and did a HASH JOIN.

You definitely have a UNIQUE index on (PRK, PRL), right? Don't suppose you could run it again with /*+ ORDERED USE_NL(b) INDEX(b)*/? But man, full scan of 59M rows in a minute on your "low spec" system. The tool who put together our 8CPU SPARC T2000 with 32GB RAM should be ashamed, cos we've got some real problems. Our full scans don't perform ANYTHING like that, so I'm thinking the problem is not local to NESTED LOOPS joins.

Ross Leishman
Re: Help proving a database is slow [message #458871 is a reply to message #458867] Wed, 02 June 2010 05:08 Go to previous messageGo to next message
John Watson
Messages: 8931
Registered: January 2010
Location: Global Village
Senior Member
Quote:
The tool who put together...
Laughing typo, or what? Love it!
No, it ain't a unique index - the primary key is on a pre-built non-unique index (I usually do that). Don't think I can test again today.

Re: Help proving a database is slow [message #458976 is a reply to message #458786] Wed, 02 June 2010 15:13 Go to previous messageGo to next message
moshea
Messages: 51
Registered: February 2008
Location: Dublin, Ireland
Member
40+ seconds on a table with 9.3M fat rows
12 seconds on a table with 125M+ short rows.

Hmm, quite the difference.

Its our test box, but its actually the same class of HP hardware we use in production.

-Michael

20:31:21 SQL> set lin 130

20:31:34 SQL> select avg_row_len from all_tables
20:31:42   2  where table_name = 'TRANSMAIN';

AVG_ROW_LEN
-----------
        590

20:31:44 SQL> set autotrace traceonly
20:32:21 SQL>
20:32:21 SQL> set timing on
20:32:29 SQL>
20:32:30 SQL> select *
20:32:30   2  from (
20:32:30   3      select /*+first_rows*/ *
20:32:30   4      from TRANSMAIN sample (1) a
20:32:30   5      join TRANSMAIN b
20:32:30   6      on a.transik = b.transik
20:32:30   7      and rownum <= 10000
20:32:30   8      )
20:32:30   9  where rownum > 1;

no rows selected

Elapsed: 00:00:41.61

Execution Plan
----------------------------------------------------------
Plan hash value: 4091699509

------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |             | 10000 |    75M|   357K  (1)| 01:11:30 |
|   1 |  COUNT                           |             |       |       |            |          |
|*  2 |   FILTER                         |             |       |       |            |          |
|   3 |    VIEW                          |             | 10000 |    75M|   357K  (1)| 01:11:30 |
|*  4 |     COUNT STOPKEY                |             |       |       |            |          |
|   5 |      NESTED LOOPS                |             | 93094 |   104M|   357K  (1)| 01:11:30 |
|   6 |       TABLE ACCESS SAMPLE        | TRANSMAIN   | 93094 |    52M|   171K  (1)| 00:34:14 |
|   7 |       TABLE ACCESS BY INDEX ROWID| TRANSMAIN   |     1 |   590 |     2   (0)| 00:00:01 |
|*  8 |        INDEX UNIQUE SCAN         | P_TRANSMAIN |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(ROWNUM>1)
   4 - filter(ROWNUM<=10000)
   8 - access("A"."TRANSIK"="B"."TRANSIK")


Statistics
----------------------------------------------------------
         77  recursive calls
          0  db block gets
      43629  consistent gets
      64260  physical reads
          0  redo size
      20774  bytes sent via SQL*Net to client
        227  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          0  rows processed

20:33:13 SQL>
20:34:15 SQL>
20:38:16 SQL>

20:39:22 SQL> set autotrace off
20:39:35 SQL> select stat_name, value from v$osstat;

STAT_NAME                                                             VALUE
---------------------------------------------------------------- ----------
NUM_CPUS                                                                 10
IDLE_TIME                                                         310772517
BUSY_TIME                                                          15574453
USER_TIME                                                          11029470
SYS_TIME                                                            4544983
IOWAIT_TIME                                                        13043895
AVG_IDLE_TIME                                                      31067476
AVG_BUSY_TIME                                                       1547722
AVG_USER_TIME                                                       1093178
AVG_SYS_TIME                                                         444717
AVG_IOWAIT_TIME                                                     1294669
OS_CPU_WAIT_TIME                                                 5.2162E+12
RSRC_MGR_CPU_WAIT_TIME                                                    0
LOAD                                                               .0078125
NUM_CPU_SOCKETS                                                          10
PHYSICAL_MEMORY_BYTES                                            5.0413E+10
VM_IN_BYTES                                                      1393090560
VM_OUT_BYTES                                                     1601556480

18 rows selected.

Elapsed: 00:00:00.01
20:39:36 SQL> sho sga

Total System Global Area 3992977408 bytes
Fixed Size                  2159400 bytes
Variable Size            3929836760 bytes
Database Buffers           50331648 bytes
Redo Buffers               10649600 bytes
20:42:27 SQL>
20:42:30 SQL> select count(*) from transmain;

  COUNT(*)
----------
   9331892

Elapsed: 00:00:06.11

20:49:40 SQL>
20:53:54 SQL>
20:53:54 SQL> set autotrace traceonly
20:54:06 SQL>
20:54:07 SQL> set timing on
20:54:09 SQL>
20:54:10 SQL> select *
20:54:11   2  from (
20:54:11   3      select /*+first_rows*/ *
20:54:11   4      from calcfiguresub sample (1) a
20:54:11   5      join calcfiguresub b
20:54:11   6      on a.calcfigik = b.calcfigik and a.calcfigsubdatik = b.calcfigsubdatik
20:54:11   7      and rownum <= 10000
20:54:11   8      )
20:54:11   9  where rownum > 1;

no rows selected

Elapsed: 00:00:12.48

Execution Plan
----------------------------------------------------------
Plan hash value: 143895197

----------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                 | 10000 |   761K|  3820K  (1)| 12:44:02 |
|   1 |  COUNT                           |                 |       |       |            |          |
|*  2 |   FILTER                         |                 |       |       |            |          |
|   3 |    VIEW                          |                 | 10000 |   761K|  3820K  (1)| 12:44:02 |
|*  4 |     COUNT STOPKEY                |                 |       |       |            |          |
|   5 |      NESTED LOOPS                |                 |  1239K|    28M|  3820K  (1)| 12:44:02 |
|   6 |       TABLE ACCESS SAMPLE        | CALCFIGURESUB   |  1239K|    14M| 98853   (2)| 00:19:47 |
|   7 |       TABLE ACCESS BY INDEX ROWID| CALCFIGURESUB   |     1 |    12 |     3   (0)| 00:00:01 |
|*  8 |        INDEX UNIQUE SCAN         | P_CALCFIGURESUB |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(ROWNUM>1)
   4 - filter(ROWNUM<=10000)
   8 - access("A"."CALCFIGIK"="B"."CALCFIGIK" AND
              "A"."CALCFIGSUBDATIK"="B"."CALCFIGSUBDATIK")


Statistics
----------------------------------------------------------
        290  recursive calls
          0  db block gets
      43546  consistent gets
       6703  physical reads
          0  redo size
        394  bytes sent via SQL*Net to client
        227  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          7  sorts (memory)
          0  sorts (disk)
          0  rows processed

20:54:23 SQL> set autotrace off
20:56:13 SQL> select count(*) from calcfiguresub;

  COUNT(*)
----------
 125261366

Elapsed: 00:02:44.41
20:59:11 SQL> select avg_row_len from all_tables
20:59:19   2  where table_name = 'CALCFIGURESUB';

AVG_ROW_LEN
-----------
         12

Elapsed: 00:00:00.06
20:59:27 SQL>
Re: Help proving a database is slow [message #458997 is a reply to message #458976] Wed, 02 June 2010 19:38 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Hey, thanks for the support you guys. I'm overwhelmed. This is teriffic stuff.

For those interested, one of our Unix guys (as opposed to "their" Unix guys, because the hardware and DBA support is outsourced) has been doing some raw tests with Unix dd to benchmark the disks, and they are writing and reading 10 times slower than our production environment.

Because these are NFS disks on an SVT environment, we suspect that we are sharing a network device with another application in SVT, which it flogging the network that connects the disks to our Oracle server. Just a theory.... very difficult to prove with our levels of access, but your benchmarks make me much more confident of our position.

Ross Leishman
Re: Help proving a database is slow [message #459005 is a reply to message #458786] Wed, 02 June 2010 22:40 Go to previous messageGo to next message
trantuananh24hg
Messages: 744
Registered: January 2007
Location: Ha Noi, Viet Nam
Senior Member
OKie, I'll post the test sql as you suggest.

logvnp@VNP> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Solaris: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

5 rows selected.

logvnp@VNP> select count(1) from big_table;

  COUNT(1)
----------
   1000000

logvnp@VNP> select sid from v$session
  2  where username='LOGVNP'
  3  and program like 'sq%';

       sid
----------
       383

logvnp@VNP> set autotrace traceonly explain
logvnp@VNP> select *
  2  from (
  3  select /*+first_rows*/*
  4  from big_table sample(1) a
  5  join big_table b
  6  on a.id=b.id
  7  and a.object_name=b.object_name
  8  and rownum<=10000)
  9  where rownum<1
 10  /

Execution Plan
----------------------------------------------------------
Plan hash value: 1902106588

--------------------------------------------------------------------------------
-----
| Id  | Operation               | Name      | Rows  | Bytes | Cost (%CPU)| Time
    |
--------------------------------------------------------------------------------
-----
|   0 | SELECT STATEMENT        |           |     1 |   282 |  3210   (2)| 00:00
:39 |
|*  1 |  COUNT STOPKEY          |           |       |       |            |
    |
|   2 |   VIEW                  |           | 10000 |  2753K|  3210   (2)| 00:00
:39 |
|*  3 |    COUNT STOPKEY        |           |       |       |            |
    |
|*  4 |     HASH JOIN           |           | 10000 |  2841K|  3210   (2)| 00:00
:39 |
|   5 |      TABLE ACCESS SAMPLE| BIG_TABLE | 10000 |   947K|  3207   (2)| 00:00
:39 |
|   6 |      TABLE ACCESS FULL  | BIG_TABLE |   200 | 19400 |     2   (0)| 00:00
:01 |
--------------------------------------------------------------------------------
-----

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<1)
   3 - filter(ROWNUM<=10000)
   4 - access("A"."ID"="B"."ID" AND "A"."OBJECT_NAME"="B"."OBJECT_NAME")

logvnp@VNP> set autotrace off
logvnp@VNP> select stat_name, value
  2  from v$osstat;

STAT_NAME                                                             VALUE
---------------------------------------------------------------- ----------
NUM_CPUS                                                                  4
IDLE_TIME                                                          50166369
BUSY_TIME                                                           8755311
USER_TIME                                                           7659225
SYS_TIME                                                            1096086
IOWAIT_TIME                                                               0
AVG_IDLE_TIME                                                      12537823
AVG_BUSY_TIME                                                       2185126
AVG_USER_TIME                                                       1911057
AVG_SYS_TIME                                                         270313
AVG_IOWAIT_TIME                                                           0
OS_CPU_WAIT_TIME                                                     541400
RSRC_MGR_CPU_WAIT_TIME                                                    0
LOAD                                                              .87109375
PHYSICAL_MEMORY_BYTES                                            1.7093E+10
VM_IN_BYTES                                                      3877429248
VM_OUT_BYTES                                                     2345951232

17 rows selected.


However, why did you think that join a big table by itself will help you apologize the different between many system?

Let me take an another statement, and measure it.

logvnp@VNP> set feedback off
logvnp@VNP> select * from (
  2  select *  from
  3  big_table a
  4  join big_table b
  5  on a.id=b.id
  6  and a.object_name=b.object_name)
  7  /
  
logvnp@VNP> @snap ash 5 1 383
Sampling with interval 5 seconds, 1 times...

-- Session Snapper


-----------------------------------------------------------------------
Active% | SQL_ID          | EVENT                     | WAIT_CLASS
-----------------------------------------------------------------------
    76% | 62jynh05qw7zw   | direct path write temp    | User I/O
    24% | 62jynh05qw7zw   | ON CPU                    | ON CPU

--  End of ASH snap 1, end=2010-06-03 10:42:00, seconds=5, samples_taken=41


PL/SQL procedure successfully completed.

logvnp@VNP> col sql_text format a30
logvnp@VNP> select sql_text, cpu_time/1000000 cpuon, parse_calls
  2  from v$sql
  3  where sql_id='62jynh05qw7zw';

SQL_TEXT                            CPUON PARSE_CALLS
------------------------------ ---------- -----------
select * from ( select *  from   4.106833           1
 big_table a join big_table b
on a.id=b.id and a.object_name
=b.object_name)

logvnp@VNP> set autotrace traceonly explain
logvnp@VNP> select * from (
  2  select * from
  3  big_table a
  4  join big_table b
  5  on a.id=b.id
  6  and a.object_name=b.object_name)
  7  /

Execution Plan
----------------------------------------------------------
Plan hash value: 1690342657

--------------------------------------------------------------------------------
--------
| Id  | Operation          | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Ti
me     |
--------------------------------------------------------------------------------
--------
|   0 | SELECT STATEMENT   |           |  1000K|   185M|       | 16904   (2)| 00
:03:23 |
|*  1 |  HASH JOIN         |           |  1000K|   185M|   103M| 16904   (2)| 00
:03:23 |
|   2 |   TABLE ACCESS FULL| BIG_TABLE |  1000K|    92M|       |  3258   (3)| 00
:00:40 |
|   3 |   TABLE ACCESS FULL| BIG_TABLE |  1000K|    92M|       |  3258   (3)| 00
:00:40 |
--------------------------------------------------------------------------------
--------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A"."ID"="B"."ID" AND "A"."OBJECT_NAME"="B"."OBJECT_NAME")



The only thing made differently was IO, I think, don't I?
Re: Help proving a database is slow [message #459007 is a reply to message #459005] Wed, 02 June 2010 23:00 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Thanks trantuananh24hg. 1M rows is not really a big enough table for my preferences in this test. I also think maybe you are missing a PK index, because your SQL performed a HASH JOIN, or maybe your database just ignored the FIRST_ROWS hint like John Watson's.

The reason I constructed the query the way I did is because I wanted to demonstrate the cost of accessing single blocks from disparate locations on disk, without overheads of returning data across the network. Your second example uses FULL TABLE SCANS, which will perform multi-block reads of contiguous blocks on disk. Also, even through you specified EXPLAIN ONLY, SQL*Plus still pulls the data back across the network, it just does not display.

Thanks for your input.

Ross Leishman
Re: Help proving a database is slow [message #459195 is a reply to message #459007] Thu, 03 June 2010 17:23 Go to previous messageGo to next message
coleing
Messages: 213
Registered: February 2008
Senior Member
I cant do any tests for you as im not at work at the moment, but it is sounding like sequential reads are fine, but scattered reads are not?

Are you sure your indexes have an ok clustering factor?

Re: Help proving a database is slow [message #459196 is a reply to message #459195] Thu, 03 June 2010 17:24 Go to previous messageGo to next message
coleing
Messages: 213
Registered: February 2008
Senior Member
Ahh, I just read it was an NFS mount?

thats sounds a bit ropey for putting oracle data files isnt it?

Can you increase block size? This might help.

[Updated on: Thu, 03 June 2010 17:29]

Report message to a moderator

Re: Help proving a database is slow [message #459201 is a reply to message #458786] Thu, 03 June 2010 21:41 Go to previous messageGo to next message
trantuananh24hg
Messages: 744
Registered: January 2007
Location: Ha Noi, Viet Nam
Senior Member
@reishman: The Big_table was created by looping insert from all_objects, so that, it only has got check constraint, not is primary key. So, I can solve it by using Exceptions table, but if I use, the big table can not contained to 10m rows.

@coleing: I guess the block size is 8K.
Re: Help proving a database is slow [message #459418 is a reply to message #459201] Fri, 04 June 2010 18:42 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Coleing, we are a bit powerless. Pretty much just the application design and programming has been contracted to us.

Regarding clustering factor, I designed the test case above to thwart clustering factors by choosing rows from all over the place. I was looking to benchmark the "worst-case", where each row that you process is in a different block. Even though this is an undesirable application characteristic, I think it should still pump data out at MUCH better than <100 rows/sec.

We might have had a shot at infrastructure redesign if we had caught it earlier, but we are in the middle of deployment now, so it's too late.

The reason we haven't noticed it before now is because it hasn't been a problem. Historic performance hasn't been blindingly fast, but it has been quite viable.

We escalated the issue using the Unix "dd" test case, which showed the disk to be slow without bringing Oracle into the equation. They identified a setting in the NFS that was different to our PROD system - something to do with "Direct IO". We had an outage and they changed the setting. This fixed the Unix "dd" test case (10x faster than it was), but the Oracle test case I posted above is unchanged.

Someone mentioned that there is an Oracle setting related to "Direct IO", and that even if we change it on the NFS, Oracle won't use it. If there is any truth to this then we need to escalate it to the outsourced DBAs.

Ross Leishman
Re: Help proving a database is slow [message #459729 is a reply to message #459418] Tue, 08 June 2010 02:06 Go to previous messageGo to next message
hkchital
Messages: 128
Registered: September 2008
Location: Singapore
Senior Member
>Someone mentioned that there is an Oracle setting related to "Direct IO"

That would be "filesystemio_options" which can be "none", "directio", "asynch" or "setall"


If they have enabled Direct I/O at the server level, all your file I/O is not using FileSystem Buffers. Then again, with NFS, this should have been enabled.
With Direct I/O enabled, you generally need to increase your db_buffer_cache to overcome the loss of the OS (filesystem) buffers !

I am not very sure about your SQL test whether that is the right way to "prove" something. You are attempting Random, single block *reads*. How about multiblock reads ? How about writes ?

Hemant K Chitale

[Updated on: Tue, 08 June 2010 02:07] by Moderator

Report message to a moderator

Re: Help proving a database is slow [message #460153 is a reply to message #459729] Wed, 09 June 2010 22:37 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
FILESYSTEMIO_OPTIONS is set to "asynch", and our NFS mounts on Salaris use the "forcedirectio" option. My understanding is that this will bypass the filesystem cache.

Our multi-block reads seem to be OK - no complaints. It is our single block reads that are killing us. For the purpose of testing, I want to choose blocks NOT in the buffer cache, so that I can measure the cost of single block reads from disk.

As it turns out, the test case I posted above is still quite subject to buffering and clustering factors; so its value is limited. I created another test case that randomises the order of blocks retrieved, but it is not easily transportable to other environments, so I won't bother posting it here.

Thanks for your help. I now have traction with our Database, Network and System administrators to track this problem down. There is a general recognition that something is wrong.

Ross Leishman
Re: Help proving a database is slow [message #462736 is a reply to message #460153] Sat, 26 June 2010 05:13 Go to previous message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
If anyone is interested in the outcome of this....

We got assistance down at the NAS level. One of the network cards in the NAS was configured incorrectly and the result was that it didn't seem to utilise the disk cache.

This didn't really solve our problem though, queries were still slow. We measured round trips to the disk consistently between 0.011 and 0.014 seconds. According to our NAS guys, that is an expected and valid outcome (they actually quoted 0.007 to 0.010 seconds, but our experience was in within reasonable variance.

Given that the disk hits were performing as expected, we refocussed on Oracle figuring if a round trip to disk is not going to get any better, we need to do fewer round trips. We increased the buffer cache and got a better hit ratio. It improved things a fair bit, but we still had a long build-up in an ETL job to get the cache built up with the tables being processed.

Finally made a breakthrough today: in each of our SQL queries in the ETL we included a call to a packaged function that returned a setting that affected the range of data to be processed.

The function was not defined as parallel_enable because it used a global package state variable (violated the wnps purity condition). I changed the package variable over to a SYS_CONTEXT call (which is global across the parallel query servers), set the function to PARALLEL_ENABLE, and WHOOSHKA! it went like a rocket.
Previous Topic: Pin Sql query in shared pool
Next Topic: select partition table with non partition key condition
Goto Forum:
  


Current Time: Thu May 02 17:00:30 CDT 2024