OraFAQ Forum: Performance Tuning » Help proving a database is slow

Home » RDBMS Server » Performance Tuning » Help proving a database is slow (10.2 / Solaris)

Show: Today's Messages :: Polls :: Message Navigator
E-mail to friend

Help proving a database is slow [message #458786]

Tue, 01 June 2010 19:13

rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia

Senior Member

I'm experiencing some very strange behaviour in my database, and it's happened at the worst time possible: during deployment.

Prior to now, it has been working admirably, but something has changed and noone is admitting to having changed anything.

I have a very simple test case that demonstrates the problem, but I do not have a benchmark to compare against. If someone asks "why do you think it should run faster?" I don't have a good answer. I want some proof that lower-spec boxes around the world are doing this orders of magnitude faster than our box.

What I would like is for some of you guys to run a query on your very large tables (>10M rows) and tell me the performance.

Here's the query:

select *
from (
    select /*+first_rows*/ *
    from <really big table> sample (1) a
    join <same really big table> b
    on a.primary_key = b.primary_key
    and rownum <= 10000
    )
where rownum > 1;

Please time it the FIRST TIME, because I want to rule out caching benefits.

The objective here is to pick up 10000 from all over the table using the SAMPLE mechanism, and then lookup those rows in the same table by using the Primary Key index. This should be especially fast because the table rows that are accessed via the index are already in the buffer cache from the SAMPLE access.

The ROWNUM > 1 will ensure that rows are not returned across the network, thereby isolating the test to the DB server alone.

I am getting consistent results across my database: with tables having row widths of 200-300 bytes, this query takes around 130s. This equates to <100 rows/sec. Prior to this problem we were getting Nested Loops joins of 6-10 tables returning rows at around 500-600 rows/sec. Now here's something really weird, a job that was using a Nested Loops plan was running at 50 rows/sec for about 2 hours yesterday, when it suddenly accellerated to 500+ rows/sec for the remainder of the job (another 30 mins or so). That improvement was temporary, and now the DB is back to its old tricks.

Here's what we have already checked:
- Contention: This is the only query running on the database. I have checked this using OEM looking at active sessions, and by performing a "ps -ef" on Unix.
- Buffer Cache too small: Currently 928MB, but we are going to increase it this morning - just need to schedule an outage.
- Shared Pool too small: Currently 240MB, but will increase.

I don't think these values are small enough to warrant the bad performance, and the DBA has not changed them recently, and most importantly, they did not change in the middle of the job that suddenly accellerated.

- Other databases on the same box: There is 1 other database. It is idle.
- Non-database activity on the box: "ps -ef" and "top" shows no other activity. We have also bounced the box.
- Heavy disk usage: "sar -d 5 1" does show some activity on the disks where our data is stored that I cannot explain, but it is very modest (<1%) compared to when a query is running. The avwait statistic on sar is showing 0.0 both when the box is idle and when my query is running.

The disks that this database resides on are physical devices that may be shared across other computers. I cannot see activity generated by the other computers, although Full Table Scan performance is great. It is only indexed lookups that are bad.

If anyone can help by running the above query on their large table, I would greatly appreciate knowing:
- The time taken
- The AVG_ROW_LEN of the table (from USER_TABLES)
- # rows in the table
- Any specs of your environment (buffer cache, shared pool, number of CPUs, RAM.

Ross Leishman