Home » RDBMS Server » Performance Tuning » Options for querying on the skewed data (Oracle on RHEL)
Options for querying on the skewed data [message #542739] Fri, 10 February 2012 00:39 Go to next message
Messages: 134
Registered: November 2011
Location: Canada
Senior Member

We have a table with huge data which is skewed on a 'status' column

The 'status' column has 6 distinct values with 1 particular value occupying 80-85% records

In the batch process we query the data on the status and process the retrieved records

My senior is insisting on partitioning which I see not much feasible considering cost implications just for a part of functionality

I have following options in mind, could you please suggest on it?

See there are 6 status 'A','B','C','D','E','F'

with 'A' occupying 80% records
'B' to 'F' occupies 2% till 14% records in the table(approx)

Create a conditional index on status (using case) to have records with all statuses except 'A'
Then create If-ELSE structure
IF input parameter is 'A' 
select /*+ FULL Parallel(t) */ * from t where status='A';
Select /*+ INDEX (t conditional_index) */ * from t where status in ('B','C');

I want to create conditional index here for 2 reasons
1] since it will have values for status except 'A' this nullify the chance that this index will be picked up when status='A' will be queried
Thus making the performance worst (status ='A' is for 80% records) - The IF-ELSE is additional protection
2] Less impact on the DMLS as the index will not be on status='A' which contribute to large chunk of records

Populate a dummy table which would contain rowid and status
Since the business closes at 21:00 and batch process starts at 21:30
Between these times periods refresh the dummy table every day using merge (to catch business transactions during the day)

Now during the batch process retrieve records from the main table using the rowids in the dummy table depending on the input status value

Create index on status
Make sure hard coded status values are used in the database procedures
Gather stats with the histograms
And leave it to the Optimizer to choose the best possible path

Thanks and Regards

Re: Options for querying on the skewed data [message #542778 is a reply to message #542739] Fri, 10 February 2012 03:10 Go to previous messageGo to next message
John Watson
Messages: 8577
Registered: January 2010
Location: Global Village
Senior Member
I'll have a tentative try at this.
Your index first. You say that the most selective value would retrieve 2% of the rows. It is quite possible that even for 2%, a scan is more efficient. This would depend on many criteria (row size, block size, key size, clustering factor,...) but I think it likely that unless you do something else an index is not going to help in the majority of queries. The "something else" might be to sort the data on the key; convert to an IOT (possibly with an overflow segment); add extra columns to the keys. None of these need an extra licence. And you might want to consider a bitmap index instead, much easier.
Now partitioning. Presumably you mean list partitioning? Given the distribution it is hard to see how range or hash would help. This would seem to be a good option, partition pruning would surely help. Consider the queries and think about whether you can sub-partition; how to partition your indexes; the impact on manageability.
You'll have to test and test and test, to prove that the any change does give predictable, measurable, improvement.
Good luck.
Re: Options for querying on the skewed data [message #542819 is a reply to message #542778] Fri, 10 February 2012 04:47 Go to previous messageGo to next message
Messages: 134
Registered: November 2011
Location: Canada
Senior Member
Hello John

Thanks for the nice reply

I understand clustering factor is good when data is entered in some ordered manner. e.g. Sequence number, sysdate etc.
In my case with status='A','B'..etc.the data order is not maintained
I think this will retard index access. Right?

Sorry that I did not understand what do you mean by Quote:
"to sort the data on the key"?

Could you please explain it little?

Regarding IOT, I have never used it. Of course sometime the first must happen

Bitmap index was my first choice but in the past we had faced locking issues and I really have apprehensions using it

As you suggested I will test the options but wanted to know if they are worth trying out?

Thanks and Regards

Re: Options for querying on the skewed data [message #542932 is a reply to message #542819] Fri, 10 February 2012 19:42 Go to previous message
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
If you are accessing data on status alone then partitioning is definitely the way to go. Even if it is faster to scan 2% of the table via an index rather than a full table scan, you are going to pollute the buffer cache and get rid of cached blocks that have a much better likelihood of being reused. ie. You *might* make your query a *little* faster, but you'll slow down the rest of the system.

Bitmap indexes are not magically faster than b-tree indexes for high-cardinality keys. You need to *combine* many bitmap index searches in a single query to get any benefit. And then there are the locking issues.

What are your objections to two list partitions: one for the skewed value, the other for DEFAULT. You add a PARTITION clause to your create-table and GLOBAL clause to your create-index scripts, populate the table, and you're done. No further maintenance or overheads required.

Ross Leishman
Previous Topic: Query Rewrite
Next Topic: Handling NULL values in the Database
Goto Forum:

Current Time: Fri Jul 23 16:31:04 CDT 2021