Home » RDBMS Server » Performance Tuning » Options for optimising SQL running against Wide tables (11g)
Options for optimising SQL running against Wide tables [message #570642] Tue, 13 November 2012 07:33 Go to next message
jasontilbury
Messages: 4
Registered: November 2012
Junior Member
Hello,
Apologies in advance if my Oracle speak is not what your used to I'm leading performance testing on the system here. The scale of the tests that generate the following scenario is not huge right now, only 50 users simulated (or you can think of them as independently running threads if you like). But here is the crunch, the queries generated (from generic transaction layer) are all running against a table that has 600 columns! We can't really control this right now, but this is causing masses amounts of IO (5GB per request) making requests queue for disk availability (which are setup RAID 0/1); its even noticable for as few as 3 threads. I have rendered the SQL on one occasion to execute in 13 seconds for a single user but this appears short lived as when stats were freshly gathered it went up to the normal 90-120 seconds. I've added the original query to the file, however the findings here along with our DBA (who I trust implicitly) suggest that no amount of editing the query will improve the response times, increasing the PGA/SGA (currently 4/6GB respsectively) will only delay the queuing for a bit and compression can help either. In short it looks as though we've hit hardware restrictions already for this particular scenario.

As I can't really explain how my rendered query no longer takes 13 seconds, it's niggling me that we might be missing a trick.So I was hoping for some guidance on possible ways of optimising these type of queries against such wide tables, in other words possibilities that we haven't considered...

Attached is the query and plan. let me know if anything else is needed.
  • Attachment: sqltrace.sql
    (Size: 3.02KB, Downloaded 81 times)
Re: Options for optimising SQL running against Wide tables [message #570654 is a reply to message #570642] Tue, 13 November 2012 12:38 Go to previous messageGo to next message
John Watson
Messages: 4710
Registered: January 2010
Location: Global Village
Senior Member
OK, I'll have a go. Start with a basic reality check. Add this hint,
/*+ gather_plan_statistics */ 
after the SELECT key word, run the statement, and then get the execution plan and statistics with
select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

and copy/paste the result up here.
But before you do that, please read our OraFAQ Forum Guide and How to use [code] tags and make your code easier to read I really mean it about the [code] tags.
Re: Options for optimising SQL running against Wide tables [message #570661 is a reply to message #570642] Tue, 13 November 2012 20:31 Go to previous messageGo to next message
Kevin Meade
Messages: 1950
Registered: December 1999
Location: Connecticut USA
Senior Member
Well if John is having a go at it, then I want to try too. In addition to what John has asked for, can you also provide some additional information:

1) what is PERCENT. This is not an Oracle provided function. Is this your own function? If so, please provide the code for it.
2) same for DIV.

3) you said the table has some 600 columns, but your query uses only 8. Did I miss something here? Is this the only query or is there a large class of queries like this just with different columns thus eventually providing a query that covers every one of the 600 columns. If so please provide some other examples, and counts of the number of variations. For example, something like "there are 50 different queries that look like this sampling of 3" (query samples ...).

4) what database are you running? Do you have access to an Oracle EXADATA instance? If so, have you tried HYBRID COLUMNAR COMPRESSION in order to group columns together and thus avoid all that I/O. Seems like EXADATA HCC was created just for this problem. Based on your query it would eliminate 99% of the I/O. Of course if you don't have EXADATA then is your boss willing to spend the vast sums to get it?

5) how many rows are in the table, and how many rows do you expect to be returned by a typical query?
6) what is the partitioning used?

Thanks, Kevin
Re: Options for optimising SQL running against Wide tables [message #570688 is a reply to message #570661] Wed, 14 November 2012 04:27 Go to previous messageGo to next message
jasontilbury
Messages: 4
Registered: November 2012
Junior Member
Thanks for the replies.
Actually I was confused about the 8 columns vs whole row too. I guess our DBA is making an estimate. He's found that ~ 32000 rows are returned and used the row size of ~13000 to calculate the I/O roughly 5Gb, but I must admit I didnt know why he was using the whole row. Each column is typically a NUMBER so even with 60 columns chosen were only looking at maybe row size of ~480Bytes (based on 64bit?). Anyway I didn't question his methods and assumed he was using a whole row as the query was returning it?

We have partitioning on the tables based on 24 hours so the datetime between gets data from partitions 3-6.

The generic code that generates the SQL could result in all columns being picked but this is highly unlikely, most will probably hit less than 30 columns. I'm not really sure how these columns would get distributed amongst the queries though and in test we have problems of restricted data sets due to capacity of machines, I'd need to do some digging to get a feel as its not obvious; (the columns for the queries (along with how they are chosen) are tied up in xml).

I'm not sure I can provide the function definitions due to IP considerations (will ask in a bit), but I strongly suspect that is a dead end as queries with standard oracle functions such as SUM possess the same execution plans.

One thing I can confirm is that all the queries will have similar number of sub-queries with the same group by clauses and use of With, in the With statement, a number of filter values can be given and when we do the queries block one another really badly.

With respect to the tables they have 5 days worth of data, each row representing a 15minute sample from systems (we deal with performance monitoring in telecomms)making approximately 7.6million rows. In production I believe some customers keep data in this raw form for months and up to 1 year.

I've attached a file that should show the variation in SQL that we have in our test environment. The bind variables are replaced by the same dates in the previous sql.

Thanks guys.
Re: Options for optimising SQL running against Wide tables [message #570689 is a reply to message #570688] Wed, 14 November 2012 04:32 Go to previous messageGo to next message
John Watson
Messages: 4710
Registered: January 2010
Location: Global Village
Senior Member
Your code is unreadable.
Really, if you won't follow simple requests for information properly formatted, I cannot assist. Perhaps someone else will.
Re: Options for optimising SQL running against Wide tables [message #570690 is a reply to message #570689] Wed, 14 November 2012 04:42 Go to previous messageGo to next message
jasontilbury
Messages: 4
Registered: November 2012
Junior Member
Something like SQL Developer will reformat it automatically (ctrl-F7) but its done in the new attachment.

I dont think anyone will be able to suggest an improvement without major architecture changes anyway, but I'm really only hoping for such a confirmation anyway. The EXADATA sounds interesting not something we could test in-house but certainly a recommendation that could be made.
Re: Options for optimising SQL running against Wide tables [message #570696 is a reply to message #570642] Wed, 14 November 2012 07:21 Go to previous messageGo to next message
Kevin Meade
Messages: 1950
Registered: December 1999
Location: Connecticut USA
Senior Member
Thanks for that info.

Still waiting on what John asked for.

Given the rowcounts the function may not be your issue. Still, calling a function from SQL is expensive. You are calling three. You should inline the logic of the functions in the query to see what performance change occurs. Then you can know how to talk about that specific issue rather than guess about it.

The logic you have presented so far seems good to me. Your issue appears to be that you are reading entire rows when in fact you only want a small portion of each row, <5% of the row contributes to your result set of your typical queries most of the time by your description and I would wager <2% of the row most of the time. Thus as you have suggested, the most likey solution is to find a way not to read the entire row but instead only those columns you want. This of course requires some kind of re-architected storage solution. With that in mind there are a few alternative designs I can think of.

1) indexes. this won't work for you based on your description of the problem but it highlights the issue. Suppose for example you created an index on the table and this index contained all the columns your query needed. In this case you would never visit the table and thus never do the massive I/O. But since your queries are different, you would need an unknonwn number of indexes and that won't work.

2) vertical partitioning of the table. It is too bad that Oracle does not support native VERTICAL PARTITIONING with an appropriate storage clause, but there are ways to do this. Every one of them has some kind of implication and requires lots of changes somewhere (hence the value of native support if it was available).

A) The first way is via multiple tables. You create a table with the primary key and one or more columns you split off the main table. Then you create a view that joins these tables together. In 10g the feature JOIN ELIMINATION was added. It is enhanced in 11g. With this feature, and with the necessary meta-data, Oracle knows not to visit pieces of data and or tables that it does not need to answer a query even though the query itself looks like it requires them. Here is a link to one such example. Pay attention to the defninition of primary keys and the view that joins across the primary keys and the query plan that clearly does not visit both tables even though there is a join between them in the view used.

The success of this comes from being able to define the right set of sub-tables. If for example your queries came in two basic patterns, one set that used the first 300 columns and the other set of queries that used the second 300 columns then splitting your large table into two smaller tables each 300 columns wide would reduce the I/O in half since you always be visiting only half the table for each query. Now consider extension of the idea to 10 sub-table with 60 columns each or 100 sub-tables with 6 columns each. You figure out the optimum splits. Your goal should be an overall reduction in I/O of 90% or more.

B) a second way is via NESTED TABLES. You create a nested table type which again contains a subset of columns. Then your original table uses these NESTED TABLE definitions instead of the original columns. Each nested table is stored out of line. Thus if your query does not reference a column in some specific nested table, that nested table will not be visited. I think this is the 9i variation of join elimination before join elimination was available as an optimization feature. Otherwise, not sure why you would do this over A. You sill need to deal with the right set of column sub-sets.

C) CLUSTERING? I read a comment somewhere that suggested HASH CLUSTERS could do this and there was a link in the article, but the link was bad (figures) and I cannot concieve myself of how a hash cluster could solve this problem. I think the author was confusing horizontal partitioning with vertical partitioning. But I mention it in case someone else is smarter than I.

3) columns data storage. In this case we are talking EXADATA and its HYBRID COLUMNAR COMPRESSION. By storing data in a column orientation, EXADATA can visit only those "storage units" that that contain the columns your query wants and thus avoid all the I/O for columns you don't want. Add to this the massive compression potentials of HCC and you get a reduction of 99% of I/O for situations like yours.

So, until your file downloads, that is about it for me. Kevin
Re: Options for optimising SQL running against Wide tables [message #570725 is a reply to message #570696] Wed, 14 November 2012 19:48 Go to previous messageGo to next message
rleishman
Messages: 3700
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Your plan
----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |              |   167 |  5344 |   334K  (1)| 01:17:58 |       |       |
|   1 |  HASH GROUP BY                        |              |   167 |  5344 |   334K  (1)| 01:17:58 |       |       |
|   2 |   PARTITION RANGE ITERATOR            |              |   235 |  7520 |   334K  (1)| 01:17:58 |     3 |     6 |
|   3 |    VIEW                               |              |   235 |  7520 |   334K  (1)| 01:17:58 |       |       |
|   4 |     HASH GROUP BY                     |              |   235 | 35250 |   334K  (1)| 01:17:58 |       |       |
|   5 |      TABLE ACCESS BY LOCAL INDEX ROWID| RNC_UCELL    |   330K|    47M|   334K  (1)| 01:17:57 |     3 |     6 |
|*  6 |       INDEX RANGE SCAN                | RNC_UCELL_PK |   165K|       |  2875   (1)| 00:00:41 |     3 |     6 |
----------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   6 - access("RNC"='ErU_RNC_2' AND "DATETIME">=TO_DATE(' 2012-07-30 07:39:00', 'syyyy-mm-dd hh24:mi:ss') AND 
              "DATETIME"<=TO_DATE(' 2012-08-02 17:44:59', 'syyyy-mm-dd hh24:mi:ss'))
       filter("DATETIME"<=TO_DATE(' 2012-08-02 17:44:59', 'syyyy-mm-dd hh24:mi:ss') AND "DATETIME">=TO_DATE(' 
              2012-07-30 07:39:00', 'syyyy-mm-dd hh24:mi:ss'))


Notice that the DATETIME filter appears in both the Access Predicates and the Filter Predicates.

This leads me to believe that the index RNC_UCELL_PK is on columns (RNC, ????, DATETIME)

In which case, the index is scanning ONLY on RNC, returning ALL matching rows, filtering on DATETIME, and then looking up the table. ie. It reads EVERY row with matching RNC regardless of DATETIME (notwithstanding the partition iterator). This insidious behaviour is detectable only by looking at the Filter Predicates. The row counts / estaimtes for the INDEX RANGE SCAN step are only those returned AFTER both scanning and filtering. There is no way of knowing how many rows were scanned before filtering occurred.

If the index included DATETIME as the second column (RNC, DATETIME [, ????] ) then you would see ONLY Access Predicates, no filter predicates.

You may argue that Oracle is showing the DATETIME in the Access Predicates, therefore it is scanning on that column. It's not. It is misleading behaviour in the optimizer that seems to have been there since Access/Filter Predicates were introduced to the plan table (v10 I think) and it is still present in v11.

With daily partitioning, you are probably not reading very many redundant rows (just those before the start time on the first day and those after the end-time on the last day), but it's worth a try.

Ross Leishman
Re: Options for optimising SQL running against Wide tables [message #570990 is a reply to message #570725] Mon, 19 November 2012 11:47 Go to previous messageGo to next message
jasontilbury
Messages: 4
Registered: November 2012
Junior Member
Thanks for all the replies on this, it looks like its a bit of a dead end due to capacity of the test system.

We tried a couple of the suggestions (including Ross') just to be sure of what we were looking at here.

Bottom line is all tests show that the system is I/O limited (Disk), as a result of the width of the table combined with the number of rows. The index lookup (both our original and Ross') is low cost compared to this and makes little if any difference (creating the index was actually quite difficult too, due to the size of the table).

Therefore it appears short of re-designing (which is I'm sure the most sensible approach a long the lines of mentioned earlier), there is not a great deal to be done.

Thanks again for your ideas, but I think it time I shelf this one until such time we can take on board some of your suggestions.

Re: Options for optimising SQL running against Wide tables [message #570993 is a reply to message #570990] Mon, 19 November 2012 15:16 Go to previous message
rleishman
Messages: 3700
Registered: October 2005
Location: Melbourne, Australia
Senior Member
If you wanted to keep working on it, I would suggest parallelizing the query. Modern disk systems are designed to process many requests at once. As a result, individual queries appear to be IO bound but in fact there is plenty of headroom left on the disk.

If you tend NOT to have dozens of these queries running at once, then you might reasonably expect a 10x improvement from parallelizing a single query.

Since you are using an index range scan, effectively parallelizing the index scan is not immediately possible. Oracle can parallelize full table scans, fast full index scans, and index range scans of partitioned tables. Although your table is partitioned, it is not partitioned in a way that helps us because there are not enough partitions with data we want to read. For this reason, you would need to Hash Sub-Partition the table. Choose a column with an even spread of values - such as a unique primary key - and then sub-partition on that column. Keep the index locally partitioned. For parallelizing a range scan, I recommend 32 or 64 hash sub-partitions. 32 will probably be enough to max-out your disk.

Ross Leishman
Previous Topic: dba_hist_sqlstat
Next Topic: Oracle memory parameters
Goto Forum:
  


Current Time: Fri Oct 31 12:18:46 CDT 2014

Total time taken to generate the page: 0.19352 seconds