Rittman Mead Consulting
Reminder - UKOUG BIRT SIG meeting
… is on Wednesday October 8th. Mark will be speaking and I will be there with the Rittman Mead stand, so if you are at the meeting just come up and say “hello”.
This is the last BIRT SIG meeting of the year, but BI is, of course, going to be big part a big part of the annual UKOUG conference in December; this year Rittman Mead have five sessions on the agenda and a presence in the exhibition hall.
At the same time as the UKOUG conference the BIWA Summit is being held over at Redwood Shores. There will be a Rittman Mead speaker at this event too- me! I will be talking about Oracle 11g OLAP as a summary management technique for data warehouses, and comparing it with some of the GROUP BY extensions that Mark blogged about last month.
Digging into the Oracle BI Server Query Log File
As a follow-up to the posting I did last week on Oracle BI Enterprise Edition performance tuning, I thought I’d take a closer look a the query log file generated by the Oracle BI Server, the “virtual database engine” that ships with Oracle BI EE. For some time now I’ve been looking to see if there is any diagnostic data produced by the BI Server so that, for example, I can see the query plans put together when a query comes in, particularly in the situation where, for example, the BI Server joins together (or “stitches”, in Oracle BI Server terminology) data from two different database queries into one logical dataset. So, let’s take a simple example, where we have a fact table containing 14m rows of sales data and a product table containing 72 rows of product data, with the data all being stored in a single Oracle database and a single physical model describing the source tables and their relationships.
Running a simple query like this, against the source data:

Would generate the following log entries, at
$ORACLEBI_HOME/server/Log/NQQuery.log, with query logging set to 3:
##############################################
-------------------- SQL Request:
SET VARIABLE QUERY_SRC_CD='Report';SELECT PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC saw_0, SALES_BIG_LOCAL.QUANTITY_SOLD saw_1 FROM "Log & Federated Query Test" ORDER BY saw_0
+++Administrator:2c0000:2c0003:----2008/10/02 13:29:24
-------------------- General Query Info:
Repository: Star, Subject Area: Log & Federated Query Test, Presentation: Log & Federated Query Test
+++Administrator:2c0000:2c0003:----2008/10/02 13:29:24
-------------------- Logical Request (before navigation):
RqList
PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC as c1 GB,
QUANTITY_SOLD:[DAggr(SALES_BIG_LOCAL.QUANTITY_SOLD by [ PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC] )] as c2 GB
OrderBy: c1 asc
+++Administrator:2c0000:2c0003:----2008/10/02 13:29:24
-------------------- Sending query to database named ora11g (id: <<11172>>):
select T6303.PROD_SUBCATEGORY_DESC as c1,
sum(T6826.QUANTITY_SOLD) as c2
from
SH_COPY.PRODUCTS T6303,
SH_COPY.SALES_BIG T6826
where ( T6303.PROD_ID = T6826.PROD_ID )
group by T6303.PROD_SUBCATEGORY_DESC
order by c1
+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39
-------------------- Query Status: Successful Completion
+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39
-------------------- Rows 21, bytes 84504 retrieved from database query id: <<11172>>
+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39
-------------------- Physical query response time 194 (seconds), id <<11172>>
+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39
-------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 194, DB-connect time 0 (seconds)
+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39
-------------------- Rows returned to Client 21
+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39
-------------------- Logical Query Summary Stats: Elapsed time 195, Response time 194, Compilation time 0 (seconds)
So let’s step through this log entry and look at each part in detail.
The first section sets out the general details of the query, and contains details of the QUERY_SRC_CD variable containing your query, the name of the Oracle BI Server repository and subject area used for the query.
############################################## -------------------- SQL Request: SET VARIABLE QUERY_SRC_CD='Report';SELECT PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC saw_0, SALES_BIG_LOCAL.QUANTITY_SOLD saw_1 FROM "Log & Federated Query Test" ORDER BY saw_0 +++Administrator:2c0000:2c0003:----2008/10/02 13:29:24 -------------------- General Query Info: Repository: Star, Subject Area: Log & Federated Query Test, Presentation: Log & Federated Query Test +++Administrator:2c0000:2c0003:----2008/10/02 13:29:24
Notice that each entry is the log is also timestamped with the username of the BI Server user that was responsible for the query, and the timestamp for the log entry.
Next is the logical, “Request List” that sets out the columns that form the criteria for the report, and the grouping and aggregation applied to the data.
-------------------- Logical Request (before navigation):
RqList
PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC as c1 GB,
QUANTITY_SOLD:[DAggr(SALES_BIG_LOCAL.QUANTITY_SOLD by [ PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC] )] as c2 GB
OrderBy: c1 asc
+++Administrator:2c0000:2c0003:----2008/10/02 13:29:24
If my report criteria included a filter, the filter columns would appear in the request list as well, like this:
-------------------- Logical Request (before navigation):
RqList
PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC as c1 GB,
QUANTITY_SOLD:[DAggr(SALES_BIG_LOCAL.QUANTITY_SOLD by [ PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC] )] as c2 GB
DetailFilter: PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC in ([ 'Camcorders', 'Camera Batteries', 'Camera Media'] )
OrderBy: c1 asc
+++Administrator:2c0000:2c0005:----2008/10/02 13:42:09
So at this point we’re dealing with the logical view of the data, which relates back to the Presentation Layer of the Oracle BI Server metadata layer. The next part of the log file shows the physical SQL that was generated to correspond with the logical request list of columns specified by the user. In this case, both logicals tables have logical table sources that come from the same physical Oracle database, and so a single physical SQL query is generated.
-------------------- Sending query to database named ora11g (id: <<11172>>):
select T6303.PROD_SUBCATEGORY_DESC as c1,
sum(T6826.QUANTITY_SOLD) as c2
from
SH_COPY.PRODUCTS T6303,
SH_COPY.SALES_BIG T6826
where ( T6303.PROD_ID = T6826.PROD_ID )
group by T6303.PROD_SUBCATEGORY_DESC
order by c1
+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39
Notice the fact that Oracle BI Server automatically aggregates your data (this is behavoir specified in the default aggregation settings in the BI Server metadata layer, you can specify non-grouping, i.e. detail display of data instead), and applies a GROUP BY on the dimension data.
The next part of the log file details whether the query completed successfully, how long it took to return data, how many rows were returned and the response time of the underlying database.
-------------------- Query Status: Successful Completion +++Administrator:2c0000:2c0003:----2008/10/02 13:32:39 -------------------- Rows 21, bytes 84504 retrieved from database query id: <<11172>> +++Administrator:2c0000:2c0003:----2008/10/02 13:32:39 -------------------- Physical query response time 194 (seconds), id <<11172>> +++Administrator:2c0000:2c0003:----2008/10/02 13:32:39 -------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 194, DB-connect time 0 (seconds) +++Administrator:2c0000:2c0003:----2008/10/02 13:32:39 -------------------- Rows returned to Client 21 +++Administrator:2c0000:2c0003:----2008/10/02 13:32:39 -------------------- Logical Query Summary Stats: Elapsed time 195, Response time 194, Compilation time 0 (seconds)
So what we’ve got here is a fairly straightforward record of the logical query requested by the user (the request list of columns, the criteria for your request), the physical SQL generated to return the required list of columns, any filters, aggregation or grouping applied to the data, and statistics on how long the query took to run. Putting my Oracle Database hat on, we haven’t got anything equivalent to the “wait events” you get when tuning the Oracle database, where it tells you where the time has been spent when executing a query (waiting for disk I/O, waiting for network I/O, etc) but then again, in this case it’s the database that’s doing the vast majority of the work and there’s not really much overhead added by the BI Server here. The database took 194 seconds to return data, and the BI server only added a second to this, giving a total response time of 195.
There are actually several levels of logging that you can turn on for users and queries, with levels going from 0 (no logging) to 5 (lots and lots of logging), with the general advice being to only really turn this on when you’re debugging a query, as the logs are written to disk and can slow down a real-life system if left on. In a bit more detail, the logging levels currently available are as follows:
Level 0 = no logging, the default setting for a user or query
Level 1 = logs the logical SQL (the request list), the elapsed time, whether the cache was used, query status, user name and so on.
Level 2 = Everything in level 1, plus repository name, subject area name, physical SQL generated, number of rows returned and so on
Level 3 = Everything in level 2 plus some additional data on cache usage. This is the level I usually set, though the docs suggest you should just use level 2
Level 4 = Everything in level 3 plus the physical SQL execution plan
Level 5 = Everything in level 4 plus intermediate row counts at various stages, particularly useful when the BI Server is stitching data sets together.
So how does the above query look when run at level 5 logging? Let’s take a look, this time with the filter added in that I showed a minute ago. I’ve bolded the areas that are new in this level of logging:
-------------------- SQL Request:
SET VARIABLE QUERY_SRC_CD='Report',SAW_SRC_PATH='/users/administrator/Simple Log Test Query';SELECT PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC saw_0, SALES_BIG_LOCAL.QUANTITY_SOLD saw_1 FROM "Log & Federated Query Test" WHERE PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC IN ('Camcorders', 'Camera Batteries', 'Camera Media') ORDER BY saw_0
+++Administrator:2b0000:2b0001:----2008/10/02 14:04:10
-------------------- General Query Info:
Repository: Star, Subject Area: Log & Federated Query Test, Presentation: Log & Federated Query Test
+++Administrator:2b0000:2b0001:----2008/10/02 14:04:10
-------------------- Logical Request (before navigation):
RqList
PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC as c1 GB,
QUANTITY_SOLD:[DAggr(SALES_BIG_LOCAL.QUANTITY_SOLD by [ PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC] )] as c2 GB
DetailFilter: PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC in ([ 'Camcorders', 'Camera Batteries', 'Camera Media'] )
OrderBy: c1 asc
+++Administrator:2b0000:2b0001:----2008/10/02 14:04:10
-------------------- Execution plan:
RqList <<9790>> [for database 3023:2350:ora11g,44]
PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:2350,44],
sum(SALES_BIG.QUANTITY_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c2 GB [for database 3023:2350,44]
Child Nodes (RqJoinSpec): <<9851>> [for database 3023:2350:ora11g,44]
PRODUCTS T6303
SALES_BIG T6826
DetailFilter: PRODUCTS.PROD_ID = SALES_BIG.PROD_ID and (PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camcorders' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Batteries' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Media') [for database 0:0]
GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC] [for database 3023:2350,44]
OrderBy: c1 asc [for database 3023:2350,44]
+++Administrator:2b0000:2b0001:----2008/10/02 14:04:10
-------------------- Sending query to database named ora11g (id: <<9790>>):
select T6303.PROD_SUBCATEGORY_DESC as c1,
sum(T6826.QUANTITY_SOLD) as c2
from
SH_COPY.PRODUCTS T6303,
SH_COPY.SALES_BIG T6826
where ( T6303.PROD_ID = T6826.PROD_ID and (T6303.PROD_SUBCATEGORY_DESC in ('Camcorders', 'Camera Batteries', 'Camera Media')) )
group by T6303.PROD_SUBCATEGORY_DESC
order by c1
+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25
-------------------- Execution Node: <<9790>>, Close Row Count = 3, Row Width = 4024 bytes
+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25
-------------------- Execution Node: <<9790>> DbGateway Exchange, Close Row Count = 3, Row Width = 4024 bytes
+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25
-------------------- Execution Node: <<9790>> DbGateway Exchange, Close Row Count = 3, Row Width = 4024 bytes
+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25
-------------------- Query Status: Successful Completion
+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25
-------------------- Rows 3, bytes 12072 retrieved from database query id: <<9790>>
+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25
-------------------- Physical query response time 195 (seconds), id <<9790>>
+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25
-------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 195, DB-connect time 0 (seconds)
+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25
-------------------- Rows returned to Client 3
+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25
-------------------- Logical Query Summary Stats: Elapsed time 195, Response time 195, Compilation time 0 (seconds)
So the execution plan that the log file contains is the logical execution plan, not the Oracle execution plan corresponding to the physical SQL that is generated. Let’s take a look at this logical execution plan in more details.
-------------------- Execution plan:
RqList <<9790>> [for database 3023:2350:ora11g,44]
PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:2350,44],
sum(SALES_BIG.QUANTITY_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c2 GB [for database 3023:2350,44]
Child Nodes (RqJoinSpec): <<9851>> [for database 3023:2350:ora11g,44]
PRODUCTS T6303
SALES_BIG T6826
DetailFilter: PRODUCTS.PROD_ID = SALES_BIG.PROD_ID and (PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camcorders' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Batteries' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Media') [for database 0:0]
GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC] [for database 3023:2350,44]
OrderBy: c1 asc [for database 3023:2350,44]
So what this contains is the request list as before, and then a list of “child nodes” with each node corresponding to a step in the execution plan (I think). Then we’ve got the filter applied to the nodes, the group by and the order by, and the query again takes 195 seconds, with the BI Server adding virtually nothing to the response time.
So far it all looks straightforward, but what happens if instead of both logical table sources pointing to the same one physical database, we source the dimension table from a second physical database, so that the BI Server generates a federated query and stitches the result set together in the BI Server’s memory. How would this extended query log look then? Let’s take a look.
-------------------- SQL Request:
SET VARIABLE QUERY_SRC_CD='Report';SELECT SALES_BIG_LOCAL.QUANTITY_SOLD saw_0, PRODUCTS_REMOTE.PROD_SUBCATEGORY_DESC saw_1 FROM "Log & Federated Query Test" WHERE PRODUCTS_REMOTE.PROD_SUBCATEGORY_DESC IN ('Camcorders', 'Camera Batteries', 'Camera Media') ORDER BY saw_1
+++Administrator:2a0000:2a0002:----2008/10/02 15:22:38
-------------------- General Query Info:
Repository: Star, Subject Area: Log & Federated Query Test, Presentation: Log & Federated Query Test
+++Administrator:2a0000:2a0002:----2008/10/02 15:22:38
-------------------- Logical Request (before navigation):
RqList
QUANTITY_SOLD:[DAggr(SALES_BIG_LOCAL.QUANTITY_SOLD by [ PRODUCTS_REMOTE.PROD_SUBCATEGORY_DESC] )] as c1 GB,
PRODUCTS_REMOTE.PROD_SUBCATEGORY_DESC as c2 GB
DetailFilter: PRODUCTS_REMOTE.PROD_SUBCATEGORY_DESC in ([ 'Camcorders', 'Camera Batteries', 'Camera Media'] )
OrderBy: c2 asc
+++Administrator:2a0000:2a0002:----2008/10/02 15:22:38
-------------------- Execution plan:
RqBreakFilter <<9948>>[2] [for database 0:0,0]
RqList <<10056>> [for database 0:0,0]
sum(D1.c5 by [ D1.c2] at_distinct [ D1.c2, D1.c3] ) as c1 [for database 0:0,0],
D1.c2 as c2 [for database 3023:6422,44]
Child Nodes (RqJoinSpec): <<10068>> [for database 0:0,0]
(
RqList <<9865>> [for database 0:0,0]
D902.c1 as c2 GB [for database 3023:6422,44],
D901.c2 as c3 [for database 3023:2350,44],
D901.c3 as c5 [for database 3023:2350,44]
Child Nodes (RqJoinSpec): <<9950>> [for database 0:0,0]
(
RqList <<9964>> [for database 3023:2350:ora11g,44]
SALES_BIG.PROD_ID as c2 [for database 3023:2350,44],
sum(SALES_BIG.QUANTITY_SOLD by [ SALES_BIG.PROD_ID] ) as c3 [for database 3023:2350,44]
Child Nodes (RqJoinSpec): <<9967>> [for database 3023:2350:ora11g,44]
SALES_BIG T6826
GroupBy: [ SALES_BIG.PROD_ID] [for database 3023:2350,44]
OrderBy: c2 asc [for database 3023:2350,44]
) as D901
InnerJoin <<10001>> On D901.c2 = D902.c2; actual join vectors: [ 0 ] = [ 1 ]
(
RqList <<9981>> [for database 3023:6422:remote_ora11g,44]
PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:6422,44],
PRODUCTS.PROD_ID as c2 [for database 3023:6422,44]
Child Nodes (RqJoinSpec): <<9984>> [for database 3023:6422:remote_ora11g,44]
PRODUCTS T6864
DetailFilter: PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camcorders' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Batteries' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Media' [for database 0:0]
OrderBy: c2 asc [for database 3023:6422,44]
) as D902
OrderBy: c2, c3 [for database 0:0,0]
) as D1
OrderBy: c2 asc [for database 0:0,0]
+++Administrator:2a0000:2a0002:----2008/10/02 15:22:38
-------------------- Sending query to database named ora11g (id: <<9964>>):
select T6826.PROD_ID as c2,
sum(T6826.QUANTITY_SOLD) as c3
from
SH_COPY.SALES_BIG T6826
group by T6826.PROD_ID
order by c2
+++Administrator:2a0000:2a0002:----2008/10/02 15:22:38
-------------------- Sending query to database named remote_ora11g (id: <<9981>>):
select T6864.PROD_SUBCATEGORY_DESC as c1,
T6864.PROD_ID as c2
from
PRODUCTS T6864
where ( T6864.PROD_SUBCATEGORY_DESC in ('Camcorders', 'Camera Batteries', 'Camera Media') )
order by c2
+++Administrator:2a0000:2a0002:----2008/10/02 15:22:38
-------------------- Execution Node: <<9981>>, Close Row Count = 9, Row Width = 4012 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<9964>>, Close Row Count = 72, Row Width = 24 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Cancel initiated for Exchange Producer: <<9964>> DbGateway Exchange
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<9964>> DbGateway Exchange, Close Row Count = 63, Row Width = 24 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<9981>> DbGateway Exchange, Close Row Count = 9, Row Width = 4012 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<10001>> Binary Join, Close Row Count = 9, Row Width = 4032 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<9865>> Projection, Close Row Count = 9, Row Width = 4032 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<9865>> Sort, Close Row Count = 9, Row Width = 4032 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<10056>> Pre-aggr Projection, Close Row Count = 9, Row Width = 4032 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<10056>> Aggregation, Close Row Count = 9, Row Width = 4040 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<10056>> Post-aggr Projection, Close Row Count = 9, Row Width = 4016 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<10056>> Sort, Close Row Count = 9, Row Width = 4016 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<9948>> GroupByNoSort, Close Row Count = 3, Row Width = 4016 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<9964>> DbGateway Exchange, Close Row Count = 63, Row Width = 24 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Execution Node: <<9981>> DbGateway Exchange, Close Row Count = 9, Row Width = 4012 bytes
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Query Status: Successful Completion
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Rows 72, bytes 1728 retrieved from database query id: <<9964>>
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Physical query response time 172 (seconds), id <<9964>>
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Rows 9, bytes 36108 retrieved from database query id: <<9981>>
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Physical query response time 0 (seconds), id <<9981>>
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Physical Query Summary Stats: Number of physical queries 2, Cumulative time 172, DB-connect time 0 (seconds)
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Rows returned to Client 3
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31
-------------------- Logical Query Summary Stats: Elapsed time 173, Response time 172, Compilation time 0 (seconds)
So this looks a bit more interesting, as we can see that the BI Server is doing some of the work now, generating an SQL query for each physical data source and then stitching the data together. There’s also lot more nodes, which suggests that the BI Server is going through a lot more steps to return the data. Let’s a take a look at the more interesting parts of the log, starting with the logical execution plan, which now contains a number of child nodes; ones for each of the two database queries, further ones for the stitched-together data set and a node for the join between the two result sets .
-------------------- Execution plan:
RqBreakFilter <<9948>>[2] [for database 0:0,0]
RqList <<10056>> [for database 0:0,0]
sum(D1.c5 by [ D1.c2] at_distinct [ D1.c2, D1.c3] ) as c1 [for database 0:0,0],
D1.c2 as c2 [for database 3023:6422,44]
Child Nodes (RqJoinSpec): <<10068>> [for database 0:0,0]
(
RqList <<9865>> [for database 0:0,0]
D902.c1 as c2 GB [for database 3023:6422,44],
D901.c2 as c3 [for database 3023:2350,44],
D901.c3 as c5 [for database 3023:2350,44]
Child Nodes (RqJoinSpec): <<9950>> [for database 0:0,0]
(
RqList <<9964>> [for database 3023:2350:ora11g,44]
SALES_BIG.PROD_ID as c2 [for database 3023:2350,44],
sum(SALES_BIG.QUANTITY_SOLD by [ SALES_BIG.PROD_ID] ) as c3 [for database 3023:2350,44]
Child Nodes (RqJoinSpec): <<9967>> [for database 3023:2350:ora11g,44]
SALES_BIG T6826
GroupBy: [ SALES_BIG.PROD_ID] [for database 3023:2350,44]
OrderBy: c2 asc [for database 3023:2350,44]
) as D901
InnerJoin <<10001>> On D901.c2 = D902.c2; actual join vectors: [ 0 ] = [ 1 ]
(
RqList <<9981>> [for database 3023:6422:remote_ora11g,44]
PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:6422,44],
PRODUCTS.PROD_ID as c2 [for database 3023:6422,44]
Child Nodes (RqJoinSpec): <<9984>> [for database 3023:6422:remote_ora11g,44]
PRODUCTS T6864
DetailFilter: PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camcorders' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Batteries' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Media' [for database 0:0]
OrderBy: c2 asc [for database 3023:6422,44]
) as D902
OrderBy: c2, c3 [for database 0:0,0]
) as D1
OrderBy: c2 asc [for database 0:0,0]
Notice that each of the main nodes has IDs assigned to them (9981, 9964 and 9865) so that you can follow them through the rest of the log file.
Later on down the log file we can see the physical SQL generated for each node, and the row count resulting from each physical SQL execution. Then, after the data is retrieved from the various physical database connection, you can see how the BI Server then goes on to inner join the data (the binary join on node 10001), remove unneeded columns from the initial result set (projection), sort the data, again remove unneeded columns and then aggregate the data. You can also see what operations are carried out on the data before and after aggregation, where sorting is carried out and so on.
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31 -------------------- Execution Node: <<10001>> Binary Join, Close Row Count = 9, Row Width = 4032 bytes +++Administrator:2a0000:2a0002:----2008/10/02 15:25:31 -------------------- Execution Node: <<9865>> Projection, Close Row Count = 9, Row Width = 4032 bytes +++Administrator:2a0000:2a0002:----2008/10/02 15:25:31 -------------------- Execution Node: <<9865>> Sort, Close Row Count = 9, Row Width = 4032 bytes +++Administrator:2a0000:2a0002:----2008/10/02 15:25:31 -------------------- Execution Node: <<10056>> Pre-aggr Projection, Close Row Count = 9, Row Width = 4032 bytes +++Administrator:2a0000:2a0002:----2008/10/02 15:25:31 -------------------- Execution Node: <<10056>> Aggregation, Close Row Count = 9, Row Width = 4040 bytes +++Administrator:2a0000:2a0002:----2008/10/02 15:25:31 -------------------- Execution Node: <<10056>> Post-aggr Projection, Close Row Count = 9, Row Width = 4016 bytes +++Administrator:2a0000:2a0002:----2008/10/02 15:25:31 -------------------- Execution Node: <<10056>> Sort, Close Row Count = 9, Row Width = 4016 bytes +++Administrator:2a0000:2a0002:----2008/10/02 15:25:31 -------------------- Execution Node: <<9948>> GroupByNoSort, Close Row Count = 3, Row Width = 4016 bytes
From looking this log file, you can pretty much determine the steps the BI Server is going through to add aggregations, sorting, result set stitching and so on to your data, if you’re interested in seeing how data from multiple results sets is brought together, I don’t know if this is publicly documented anywhere and whether other join types are possible (I’ve heard of nested loop joins also being performed), but this certainly looks interesting if you’re into Oracle BI Server internals. Something I’m certainly looking to do is to document these steps properly and come up with the complete set of steps that can be carried out, if anyone’s interested in helping then let me know as I said, I don’t think this is publicly documented.
So looking at the log file output, this query has actually taken just 173 seconds rather than the 195 required for the single-source query, the BI Server only adding a second to this. I wouldn’t place too much emphasis on the actual times taken for each variation on the query as I’m only running them once and on a single laptop, but you can see from the timings that Oracle BI Server doesn’t seem to add much overhead, even when you’re using the federated query feature (at least on my dataset, which isn’t all that big).
Now one of the things you can do in the BI Server metadata layer, is select the “driving table” for these types of cross-database joins. This is particularly useful if one of your tables is small (the suggested limit is 1000 rows, in my case it’s 70, and it tells the BI Server to firstly query the driving table to return a small amount of interesting rows, then it applies these these rows to the other table in batches so that only a much smaller set of the big table rows are then loaded into the BI Server memory.

The rationale here is that a normal federated query join would require both tables to be loaded into the BI Server memory, and if one of them is big, then this will require a full table scan of the physical source table and potentially take up a lot of memory in the BI Server. Using this driving table technique you can fairly quickly come up with a much smaller set of initial rows and row ids, this can then be used to filter the second table and therefore reduce the I/O usage and memory usage required by the federated query join. There are a couple of parameters you can set in the NQSCONFIG.INI file to determine the size of the batches that are applied to the larger table, and you can switch this feature off completely, but how does this look in the query log file, and does it make my query run faster? Let’s take a look.
The first difference in this “driving table” query log entry is the logical execution plan, than now retrieves data from the second, non-driving table via a set of parameter values.
RqBreakFilter <<9961>>[2] [for database 0:0,0]
RqList <<10027>> [for database 0:0,0]
sum(D1.c5 by [ D1.c2] at_distinct [ D1.c2, D1.c3] ) as c1 [for database 0:0,0],
D1.c2 as c2 [for database 3023:6422,44]
Child Nodes (RqJoinSpec): <<10039>> [for database 0:0,0]
(
RqList <<9918>> [for database 0:0,0]
D901.c1 as c2 GB [for database 3023:6422,44],
D902.c2 as c3 [for database 3023:2350,44],
D902.c3 as c5 [for database 3023:2350,44]
Child Nodes (RqJoinSpec): <<9963>> [for database 0:0,0]
(
RqList <<9966>> [for database 3023:6422:remote_ora11g,44]
PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:6422,44],
PRODUCTS.PROD_ID as c2 [for database 3023:6422,44]
Child Nodes (RqJoinSpec): <<9969>> [for database 3023:6422:remote_ora11g,44]
PRODUCTS T6864
OrderBy: c2 asc [for database 3023:6422,44]
) as D901
InnerJoin (left drive) <<10000>> On D902.c2 = D901.c2; actual join vectors: [ 1 ] = [ 0 ]
(
RqList <<9984>> [for database 3023:2350:ora11g,44]
SALES_BIG.PROD_ID as c2 [for database 3023:2350,44],
sum(SALES_BIG.QUANTITY_SOLD by [ SALES_BIG.PROD_ID] ) as c3 [for database 3023:2350,44]
Child Nodes (RqJoinSpec): <<9987>> [for database 3023:2350:ora11g,44]
SALES_BIG T6826
DetailFilter: SALES_BIG.PROD_ID = ?1 or SALES_BIG.PROD_ID = ?2 or SALES_BIG.PROD_ID = ?3 or SALES_BIG.PROD_ID = ?4 or SALES_BIG.PROD_ID = ?5 or SALES_BIG.PROD_ID = ?6 or SALES_BIG.PROD_ID = ?7 or SALES_BIG.PROD_ID = ?8 or SALES_BIG.PROD_ID = ?9 or SALES_BIG.PROD_ID = ?10 or SALES_BIG.PROD_ID = ?11 or SALES_BIG.PROD_ID = ?12 or SALES_BIG.PROD_ID = ?13 or SALES_BIG.PROD_ID = ?14 or SALES_BIG.PROD_ID = ?15 or SALES_BIG.PROD_ID = ?16 or SALES_BIG.PROD_ID = ?17 or SALES_BIG.PROD_ID = ?18 or SALES_BIG.PROD_ID = ?19 or SALES_BIG.PROD_ID = ?20 [for database 0:0]
GroupBy: [ SALES_BIG.PROD_ID] [for database 3023:2350,44]
OrderBy: c2 asc [for database 3023:2350,44]
) as D902
OrderBy: c2, c3 [for database 0:0,0]
) as D1
OrderBy: c2 asc [for database 0:0,0]
+++Administrator:2a0000:2a0004:----2008/10/02 15:39:08
You can specify the maximum amount of parameters in the table filter through the MAX_PARAMETERS_PER_DRIVE_JOIN parameter in the NQSCONFIG.INI file, with the limit really being down to how many filter values the back-end database can handle. The MAX_QUERIES_PER_DRIVE_JOIN parameter thereafter limits the total number of separate calls to the non-driving table that can take place (a function of the total number of rows returned by the initial driving table query divided by the MAX_PARAMETERS_PER_DRIVE_JOIN parameter setting), to stop run-away queries occuring that generate hundreds of individual calls to the non-driving table.
The other interesting part is the actual physical SQL sent to the non-driving table, which looks like this:
-------------------- Sending query to database named ora11g (id: <<9984>>):
select T6826.PROD_ID as c2,
sum(T6826.QUANTITY_SOLD) as c3
from
SH_COPY.SALES_BIG T6826
where ( T6826.PROD_ID in (:PARAM1, :PARAM2, :PARAM3, :PARAM4, :PARAM5, :PARAM6, :PARAM7, :PARAM8, :PARAM9, :PARAM10, :PARAM11, :PARAM12, :PARAM13, :PARAM14, :PARAM15, :PARAM16, :PARAM17, :PARAM18, :PARAM19, :PARAM20) )
group by T6826.PROD_ID
order by c2
+++Administrator:2a0000:2a0004:----2008/10/02 15:39:08
So in this case, the driving table feature will initially run the query against the smaller (dimension) table and return the rows specified by the filter on the query. Then, in batches of twenty, individual selects will be made against the larger, non-driving table, until all the interested rows are returned. These will then be joined in memory in the BI Server and the results returned to the dashboard, in theory faster than if we’d tried to load both tables into the BI Server memory and join them both there.
So now, to take this one step further, what does the query log file look like if we run the Aggregate Persistence Wizard on this data set, generate and aggregate at the PROD_SUBCATEGORY_DESC, plug the aggregates into the logical model and then run a query? Where (if anywhere) does aggregate navigation future in the logical execution plan and query steps?
Well, after running the wizard and registering the aggregates, I run the query again and here’s the output:
-------------------- Execution plan:
RqBreakFilter <<10079>>[2] [for database 0:0,0]
RqList <<10037>> [for database 3023:2350:ora11g,44]
ag_SALES_BIG_LOCAL.QUANTITY_S0001562C as c1 GB [for database 3023:2350,44],
SA_Subcate0001593D.PROD_SUBCA0001561B as c2 GB [for database 3023:2350,44]
Child Nodes (RqJoinSpec): <<10067>> [for database 3023:2350:ora11g,44]
SA_Subcate0001593D T7688
ag_SALES_BIG_LOCAL T7695
DetailFilter: SA_Subcate0001593D.PROD_SUBCA0001561B = ag_SALES_BIG_LOCAL.PROD_SUBCA0001561B [for database 0:0]
OrderBy: c2 asc [for database 3023:2350,44]
+++Administrator:2a0000:2a0005:----2008/10/02 16:46:18
-------------------- Sending query to database named ora11g (id: <<10037>>):
select distinct T7695.QUANTITY_S0001562C as c1,
T7688.PROD_SUBCA0001561B as c2
from
SH_COPY.SA_Subcate0001593D T7688,
SH_COPY.ag_SALES_BIG_LOCAL T7695
where ( T7688.PROD_SUBCA0001561B = T7695.PROD_SUBCA0001561B )
order by c2
So, you can see the aggregate table being used, but there’s nothing in the log to say that aggregates were considered, the query is just generated against the logical table source that returns the smallest amount of rows.
Anyway, that’s a quick trip around the query log file, with a short diversion into driving tables and aggregates. Of course the $64,000 questions is “what do you do with the log file, in particular the knowledge of the various query execution steps”, and in reality if you’re getting to the point where federated queries are running slowly, your most obvious and productive means of sorting this out is to physically integrate your data into a single data mart or data warehouse rather than mucking around trying to decipher undocumented log files. Still, if you’re stuck with a federated query or just interested in how the internals work, it’s an interesting set of diagnostics you can generate, I’ll certainly be blogging more on this over time and if you’ve got anything to add, just let me know in the article comments.
Announcing : Oracle BI Training Days, London October 22nd - 24th 2008
Something that we’ve been working on in the background recently, and are now able to announce, are the first of our Oracle BI Training Days running in London from October 22nd to the 24th, 2008. If you’re a developer using Oracle’s latest BI tools and you’re looking to take a step up in terms of your technical knowledge, plus get the chance to meet our team and other advanced developers using Oracle’s BI tools, this is an opportunity you can’t afford to miss.
The event is running over three days with a different focus on each day. The way we are pricing it is that you can attend for just one day, or attend all three for the price of two. We’re organizing a meal (ticket price included) at the end of the first day, and we’re keeping the numbers attending restricted so that those that attend can get the maximum possible out of the session. The format of the three days will be a seminar/workshop style, with either myself or one of the team leading each session, doing demos, and providing you with the demo and sample files so that you can follow along if you wish. The expectation for each session is that you have some level of basic knowledge, we will take you beyond this and also introduce you to some potentially new topics such as the Oracle BI Applications and some of the new features in Essbase 11.1 and Oracle Database 11g.
Day 1 - Oracle Business Intelligence Enterprise Edition In-Depth
- A Future Oracle BI Architecture
- OBIEE overview
- OBIEE architecture
- The Oracle BI&DW Tech Stack
- Key features and new possibilities
- A proposed architecture and development approach
- Oracle BI EE Data Modeling Techniques
- The Common Enterprise Information Model
- Creating the Business Model and Mapping Layer
- Mapping 3NF data models into the Business Model
- Integrating Multiple data sources into the business model
- Fragmentation and Real-Time Integration
- Oracle BIEE Performance Optimization
- Clustering and other physical implementation optimizations
- Optimizing the Business Model
- Caching
- Aggregate Persistence Wizard
- OBIEE Integration with SOA and BPEL and Web applications
- The OBIEE Web Services Interface
- The EasyBI SOA Bridge
- Integrating OBIEE with SOA and BPEL
- Integrating OBIEE with web-based applications using GO URL
Day 2 - New Features in Oracle OBIEE 10gR3 and Oracle 11g Data Warehousing
- Essbase 9.3.1 and 11.1 Integration with OBIEE
- Overview of Essbase
- Building an Essbase Cube using EIS and Essbase Studio
- Integrating an Essbase Cube into OBIEE
- Loading Essbase Cubes from the Common Enterprise Information Model
- Integrating OBIEE dashboards into the Hyperion Workspace
- Automated Decision Making Using Oracle Real-Time Decisions
- The need for automated business decisions
- Introduction to Oracle Real-Time Decisions
- Steps to Create a Real-Time Decisions Project
- Integration of Real-Time Decisions with ApEx and ADF
- Migrating Discoverer EULs and Workbooks to OBIEE
- Aspects of OBIEE/ Discoverer Integration
- Using the Discoverer EUL Migration Utility
- Post-migration steps
- Migrating Discoverer Workbooks
- Integration of Discoverer with Oracle BI Publisher
- New DW Features in Oracle Database 11g
- Overview of Oracle Database 11g DW Features
- New Partitioning Features
- The Relationship between the DW and OBIEE
- Cube Organized Materialized Views
Day 3 - OBI Apps In-Depth
- Oracle BI Applications 7.9.5 Introduction and Architecture
- Overview of Oracle BI Applications
- Oracle BI Applications Architecture
- The Oracle Business Analytics Warehouse data model
- Exploring the BI Apps data model and pre-built dashboards
- Oracle Business Analytics Warehouse Administration using DAC
- Introduction to the DAC
- Basic DAC Administration
- Performing an initial EBS 11.5.10 data load using the DAC
- Introduction to Informatica PowerCenter 8.1.1
- Overview of Informatica PowerCenter 8.1.1
- Creating a simple mapping and running end-to-end
- Extending and Customizing the BI Apps Data Warehouse.
- Overview of the Oracle BI Apps customization methodology
- Understanding SIL and SDE mappings
- Customizing an Existing Type 1 mapping to customize a dimension
- Creating a new Type 2 mapping to add new facts and dimensions
The idea of the event is to take you beyond what’s available on other public courses, address the requirements of more advanced and experienced developers, and provide a “state of the nation” view of Oracle’s latest BI & DW offerings, including what’s been announced and showcased at Oracle Open World 2008. The cost of the event is £900 + VAT for one day, or £1800 + VAT for all three days, with a 10% early bird discount available until October 3rd. As I said, places are limited and so far we’ve received a lot of interest and bookings, so register now to avoid disappointment. If you want any more details, drop me an email or check out the web page, and hopefully we’ll see you at the event in October.
A Trip to Redwood Shores
Open World ended yesterday, and today I took a trip down the peninsula to Oracle’s headquarters at Redwood Shores. I actually got the train down rather than hire a car, as I thought it’d be more interesting to get public transport rather than just drive myself down. The train station at King Street was actually quite small, it’s tucked away a few blocks behind the Moscone Center and the trains run down past Oracle’s offices to San Jose.

The journey on the faster “Baby Bullet” train takes about half an hour, after which I was picked up and driven to Oracle’s offices. The buildings themselves are a bit of a landmark, I guess most of us have seen pictures of the various “database-shaped” buildings around a central lake with the Americas Cup yacht moored by the side. It was a lovely day and certainly quite an impressive sight.

I wasn’t just on a sightseeing visit though, as I had a meeting with the product development team responsible for Oracle Warehouse Builder. Here’s Antonio, Dave and Michelle (who’s coming over to the UK in a couple of weeks to do a briefing on OWB11gR2 for the UKOUG BIRT SIG meeting).

After the meeting we popped out for lunch, then went through some of the new ODI and OBIEE integration features in the OWB 11gR2 beta. Here’s me after my lunch, and wondering why the weather back in the UK can’t be as good as this, and how I’m going to shift the extra stone I’ve probably put on after eating my breakfast and dinner in Lori’s Diner every day this week.

Just to remind me of home though, the train going back was delayed by half an hour. Here’s one more photo though, as I said it’s quite an impressive place to visit for the day.

If anyone’s interested, I’ve uploaded all of my Open World 2008 photos to this Flickr set.
Oracle Open World 2008, Day 5 : Exadata Storage Server, and Ask the Oracle ACE Directors
Today was the final day at Open World, with a number of sessions on the new Exadata Storage Server and the Oracle Database Machine data warehouse appliance, together with the Oracle ACE Directors’ Birds of a Feather Panel Session kindly organized by Lewis Cunningham. In the end we had six ACE Directors on the stage in this photo, with Lewis arriving just after.

Left to right, we’ve got Bradley Brown from TUSC, Arup Nanda, Hans Forbrich, Tim Hall, Eddie Awad and myself. The questions from the audience we mostly around the database although Lewis kindly threw one in on OBIEE optimization; I also planted one on the panel’s reaction to the Exadata news yesterday and we had an interesting one on the panel’s least favourite feature in 11g - my vote wasn’t a feature as such, more an observation that most of the good stuff was only available as database options, and the decision to use these is usually made well before I turn up I’m usually unable to make use of them.
There were two sessions on the Exadata Storage Server and the Oracle Database Machine during the day, renamed from the more generic titles that I picked up on earlier in the month. The details on the two products have been more than adequately covered on other blogs (Kevin Closson’s postings (1,2,3), and the ones by Curt Monash (1,2,3,4), plus Pete’s earlier posting here, set out the features and industry commentary pretty well), but some key points and takeaways that I picked up on from today’s sessions included:
- The storage array hardware that the Exadata Storage Server uses is commodity HP DL185G5 kit, but with additional software from Oracle for the predicate-pushing feature
- The fact that each storage server ships with 1GB of network bandwidth means that your disks and network I/O capabilities will always be balanced
- The database server using the Exadata Storage Servers can be either single node or RAC, must run Oracle Enterprise Linux and Oracle 11g 11.1.0.7
- The predicate pushing feature is implemented using a new feature in 11.1.0.7 called smart scan, where the database engine, when it can, will take the SQL coming from the application, extract a simplified version of it and send it to the storage server CELLSRV process, which then returns just the rows and columns containing the results set. If this can’t happen then normal database blocks are returned instead, this is why it’s positioned as suitable for both OLTP and DW
- The volume manager is ASM, this is a key part of the software stack
- Each Exadata server is called a Cell, and is a self-contained server which contains 14 (?) separate disks and the server hardware that implements smart scan. The server is programmable by Oracle but not by the client. The software can/will be updated for each release of Oracle, you can choose not to update it however this will be like setting the compatibility of the database to an earlier release, it’ll work but not make use of potential new features.
- The Oracle Database Machine contains 8 Oracle servers and 14 Exadata Storage Servers in a rack. Database servers are HP DL360G5, again commodity hardware.
So, very interesting stuff albeit not something I’ll probably get my hands on for a while. So, that’s it for Open World for this year, I’m off to Corporate HQ tomorrow then back to the UK the day after. I’ll try and wrap everything up with a summary post tomorrow, for now though it’s a chance for an early night for once before getting up early tomorrow.
Oracle Open World 2008, Day 4 : OBIEE Action Framework, and OBIEE Performance Tuning
I’m just catching up now on my Open World notes, as I’m staying for an extra day to go to Redwood Shores tomorrow for a meeting with the OWB product development team. Wednesday (Day 4) was my first day without any presentations, and with Larry’s keynote during the early afternoon I managed to go to a session on the OBIEE Action Framework in the morning, and - the session I’d been waiting all week for - the OBIEE Performance Tuning one, at the end of the day.
I blogged about the forthcoming Action Framework in OBIEE 11g in a blog post last year, and this year Nick Tuson and Adam Bloom, two of the OBIEE product managers based just down the road from me in Bristol, gave an update on the progress of this feature. The basic idea with the action framework is that you can define, in the web catalog, “actions” that represent calls to web services, java methods, scripts or URLs that you can then attach to data items in your reports. Actions have three parts to them; an action definition that contains the action name, parameter names, button names and so on; a set of parameter mappings, and an action implementation, that defines the java class, web service call and so on that the action uses.
Adam also introduced another new addition to the web catalog in 11g, “conditions”, which are defined against data items and can be used by actions and other processes to run checks like “is the customer profitable”, “do they have a checking account” and so on, I used a variation on this when doing my BI and SOA articles and it’s a similar idea to the conditions that you can define in the Discoverer EUL. In terms of dates, we’re still looking at “some date in calendar year 2009″ for OBIEE 11g, but it certainly sounds like they’re packing in the features. I was also amused to see that the quote on Nick’s slides around the action framework came from my blog post last year, I was certainly happy to endorse the new feature as it’s shaping up to be one of the most innovative features in this forthcoming release.
Apart from live blogging Larry’s keynote just after lunch, I managed to get myself along to Bob Ertl’s session on Optimizing Oracle Business Intelligence Enterprise Edition. I was particularly looking forward to this session as it was the only one that took a look at OBIEE internals; in previous years we’ve had Ed Suen doing sessions on the semantic object layer and Kurt Wolf doing ones on OBIEE architecture, however I understand both have left recently and therefore Bob looks to have taken up the mantle of chief internals specialist for OBIEE. The session itself was really good, I really enjoyed it, Bob was leapt on a bit by enthusiastic and questioning OBIEE architects and performance tuners at the end but I think this just illustrates the interest there is in this area. Anyway, here’s a synopsis of what Bob had to say:
OBIEE queries can run slowly for three main reasons:
- They scan too many records on the disk, typically caused by tables being too large, too much data needing to be aggregated, transformations being too complex, too many joins or poor SQL. The typical database solution to this is to add indexes, partitioning, summary management, batch aggregation, move the aggregation to the ETL process, create star schemas and so on. In the OBIEE world, you typically address this by using aggregate navigation, use caching, follow BI data modeling best practices. Bob noted that customers often turn to caching as their first solution to problems in this area, but this only addresses part of the problem.
- Returning too many records over the network, due to poor SQL, using unsupported DB functions, doing a cross-join (OBIEE federated queries) involving large tables, poor prompt design. In the RDBMS world you can address this by centralizing your data into a data warehouse (therefore decreasing the network traffic caused by cross-database joins), following model practices, using the drive table join feature, setting prompts appropriately, and using caching.
- Interface and processing bottlenecks, such as incorrect connection pool settings, setting logging too too high, incorrect configuration settings, insufficeint hardware. Funnily enough at least the first two of these issues are addressed by the new database machine and exadata storage servers announced yesterday, but at least on smaller systems you’ve got to address these things yourself.
Bob then went on to talk about BI Server query plans, for me the thing that I was most interested in finding out about. My background is in data warehousing where it’s pretty important that you know about the various join types and execution plans that the database can generate for your queries; as the BI Server is a “virtual database engine” I’m equally curious about how the BI Server does its query parsing, plan generation and so on. In the abstract form, BI tool query plans look something like this:
- Pre-aggregation functions and filters are applied to the data
- Aggregations are performed
- For the BI Server, multiple result sets from federated queries are “stitched” together, and
- Post-aggregation functions and filters are applied.
Bob’s advice was (and this is something I also advocate), to move as much of this functionality down the stack to the database as is possible. If you can integrate, aggregate and summarize your data in the data warehouse then do so, only if you can’t should you then consider federated queries and so on. The BI Server itself will push down functions to the underlying database so that, for example, if you run a ranking function on your query and you’re using an Oracle database, the BI Server will “function ship” this down to the database rather than try and do it itself. If you’re using Access though, for example, or using Oracle but have got your database features set incorrectly in the BI Administration tool, this calculation will need to be done in the BI Server’s memory instead (using a feature called “functional compensation” and it’ll go and get all the rows from the database before it can run the calculation itself. The tip here then is to (a) make sure your database feature setting are correct in the BI Administrator tool, using the Features tab in the database dialog:

and (b) consider using a database that supports the maximum amount of function shipping, also consider using Essbase etc if lots of aggregates or complex calculations are needed as these can be done natively.
There was also talk about how the BI Server goes about stitching disparate results sets together. This is something where, for example, you define a fact table in the business model and mapping layer and then map this to two physical data sources. When a query runs, two separate SQL statements are generated and the BI Server “stitches” the data together. The problem here comes when you try and do this with large data sets, as all the data is accessed and loaded into data before the BI Server joins the data in memory using a common key column. Bob talked about a BI Server feature called “parameterized nested loop joins”, where you can set the driving table for a federated query, like this:

When you set this feature, the BI Server (if I remember it correctly) will query the driving table first, remember the set of key values that are returned by this query bearing in mind any filters that are applied, and then separately join to the second table only retrieving the key values collected by the first query. Bob also mentioned “eager group by” (might have mis-heard this though), “sub queries with IN clauses” and “persisted connection pool sub-queries”, he didn’t go into any more details on this but did promise a series of tech notes from the BI Server team in time to expand on these features. I hope this comes through and I’ll certainly be dropping Bob a note on this, as internals and query processing details like these just aren’t in the documentation and it’ll be good to shed a bit more light on this.
Bob then went on to talk about Aggregate Persistence (I covered this in a previous blog posting) and talked about some features that might be coming in future releases, including parallel loading of aggregated data into the aggregate tables created by the wizard; automatic indexing (of the underlying data warehouse? or the aggregate tables created by the Aggregate Persistence Wizard?), an advisor function on which aggregates to create and which to drop; selective deletion of aggregates (currently it’s all or nothing), and incremental loads of aggregate tables, although this is a bit trickier. I also asked about future integration with Oracle materialized views, this apparently is being looked at as of course it provides many of these features built into the (Oracle) database.
The final section of the talk was on logging, and the usage tracking repository tables and reports that ship with OBIEE and the BI Applications. Whilst logical and physical query logging is something you typically only turn on when diagnosing performance problems, usage tracking is something you can and should turn on permanently, the results of which you can store to file or better, to tables, so that you can generate dashboards and reports off of the usage data generated. The usage tracking reports themselves contain data such as when reports are run, what reports are the most popular, statistics (time of day, total time, compile time, number of rows returned and so on), these can also be reported on by the BI Administration plug in that you can obtain for Oracle Enterprise Manager. There are some best practices around setting up the usage tracking system; the BI Server should connect to the usage tracking tables via it’s own connection pool, writes are batched to every five minutes, run the usage tracking system on the same database server as the business content. Apparently an Oracle by Example tutorial is coming soon for the usage tracking system, keep an eye out on OTN for this.
Anyway, as I said it was an excellent session, and I suggested at the end that those who had come along and expressed an interest in the area should stay in contact. So, if you’re interested in setting up an informal “OBIEE Performance Tuning User Group”, let me know and I’ll put us all in contact.
Exa-ctly
Sitting about half-a-flying-day from San Francisco and OOW and for that matter my colleagues, gives me opportunity to mull over Larry’s keynote address that has been so widely reported on-line.
I would love to say my prediction for what was going to be announced was completely right; although in the main it was right some of my conclusions on how it might be done were a little adrift.
For strange people like me, people that see the world as moving large amounts of data around, it was exciting news. For me, data retrieval and storage are bulk processes and need to be achieved in way that does not swamp the capacity of that weak link, IO bandwidth. Historically, we had the tools of right-sizing (that is making sure we have the “correct” number and size of disks, the right number of controllers, the right disk connect technology etc) and reducing the data volume to transfer (through pre-aggregated summary tables, partitioning, indexing and, now, table compression) The missing component, and one used by some DW appliance vendors such as Netezza, is pushing parts of the query out to the data. In effect this a technique to reduce the amount of information to be moved a distance which is good for both IO and ultimately processing power - it is a lot easier to sort and manipulate a small data set. I say “moved a distance” as inevitably the same amount of data has to be read off disk and processed on the storage units, it is just that unneeded data does not migrate along a potentially slower link only to be discarded at the database. In a way this is not unlike the childhood card game “Happy Families”; I ask the storage unit “Do you have Mr Bunn, the Baker?” and get given Mr Bunn, and not be given all the cards and be told to look for myself.
Of course there will be a lot of questions that will come up such as: does it work with star transformations (I would guess,yes as the results of the bitmap combines can theoretically be pushed to the storage units), does it support Oracle OLAP… maybe not, but there again is that important if we can materialize a relational cube do something similar?
For now I am excited, but with anything that is hardware based, saddened that I can’t just download it and give it a whirl.
Oracle Enters the DW Appliance and Storage Server Market
So, the mystery “X” product was a database / data warehouse appliance, using HP Hardware and a custom storage server that takes predicates provided by the database and filters the data locally, returning just the results / columns back to the database rather than whole blocks of data. Oracle enters the data warehouse appliance market, more details on Kevin Closson’s blog, and on the live blog plus pictures that I ran during the keynote.
Blogging Live from the Larry Ellison Oracle DW Appliance Keynote, San Francisco
15.54
Right, that’s it. Everyone’s going to the demogrounds now, there’s a couple of machines on show. Now, where’s Kevin Closson so I can ask him how it all works, and Curt Monash to see how he thinks this’ll play in the market?

15.34
Idea is that hardware is open, as CPUs with more cores become available these will be used. Example given was going from a four core CPU to a six core CPU.
15.33
The server technology behind the product is HP’s Proliant technology. So Oracle and HP are now in the data warehouse appliance market. Mark Hurd on video link now.
15.31
Jointly sold by Oracle and HP. Oracle sell it, HP build and service it.
15.30
HP Oracle Database Machine priced at $4000 per TB, software license $1.8M. Cost of two together plus hardware is <$14000/TB. Designed to run OTLP loads as well as DW loads.

15.26
Teradata uses conventional disk arrays. No intelligence in the disk controller. Suffers from the problem that the disk controller moves disk blocks to the database server, the interconnect is slow and is the bottleneck. Architecturally it can’t compete, because of this inadequate data bandwidth.

Netezza does a lot better, it’s basically a storage server, with one processor per disk server. Oracle’s advantage is that its machine runs Oracle, Netezza does not. Netezza box is not fault tolerant, Oracle’s is. Netezza misses basic DW functionality (bitmap indexes etc). Obviously Netezza and and Teradata are the vendors this is aimed at.

15.25
The problem with current storage arrays and servers is that there is too much data movement between the two components. Because the Exadata server has more bandwidth and only sends back results, processing the query predicates locally, the amount of bandwidth it requires is reduced, with lack of this bandwith being the main limiting factor for data warehouses. Every time you add an Oracle storage server you add two more infiniband connectors, keeping the hardware in balance.
15.20
So the “X” is actually a database / data warehouse appliance, in the same space as Netezza, DatAllegro etc. It’s also being positioned as a general “large database” appliance.

15.18
Started providing early versions of the system to customers for the past year. Amazon mentioned as a customer.
15.15
Larry’s now announced a database appliance - the HP Oracle Database Machine. A database (data warehouse) appliance. 8 Oracle database servers. 64 intel processor cores. Oracle Enterprise Linux. Oracle RAC. 14Gb/sec data bandwidth. 112 Intel Processor Cores. 168, 300 or 1000GB DIsk Drives. 14000 Larger than the largest ipod.

15.12
The database server pushes the query, the predicates to the Oracle storage server. The storage server (or servers, in a grid) then query the data, then return just the results, the required columns, to the database server.

Uses 2 infiniband pipes, 1GB/sec (bandwidth limited by speed of the 12 disk drives in each server) per storage server. Dozens of servers can run in parallel.

15.10
Built by HP.
15.04
It’s Oracle’s first hardware product - the “Exadata Programmable Storage Server”

2 Intel processors, 8 cores 12 disk drives, up to 12TB raw storage. Runs Oracle’s parallel query software, moved into the storage server. Returns query results, not disk blocks.
2.00pm
I’m currently sitting in the audience waiting for the Larry Ellison keynote at Oracle Open World 2008. It’s just past 2pm and the crowds are coming in now. I’m not sure whether we’ll have to sit through the vendor keynotes first, if so it’ll be about an hour or so before Larry comes on, and we get to hear what this mysterious “Project X” is all about. Keep this page bookmarked and check back over the next couple of hours, I’ll update the posting as things are announced.
Oracle Open World 2008, Day 3 : New BI Technology & Keynotes
Now that’s more like it. After I mentioned that there didn’t appear to be much new BI information at this year’s Open World, I went to a session by Matt Bedin on “New Technologies in Oracle BI Suite Enterprise Edition” whilst Thomas Kurian’s keynote was almost all about the new BI and EPM products that have come along in the past year or so. Before that though was my final session, on Data Modeling Techniques using Oracle BI Server, and I have to say that I though this was the most successful of my four sessions. There was a good turnout for the 9am start, the demos went well and there were lots of questions at the end. From my work with clients over the past year, the data modeling aspect of OBIEE together with the role that the BI Apps data warehouse plays in Oracle’s new BI technology stack are the two areas that customers are most confused about, so I wasn’t surprised that this session and the first one I ran on Sunday would be the best attended. Anyway it was a good way to end off my presenting duties, and as usual the slides are uploaded here.
Matt Bedin’s session on New Technologies in OBIEE didn’t introduce anything absolutely new, as in only announced this week, but it did go over the new Essbase Studio application, the new sample application that comes with OBIEE 10.1.3.4, and the integration that’s now possible between OBIEE and (Hyperion) EPM Workspace. This new Workspace integration looks good, and it was good to hear that you don’t need Hyperion licenses to use it (although of course it’s only really relevant if you have other Hyperion products to integrate), which is also the case with Smart View, the strategic product for MS Office integration. It was good to see a demo of OBIEE, Workspace and Smart View working together, and I’m currently downloading all the relevant pieces at the moment so that hopefully I can show this in more detail later in the week.
The other interesting thing yesterday was Thomas Kurian’s keynote, although I did have to sit through a tedious one hour infomercial from Intel beforehand before he came on. What was interesting about Thomas’ keynote was the fact that the first hour or so was all around BI and EPM, starting off with Oracle Data Integrator….

Then moving on to OBIEE and then Essbase…

Followed up by the BI Applications and Smart View.

So, take-aways from Open World so far, from the perspective of BI and EPM? Well, the integration between OBIEE and the Hyperion tools is moving on, with a visual and security integration now possible from the OBIEE 10.1.3.4 / EPM 11.1.1 releases, something you can get up and running now. Essbase and the Hyperion tools are centre-stage for Oracle now, certainly from our (tech) perspective we’re skilled up on Essbase now and looking to extend that out towards planning and Hyperion’s master data management solution. Oracle Data Integrator is obviously the strategic data migration / data integration tool going forward, with ODI and OWB merging their features and their individual product managers now reporting in to the same management. OBIEE 11g itself is no closer, with “within 12 months” still given as the delivery date although I expect the beta release to become available in some form before the end of this calendar year. No brand-new features have been announced this week, although for most attendees the integration between OBIEE and EPM Workspace will be new, I’ll also be interested to see whether many Essbase developers start taking up the opportunity to source at least some of their data from the OBIEE semantic layer, something that’s now possible with the release of Essbase Studio.
Anyway, I’m off to meet Jon Mead for breakfast in the cafe around the corner, then I’m off to the Moscone to hear the session by Adam Bloom and Nick Tuson on process integration using OBIEE. Larry’s keynote is later today, so we’ll also get to hear what all the fuss around “X” is all about.
Oracle Open World 2008, Day 2 : Presentations and Keynotes
Day 2 of Oracle Open World 2008 started off with Charles Phillips and Chuck Rozwat’s keynote setting out, at a high level, all the innovations around the various product lines over the past year. There wasn’t anything around BI in the session, it was mostly around what’s been happing with Fusion Middleware, the database and the applications, I ended up watching it in the OTN Lounge sitting next to Lewis Cunningham as I was still unable to sort out my blogger credentials at the reception desk. I’ll try again later today as it’d be good to get a good seat for the Larry keynote on Wednesday. One thing they did mention though, was that the Oracle database and application server can now be licensed to use in Amazon’s cloud environment; prior to this you could use XE in this environment but not one of the proper products, now you can use this in production and even backup your data to Amazon’s cloud storage.
The BI Roadmap and Strategy talk took place a bit later, for me it was a bit disappointing as for the first time that I’ve been to Open World, there was no product news that I’d not heard before. The main “new” stuff was around the new EPM 11.1 release, how Essbase Studio can be used to bring in OBIEE data, how Smart View is the new strategic direction for Office integration, and a reminder for developers to take a look at the new (extensive) Sample Sales demo that comes with OBIEE 10.1.3.4, which I’ll be covering in more detail in some future blog postings. No new news though, and I guess we’ll end up putting OOW 2008 down as a “fallow” year before all the big product announcements around OBIEE11g next year.
Other than that I went to Bud Endress’ and Marty Gubar’s talk on Cube Organized Materialized Views; I’ve been presenting on this myself for several months and it was good to hear their talk on this, I was also encouraged by the interest shown by the audience in this new feature. If you want to read the low-down in this feature then take a look at this blog post from a few months ago, and I also put together a write-up around the Oracle OLAP / ApEx integration demo that Marty did towards the end of the session if you’re interested in how this works.
My last big event of the day was my session on Automating Business Decisions using Oracle Real Time Decisions (slide here). I had the usual hardware problems just before the session started but things came together in the end, I was also pleased that the audience seemed to get what I was talking about and I had people coming up at the end to ask for more details. The issues with the demo though have got me thinking - given the fact that, with the last two sessions, the only bit that caused issues was getting the demo running, is it best to leave these out and just use slides? For me, having the demo makes the session “authentic”, it shows that the speaker actually knows what he’s talking about and can do what he says he can do, but if it goes wrong then it tends to (a) make you look a bit stupid and (b) cause you to stress out as you’re preparing. Anyone have any thoughts on this? Authentic but with potential for glitches versus slick but less informative? What’s your opinion?
Anyway, I’m just getting myself sorted out for my last session, at 9am in the Moscone West on Data Modeling Techniques using the Oracle BI Server. I’m just checking my demo (I know, I never learn) and hopefully we should get a good audience. After that it’s the ODI Partner Advisory Board, not sure whether what we’ll hear will be confidential but if not, I should be able to blog about the product roadmap afterwards. Then there a session on new technology in OBIEE (so we might actually get some new news after all) and the Thomas Kurian Fusion Middleware keynote, which might perhaps have some new information on where EPM and BI fit into this product category, and what’s going to happen regarding the old BI tools and the fact that they presumably are going to move to Weblogic as their application server. More from me later.
Oracle Open World 2008, Day 1 : “The X is Coming”
I was just walking back from the Thirsty Bear with Tim Hall, when we saw this teaser advert inside the Moscone Center:

Presumably this is the new Database Accelerator that hit the blogs and news sites last week, and that Oracle wouldn’t talk about at the ACE Director briefing today. If you follow the threads on the various postings you’ll get a few ideas - some of the talk is around the filters used to select rows for a query being pushed down to the storage controller hardware, some of it is around Oracle releasing some sort of column-store database, some of it is around a shared-nothing architecture. Who knows, as Oracle are strictly embargoing any news - we’ll have to wait until Larry’s session later in the week to find out more.
Oracle Open World 2008, Day 1 : User Groups and ACE Director Briefing
Today was the first day of Oracle Open World in San Francisco, with most of today’s sessions being organized by the various user groups. For me, as usual on the first day over here, I was up at around 4am, so I used the time to get my demos sorted out for the two sessions I’m running today. I did plan to just use slides and miss out the demos completely, given that they’re the only part you end up stressing about, but in the end I decided to do a couple of demos which I think turned out to be worth the effort.

My first session, at 8.30 am over in Moscone West, was on Extending and Customizing the Oracle BI Applications Data Warehouse (slides here). This was one of the BIWA SIG sessions and I think it went down fairly well. Most of the sessions I’ve been to on the Oracle BI Applications are more on the functional or product marketing side of this, the angle I decided to take for my session was to look at the internals and underpinnings of the BI Apps data warehouse including how the base data model is extended and customized using the Informatica and DAC tools. The turnout was pretty good and there were lots of questions, so I think this went down pretty well. After this first session I rushed back to the Hilton where the ACE Directors product briefing was being held.

Later on in the afternoon was my second session, this time as part of the Hyperion Symposium, organized by Tim Tow and Edward Roske. The title of this session was “Integration Essbase and Oracle Business Intelligence Enterprise Edition” (slides here) with a focus on bringing Essbase data into the OBIEE semantic model, and using the OBIEE semantic model as a data source for Essbase. The demos for this session should have been fairly straightforward except for the fact that I decided to upgrade my copy of VMWare Fusion from 1.0 to 2.0 in the morning, and when I plugged my Essbase virtual machine into my laptop it decided to uninstall all of the virtual drivers and then install new versions, which caused me a bit of consternation as this all started to happen about ten minutes before my session was due to start. In the end it all came together just in time and the session went down well, I hope.
I’m currently sitting in the OTN lounge waiting to catch up with Jon and Borkur, and then later tonight it’s the Blogger meetup at the Thirsty Bear. So no product news at this point then, hopefully we should start to hear some news from the main conference sessions starting tomorrow. One thing I heard is that 11gR2 definitely won’t be announced this week, it’s not even in beta yet, but I’m pretty sure there’ll be a major announcement around the data warehouse / database accelerator product about which the blogosphere got so exited about last week.
Loading OBIEE Data Into Essbase Using Essbase Studio 11.1
I’m currently sitting on a British Airways flight over to San Francisco, getting my demos and presentations sorted out for tomorrow. I’ve got an early start in the morning with an 8.30 slot as part of the IOUG Forum, doing a session on extending and customizing the Oracle BI Applications data warehouse. After that I’ve got another session later in the day, as part of the ODTUG Essbase Symposium, where I’ll be showing off the integration that’s now possible between OBIEE and Essbase 11.1, through the new Essbase Studio IDE. The rationale here is that an Essbase developer might want to take advantage of the data integration and query federation capabilities of OBIEE and load some data from the OBIEE semantic model into their Essbase cube, or alternatively an OBIEE developer might want to improve the query performance of an existing OBIEE business model by extracting the data out of OBIEE, loading it into Essbase and then re-mapping the business model to use the Essbase cube as it’s data source rather than it’s original relational table sources.
Assuming you’ve got some data in OBIEE that you’d like to bring into Essbase, the first step is to define a new data source in Essbase and point it to your OBIEE installation. Essbase Studio uses an HTTP connection over in my case port 9703, which is how it gets around the ODBC issue that Essbase Integration Services has that I blogged about earlier in the year. Once you define your connection you can select from any of the OBIEE presentation models that your OBIEE username and password grants you access to.

Once you create the data source connection, you can create a Schema Diagram that allows you to put in place the necessary table joins, as these won’t come through from OBIEE.

Now that you’ve got the data source defined, the next step is to define your hierarchies, as you would do in the OLAP Model stage when using Essbase Integration Services. Unfortunately at least at this stage the dimensions you’ve set up in OBIEE don’t come through to Essbase Studio, so you need to recreate them again before you can go much further.

You’re now at the point where, in Integration Services, you’d have enough information to create your OLAP Model. In Essbase Studio, the equivalent step is to right-click on the metadata folder and select New > Cube Schema.

When the New Cube Schema comes up, you can select the particular set of regular, calendar and measure (account) hierarchies that will form your cube. In my case, I’ve got regular hierarchies for the customer, product, channel and times tables, and a measure hierarchy to contain the fact table measures.

I then select the option to create a new Essbase Model to go with the Cube Schema, and I’m now at the point where I can fine-tune my dimension member names, by right-clicking on the new model and selecting Essbase Properties.
The Essbase Properties page brings up a similar screen to the one you get when working with an OLAP Metaoutline in Integration Services, and lets you transform dimension member names, add aliases, set dimension aggregation rules and so on. I use this dialog to initially set my cube type to Aggregate Storage (as I still yet can’t find out where the equivalent of “use default calc script” is in Essbase Studio, so this is the only way I can get my cube to aggregate properly), and then I transform the various “Total” columns in my dataset to make sure that their values are unique across the cube.

Once you’ve defined all your transformations, set all your aliases and so on, you can then verify your model and then use the Cube Deployment Wizard to create and load your Essbase cube.

The Cube Deployment Wizard brings up a screen similar to the Member and Data Load screen in Essbase Integration Services, except as I mentioned the various options around aggregate calculation have now gone (the “use default calc script” option). To get around this I defined my cube as using Aggregate Storage which automatically aggregates data on demand, but if anyone knows where this option is gone then let me know by adding a comment to this posting.

Once the cube is created you can then view the outline in Administration Services, like this:

You can also use the Preview function to view the data in the cube, to check that it’s loaded and aggregated correctly.

Other than this, you can then bring the cube back in to OBIEE, so that for example you can replace an original relational data source for your logical model with a multi-dimensional one; you can add aliases to the dimension member names to pick up the descriptions in the logical model; you can add calculations, forecasts and so on using MDX (in the case of ASO cubes) and the Essbase Calculation Script for BSO cubes.
That’s it for me now, we’re landing in about half an hour, then it’s off into San Francisco and the start of Open World week.
Tuning an Oracle 11g Data Warehouse Using the DBConsole Advisors
I’m currently back in Denmark running the first of my Oracle 11g Data Warehousing Masterclasses, today was the first day and I’m currently back in the hotel working through some material for tomorrow. One of the main demonstrations that I went through today was around using the SQL Access Advisor to recommend on indexes, materialized views and partitioning, and one of the comments that I got at the end was that it would be useful if I went through some of the other advisors available in Database Control.
To give this a run through then, and to put it in a data warehousing context, I came up with a couple of scenarios that I’d like to “tune”.
1) We have a number of queries running against the SH schema, in one session serially, that we want to monitor and potentially tune.
2) I’ll repeat the same process again, but run several sessions together all querying the SH schema, to see how well the advisors deal with memory, I/O and concurrency issues.
The idea here is that there are a number of advisors, graphs and other performance dashboard elements on the Performance page of Database Control that can be used by someone to work out what’s going wrong with a data warehouse. I’m assuming here whoever’s using it is not a hardcore database whizz and that they’re looking to get some obvious pointers to what’s going wrong, and how they might fix things.
Taking a look at the Database Control page in question, here’s what you get when you first take a look.

When I took this screenshot I was running a single session that was firing individual, serialized SELECT statements against the SH schema to the database, and you can see from the “Average Active Sessions” graph in the middle that this one session was generating a lot of user I/O wait events (the big blue part of the chart) and CPU wait events (this big green part of the chart). Looking at the graph above, which on Windows shows the CPU utilization and on Unix shows the run queue, I can see that the CPU was being hammered at around 60% for most of the period. So from looking at this, whilst running this set of SELECT statements the major things holding me back were lack of CPU and waiting for disk.
If I click on the User I/O part of the graph to drill into this particular wait event class, I can see what it’s made up of direct path reads (presumably causes by sorts going to disk) and a very small amount of db file sequential reads, which is down to all the indexes that are being accessed by the queries. I’d make a note at this point to see whether I’ve sized my PGA correctly as these sorts being written and then read from disk might be down to the PGA_AGGREGATE_TARGET being set a bit too small.

Staying on that page, I can see that the Active Session History that has captured these details is also listing out the top offenders for this wait class. Clicking on the top one, which accounted for 18.38% of the activity, I can examine it in more detail and see that it is indeed using an order by clause, causing the sorting, and I can also see the SQL itself and the execution plan that it’s currently generating.

Now as the SQL itself is a fairly simple I can’t really see much point in using the “Schedule SQL Tuning Advisor” button to generate some alternative versions of the query, but looking at the diagram of the explain plan below I notice that we’ve got something unusual going on, not something I’d normally expect to see in a star schema data warehouse query. Clicking the “Table” radio button I then take a look at the execution plan and notice that it doesn’t appear to be using any indexes let alone a star transformation, which is strange, but then I remember that I’m running this against a copy of the SH schema where I don’t think I’ve put any indexes or materialized views in place.

Going back to the original Performance Page, I press the Run ADDM Report button to generate an ADDM report against recent activity. Taking a look at the resulting report I can see the two period in which I’d been running my tests (just before 5pm, and just after 7pm), and within the ADDM Performance Analysis section I can see, ordered by priority, the issues that I should take a look at.

The topmost issue is “Top Segments by I/O”, and if I click on the Finding link I see that my SH_COPY.SALES table is highlighted, with 19 full table scans recorded against it whereas I’d normally expect this to be accessed via bitmap indexes and a star transformation. So there’s another clue then.

If I saw this coming up on a tuning report probably the first thing I would do is to try and see if I could access this segment (table) more efficiently, potentially creating some indexes, some materialized views to provide faster access to table aggregations, or partitioning the table so that full table scans only need to access portions of the table. Unfortunately the Segment Advisor that’s available on the same page just recommends that I shrink the table to free up some space, I guess again this points to advisors such as these being more there to help automate the gathering of diagnostic data rather than going off an making the changes themselves. Anyway, back to the Performance page, and this time I’m going to run three sessions in parallel, all running SELECTs against the same tables.
From looking at the Active Average Session chart now, you can see the activity is now much greater than before, because I’m running three sessions in parallel (the fan on my laptop is whirring, as well). Now one thing I’m not sure about is the Maximum CPU line on this graph, it would seem to indicate that my concurrent processes are requiring around 2.5 CPUs to run as against the 1 that I’ve actually got available; also looking at the host chart above I can see that I’m maxing the CPU out at around 100%, which would be logical.

Scrolling down, there is another section with another set of reports, one of which tells DBConsole to display a Top Activity report (top SQL statement), another lists out blockers, and other one lets me create an SQL Tuning Set.

Now as the problem I’m hitting is probably more down to a lack of indexes and summaries in my schema, no amount of SQL tuning is going to fix it and therefore it’s probably time to (a) create an SQL Tuning Set and (b) use the SQL Access Advisor to recommend some indexes and MVs. To do this I click on the SQL Tuning Sets link and create a tuning set based on the activity captures in the Automatic Workload Repository (AWR), the same workload that my previous reports and graphs have been based on.

I can now take this SQL Tuning Set and run the SQL Access Advisor. Taking a look at the recommendation overview, it certainly looks like what it’s going to propose will make a fair bit of difference.

I therefore schedule the recommendations to be run immediately, let the indexes and materialized views be created, and then run the workload again. This time, although the CPU seems to be fairly saturated as I’m running three sessions on a single CPU virtual machine, the waits around disk I/O have gone down, which is probably due to the materialized views that have been created, and what’s more the whole set of SQL took a lot less time to run, again because of the indexes and MVs.

So, there’s a typical data warehouse tuning session using the various advisors in DBConsole 11g. Of course all of these advisors require you to license the Tuning and Diagnostic Pack for the Enterprise Edition of 11g, so you can’t legally use ADDM, AWR and ASH or these advisors unless you’ve got the relevant license. If you have though, it’s quite a neat and interactive way to start tuning the queries and data structures in your database.
Oracle BI Training Days
Over the past few months the team at Rittman Mead have been busy developing new course content and seeking out suitable venues for our first ever public course, the Oracle BI Training Days . This course is a fantastic opportunity to learn more about designing, developing and optimizing business intelligence systems using Oracle’s BI tools.
Unlike other training courses that you may have seen this one is different! With involvement from the whole Rittman Mead team and a course structure that allows you to choose to attend on one, two or all three days this is a course that will take your existing skills to the next level. Although there is a lot of intensive training and knowledge sharing the course is designed to be very interactive; there is also time for networking and discussions - a course meal one evening and some special surprises during the three days. Can’t say anything about the surprises or they won’t be surprises!
One of the great things about the venue is that is is easy to get to - just a short distance from London City Airport, near the London Underground and DLR stations and just across the road from the River Bus pier. Looking out from the conference centre windows it is hard to imagine that you are in one of Europe’s financial hubs!
There is more about the BI Training Days on our courses pages - but a couple of things we don’t say there are: “buy two days and get the third day for free”, the three day price is the same as the two day price… so there is an incentive, as is the Early-Bird discount that is available to the end of September. And the course includes lunch and refreshments each day; overnight accommodation is not provided, but we can make recommendations .
Testing Advanced (OLTP) Compression in Oracle 11g
Another new feature in Oracle 11g of interest to data warehouse developers is Advanced Compression. This new feature extends the capabilities of segment compression first introduced with Oracle 9i, but crucially makes it work for all DML operations, not just direct path inserts, direct path SQL*Loader operations and CREATE TABLE … AS SELECT operations. As such I’ve seen it also referred to as “OLTP Compression” as it makes compression now available to OLTP, transaction applications as well as data warehouses. It’s also (wait for it) a pay-extra option, which means that it’s not just a feature upgrade for traditional segment compression and instead requires you to pay an extra license fee on top of the regular Enterprise Edition license. That aside though, how does it work, and more importantly, given that the old 9i and 10g compression only worked for direct path operations with regular, conventional path operations causing the compression to be lost, does it use a new compression algorithm or is there something clever going on behind the scenes?
To do a quick recap on segment compression prior to Oracle 11g, tables and other segments (except index-organized tables) could be designated as compressed by adding the COMPRESS clause to the CREATE TABLE statement. For partitioned tables partition compression would be inherited from the table or could be designated at a partition level within the create table command. Table compression could be switched on or off after the table or partition was created by using using an ALTER TABLE…MOVE… command. Anyway, creating a compressed table and in this case loading it directly using a CREATE TABLE … AS SELECT statement looked like this, with the command that follows afterwards comparing the size of the new compressed table with the table that provided its data.
SQL> CREATE TABLE comp_customer COMPRESS AS SELECT * FROM copy_customer; Table created. SQL> SQL> SELECT segment_name "Table_Name", bytes / 1024 / 1024 "Megabytes" 2 FROM user_segments 3 WHERE segment_type = 'TABLE' AND segment_name LIKE 'CO%CUST%' 4 ORDER BY segment_name, tablespace_name; Table_Name Megabytes ------------------------------------------------------------- ---------- COMP_CUSTOMER 5 COPY_CUSTOMER 8
You can actually influence the degree of compression by choosing the order in which you load the table; depending on the nature of your data often one columns will give better compression ratios than another, and you also need to factor in the way in which the table is accessed. This presentation by Jeff Moss at a UKOUG Business Intelligence & Reporting Tools SIG Meeting a couple of years ago goes through the factors and how you can come up with the optimal column ordering. Anyway, to show the sort of different this can make, I’ll now create the same table again but this time sort by the customer last name.
SQL> DROP TABLE comp_customer; Table dropped. SQL> CREATE TABLE comp_customer COMPRESS AS SELECT * FROM copy_customer ORDER BY cust_last_name; Table created. SQL> SQL> SELECT segment_name "Table_Name", bytes / 1024 / 1024 "Megabytes" 2 FROM user_segments 3 WHERE segment_type = 'TABLE' AND segment_name LIKE 'CO%CUST%' 4 ORDER BY segment_name, tablespace_name; Table_Name Megabytes ------------------------------------------------------------- ---------- COMP_CUSTOMER 4 COPY_CUSTOMER 8
So now by ordering the compressed table data load by the customer last name, I’ve reduced the space required by the table by 20%.
Of course as I mentioned before, Oracle 9i and 10g compression only works with direct path operations. If, for example, you went and inserted another batch of the same records into the same table but didn’t use a direct path operation, you will find that the new data takes up more or less the same space as the data it was copied from, showing that compression hasn’t been applied.
SQL> INSERT INTO comp_customer SELECT * FROM copy_customer ORDER BY cust_last_name; 55500 rows created. SQL> commit; Commit complete. SQL> SELECT segment_name "Table_Name", bytes / 1024 / 1024 "Megabytes" 2 FROM user_segments 3 WHERE segment_type = 'TABLE' AND segment_name LIKE 'CO%CUST%' 4 ORDER BY segment_name, tablespace_name; Table_Name Megabytes ------------------------------------------------------------- ---------- COMP_CUSTOMER 11 COPY_CUSTOMER 8
It’s even worse for updates, as any data that you update then loses its compression, taking up space and slowing the update down whilst the decompression takes place.
SQL> CREATE TABLE comp_customer COMPRESS AS SELECT * FROM copy_customer ORDER BY cust_last_name; Table created. SQL> SELECT segment_name "Table_Name", bytes / 1024 / 1024 "Megabytes" 2 FROM user_segments 3 WHERE segment_type = 'TABLE' AND segment_name LIKE 'COMP%CUST%' 4 ORDER BY segment_name, tablespace_name; Table_Name Megabytes ------------------------------------------------------------- ---------- COMP_CUSTOMER 4 SQL> UPDATE comp_customer SET cust_year_of_birth=1900; 55500 rows updated. SQL> SELECT segment_name "Table_Name", bytes / 1024 / 1024 "Megabytes" 2 FROM user_segments 3 WHERE segment_type = 'TABLE' AND segment_name LIKE 'COMP%CUST%' 4 ORDER BY segment_name, tablespace_name; Table_Name Megabytes ------------------------------------------------------------- ---------- COMP_CUSTOMER 7
Now it’s this issue with updates removing the compression from tables, that makes compression quite tricky to work with in certain data warehouses; the ones that need to go back and alter data in the fact table, for example, after it’s been loaded. Now if you’ve read papers and presentations by the likes of Tim Gorman, Jeff, even myself you’ll have read references to the general best practice of loading data into the warehouse once, compressing it, backing it up, making the new partitions read-only and thereby reducing the work required to backup your data and store it to disk. In theory this is all well and good, but in practice I often come across customers who for whatever valid reason need to regularly go back and reload fact data into their warehouse, which reduces the value of compression as each update will remove the compression on the blocks that contain the rows affected. It’s for this reason that I was interested when Advanced Compression was announced, as it potentially means that these customers can also use compression (as long as they pay the extra license fee) and not worry about going back and updating their data. They still face the problem that it’s tricky to work out when data is “closed-off” and safe to mark read-only, but at least they won’t need to worry about the disk requirement for their data creeping up and up after it is initially loaded.
So how does this new Advanced Compression feature work then? According to the manuals, my understanding is that it’s not a new “decompression-safe” algorithm that used, what actually happens with advanced compression is this:
1) A table created using the advanced compression feature is initially actually uncompres




