Top Ten New Data Warehousing Features In Oracle Database 10g
With the recent release of Oracle Database 10g, OTN have recently been running a series of articles by Arup Nanda entitled "Oracle Database 10g: The Top 20 Features for DBAs".
With this in mind, I thought it worth putting something similar but this time focusing on those features that apply to Data Warehousing and Business Intelligence. So here's my Top 10...
1. The SQL Model Clause
Probably the new data warehousing feature in Oracle Database 10g that has received the most attention is the SQL Model Clause.
The SQL Model clause allows users to embed 'spreadsheet-like' models in a SELECT statement, in a way that was previously the domain of dedicated multidimensional OLAP servers such as Oracle Express and Oracle 9i OLAP. The SQL Model clause brings an entirely new dimension to Oracle analytical queries and addresses a number of traditional shortcomings with the way SQL normally works.
The SQL Model clause has been designed to address the sort of situation where, in the past, clients have taken data out of relational databases and imported it into a model held in a spreadsheet such as Microsoft Excel. Often, these models involve a series of macros that aggregate data over a number of business dimensions, over varying time periods, and following a set of complex business rules that would be difficult to express as normal SQL. I've worked on many a client engagement where the limitations of SQL meant that a number of standalone Excel spreadsheets had to be used, and whilst these gave the client the analytical capabilities they required, the usual issues of scalability, reliability of replicated data, and lack of overall control often became apparent after a while.
The aim of the SQL Model clause is to give normal SQL statements the ability to create a multidimensional array from the results of a normal SELECT statement, carry out any number of interdependent inter-row and inter-array calculations on this array, and then update the base tables with the results of the model. An example SQL statement using the MODEL clause would look like;
SELECT SUBSTR(country,1,20) country, SUBSTR(prod,1,15) prod, year, sales FROM sales_view WHERE country IN ('Italy','Japan') MODEL RETURN UPDATED ROWS PARTITION BY (country) DIMENSION BY (prod, year) MEASURES (sale sales) RULES ( sales['Bounce', 2002] = sales['Bounce', 2001] + sales['Bounce', 2000], sales['Y Box', 2002] = sales['Y Box', 2001], sales['2_Products', 2002] = sales['Bounce', 2002] + sales['Y Box', 2002]) ORDER BY country, prod, year;
2. The SQLAccess Adviser
Part of the Oracle Database 10g Server Manageability Feature, the SQLAccess Adviser is a new feature that recommends the best combination of indexes and materialized views for a given database workload. Available either at the command line (via the DBMS_ADVISOR package) or through the Advisor Central element of the new web-based Oracle Enterprise Manager, the SQLAccess Adviser is based on the index and summary advisors previously bundled with Oracle 9i and provides a 'one-stop-shop' for tuning and summarising your warehouse data.
3. Improvements To The Multidimensional OLAP Engine
With Oracle 9i, the previously standalone Express multidimensional engine was now incorporated into the Oracle database, and with Oracle database 10g, benefits of integration with the traditional relational Oracle engine are starting to become apparent.
First up is improvements to the way large Analytic Workspaces can be partitioned, introducing into the Oracle OLAP world some of the advanced partitioning options currently enjoyed by Oracle database users. Currently, Analytic Workspaces, stored as AW$ tables within an Oracle schema, can be partitioned across multiple rows in the AW$ table by specifying a maximum segment size, allowing you to split an individual analytic workspace into (say) 10Gb segments, one in each table row. This table could then be partitioned just like any other Oracle table, allowing you to put one row in one tablespace, another in another, and each of these tablespaces could of course be stored in datafiles on different physical disk units. Although this was of some benefit, splitting by segment size was the only way of partitioning the data, and you couldn't specify what objects within the analytic workspace went in to each partition. Oracle 10g OLAP will now include an enhancement where you can specify exactly which objects within the analytic workspace go in to each partition, and you can further subdivide this by segment size if objects are particularly large.
In a similar fashion, variables within the analytic workspace can now be partitioned, either by range of dimension members, a list of dimension members, or by reference to a CONCAT dimension. The 10g multidimensional engine then stores each variable partition as a separate physical object, which can be directed to separate rows in the AW$ table (allowing you to partition these across different tablespaces and physical disk drives); the variable however appears as just one object to the application, simplifying the data model and allowing Oracle to do all the 'hard work' in the background.
Another excellent new feature, and a real improvement over what was available with Express, is support for multi-user read-write access to individual analytic workspaces. In the past, one drawback with Express was that only one user could attach to an Express database in read-write mode, leading Express developers to develop a whole range of alternative solutions to allow ad-hoc write access to Express databases. In Oracle 10g OLAP, analytic workspaces can be attached in MULTI mode, whereafter applications then ACQUIRE individual variables in the analytic workspace for read-write access. Once an object has been acquired (and locked by the Oracle multidimensional engine), updates can then take place and the application can make whatever modifications are necessary. After all changes have been made, the UPDATE command is issued against the variable, followed by a COMMIT, and then a RELEASE command is issued against the variable to make it available for other applications to write to. It'll be interesting to see how the multidimensional engine handles multi-write access; in the past, with Express, databases could balloon in size when one user had read-write access to a database, and others were accessing it in 'read' mode, as Express had to clone the database for each user to ensure that they had a consistent view of the data; I wouldn't be surprised if individual variables were copied out of a 10g analytic workspace into a temporary workspace whilst updates happened, with updates being propagated back (as with the old Express Excel Add-In) when the changes are finally COMMITted - the key thing here is how database size is dealt with as the old Express way of doing it was less than optimal.
Aggregation has been improved with Oracle 10g OLAP, with formulas now allowed as sources of data for the AGGREGATE command, eliminating the need to calculate and store data at the detail level. Aggregation, particularly dynamic aggregation, is another area where Oracle 9i and now 10g OLAP are a distinct improvement over Express and it's well worth looking at this area in more detail if this is an issue with an existing Express system.
More details on Oracle 10g OLAP's new features can be found in the OLAP Application Developers' Guide, the OLAP DML Reference, and this white paper by Bud Endress.
4. The Tune MView Advisor, and improvements to Query Rewrite
Query Rewrite (the ability for Oracle to transparently redirect queries from detail level to summary tables) is one of the best data warehousing features in Oracle 8i and 9i, but it's sometimes a bit temperamental and you can often find that queries don't actually get rewritten. Sometimes this is because you've broken one of the Query Rewrite restrictions, sometimes it's because your materialized view doesn't contain the correct columns and aggregates. Oracle 10g has a number of improvements to Query Rewrite and the materialized view tuning process that should make this process a bit more productive.
With Oracle Database 10g, query rewrite is now possible when your SELECT statement contains analytic functions, full outer joins, and set operations such as UNION, MINUS and INTERSECT. In addition, you can now use a hint, /*+ REWRITE_OR_ERROR */, which will stop the execution of a SQL statement if query rewrite cannot occur.
SQL> SELECT /*+ REWRITE_OR_ERROR */ 2 s.prod_id, 3 sum(s.quantity_sold) 4 FROM sales s 5 GROUP BY s.prod_id; FROM sales s * ERROR at line 4: ORA-30393: a query block in the statement did not rewrite
Oracle 9i came with two packages, DBMS_MVIEW.EXPLAIN_MVIEW and DBMS_MVIEW.EXPLAIN_REWRITE that could be used to diagnose why a materialized view wasn't being used for query rewrite. However, although these packages told you why rewrite hadn't happened, they left it down to you to work out how to alter your CREATE MATERIALIZED VIEW statement to ensure that rewrite happened correctly. Oracle Database 10g comes with a new advisor package, DBMS_ADVISOR.TUNE_MVIEW, that takes as its input a CREATE MATERIALIZED VIEW DML statement, and outputs a corrected version that supports query rewrite and features such as fast refresh.
More details on Query Rewrite improvements and the Tune MView Adviser can be found in this presentation by Lillian Hobbs (password 'presentation'), the TUNE_MVIEW online documentation, the Query Rewrite online documents and this Oracle Database 10g Oracle-by-Example tutorial.
5. Data Pump, The Replacement For Import and Export
Data Pump is a replacement for the venerable IMP and EXP applications used for creating logical backups of Oracle tables, schemas or databases. Data Pump is a server application (as opposed to IMP and EXP, which were client applications) which, in beta testing, was twice as fast as the old EXP for exporting data, and ten times as fast as the old IMP for importing data. Data Pump is callable either through the DBMS_DATAPUMP package, through the replacements for IMP and EXP, known as IMPDP and EXPDP, or through a wizard delivered as part of Oracle Enterprise Manager 10g.
Data Pump (and the new IMPDP and EXPDP applications) offer a number of improvements over the old IMPORT and EXPORT, including resumable/restartable jobs, automatic two-level parallelism, a 'network mode' that uses DBLINKs/listener service names instead of pipes, fine-grained object selection (so you can select individual tables, view, packages, indexes and so on for import or export, not just tables or schemas as with IMPORT and EXPORT), and a fully callable API that allows Data Pump functionality to be embedded in third-party ETL packages.
Automatic Storage Management (ASM) is one of the 'cool new features' in Oracle 10g that is meant to reduce the workload for Oracle DBAs. ASM completely automates the process of creating logical volumes, file systems and filenames, with the DBA only specifying the location of raw disks and ASM doing the rest. Disk I/O is managed by evenly distributing the data across blocks within a disk group, with ASM in addition handling disk mirroring and the creation of mirror groups and failure groups.
ASM deals with the problems caused by rapidly expanding data warehouses, where administators can no longer deal with the sheer number of disk units, nodes and logical groupings, and is a key feature of the Oracle 10g Grid Architecture, which aims to 'virtualize' computing power and present database features like processing and storage as utilities that effectively manage themselves.
More details on Automatic Storage Management can be found in this OTN article by Lannes Morris-Murphy, this Oracle-by-Example tutorial (which comes with a number of demonstration viewlets), and the online documentation.
7. Faster Full Table Scans
Full Table Scans are common in data warehousing environments, and with this in mind table scan performance has been improved in Oracle 10g. Code optimisation in Oracle Database 10g has decreased CPU consumption and this leads to faster table scan execution (when queries are CPU bound, rather than I/O bound), and gives the potential for greater query concurrency, offering up to 30-40% speed improvements when comparing CPU-bound queries.
More details on improvements to full table scans can be found in this OTN white paper, which also details improvements in the handling of large numbers of partitions, improvements to the optimizer CPU cost model, automatic statistics gathering, and partition-aware materialized view refreshes.
8. Automatic Tuning And Maintenance
Automatic maintenance and tuning has always been one of the key product differentiators for Microsoft SQL Server and with Oracle 10g, features that meet and match those found in competitor products are being introduced to the server technology stack.
Surveys show that over 50% of a DBAs time is spend tuning and monitoring the database server, a task that whilst important is often complex and difficult to get exactly right. With Oracle Database 10g, Oracle have introduced a number of components that together make it possible for the database server to monitor itself, make intelligent changes to configuration, and alert DBAs when situations arise that need manual intervention.
The first component in this framework is the Automatic Workload Repository, which uses an enhanced version of Statspack to collect instance statistics every thirty minutes and stores these for a rolling seven day period. This enhanced version of Statspack now collects a broader range of statistics and has a number of optimizations to streamline the way high-cost SQL statements are captured, ensuring that only SQL activity that has significantly affected performance since the last snapshot are collected. The usage information stored in the Automatic Workload Repository is then used as the basis for all the self-management functionality in Oracle Database 10g.
Next up is the Automatic Maintenance Tasks feature, which acts on the statistics gathered by the Automatic Workload Repository, and carries out tasks such as index rebuilding, refreshing statistics, and so on, where such tasks don't require any manual intervention by the DBA. A new scheduling feature known as 'Unified Scheduler' runs these tasks during a predefined 'maintenance window', set by default to be between 10pm and 6am the next day, although these times can be customized to reduce impact on other tasks (such as batch loads) that might be taking place.
The third component of the self-managing framework is 'Server Generated Alerts', a method where the database server sends notifications via email to the DBA - including a recommendation as to how best to deal with the situation. Alerts will normally be raised where the database itself cannot deal with the situation that has arisen, such as when there is insufficient space on a disk unit to extend a datafile.
Lastly, and perhaps the most exiting of all the self-managing component frameworks, is the Automatic Database Diagnostic Monitor. This component analyzes the data captured in the Automatic Workload Repository and uses an artificial intelligence algorithm, similar to that found in Oracle Expert, to analyze areas such as lock contention, CPU bottlenecks, I/O usage and contention, issues with checkpointing and so on, in much the same way that a DBA would currently do by analyzing statspack reports.
More details on the self-tuning and maintenance features in Oracle 10g can be found in this Oracleworld white paper by Sushil Kumar.
9. Asynchronous Change Data Capture
Oracle Change Data Capture was introduced with Oracle 9i, and provided the ability to track changes to tables and store them in a change table, for further consumption by an ETL process. Oracle 9i Change Data Capture worked by creating triggers on the source tables, transferring data synchronously but creating a processing overhead and requiring access to the structure of the source tables. Because of the effect that the triggers had on the underlying tables, many warehouse projects did without change data capture and used other methods to capture changes.
Oracle 10g introduces Asynchronous Change Data Capture, which instead of using triggers uses the database log files to capture changes and apply them to collection tables. Asynchronous Change Data Capture therefore doesn't require changes to the table structure and doesn't impact on database performance.
More details on Asynchronous Change Data Capture can be found in this OTN white paper, which also details improvements to the transportable tablespaces feature in Oracle Database 10g (tablespaces no longer need to be sourced from the same O/S platform).
10. Improvements To Oracle Data Mining
Alongside the inclusion of the Oracle Express multidimensional OLAP engine, Oracle 9i also embedded data mining functionality in the database together, and this data mining functionality has been enhanced with Oracle Database 10g. Oracle Database 10g adds support for two new classification routines, 'Support Vector Machine' (used for top-down rather than a bottom-up calculations, assuming the best possible fit and then working backwards to what can be achieved) and Non-Negative Matrix Factorisation, together with support for Frequent Itemsets, used for such functions as market basket analysis and propensity analysis.
More details on what's new with Oracle 10g Data Mining can be found in the online documentation.
Mark Rittman is a Certified Oracle Professional DBA and works as a Consulting Manager at Plus Consultancy, specializing in developing BI & DW applications using the Oracle Database, Application Server, Discoverer, Warehouse Builder and Oracle OLAP. Outside of Plus Consultancy, Mark chairs the UKOUG BI & Reporting Tools SIG, and runs a weblog dedicated to Oracle BI & Data Warehousing technology.
Mark recently was awarded an OTN Community Award for contributions to the Oracle developer community, and is a regular speaker at Oracle User events in both the UK and Europe.