BI & Warehousing
GoldenGate and Oracle Data Integrator – A Perfect Match in 12c… Part 2: Journalizing Knowledge Module
This is the second post in a blog series on the integration between Oracle Data Integrator (ODI) 12c and GoldenGate 12c. The first post focused on the latest, yet-to-be-released, Oracle Information Management Reference Architecture and some high-level features new to the 12c versions of each product. We also began working through the details of the GoldenGate installation and configuration, specifically the JAgent setup, which is necessary for communication between ODI and GoldenGate during the setup of “online” journalizing. In this post, we’ll look at the new features of the Journalizing Knowledge Module “JKM Oracle to Oracle Consistent (OGG Online)” and get started on the modifications to the JKM that must occur for us to load the Raw Data Reservoir (RDR – acronym coined by Nick Hurt in the comments of my first post…I like it!) and Foundation Layer simultaneously.ODI Journalizing
Before I get into editing the JKM, let me briefly go through an introduction of ODI Journalizing, aka Change Data Capture (CDC). In case you missed it earlier, ODI CDC is implemented using a Journalized Knowledge Module. The JKM generates the infrastructure for the CDC, creating journal tables that store the change rows and views that provide access to the change rows for use in ODI Mappings. ODI CDC can be implemented using various capture processes, such as triggers on source tables, timestamps on rows, or mining of the database logs via Oracle Streams or, in our case, Oracle GoldenGate. A great explanation of ODI JKMs and how they work, written by Christophe Dupupet of the Oracle Data Integration A-Team, can be found here.
The integration between GoldenGate and ODI begins with a Journalized Knowledge Module, in our example we’ll be using the “JKM Oracle to Oracle Consistent (OGG Online)”. The JKM, delivered with ODI, will create the GoldenGate parameter files, configure GoldenGate process groups (and start them up), and generate the ODI CDC tables and views. The JKM, added to the source Model in 12c, uses the ODI metadata to generate the GoldenGate parameter file mappings. This alone saves quite a bit of manual work and reduces possible typos caused by human error.JKM Oracle to Oracle Consistent (OGG Online)
In the previous post, I mentioned the new capability of the JKM that allows for an “online” integration between ODI and GoldenGate. But, there are many other new features that need to be described, so we’ll walk through those here.ODI Tool: OdiOggCommand
The JKM uses an undocumented ODI Tool called OdiOggCommand in the target command of some tasks that are executed only when in “online” mode. This tool has different values for a parameter called OPERATION.
EXECUTECMD: executes various commands, such as Add Extract, within GGSCI
EXECUTEOBEY: runs the OBEY command against an obey file in GGSCI
DEFGEN: generates the source definitions file by executing the DEFGEN command
SAVEFILE: uploads the parameter and obey files to the GoldenGate installation directory
I imagine the code behind this ODI Tool is simply executing command line calls to GGSCI, DEFGEN, etc. It would be great to see some Oracle documentation on this one!GoldenGate Topology
The “online” aspect of the JKM requires that a Data Server, Physical Schema and Logical Schema all be setup under the GoldenGate technology in ODI. The Data Server contains the location and connection information for communicating with the JAgent on either the source or target GoldenGate server. Under that, a Physical Schema must be setup for each extract (the pump is included) and replicat process group that will be implemented on that server.
In this screenshot, we have a Physical Schema with the process type set to Capture, also called the extract. Here we set up the directory path on the source to the location where captured transactions from the source database logs will be stored in GoldenGate’s own log files, called trail files. The remote trail file directory, setup on the target server and accessed by the pump process in order to move transactions from the source trail to the target trail, must also be added. Additional options such as trail file size (how large a trail file can get before rolling over to the next file) are also available to be set. Just below the Capture Process Properties are the Additional Options, parameters that can be added to the extract or replicat parameter files based on specific needs of the solution.
Here we can add options to handle different aspects of the extract or replicat, such as TRANLOGOPTIONS on the extract. Once I add this option, I get a template of the additional parameters I can configure. For example, I may want to access my transaction logs stored in ASM via the database rather than directly from ASM. The DBLOGREADER option lets me utilize the ASM API in the database, simplifying my access to the logs.
TRANLOGOPTIONS DBLOGREADER, DBLOGREADERBUFSIZE 2597152
When I add the additional option to the capture physical schema, it will be generated as a part of the extract parameter file. This helps to drastically reduce the amount of manual editing that had to occur after the parameter files were generated in the 11g version, and is a great addition to the JKM.JKM Setup and Options
The JKM is actually applied to the source ODI Model, which is a grouping of logical table structures called Datastores. On the Journalizing tab in the Model, we first set the type (Consistent Set) and choose the appropriate JKM.
In the GoldenGate Process Selection section, we choose the Capture and Delivery Logical Schemas that were setup in the ODI Topology. If none have been created, but the Data Server for the source and target GoldenGate installations exist, we can choose to create the Physical and Logical Schemas directly from the Model by clicking the Create button. This is a nifty way to separate the system administration role (setting up the Data Server) from the metadata management or developer role.
The JKM also has a set of Options with configurable values, some being required and others optional. The number of Options has been reduced by quite a bit in the 12c JKM. This makes sense, as we’ve seen that additional metadata such as the trail file location and trail file size are set elsewhere. We’ll go through these in more detail later on when setting the values in our example.
That covers the updated features in the JKM, now let’s talk Knowledge Module customization.JKM Customization
In the first post, I described how we want to extract the source data once and replicate it into both the Foundation Layer and Raw Data Reservoir in parallel. The JKM is set up to load the ODI CDC Framework (the fully replicated table and J$ table) out of the box, but not the Foundation table. In case you’re unfamiliar with the purpose of the Foundation layer, the idea is to store all transactional history from the source tables by converting every transaction into an insert, and tracking the type of change (insert / update / delete), commit date, and commit SCN. With this information stored from the beginning of the data warehouse, it can be used for historical drill-throughs from the dimensional model, or to completely reload a star schema – including all history. With the stage set, let’s look at what we’re going to change in the JKM.
1. Add New Option “APPLY_FOUNDATION”
This option, when true, will allow the Start Journal process to generate the source-to-foundation mapping statement in the Replicat (apply) process.
2. Add New Option “FND_LSCHEMA”
The Logical Schema name for the Foundation layer schema.
3. Add New Task “Create apply prm (4) RM”
This task will create the source-to-foundation mapping code and add it to the replicat parameter file.
4. Set Option on “Create apply prm (4) RM” Task
Set the execution Options to have APPLY_FOUNDATION as the only checked option. This will determine whether or not that task will execute when Start Journal is run.
5. Edit Task “Execute apply commands online RM”
We’ll comment out the “start replicat …” command, as we’ll first need to complete an initial load of the source data to the target.
Now, I did say I would get into the editing of the JKM in this post, but after describing the new features of the Knowledge Module, I’ll save the actual details for the part 3 of the series! Up next, editing the JKM and building the necessary ODI objects so we can start journalizing.
Back in November, I checked to see the frequency of having incremental numbers in the date, like 11/12/13 (my birthday) and 12/13/14 (kate's birthday). I don't want to hear how the rest of the world does their dates either, I know (I now write my dates like YYYY/MM/DD on everything, just so you know, that way I can sort it...or something).
Anyway, SQL to test out the claim of once every 823 years. Yay SQL.
OK, I'm not going to go into the steps necessary because I'm lazy (and I'm just lucky to be writing here), so here it is:
select *So over the next 50,000 days, this happens 138 times. I'm fairly certain that doesn't rise to the once every 823 years claim. But it's cool, maybe.
to_char( d, 'yyyymm' ) year_month,
when to_char( d, 'fmDay' ) = 'Saturday' then 1
end ) sats,
when to_char( d, 'fmDay' ) = 'Sunday' then 1
end ) suns,
when to_char( d, 'fmDay' ) = 'Friday' then 1
end ) fris
select to_date( 20131231, 'yyyymmdd' ) + rownum d
connect by level <= 50000
to_char( d, 'yyyymm' )
where fris = 5
and sats = 5
and suns = 5
YEAR_MONTH SATS SUNS FRISI'm not the only dork that does this either, here's one in perl. I'm sure there are others, but again, I'm lazy.
---------- ---------- ---------- ----------
201408 5 5 5
201505 5 5 5
201601 5 5 5
201607 5 5 5
201712 5 5 5
128 more occurrences...
214607 5 5 5
214712 5 5 5
214803 5 5 5
214908 5 5 5
215005 5 5 5
138 rows selected
Most OBIEE projects that we see are delivered through some sort of “waterfall” method, where requirements are defined up-front, there’s several stages of development, one or more major releases at the end, and any revision to requirements takes the form of a change request. These work well where requirements can be defined upfront, and can be reassuring to customers when they want to agree a fixed-price up-front with every subsequent change clearly costed. But, as with the development world in general, some customers are starting to look at “agile” methods for delivering BI projects, where requirements emerge over the course of a project, there isn’t so much of a fixed design or specification at the start, but instead the project adds features or capabilities in response to what are called “user stories”, making it more likely in-the-end that what ends-up getting delivered is more in-line with what users want – and where changes and additions to requirements are welcomed, rather than extra-cost change requests.
OBIEE naturally lends itself to working in an agile manner, through the three-layer nature of the repository (RPD); by separating the physical representation of the source data from how it is then presented to the end-users, you can start from the off with the dimensional model that’s your end goal, and then over time evolve the back-end physical layer from pointing directly at the source system to instead point at a data warehouse or OLAP cube. In fact, I covered this approach back in 2008 in a blog post called “A Future Oracle OBIEE Architecture” where I positioned OBIEE’s BI Server as a “business logic layer”, and speculated that at some point in the future, OBIEE might be able to turn the logical > physical mappings in the RPD into actual ODI mappings and transformation.
In the end, although OBIEE’s aggregate persistence feature gave us the ability to spin-off aggregate tables and cubes from the RPD logical model, full ETL “push-down” never came although you can see traces of it if you have a good poke around the DLLs and directories under the BI Server component. What did happen though was Exadata; with Exadata, features such as SmartScan, and its ability to do joins between normalised tables much faster than regular databases meant that it became possible to report directly against an OLTP schema, or a ODS-like foundation layer, only adding ETL to build a performance star schema layer if it was absolutely necessary. We covered this in a series of posts on Agile Data Warehousing with Exadata, and the focus of this method was performance – by adding Exadata, and the metadata flexibility in OBIEE’s RPD, we could deliver agile projects where Exadata gave us the performance even when we reported directly against a third-normal form data source.
And this approach worked well for our customers; if they’d invested in Exadata, and were open to the idea of agile, iterative development, we could typically deliver a working system in just a few months, and at all times what the users got was what they’d requested in their user story backlog. But there were still ways in which we could improve this method; not everyone has access to Exadata, for example, and reporting directly against a source system makes it tricky to add DW features like history, and surrogate keys, so recently we introduced the successor to this approach, in the form of an OBIEE development method we called “ExtremeBI”. Building our previous agile work, ExtremeBI introduced an integration element, using GoldenGate and ODI to replicate in real time any source systems we were interested in to the DW foundation layer, add the table metadata that DW systems expect, and then provide a means to transform the logical to physical RPD mappings into ODI ETL specifications.
But in a way, all the technical stuff is by-the-by; what this means in practice for customers is that we deliver working systems from the first iteration; initially, by reporting directly against a replicated copy of their source system (with replication and metadata enhancement by GoldenGate, and optionally ODI),and then over subsequent iterations adding more end-user functionality, OR hardened ODI ETL code, all the while driven by end-user stories and not some technical design signed-off months ago and which no longer reflects what users actually want.
What we’ve found though from several ExtremeBI customer engagements, is that it’s not just down to the technology and how well ODI, OBIEE and GoldenGate work; the major factors in successful projects are firstly, having the project properly pre-qualified at the start; not every project, and not every client, suits agile working, and agile works best if you’re “all in” as opposed to just agreeing to work in sprints but still having a set-in-stone set of requirements which have to be met at a certain time. The second important success factor is proper project organisation; we’ve grown from just a couple of guys with laptops back in 2007 to a fully-fledged, end-to-end development organisation, with full-time delivery managers,a managed services desk and tools such as JIRA, and you need to have this sort of thing in place, particularly a project management structure that’s agile-friendly and a good relationship with the customer where they’re fully-signed up to the agile approach. As such, we’ve found the most success where we’ve used ExtremeBI for fairly technically-savvy customers, for example a MIS department, who’ve been tasked with delivering something for reasonable price and over a short amount of months, who understand that not all requirements can be delivered, but really want their system to get adopted, delight their customer and focus its features on what’s important to end-users.
As well as processes and a method, we’ve also developed utilities and accelerators to help speed-up the initial setup, and ensure the initial foundation and staging layers are built consistently, with GoldenGate mappings already put in place, and ready for our developers to start delivering reports against the foundation layer, or use these foundation-layer tables as the basis of a data mart or warehouse build-out. The screenshot below shows this particular tool, built using Groovy and run from within the ODI Studio user interface, where the developer selects a set of source tables from an ODI model, and then the utility builds out the staging and foundation layers automatically, typically saving days over the manual method.
We’ve also built custom KMs for ExtremeBI, including one that uses Oracle Database’s flashback query feature to pull historical transactions from the UNDO log, as an alternative to Oracle Streams or Oracle GoldenGate when these aren’t available on the project.
All together, using Rittman Mead’s ExtremeBI method along with OBIEE, ODI and optionally GoldenGate has meant we’ve been able to deliver working OBIEE systems for customers over just a few months, typically for a budget less than £50k. Coupled with cloud hosting, where we can get the customer up-and-running immediately rather than having to wait for their IT department to provision servers, we think this the best way for most OBIEE11g projects to be delivered in the future. If you’re interested, we’ve got more details on our “ExtremeBI in the Cloud” web page, or you can contact me via email – email@example.com – if you’d like to discuss it more,
Found a very good paper: http://research.microsoft.com/pubs/204499/a20-appuswamy.pdf
This paper discuss if it is a right approach of using Hadoop as the analytics infrastructure.
It is hard to argue with the industry trend. However, Hadoop is not
new any more. It is time for people to calm down and rethink about the