Skip navigation.

Other

Javascript Driven ADF Taskflows for WebCenter Portal

This is a continuation from my previous post - Developing WebCenter Content Cross Platform iDoc Enabled Components for Mobile, ADF, Sharepoint, Liferay.

You can see a video of JIVE Forums integration with a JS Taskflows vs ADF Taskflow running in WebCenter Portal here -

Click here for hi-resolution

This post is aimed at Web Developers, Designers and Marketing web teams who aren’t familiar with ADF and want to create reusable dynamic taskflows without the need to learn ADF or Java to provide interactive dynamic regions using Javascript, HTML and CSS with custom frameworks like jQuery designed not to conflict with ADF JS environment.

Read on for a step by step run through on creating JS driven taskflows  -

    1. You will need to download JDeveloper – I’m using JDev 11.1.1.7.0 for WebCenter Portal 11g where I will deploy my custom taskflow driven entirely with Javascript.
    2. Run through the following Oracle guide to setup your project to extend Portal (11.1.1.8.3) - Developing Components for WebCenter Portal Using JDeveloper
    3. Add new taskflow to library by right-clicking WebCenterSpacesExtensions and selecting “New…”
    4. Add ADF Task Flow (JSF)
      .
      1
      .
    5. Name the xml file, leaving the Directory the JDev default
      .
      2
      .
    6. Double click the new xml file and drag a View element into the diagram from the Component Palette
      .
      3
      .
    7. Rename “view1″ to “[taskflow name]View”.
    8. Double click the new view to create a page fragment.
      Update the directory and add \taskflows\[taskflow name]\view
      This will make it easier to sort through in the future when you develop more taskflows.
      .
      4
      .
    9. Edit the JSFF and display code in source view.
      .
      5
      .
    10. Replace with the following -
      <?xml version='1.0' encoding='UTF-8'?>
      <jsp:root xmlns:jsp="http://java.sun.com/JSP/Page" version="2.1"
                xmlns:af="http://xmlns.oracle.com/adf/faces/rich"
                xmlns:f="http://java.sun.com/jsf/core">
      <af:resource type="javascript">
      <![CDATA[
      /**
       * CREATE BASE JS CONTAINER OBJ
       * This is base class to assist PSA javascript methods to init after page loaded.
       * You can add this script in the head of you template instead of the portlet.
       */
      var FB = window.FB || {},
      	Base = Base || (function() {
      		return {
      			//create multi-cast delegate.
      			onPortalInit: function(function1, function2) {
      				return function() {
      					if (function1) {
      						function1();
      					}
      					if (function2) {
      						function2();
      					}
      				}
      			},
      			//used for chaining methods
      			chainPSA: function() {}
      		}
      	})();
      
      //Use Base method if FB.Base hasn't been created
      FB.Base = FB.Base || Base;
      /************************/
      
      
      
      
      /**
       * CREATE CHAIN WRAPPER
       * Chain method will initialise from Base requirejs core script
       */
      FB.Base.chainPSA = FB.Base.onPortalInit(FB.Base.chainPSA, function() {
      	//set base mustache template name to load and inject
      	var vUID = 'FB_sampleContainer_${pageFlowScope.containerID}', //(UID) Unique Classname to inject template into - can't use IDs in portal 
      		oConstructor = {
      			vTemplate: 		'import/tpl/sampleTpl', //location of sampleTpl.mustache to load
      			oParams: { //Obj list of default params pulled from sample.xml Input definition
      				title:			'${pageFlowScope.title}',
      				displayTitle: 	'${pageFlowScope.displayTitle}',
      				activeUser: 	'${pageFlowScope.activeUser}'
      			},
      			containerID: 		vUID
      		};
      	
      	//check if array exists from other custom JS Portlets
      	if (typeof(FB.loadTemplate) === 'object') {
      		FB.loadTemplate.portletUIDList.push(vUID);
      	//create empty object
      	} else {
      		FB.loadTemplate = {
      			portletUIDList:[vUID],
      			portlets: {}
      		};
      	}
      
      	//inject params
      	FB.loadTemplate.portlets[vUID] = oConstructor;
      
      });
      /************************/
      ]]>
      </af:resource>
      
      
      <!-- Sample template will be injected here -->
      <af:panelGroupLayout layout="vertical" id="FB-SampleContainer" styleClass="FB_sampleContainer_#{pageFlowScope.containerID} portlet-sampleContainer"></af:panelGroupLayout>
      <!-- xSample template will be injected here -->
      
      
      </jsp:root>

      OVERVIEW:

      This is where the mustache template will be injected into to provide the sample component functionality.

    11. <af:panelGroupLayout layout="vertical" id="FB-SampleContainer" styleClass="FB_sampleContainer_#{pageFlowScope.containerID} portlet-sampleContainer"></af:panelGroupLayout>

      The oConstructor specifies the configuration of the the component to inject.
      vTemplate points to a JS file that requireJS imports and configures the base multiUploader components from the params defined.

      oParams contains all configuration for the App at the moment these are scoped params associated with the taskflow that you can allow the user to define and use within you sample component as a JS var.

      var vUID = 'FB_sampleContainer_${pageFlowScope.containerID}', //(UID) Unique Classname to inject template into - can't use IDs in portal 
      		oConstructor = {
      			vTemplate: 		'import/tpl/sampleTpl', //location of sampleTpl.mustache to load
      			oParams: { //Obj list of default params pulled from sample.xml Input definition
      				title:			'${pageFlowScope.title}',
      				displayTitle: 	'${pageFlowScope.displayTitle}',
      				activeUser: 	'${pageFlowScope.activeUser}'
      			},
      			containerID: 		vUID
      		};

      A simple check to see if other components exist on the page and append the new component within the JS Array “PortletUIDList” associated with a JS Object holding the component params in “portlets”

      //check if array exists from other custom JS Portlets
      	if (typeof(FB.loadTemplate) === 'object') {
      		FB.loadTemplate.portletUIDList.push(vUID);
      	//create empty object
      	} else {
      		FB.loadTemplate = {
      			portletUIDList:[vUID],
      			portlets: {}
      		};
      	}
      
      	//inject params
      	FB.loadTemplate.portlets[vUID] = oConstructor;

      Finally the JS configuration is wrapped in JS chain wrapper that will only initialise when requireJS has loaded in all its core base libraries like Jquery etc.

      FB.Base.chainPSA = FB.Base.onPortalInit(FB.Base.chainPSA, function() {
      
      //code
      
      });

      Make sure within your ADF Template you have setup requirejs core and have the following to initialise the FB.Base.chainPSA and loop through the custom taskflows to display on the page -

      //load JS Components
      		if (FB.Base.chainPSA) {
      			FB.Base.chainPSA();
      		}

      //loop and request all templates required
      			for (x;x<lPortletList;x++) {
      				var vPortletUID 	= aPortletList[x],
      					oPortlet 		= FB.loadTemplate.portlets[vPortletUID];
      				
      				//define temp object info to pass into script when init	
      				define('temp'+x, oPortlet);
      				
      				//request and initialise portlet template & pass params
      				require([oPortlet.vTemplate,'temp'+x], function(tpl,oPortlet) {
      					console.log('[IMPORTED TEMPLATE]',tpl.component,oPortlet);
      					tpl.init(oPortlet);
      				});
      			}

    12. To add taskflow parameters open the xml file again.
    13. Select Overview tab bottom left of the screen.
      Select the Parameters side tab.
      Add the following four example params -
      .
      6
      You will see these when we add and edit the taskflow to a portal page in WebCenter Composer.
    14. Deploy the taskflow to WebCenter Portal following the last steps in the Oracle GuideOnce the new taskflow / spaces extension project has been deployed load WebCenter Portal.
      The following screenshots from PS5 the UI has changed since PS7 but you should be able to work out the differences.
    1. Go into administration area of the portal and select the “Resources” Tab
    2. Select the “Resource Catalogs” from the items on the left under the “Structure” heading.
      A list of Resource Catalogs will be available. You can create a new one or use an existing one. Make sure the one you are updating is the one being used by the portal you want to add the taskflow into.
      .
      7
      .
    3. Select the resource catalogue and Edit from the Edit Menu drop down down.
      .
      8
      .
    4. A window will appear hear you can add folders and where you want your components to appear.
      I have created a Demo Taskflow folder.
    5. Select “Add From Library” from the Add dropdown menu.
      .
      9
      .
    6. Drill into Taskflows and add your [Taskflow] – I am adding the sample taskflow I created earlier.
    7. Go into your portal create a new page and add the new taskflow.
      Here is an example of the Jive Forums that I recreated as a JS driven taskflow.
      .
      11
      .
      12
      .
    8. And the final output of the taskflow on the page.
      13

 

 

 

The post Javascript Driven ADF Taskflows for WebCenter Portal appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

Developing WebCenter Content Cross Platform iDoc Enabled Components for Mobile, ADF, Sharepoint, Liferay

frankensteinSo over the last couple of months I’ve been thinking and tinkering with code, wondering, “What’s the best approach for creating WebCenter Content (WCC) components that I can consume and reuse across multiple platforms and environments?”
Is it pagelet producer or maybe an iFrame? These solutions just weren’t good enough or didn’t allow the flexibility I really wanted.

I needed a WCC Solution that could easily be consumed into mobile, either Cordova (Hybrid APP) or ADF Mobile (AMX views), and that worked on different devices/platforms as well as on any enterprise app, i.e. Sharepoint (.Net), Lifreray,  WebCenter Portal (ADF) or even consumed into the new WebCenter Content ADF WebUI. It also needed to provide the added advantage that there would not need to be multiple branches of code or redevelopment of the component for each platform and environment.

And in the famous words of Victor Frankenstein.. “It’s Alive!!”

After tinkering around and trying different approaches, this is the solution I created to support the above model.
I’m not saying this is the right approach or supported by the enterprise vendors, but an approach that is reusable and can work on all enterprise apps.

 

[VIDEO CONVERTING]…

Here’s a quick video of a drag/drop MultiUploader component I created for WebCenter Content Classic that I can reuse on .Net and ADF WebCenter Portal/Content as well as mobile.

Read on to find out more on how this was achieved.

1) First, I’m going to dig into WebCenter Content and explain the underlying structure of the component.

To create a flexible base model, I created a light Javascript framework, very similar to AngularJS or ReactJS.

This would be the base component that would enable additional components on the page with the use of Mustache (JS templates) to drive and inject dynamic functional areas of content into a specified DOM node by ID or className.
Any changes of layout with the component are handled via an AJAX request to a cached mustache template which updates the DOM when needed (similar to ADFs PPR). Any user interaction is handled through event-driven actions from the imported templates.

RequireJS is used to supply a flexible module loading framework, where I do not need to be worried over conflicts of JS libraries and is used to load in mustache templates and additional JS functionality when needed.

You’re probably thinking that there are going to be a lot of AJAX requests going back and forth and it’s going to be slow. Just check out the video – the answer is not really. The mustache templates are going to be smaller than average images you load on a page.

So as an example for the MultiUploader, I only have 1 mustache template that is 9kb. All interaction is handled by 2 JS files that are 39kb uncompressed.

2) As mentioned, a base model WCC component, “FishbowlModuleLoader”, will load in and initiate all other components on the page and will only load and cache required templates and JS files as and when is needed. There is no point to load in all templates and JS functionality on a page if it is not needed, which improves performance and interaction of the component.

3) Following is a quick overview of how the WCC component “FishbowlMultiUploader” works.

WebCenter Content Resource Asset

This is the base structure of the Content Component configuration, “fb_multi_upload_page_body”. It is consumed into a custom template, “MULTI_UPLOAD_PAGE”, which is requested via a custom service request, “?IdcService=GET_FB_MULTI_UPLOAD_PAGE”.

<!--
Name:           fb_multi_upload_page_body
Author:         John Sim  [18/06/2014]
Parameters:		
Description:	Page Body for Multi Checkin used in MULTI_UPLOAD_PAGE template
-->
<@dynamichtml fb_multi_upload_page_body@>
[[% FB fb_multi_upload_page_body Template body MULTI_UPLOAD_PAGE %]]

<div id="FB-multiCheckin" class="FB_multiCheckin"></div>

<script>
/**
 * CREATE CHAIN WRAPPER
 * Chain method will load from Base ModuleLoader requirejs core script
 */
FB.Base.chainPSA = FB.Base.onPortalInit(FB.Base.chainPSA, function() {
	//set base mustache template name to load and inject
	var vUID = 'FB_multiCheckin', //(UID) Unique Classname to inject template into - can't use IDs in portal 
		oConstructor = {
			vTemplate: 'import/tpl/multiUploadTpl', //location of template.mustache to load
			oParams: { //Obj list of default params pulled from multiUploader.xml Input definition
				maxUploadSize:			'10mb',
				defaultDocType:			('<$multiUploadDefaultType$>' !== '')? '<$multiUploadDefaultType$>': 'Document', 
				defaultSecurityGroup:		('<$multiUploadDefaultSecurityGroup$>' !== '')? '<$multiUploadDefaultSecurityGroup$>': 'Public',
				defaultAccount:			'Workspace/'+userName, 
				author:				(typeof(userName) !== 'undefined')? userName: '', 
				httpEnterpriseCgiPath: 		(typeof(httpEnterpriseCgiPath) !== 'undefined')? httpEnterpriseCgiPath: '',
				idcToken: 			(typeof(idcToken) !== 'undefined')? idcToken: '',
				httpWebRoot: 			(typeof(httpWebRoot) !== 'undefined')? httpWebRoot: '',
				enableTagging:			true,
				enableEmails:			true,
				enableBarcode:			true,
				enableCheckinProfiles: 		true,
				showHelpOption: 		true
			},
			containerID: 		vUID
		};
	
	//check if array exists from other custom JS Portlets
	if (typeof(FB.loadTemplate) === 'object') {
		FB.loadTemplate.portletUID.push('FB_multiUploadContainer_' + vUID);
	//create empty object
	} else {
		FB.loadTemplate = {
			portletUIDList:['FB_multiUploadContainer_' + vUID],
			portlets: {}
		};
	}

	//inject params
	FB.loadTemplate.portlets['FB_multiUploadContainer_' + vUID] = oConstructor;
});
/************************/
</script>


<@end@>

This is where the mustache template will be injected into to provide the multiUpload component functionality.

<div id="FB-multiCheckin" class="FB_multiCheckin"></div>

The oConstructor specifies the configuration of the the component to inject.
vTemplate points to a JS file that requireJS imports and configures the base multiUploader components from the params defined.

oParams contains all configuration for the app; at the moment, these are mostly hard coded, but could be defined as iDoc Variables when you install and enable the component within WCC.

var vUID = 'FB_multiCheckin', //(UID) Unique Classname to inject template into - can't use IDs in portal 
		oConstructor = {
			vTemplate: 'import/tpl/multiUploadTpl', //location of template.mustache to load
			oParams: { //Obj list of default params pulled from multiUploader.xml Input definition
				maxUploadSize:			'10mb',
				defaultDocType:			('<$multiUploadDefaultType$>' !== '')? '<$multiUploadDefaultType$>': 'Document', 
				defaultSecurityGroup:		('<$multiUploadDefaultSecurityGroup$>' !== '')? '<$multiUploadDefaultSecurityGroup$>': 'Public',
				defaultAccount:			'Workspace/'+userName, 
				author:				(typeof(userName) !== 'undefined')? userName: '', 
				httpEnterpriseCgiPath: 		(typeof(httpEnterpriseCgiPath) !== 'undefined')? httpEnterpriseCgiPath: '',
				idcToken: 			(typeof(idcToken) !== 'undefined')? idcToken: '',
				httpWebRoot: 			(typeof(httpWebRoot) !== 'undefined')? httpWebRoot: '',
				enableTagging:			true,
				enableEmails:			true,
				enableBarcode:			true,
				enableCheckinProfiles: 		true,
				showHelpOption: 		true
			},
			containerID: 		vUID
		};

This is a simple check to see if other components exist on the page and append the new component within the JS Array “PortletUIDList” associated with a JS Object holding the component params in “portlets”.

//check if array exists from other custom JS Portlets
	if (typeof(FB.loadTemplate) === 'object') {
		FB.loadTemplate.portletUID.push('FB_multiUploadContainer_' + vUID);
	//create empty object
	} else {
		FB.loadTemplate = {
			portletUIDList:['FB_multiUploadContainer_' + vUID],
			portlets: {}
		};
	}

	//inject params
	FB.loadTemplate.portlets['FB_multiUploadContainer_' + vUID] = oConstructor;

Finally, the JS configuration is wrapped in JS chain wrapper that will only initialize when required. JS has loaded in all its core base libraries like Jquery, etc.

FB.Base.chainPSA = FB.Base.onPortalInit(FB.Base.chainPSA, function() {

//code

});

 

4) So lets take a look at how the base component “FishbowlModuleLoader” works.

Essentially, this defines the FB.Base.chainPSA chain wrapper method in the header – does not need jquery or any other library.

<!--
Name:           std_html_head_declarations
Author:         John Sim  [18/06/2014]
Parameters:		
Description:	Add required header resources
-->
<@dynamichtml std_html_head_declarations@>
[[% FB std_html_head_declaration Update head add JS libs for module loader %]]

<$include super.std_html_head_declarations$>

<script>
/**
 * CREATE BASE JS CONTAINER OBJ
 * DONOT ADD JQUERY this is base class to assist PSA javascript methods to init after page loaded.
 */
var FB = window.FB || {},
	Base = Base || (function() {
		return {
			//create multi-cast delegate.
			onPortalInit: function(function1, function2) {
				return function() {
					if (function1) {
						function1();
					}
					if (function2) {
						function2();
					}
				}
			},
			//used for chaining methods
			chainPSA: function() {}
		}
	})();

//Use Base method if FB.Base hasn't been created
FB.Base = FB.Base || Base;
/************************/
</script>

<@end@>

You could cache this and put it in a script file, I’ve just put it inline easier for you to read.

In the footer, we define requireJS and the configuration that loads in base libraries that we need for all components ie Jquery and maybe a few others.
Also we setup fb.core.js as our base script to import and load in the core framework I built to handle routing and DOM event interaction as well as global vars.

<!--
Name:           std_page_end
Author:         John Sim  [18/06/2014]
Parameters:		
Description:	Component Module Loader RequireJS setup
-->
<@dynamichtml std_page_end@>
[[% FB std_page_end Add Module Loader RequireJS lib %]]

<$include super.std_page_end$>


<!-- Init FB Component Module Loader -->
<script src="<$HttpWebRoot$>resources/FishbowlModuleLoader/js/core/config.js"></script>
<script src="<$HttpWebRoot$>resources/FishbowlModuleLoader/js/libs/requirejs/require.min.js" data-main="fb.core"></script>
<!-- Init FB Component Module Loader -->
<@end@>

fb.core.js so here is where the magic begins:

// REQUIREJS Base configuration
require([
	//Dom ready req plugin
	'domReady',
	
	
	//core 
	'import/Layout',
	'import/Action',
	'import/Navigation',
	'import/Global',
	
	
	//Plugins
	'Moment',		//date plugin momentjs
	'ftlabsFastClick', 	//fix touch 300ms delay
	'fb'			//fb global methods
	

	
], function(domReady, Layout){
console.info('[ALL MODULES LOADED]');

	domReady(function() {
		console.info('[DOM READY]');
		
		//initialise layout DOM events ie click, touch etc.
		Layout.init();
		
		//load JS Components
		if (FB.Base.chainPSA) {
			FB.Base.chainPSA();
		}
		
		//check if any JS driven template containers exist
		if (typeof(FB.loadTemplate) !== 'undefined') {
			var aPortletList 	= FB.loadTemplate.portletUIDList,
				lPortletList 	= aPortletList.length,
				x 				= 0;
				
			//loop and request all templates required
			for (x;x<lPortletList;x++) {
				var vPortletUID 	= aPortletList[x],
					oPortlet 		= FB.loadTemplate.portlets[vPortletUID];
				
				//define temp object info to pass into script when init	
				define('temp'+x, oPortlet);
				
				//request and initialise portlet template & pass params
				require([oPortlet.vTemplate,'temp'+x], function(tpl,oPortlet) {
					console.log('[IMPORTED TEMPLATE]',tpl.component,oPortlet);
					tpl.init(oPortlet);
				});
			}
		}
		
	});
	
});

Once the Dom has fully loaded, FB.Base.chainPSA(); is initiated. This sets up and configures the FB.loadTemplate object that contains all information associated to required components that will need to be loaded into the page.

Here we loop through and load in all templates, and pass across the component configuration to the templates to be initialized:

//loop and request all templates required
			for (x;x<lPortletList;x++) {
				var vPortletUID 	= aPortletList[x],
					oPortlet 		= FB.loadTemplate.portlets[vPortletUID];
				
				//define temp object info to pass into script when init	
				define('temp'+x, oPortlet);
				
				//request and initialise portlet template & pass params
				require([oPortlet.vTemplate,'temp'+x], function(tpl,oPortlet) {
					console.log('[IMPORTED TEMPLATE]',tpl.component,oPortlet);
					tpl.init(oPortlet);
				});
			}

And thats all there is to it.

5) Lets dig into WebCenter Portal now. How can you reuse all that code you’ve written for WebCenter Content Classic within ADF?

Easy: let’s create a JS driven taskflow template that we can dump into the resource catalogue and drag, drop, and reuse it throughout any page where ever it is needed.

I’ve created a new post for this part:
Read on here to find out how to create JS Driven Taskflow templates.

 

Some gotcha’s - 

Some things to think about if you do decide to use this approach.

  1. You will need to make sure that all AJAX requests are made on the same domain.
    1. or enable CORs from UCM to accepts requests cross domain. (Mobile works crossdomain)
  2. WCC needs to be accessible by the users browser
    1. You can setup a proxy service and only allow access to the custom services you require to lock down other UCM environment access if needed.

And finally - one thing that comes to mind here: I am using static mustache templates but there is nothing stopping you from creating a custom WCC service to generate mustache templates with embedded idoc if you want..

The post Developing WebCenter Content Cross Platform iDoc Enabled Components for Mobile, ADF, Sharepoint, Liferay appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

Teradata bought Hadapt and Revelytix

DBMS2 - Wed, 2014-07-23 02:29

My client Teradata bought my (former) clients Revelytix and Hadapt.* Obviously, I’m in confidentiality up to my eyeballs. That said — Teradata truly doesn’t know what it’s going to do with those acquisitions yet. Indeed, the acquisitions are too new for Teradata to have fully reviewed the code and so on, let alone made strategic decisions informed by that review. So while this is just a guess, I conjecture Teradata won’t say anything concrete until at least September, although I do expect some kind of stated direction in time for its October user conference.

*I love my business, but it does have one distressing aspect, namely the combination of subscription pricing and customer churn. When your customers transform really quickly, or even go out of existence, so sometimes does their reliance on you.

I’ve written extensively about Hadapt, but to review:

  • The HadoopDB project was started by Dan Abadi and two grad students.
  • HadoopDB tied a bunch of PostgreSQL instances together with Hadoop MapReduce. Lab benchmarks suggested it was more performant than the coyly named DBx (where x=2), but not necessarily competitive with top analytic RDBMS.
  • Hadapt was formed to commercialize HadoopDB.
  • After some fits and starts, Hadapt was a Cambridge-based company. Former Vertica CEO Chris Lynch invested even before he was a VC, and became an active chairman. Not coincidentally, Hadapt had a bunch of Vertica folks.
  • Hadapt decided to stick with row-based PostgreSQL, Dan Abadi’s previous columnar enthusiasm notwithstanding. Not coincidentally, Hadapt’s performance never blew anyone away.
  • Especially after the announcement of Cloudera Impala, Hadapt’s SQL-on-Hadoop positioning didn’t work out. Indeed, Hadapt laid off most or all of its sales and marketing folks. Hadapt pivoted to emphasize its schema-on-need story.
  • Chris Lynch, who generally seems to think that IT vendors are created to be sold, shopped Hadapt aggressively.

As for what Teradata should do with Hadapt:

  • My initial thought for Hadapt was to just double down, pushing the technology forward, presumably including a columnar option such as the one Citus Data developed.
  • But upon reflection, if it made technical sense to merge the Aster and Hadapt products, that would be better yet.

I herewith apologize to Aster co-founder and Hadapt skeptic Tasso Argyros (who by the way has moved on from Teradata) for even suggesting such heresy. :)

Complicating the story further:

  • Impala lets you treat data in HDFS (Hadoop Distributed File System) as if it were in a SQL DBMS. So does Teradata SQL-H. But Hadapt makes you decide whether the data is in HDFS or the SQL DBMS, and it can’t be in both at once. Edit: Actually, see Dan Abadi’s comments below.
  • Impala and Oracle’s new SQL-H competitor have daemons running on every data node. So does one option in Hadapt. But I don’t think SQL-H does that yet.

I was less involved with Revelytix that with Hadapt (although I’m told I served as the “catalyst” for the original Teradata/Revelytix partnership). That said, Teradata — like Oracle — is always building out a data integration suite to cover a limited universe of data stores. And Revelytix’ dataset management technology is a nice piece toward an integrated data catalog.

Related posts

Categories: Other

Data integration as a business opportunity

DBMS2 - Sun, 2014-07-20 21:59

A significant fraction of IT professional services industry revenue comes from data integration. But as a software business, data integration has been more problematic. Informatica, the largest independent data integration software vendor, does $1 billion in revenue. INFA’s enterprise value (market capitalization after adjusting for cash and debt) is $3 billion, which puts it way short of other category leaders such as VMware, and even sits behind Tableau.* When I talk with data integration startups, I ask questions such as “What fraction of Informatica’s revenue are you shooting for?” and, as a follow-up, “Why would that be grounds for excitement?”

*If you believe that Splunk is a data integration company, that changes these observations only a little.

On the other hand, several successful software categories have, at particular points in their history, been focused on data integration. One of the major benefits of 1990s business intelligence was “Combines data from multiple sources on the same screen” and, in some cases, even “Joins data from multiple sources in a single view”. The last few years before application servers were commoditized, data integration was one of their chief benefits. Data warehousing and Hadoop both of course have a “collect all your data in one place” part to their stories — which I call data mustering — and Hadoop is a data transformation tool as well.

And it’s not as if successful data integration companies have no value. IBM bought a few EAI (Enterprise Application Integration) companies, plus top Informatica competitor Ascential, plus Cast Iron Systems. DataDirect (I mean the ODBC/JDBC guys, not the storage ones) has been a decent little business through various name changes and ownerships (independent under a couple of names, then Intersolv/Merant, then independent again, then Progress Software). Master data management (MDM) and data cleaning have had some passable exits. Talend raised $40 million last December, which is a nice accomplishment if you’re French.

I can explain much of this in seven words: Data integration is both important and fragmented. The “important” part is self-evident; I gave examples of “fragmented” a couple years back. Beyond that, I’d say:

  • A new class of “engine” can be a nice business — consider for example Informatica/Ascential/Ab Initio, or the MDM players (who sold out to bigger ETL companies), or Splunk. Indeed, much early Hadoop adoption was for its capabilities as a data transformation engine.
  • Data transformation is a better business to enter than data movement. Differentiated value in data movement comes in areas such as performance, reliability and maturity, where established players have major advantages. But differentiated value in data transformation can come from “intelligence”, which is easier to excel in as a start-up.
  • “Transparent connectivity” is a tough business. It is hard to offer true transparency, with minimal performance overhead, among enough different systems for anybody to much care. And without that you’re probably offering a low-value/niche capability. Migration aids are not an exception; the value in those is captured by the vendor of what’s being migrated to, not by the vendor who actually does the transparent translation. Indeed …
  • … I can’t think of a single case in which migration support was a big software business. (Services are a whole other story.) Perhaps Cast Iron Systems came closest, but I’m not sure I’d categorize it as either “migration support” or “big”.

And I’ll stop there, because I’m not as conversant with some of the new “smart data transformation” companies as I’d like to be.

Related links

Categories: Other

The point of predicate pushdown

DBMS2 - Tue, 2014-07-15 07:52

Oracle is announcing today what it’s calling “Oracle Big Data SQL”. As usual, I haven’t been briefed, but highlights seem to include:

  • Oracle Big Data SQL is basically data federation using the External Tables capability of the Oracle DBMS.
  • Unlike independent products — e.g. Cirro — Oracle Big Data SQL federates SQL queries only across Oracle offerings, such as the Oracle DBMS, the Oracle NoSQL offering, or Oracle’s Cloudera-based Hadoop appliance.
  • Also unlike independent products, Oracle Big Data SQL is claimed to be compatible with Oracle’s usual security model and SQL dialect.
  • At least when it talks to Hadoop, Oracle Big Data SQL exploits predicate pushdown to reduce network traffic.

And by the way – Oracle Big Data SQL is NOT “SQL-on-Hadoop” as that term is commonly construed, unless the complete Oracle DBMS is running on every node of a Hadoop cluster.

Predicate pushdown is actually a simple concept:

  • If you issue a query in one place to run against a lot of data that’s in another place, you could spawn a lot of network traffic, which could be slow and costly. However …
  • … if you can “push down” parts of the query to where the data is stored, and thus filter out most of the data, then you can greatly reduce network traffic.

“Predicate pushdown” gets its name from the fact that portions of SQL statements, specifically ones that filter data, are properly referred to as predicates. They earn that name because predicates in mathematical logic and clauses in SQL are the same kind of thing — statements that, upon evaluation, can be TRUE or FALSE for different values of variables or data.

The most famous example of predicate pushdown is Oracle Exadata, with the story there being:

  • Oracle’s shared-everything architecture created a huge I/O bottleneck when querying large amounts of data, making Oracle inappropriate for very large data warehouses.
  • Oracle Exadata added a second tier of servers each tied to a subset of the overall storage; certain predicates are pushed down to that tier.
  • The I/O between Exadata’s two sets of servers is now tolerable, and so Oracle is now often competitive in the high-end data warehousing market,

Oracle evidently calls this “SmartScan”, and says Oracle Big Data SQL does something similar with predicate pushdown into Hadoop.

Oracle also hints at using predicate pushdown to do non-tabular operations on the non-relational systems, rather than shoehorning operations on multi-structured data into the Oracle DBMS, but my details on that are sparse.

Related link

Categories: Other

21st Century DBMS success and failure

DBMS2 - Mon, 2014-07-14 00:37

As part of my series on the keys to and likelihood of success, I outlined some examples from the DBMS industry. The list turned out too long for a single post, so I split it up by millennia. The part on 20th Century DBMS success and failure went up Friday; in this one I’ll cover more recent events, organized in line with the original overview post. Categories addressed will include analytic RDBMS (including data warehouse appliances), NoSQL/non-SQL short-request DBMS, MySQL, PostgreSQL, NewSQL and Hadoop.

DBMS rarely have trouble with the criterion “Is there an identifiable buying process?” If an enterprise is doing application development projects, a DBMS is generally chosen for each one. And so the organization will generally have a process in place for buying DBMS, or accepting them for free. Central IT, departments, and — at least in the case of free open source stuff — developers all commonly have the capacity for DBMS acquisition.

In particular, at many enterprises either departments have the ability to buy their own analytic technology, or else IT will willingly buy and administer things for a single department. This dynamic fueled much of the early rise of analytic RDBMS.

Buyer inertia is a greater concern.

  • A significant minority of enterprises are highly committed to their enterprise DBMS standards.
  • Another significant minority aren’t quite as committed, but set pretty high bars for new DBMS products to cross nonetheless.
  • FUD (Fear, Uncertainty and Doubt) about new DBMS is often justifiable, about stability and consistent performance alike.

A particularly complex version of this dynamic has played out in the market for analytic RDBMS/appliances.

  • First the newer products (from Netezza onwards) were sold to organizations who knew they wanted great performance or price/performance.
  • Then it became more about selling “business value” to organizations who needed more convincing about the benefits of great price/performance.
  • Then the behemoth vendors became more competitive, as Teradata introduced lower-price models, Oracle introduced Exadata, Sybase got more aggressive with Sybase IQ, IBM bought Netezza, EMC bought Greenplum, HP bought Vertica and so on. It is now hard for a non-behemoth analytic RDBMS vendor to make headway at large enterprise accounts.
  • Meanwhile, Hadoop has emerged as serious competitor for at least some analytic data management, especially but not only at internet companies.

Otherwise I’d say: 

  • At large enterprises, their internet operations perhaps excepted:
    • Short-request/general-purpose SQL alternatives to the behemoths — e.g. MySQL, PostgreSQL, NewSQL — have had tremendous difficulty getting established. The last big success was the rise of Microsoft SQL Server in the 1990s. That’s why I haven’t mentioned the term mid-range DBMS in years.
    • NoSQL/non-SQL has penetrated large enterprises mainly for a few specific use cases, for example the lists I posted for MongoDB or graph databases.
  • Internet-only companies have few inertia issues when it comes to database managers. They’ll consider anything they regard as being in their price ballpark (which is however often restricted to open source). I think part of the reason is that as quickly as they rewrite their applications, DBMS are vastly less “strategic” to them than they are to most larger enterprises.
  • The internet operations of large companies — especially large retailers — in many cases behave like internet-only companies, but in many other cases behave like the rest of the enterprise.

The major reasons for DBMS categories to get established in the first place are:

  • Performance and/or scalability (many examples).
  • Developer features (for example dynamic schema).
  • License/maintenance cost (for example several open source categories).
  • Ease of installation and administration (for example open source again, and also data warehouse appliances).

Those same characteristics are major bases for competition among members of a new category, although as noted above behemoth-loyalty can also come into play.

Cool-vs.-weird tradeoffs are somewhat secondary among SQL DBMS.

  • There’s not much of a “cool” factor, because new products aren’t that different in what they do vs. older ones.
  • There’s not a terrible “weird” factor either, but of course any smaller offering faces FUD, and also …
  • … appliances are anti-strategic for many buyers, especially ones who demand a smooth path to the cloud.)

They’re huge, however, in the non-SQL world. Most non-SQL data managers have a major “weird” factor. Fortunately, NoSQL and Hadoop both have huge “cool” cred to offset it. XML/XQuery unfortunately did not.

Finally, in most DBMS categories there are massive issues with product completeness, more in the area of maturity than that of whole product. The biggest whole product issues are concentrated on the matter of interoperating with other software — business intelligence tools, packaged applications (if relevant to the category), etc. Most notably, the handful of DBMS that are certified to run SAP share a huge market that other DBMS can’t touch. But BI tools are less of a differentiator — I yawn when vendors tell me they are certified for/partnered with MicroStrategy, Tableau, Pentaho and Jaspersoft, and I’m surprised at any product that isn’t.

DBMS maturity has a lot of aspects, but the toughest challenges are concentrated in two main areas:

  • Reliability, especially but not only in short-request use cases.
  • Performance across a great variety of use cases. I observe frequently that performance in best-case scenarios, performance in the lab and performance in real-world environments are much further apart than vendors like to think.

In particular:

  • Maturity demands seem to be much higher for SQL DBMS than for NoSQL.
    • I think this is one of several reasons NoSQL has been much more successful than NewSQL.
    • It’s why I think MarkLogic’s “Enterprise NoSQL” positioning is a mistake.
  • As for MySQL:
    • MySQL wasn’t close to reliable enough for enterprises to trust it until InnoDB became the default storage engine.
    • MySQL 5 point releases have added major features, or decent performance for major features. I’ll confess to having lost track of what’s been fixed and what’s still missing.
    • In saying all that I’m holding MySQL to a much higher maturity standard than I’m holding NoSQL — because that’s what I think enterprise customers do.
  • PostgreSQL “should” be doing a lot better than it is. I have an extremely low opinion of its promoters, and not just for personal reasons. (That said, the personal reasons don’t just apply to EnterpriseDB anymore. I’ve also run out of patience waiting for Josh Berkus to retract untruths he posted about me years ago.)
  • SAP HANA checks boxes for performance (In-memory rah rah rah!!) and whole product (Runs SAP!!). That puts it well ahead of most other newish SQL DBMS, purely analytic ones perhaps excepted.
  • Any other new short-request SQL DBMS that sounds like is has traction is also memory-centric.
  • Analytic RDBMS are in most respects held to lower maturity standards than DBMS used for write-intensive workloads. Even so, products in the category are still frequently tripped up by considerations of concurrent performance and mixed workload management.

Related links

There have been 1,470 previous posts in the 9-year history of this blog, many of which could serve as background material for this one. A couple that seem particularly germane and didn’t get already get linked above are:

Categories: Other

Big Data in the Cloud at Google I/O

William Vambenepe - Tue, 2014-07-01 00:55

Last week was a great party for the entire Google developer family, including Google Cloud Platform. And within the Cloud Platform, Big Data processing services. Which is where my focus has been in the almost two years I’ve been at Google.

It started with a bang, when our fearless leader Urs unveiled Cloud Dataflow in the keynote. Supported by a very timely demo (streaming analytics for a World Cup game) by my colleague Eric.

After the keynote, we had three live sessions:

In “Big Data, the Cloud Way“, I gave an overview of the main large-scale data processing services on Google Cloud:

  • Cloud Pub/Sub, a newly-announced service which provides reliable, many-to-many, asynchronous messaging,
  • the aforementioned Cloud Dataflow, to implement data processing pipelines which can run either in streaming or batch mode,
  • BigQuery, an existing service for large-scale SQL-based data processing at interactive speed, and
  • support for Hadoop and Spark, making it very easy to deploy and use them “the Cloud Way”, well integrated with other storage and processing services of Google Cloud Platform.

The next day, in “The Dawn of Fast Data“, Marwa and Reuven described Cloud Dataflow in a lot more details, including code samples. They showed how to easily construct a streaming pipeline which keeps a constantly-updated lookup table of most popular Twitter hashtags for a given prefix. They also explained how Cloud Dataflow builds on over a decade of data processing innovation at Google to optimize processing pipelines and free users from the burden of deploying, configuring, tuning and managing the needed infrastructure. Just like Cloud Pub/Sub and BigQuery do for event handling and SQL analytics, respectively.

Later that afternoon, Felipe and Jordan showed how to build predictive models in “Predicting the future with the Google Cloud Platform“.

We had also prepared some recorded short presentations. To learn more about how easy and efficient it is to use Hadoop and Spark on Google Cloud Platform, you should listen to Dennis in “Open Source Data Analytics“. To learn more about block storage options (including SSD, both local and remote), listen to Jay in “Optimizing disk I/O in the cloud“.

It was gratifying to see well-informed people recognize the importance of these announcement and partners understand how this will benefit their customers. As well as some good press coverage.

It’s liberating to now be able to talk freely about recent progress on our quest to equip Google Cloud users with easy to use data processing tools. Everyone can benefit from Google’s experience making developers productive while efficiently processing data at large scale. With great power comes great productivity.

Categories: Other