This post comes from Fishbowl’s president, Tim Gruidl. One of Tim’s biggest passions is technology innovation, and not only does he encourage others to innovate, he participates and helps drive this where he can. Tim likes to say “we innovate to help customers dominate”. Tim summarizes Fishbowl’s Hackathon event, held last Friday and Saturday at Fishbowl Solutions, in the post below.
What a great event to learn, build the team, interact with others and compete. We also created some innovative solutions that I’m sure at some point will be options to help our customers innovate and extend their WebCenter investments. This year, we had 3 teams that designed and coded the following solutions:
- InSight Image Processing – Greg Bollom and Kim Negaard
They leveraged the Google Vision API to enable the submission of images to Oracle WebCenter and then leveraged Google Vision to pull metadata back and populate fields within the system. They also added the ability to pull in GPS coordinates from photos (taken from cameras, etc.) and have that metadata and EXIF data populate WebCenter Content.
- Slack Integation with WebCenter Portal and Content – Andy Weaver, Dan Haugen, Jason Lamon and Jayme Smith
Team collaboration is a key driver for many of our portals, and Slack is one of the most popular collaboration tools. In fact, it is currently valued at $3.6 billion, and there seems to be a rapidly growing market for what they do. The team did some crazy innovation and integration to link Slack to both WebCenter Portal and WebCenter Content. I think the technical learning and sophistication of what they did was probably the most involved and required the most pre-work and effort at the event, and it was so cool to see it actually working.
- Oracle WebCenter Email Notes – John Sim (Oracle ACE) Lauren Beatty and me
Valuable corporate content is stored in email, and more value can be obtained from those emails if the content can be tagged and context added in a content management system – Oracle WebCenter. John and Lauren did an awesome job of taking a forwarded email, checking it into WebCenter Content to a workspace, and using related content to build relationships. You can then view the relationships in a graphical way for context. They also created a mobile app to allow you to tag the content on the go and release it for the value of the org.
Participants voted on the competing solutions, and it ended up being a tie between the Google Insight team and the Email Notes team, but all the solutions truly showed some innovation, sophistication, and completeness of vision. A key aspect of the event for me was how it supported all of Fishbowl’s company values:
Customer First – the solutions we build were based on real-life scenarios our customers have discussed, so this will help us be a better partner for them.
Teamwork – the groups not only worked within their teams, but there was cross team collaboration – Andy Weaver helped John Sim solve an issue he was having, for example.
Intellectual Agility – this goes without saying.
Ambition – people worked late and on the weekend – to learn more, work with the team and have fun.
Continuous Learning – we learned a lot about Slack, cloud, email, etc.
Overall, the annual Hackathon is a unique event that differentiates Fishbowl on so many fronts. From the team building, to the innovation keeping us ahead of the technology curve, to all the learnings – Hackathons truly are a great example of what Fishbowl is all about.
Thanks to all that participated, and remember, let’s continue to innovate so our customers can dominate.
Hackathon weekend at Fishbowl Solutions – Google Vision, Slack, and Email Integrations with Oracle WebCenter
It’s hackathon weekend at Fishbowl Solutions. Fishbowl’s consulting and development teams – the hackers – along with members of the sales and marketing teams join forces to collaborate on and develop new software applications. While the overall goal of the hackathon may be to produce usable software, the event also is a great learning opportunity for participants and results in a lot of fun.
This is Fishbowl’s 4th annual hackathon and previous events have produced “beta” software that eventually evolved into shippable software components that benefited customers. Here are recaps on the 2012 and 2014 events.
This year there were over 16 different ideas, and out of those 3 teams were formed to develop the following:
- Oracle WebCenter Portal and Slack integration – Slack is a popular collaboration tool for the enterprise that enables members to communicate across channels (specific topics), send direct messages, and drag and drop files for sharing. Integrating Slack with WebCenter Portal brings its popular features and ease of use directly in context of a user’s portal session, ensuring that collaboration is easy and reducing the amount of switching between applications to communicate with others – leaving the portal to send an email, for example.
- Oracle WebCenter Content and Google Vision integration – This integration would enable the tagging of images upon check-in. The Google Vision API enables applications to understand the content of images by encapsulating machine learning models in an easy to use REST API. Using this technology, images are auto-classified into thousands of categories (e.g., “sailboat”, “lion”, “Eiffel Tower”). For example, you might check in a picture of a knit hat and it would be tagged with xKeywords of “hat”, “knit hat”, and “fashion accessories” without any human tagging. To further automate image discovery, the GSA can be used to map related terms so that searches for “beanie”, “stocking cap”, or “winter hat”, could also return the image. This tagging automation would have great implications for Oracle WebCenter customers that are using it for Digital Asset Management.
- Oracle WebCenter Content Email Check in - This integration would enable emails with attachments to be checked in to WebCenter Content automatically. Instead of the user having to check in the email itself, and then relating each attachment to the associated email, which results in additional check in steps, the emails and attachments would be parsed out and sent to a user workspaces in WebCenter. From there, users can tag and validate that the email should be checked in with the appropriate attachments – either from their desktops or mobile device.
The hacking commenced at 3 PM today and will continue until 4 PM on Saturday, April 16th. Each team will then present their developed integration/component, and the other Fishbowl team members will vote on their favorite finished product. Check back on this blog next week to see who won.
I’m thrilled to be presenting at Collaborate 2016 with my colleague John Sim, on the recently open-sourced Oracle JET! We front-end developers had been seeking a better UI/UX solution from Oracle for quite some time, and they have delivered in a big way.
Part of the beauty of JET, is in its modularity. It allows developers to use as much or as little as they need for a particular project. In addition, different libraries can be incorporated. As JS libraries evolve, and new frameworks are developed, the idea is that they can be incorporated, as well. Oracle JET’s flexibility ensures that it can change with the JS development world.
Fishbowl Solutions at Collaborate 2016: Demos and Discussions on Oracle JET, ADF, Documents Cloud Service, Controlled Document Management and Portals
Fishbowl Solutions is looking forward to Collaborate 2016. We have another full list of activities planned, and we are always excited to meet with customers and discuss their initiatives around enterprise content management and portals, the cloud, as well as front-end user design and experience. With the release of Oracle WebCenter 12c back in October, customers are also eager to understand more of what the new version has to offer. Fortunately for WebCenter customers attending Collaborate, Fishbowl Solutions will be covering all these topics across the 5 presentations we will be giving, as well as one-on-discussions in our booth – #1028.
We are also privileged to be joined by two WebCenter customers who will give presentations on their WebCenter use cases. The first customer, ICON plc (www.iconplc.com/) based in Dublin, Ireland, will discuss the process of improving the front-end experience of the WebCenter-based portal they use to manage the clinical trials process.
The second customer is Rosendin Electric (www.rosendin.com) based in San Jose, CA, and they will share how they implemented Fishbowl’s ControlCenter solution to automate the contract management process within WebCenter.
The best part and biggest benefit of attending Collaborate is hearing stories from actual customers, like ICON and Rosendin. Collaborate is truly a user group conference, and hearing case studies on WebCenter deployments, enhancements, integrations, etc., are invaluable for other customers looking to do the same or similar. Less marketing speak and sales pitches, and more learning. As you plan your schedule for Collaborate, look for Session Types denoted as Case Studies.
Here is a preview of what Fishbowl currently has planned for Collaborate 2016.
- Monday, April 11, 10:30-11:30 AM:
A Designer’s Introduction to the Oracle JET Framework
- Monday, April 11, 4:30-5:30 PM:
Integrating Oracle JET With ADF to Create a Modern and Engaging User Experience
In this session you will learn about the pros and cons of Oracle’s new JET framework and ADF and how you can combine them to create a modern development experience writing Modular Single Page Applications. Sim and Weaver will discuss how front-end designers can create modern, platform agnostic extendable interfaces with JET, and how developers can create ADF integrations and extendable services with the back-end to serve up small data snippets (JSON).
- Tuesday, April 12, 10:45-11:45 AM:
Developing Hybrid Solutions for the Oracle Documents Cloud Service (DoCS)
This session will provide an overview of Oracle’s Documents Cloud Service (DoCS), including its interface, security model, and how to embed the DoCS UI and integrate with the REST API and Applink Resource to create seamless hybrid off- and on-premise applications. As part of the lecture, Sim will provide live examples and code walkthroughs, as well as talk about hybrid application development and the best times to use the Applink Resource vs the REST API with Oracle’s new Oracle JET framework for developing cloud apps. The presentation will conclude with an overview of an integration that Fishbowl has created to support Oracle DoCs.
- Tuesday, April 12, 4:45-5:45 PM
ICON Enhances Its WebCenter Portal Design by Keeping the User in Mind
ICON Clinical Research Limited is a global provider of outsourced development services to the pharmaceutical, biotechnology, and medical device industries. They specialize in the development, management, and analysis of programs that support clinical development. ICON implemented Oracle WebCenter as the platform for its ICONIK portal, which will be used by the clinical trials team to manage, maintain, and share content created during the trials process. Come to this session to hear how ICON and Fishbowl Solutions leveraged next generation, best practice portal design concepts and technologies to provide a high-end and rich user experience to end users. Learn how ICON leverages WebCenter Portal and Content to surface personalized study documents, quickly manage content, and collaborate with other team members, whether on a desktop or on the go through a mobile device. We will also discuss how ICON has streamlined their business to solve problems that contributed to delays in the clinical trials process, impeding ICON’s customers from bringing products to market.
- Wednesday, April 13, 9:15-10:15 AM:
Rosendin Electric Pairs a Modern User Experience with WebCenter Content to Automate Contract Management
Rosendin Electric is the top-ranked private electrical contractor in the nation whose work spans preconstruction, prefabrication, building information modeling, and renewable energy. Join us to hear Rosendin describe how they leveraged Oracle WebCenter and Fishbowl Solutions’ ControlCenter to automate and improve their contract management process. Rosendin’s new contract management system provides an intuitive, mobile-enabled interface and dashboard view for their contracts team that shows working, pending, and executed contracts. This dashboard is specific for each user, enabling them to quickly take action not only on contracts, but also on associated documents such as non-disclosure agreements and corporate governance documentation. Come see how WebCenter Content has streamlined Rosendin’s contract management process, making it much more efficient while ensuring the lifecycle of contracts and related documents can be easily tracked, viewed, and archived within one enterprise repository.
We hope to see you at Collaborate 2016!
The post Fishbowl Solutions at Collaborate 2016: Demos and Discussions on Oracle JET, ADF, Documents Cloud Service, Controlled Document Management and Portals appeared first on Fishbowl Solutions' C4 Blog.
As an Oracle WebCenter consultant at Fishbowl Solutions, I have a number of tools that I use that keep me happy and productive. Whether or not you are a software developer, these tools can do the same for you and your business.
Unless you’ve been hibernating for the last year or so, you’ve probably heard of Slack. Haven’t adopted it for your business yet? Here’s why you should.
Slack facilitates contextual, transparent and efficient communication for teams. Slack helps organize your communications into “channels.” Working on a project with Fishbowl Solutions on a WebCenter project? Create a Slack channel and centralize your communications. Quickly share files with the entire team, and “Slackers” can give instant feedback. On the go? Slack goes with you via mobile, of course. Slack provides direct messaging and private channels, too.
Even better, Slack lets you integrate dozens of apps, so that you can centralize all of the services you and your team use. Send calendar reminders and events, search for documents, even start a Skype call. Slack is team communication for the 21st century (with custom emojis!).
Twitter and Evernote
Trello is the application for list-making over-achievers (like me). I organize my to-dos into different “boards”, depending on the project. I have a different board for each project I’m working on at Fishbowl. As I think of something I need to do, I can quickly add it to the appropriate to-do column. When I’m busy with a task, I put it in the “doing” column, and then slide it on over to “done” when finished. I can keep up with my task flow, it’s motivating, visually appealing, and goes with me where I go. Trello also allows me to share tables with others for easy collaboration. Oh, and did I mention I can integrate Trello with Slack (insert custom Slack emoji here)?
Toggl is a fantastic little desktop timer tool my colleague Nate Yates introduced to me. We consultants at Fishbowl Solutions need to keep very accurate timing of the hours we spend on different projects. Toggl allows me to input my different projects, and then just click the appropriate button when I start working on it. It keeps track of my time for the week on each project. It makes keeping track of my time simple, so that I can focus most of my time on creating responsive single-page applications for Fishbowl Solutions customers.
The post A Few of My Favorite Things for Ultimate Productivity with Oracle WebCenter appeared first on Fishbowl Solutions' C4 Blog.
Whenever somebody asks for my help on application technology strategy, I start by trying to ascertain three things. The absolute first is actually a prerequisite to almost any kind of useful conversation, which is to ascertain in general terms what the hell it is that we are talking about.
My second goal is to ascertain technology constraints. Three common types are:
- Compatible with legacy systems and/or enterprise standards.
- Cheap, free and/or open source.
- Proven, vetted by sufficiently many references, and/or generally having an “enterprise-y” reputation.
That’s often a short and straightforward discussion, except in those awkward situations when all three of my bullet points above are applicable at once.
The third item is usually more interesting. I try to figure out what is to be accomplished. That’s usually not a simple matter, because the initial list of goals and requirements is almost never accurate. It’s actually more common that I have to tell somebody to be more ambitious than that I need to rein them in.
Commonly overlooked needs include:
- If you want to sell something and have happy users, you need a good UI.
- You will also soon need tools and a UI for administration.
- Customers demand low-latency/fresh data. Your explanation of why they don’t really need it doesn’t contradict the fact that they want it.
- Providing data access and saying “You can hook up any BI tool you want and build charts” is not generally regarded as offering a good UI.
- When “adding analytics” to something previously focused on short-request processing, it is common to underestimate the variety of things users will soon want to do. (One common reason for this under-estimate is that after years of being told it can’t be done, they’ve learned not to ask.)
And if you take one thing away from this post, then take this:
- If you “know” exactly which features are or aren’t helpful to users, …
- .. and if you supply only what you “know” they should use, …
- … then you will discover that what you “knew” wasn’t really accurate.
I guarantee it.
So far what I’ve said can be summarized as “Figure out what you’re trying to do, and what constraints there are on your choices for doing it.” The natural next step is to list the better-thought-of choices that meet your constraints, and — voila! — you have a short list. That’s basically correct, but there’s one significant complication.
Speaking of complications, what I’m portraying as a kind of linear/waterfall decision process of course usually involves lots of iteration, meandering around and general wheel-spinning. Real life is messy.
Simply put, there are many different kinds of application project. Other folks’ experience may not be as applicable to your case as you hope, because your case is different. So the rest of this post contains a checklist of distinctions among various different kinds of application project.
For starters, there are at least two major kind(s) of software development.
- Many projects fit the traditional development model, elements of which are:
- You — and this is very much a plural “you” — code something up more or less from scratch, using whatever language(s) and/or framework(s) you think make sense.
- You break the main project into pieces in obvious ways (e.g. server back end vs. mobile front), and then into further pieces for manageability.
- There may also be database designs, test harnesses, connectors to other apps and so on.
- But there are many other projects in which smaller bits of configuration and/or scripting are the essence of what you do.
- This is particularly common in analytics, where there might be business intelligence tools, ETL tools, scripts running against Hadoop and so on. The original building of a data warehouse/hub/lake/reservoir may also fit this model.
- It’s also what you do to get a major purchased packaged application into actual production.
- It also is often what happens for websites that serve “content”.
Other significant distinctions include:
- In-house vs. software-for-resale. If the developing organization is handing code to somebody else, then we’re probably talking about a more traditional kind of project. But if the whole thing is growing organically in-house, the script-spaghetti alternative may well be viable (in those projects for which it seems appropriate). Important subsidiary distinctions start with:
- (If in-house) Truly in-house vs. out-sourced.
- (If for resale) On-premises vs. SaaS. Or maybe not.
- Kind(s) of analytics, if any. Technologies and development processes used can be very different depending upon whether the application features:
- Business intelligence (not particularly real-time) as its essence.
- Reporting or other BI as added functionality to an essentially operational app.
- Low-latency BI, perhaps supported by (other) short-request processing.
- Predictive model scoring.
- The role(s) of the user(s). This influences how appealing and easy the UI needs to be.* Requirements are very different, for example, among:
- Classic consumer-facing websites, with recommenders and so on.
- Marketing websites targeted at a small group of business-to-business customers.
- Data-sharing websites for existing consumer stakeholders.
- Cheery benefits-information websites that the HR department wants employees to look at.
- Purely internal apps meant to be used by (self-)important executives.
- Internal apps meant to be used by line workers who will be given substantial training on them.
- Certain kinds of application project stand almost separately from the rest of these considerations, because their starting point is legacy apps. Examples may be found among:
- Migration/consolidation projects.
- Refactoring projects.
- Addition of incremental functionality.
*It also influences security, all good practices for securing internal apps notwithstanding.
Much also depends on the size and sophistication of the organization. What the “organization” is depends a bit on context:
- In the case of software products, SaaS (Software as a Service) or other internet services, it is primarily the vendor. However …
- … in B2B cases the sophistication of the customer organizations can also matter.
- In the case of in-house enterprise development, there’s only one enterprise involved (duh). However …
- … the “department” vs. “IT” distinction may be very important.
Specific considerations of this kind start:
- Is me-too functionality enough, or does the enterprise seek competitive advantage through technology?
- What kinds of technical risk does it seem prudent and desirable to take?
And that, in a nutshell, is why strategizing about application technology is often more complicated than it first appears.
- My November, 2015 post on issues in enterprise application software links to a number of other relevant posts.
- One of those (the same month) briefly surveyed actual choices in technology support for enterprise apps.
- A number of my posts draw distinction among different analytic use cases. An April, 2015 example points to some of the earlier ones.
- My July, 2012 categorization of kinds of BI is particularly relevant.
- A November, 2012 post focused on assessing the supposed need for speed.
- My September, 2011 strategic worksheet is evergreen.
In a companion introduction to Kafka post, I observed that Kafka at its core is remarkably simple. Confluent offers a marchitecture diagram that illustrates what else is on offer, about which I’ll note:
- The red boxes — “Ops Dashboard” and “Data Flow Audit” — are the initial closed-source part. No surprise that they sound like management tools; that’s the traditional place for closed source add-ons to start.
- “Schema Management”
- Is used to define fields and so on.
- Is not equivalent to what is ordinarily meant by schema validation, in that …
- … it allows schemas to change, but puts constraints on which changes are allowed.
- Is done in plug-ins that live with the producer or consumer of data.
- Is based on the Hadoop-oriented file format Avro.
Kafka offers little in the way of analytic data transformation and the like. Hence, it’s commonly used with companion products.
- Per Confluent/Kafka honcho Jay Kreps, the companion is generally Spark Streaming, Storm or Samza, in declining order of popularity, with Samza running a distant third.
- Jay estimates that there’s such a companion product at around 50% of Kafka installations.
- Conversely, Jay estimates that around 80% of Spark Streaming, Storm or Samza users also use Kafka. On the one hand, that sounds high to me; on the other, I can’t quickly name a counterexample, unless Storm originator Twitter is one such.
- Jay’s views on the Storm/Spark comparison include:
- Storm is more mature than Spark Streaming, which makes sense given their histories.
- Storm’s distributed processing capabilities are more questionable than Spark Streaming’s.
- Spark Streaming is generally used by folks in the heavily overlapping categories of:
- Spark users.
- Analytics types.
- People who need to share stuff between the batch and stream processing worlds.
- Storm is generally used by people coding up more operational apps.
If we recognize that Jay’s interests are obviously streaming-centric, this distinction maps pretty well to the three use cases Cloudera recently called out.
Complicating this discussion further is Confluent 2.1, which is expected late this quarter. Confluent 2.1 will include, among other things, a stream processing layer that works differently from any of the alternatives I cited, in that:
- It’s a library running in client applications that can interrogate the core Kafka server, rather than …
- … a separate thing running on a separate cluster.
The library will do joins, aggregations and so on, and while relying on core Kafka for information about process health and the like. Jay sees this as more of a competitor to Storm in operational use cases than to Spark Streaming in analytic ones.
We didn’t discuss other Confluent 2.1 features much, and frankly they all sounded to me like items from the “You mean you didn’t have that already??” list any young product has.
- My October, 2014 post on Streaming for Hadoop is a sort of predecessor to this two-post series.
- Kafka has gotten considerable attention and adoption in streaming.
- Kafka is open source, out of LinkedIn.
- Folks who built it there, led by Jay Kreps, now have a company called Confluent.
- Confluent seems to be pursuing a fairly standard open source business model around Kafka.
- Confluent seems to be in the low to mid teens in paying customers.
- Confluent believes 1000s of Kafka clusters are in production.
- Confluent reports 40 employees and $31 million raised.
At its core Kafka is very simple:
- Kafka accepts streams of data in substantially any format, and then streams the data back out, potentially in a highly parallel way.
- Any producer or consumer of data can connect to Kafka, via what can reasonably be called a publish/subscribe model.
- Kafka handles various issues of scaling, load balancing, fault tolerance and so on.
So it seems fair to say:
- Kafka offers the benefits of hub vs. point-to-point connectivity.
- Kafka acts like a kind of switch, in the telecom sense. (However, this is probably not a very useful metaphor in practice.)
Jay also views Kafka as something like a file system. Kafka doesn’t actually have a file-system-like interface for managing streams, but he acknowledges that as a need and presumably a roadmap item.
The most noteworthy technical point for me was that Kafka persists data, for reasons of buffering, fault-tolerance and the like. The duration of the persistence is configurable, and can be different for different feeds, with two main options:
- Guaranteed to have the last update of anything.
- Complete for the past N days.
Jay thinks this is a major difference vs. messaging systems that have come before. As you might expect, given that data arrives in timestamp order and then hangs around for a while:
- Kafka can offer strong guarantees of delivering data in the correct order.
- Persisted data is automagically broken up into partitions.
Technical tidbits include:
- Data is generally fresh to within 1.5 milliseconds.
- 100s of MB/sec/server is claimed. I didn’t ask how big a server was.
- LinkedIn runs >1 trillion messages/day through Kafka.
- Others in that throughput range include but are not limited to Microsoft and Netflix.
- A message is commonly 1 KB or less.
- At a guesstimate, 50%ish of messages are in Avro. JSON is another frequent format.
Jay’s answer to any concern about performance overhead for current or future features is usually to point out that anything other than the most basic functionality:
- Runs in different processes from core Kafka …
- … if it doesn’t actually run on a different cluster.
For example, connectors have their own pools of processes.
I asked the natural open source question about who contributes what to the Apache Kafka project. Jay’s quick answers were:
- Perhaps 80% of Kafka code comes from Confluent.
- LinkedIn has contributed most of the rest.
- However, as is typical in open source, the general community has contributed some connectors.
- The general community also contributes “esoteric” bug fixes, which Jay regards as evidence that Kafka is in demanding production use.
Jay has a rather erudite and wry approach to naming and so on.
- Kafka got its name because it was replacing something he regarded as Kafkaesque. OK.
- Samza is an associated project that has something to do with transformations. Good name. (The central character of The Metamorphosis was Gregor Samsa, and the opening sentence of the story mentions a transformation.)
- In his short book about logs, Jay has a picture caption “ETL in Ancient Greece. Not much has changed.” The picture appears to be of Sisyphus. I love it.
- I still don’t know why he named a key-value store Voldemort. Perhaps it was something not to be spoken of.
What he and his team do not yet have is a clear name for their product category. Difficulties in naming include:
- Kafka is limited and simple. But of course Confluent has plans to broaden its capabilities.
- It’s long been hard to decide whether to talk about “events”, “streams” or both.
- “Streaming” has another tech meaning, in the context of video, songs, etc.
- One candidate name, “event hub”, has already been grabbed by IBM and Microsoft for their specific offerings.
- Naming is always hard in general.
Confluent seems to be using “stream data platform” as a placeholder. As per the link above, I once suggested Data Stream Management System, or more concisely Datastream Manager. “Event”, “event stream” or “event series” could perhaps be mixed in as well. I don’t really have an opinion yet, and probably won’t until I’ve studied the space in a little more detail.
And on that note, I’ll end this post for reasons of length, and discuss Kafka-related technology separately.
- My October, 2014 post on Streaming for Hadoop is a sort of predecessor to this two-post series.
Cloudera released Version 2 of Cloudera Director, which is a companion product to Cloudera Manager focused specifically on the cloud. This led to a discussion about — you guessed it! — Cloudera and the cloud.
Making Cloudera run in the cloud has three major aspects:
- Cloudera’s usual software, ported to run on the cloud platform(s).
- Cloudera Director, which for example launches cloud instances.
- Points of integration, e.g. taking information about security-oriented roles from the platform and feeding then to the role-based security that is specific to Cloudera Enterprise.
Features new in this week’s release of Cloudera Director include:
- An API for job submission.
- Support for spot and preemptable instances.
- High availability.
- Some cluster repair.
- Some cluster cloning.
I.e., we’re talking about some pretty basic/checklist kinds of things. Cloudera Director is evidently working for Amazon AWS and Google GCP, and planned for Windows Azure, VMware and OpenStack.
As for porting, let me start by noting:
- Shared-nothing analytic systems, RDBMS and Hadoop alike, run much better in the cloud than they used to.
- Even so, it seems that the future of Hadoop in the cloud is to rely on object storage, such as Amazon S3.
That makes sense in part because:
- The applications where shared nothing most drastically outshines object storage are probably the ones in which data can just be filtered from disk — spinning-rust or solid-state as the case may be — and processed in place.
- By way of contrast, if data is being redistributed a lot then the shared nothing benefit applies to a much smaller fraction of the overall workload.
- The latter group of apps are probably the harder ones to optimize for.
But while it makes sense, much of what’s hardest about the ports involves the move to object storage. The status of that is roughly:
- Cloudera already has a lot of its software running on Amazon S3, with Impala/Parquet in beta.
- Object storage integration for Windows Azure is “in progress”.
- Object storage integration for Google GCP it is “to be determined”.
- Security for object storage — e.g. encryption — is a work in progress.
- Cloudera Navigator for object storage is a roadmap item.
When I asked about particularly hard parts of porting to object storage, I got three specifics. Two of them sounded like challenges around having less detailed control, specifically in the area of consistency model and capacity planning. The third I frankly didn’t understand,* which was the semantics of move operations, relating to the fact that they were constant time in HDFS, but linear in size on object stores.
*It’s rarely obvious to me why something is o(1) until it is explained to me.
Naturally, we talked about competition, differentiation, adoption and all that stuff. Highlights included:
- In general, Cloudera’s three big marketing messages these days can be summarized as “Fast”, “Easy”, and “Secure”.
- Notwithstanding the differences as to which parts of the Cloudera stack run on premises, on Amazon AWS, on Microsoft Azure or on Google GCP, Cloudera thinks it’s important that its offering is the “same” on all platforms, which allows “hybrid” deployment.
- In general, Cloudera still sees Hortonworks as a much bigger competitor than MapR or IBM.
- Cloudera fondly believes that Cloudera Manager is a significant competitive advantage vs. Ambari. (This would presumably be part of the “Easy” claim.)
- In particular, Cloudera asserts it has better troubleshooting/monitoring than the cloud alternatives do, because of superior drilldown into details.
- Cloudera’s big competitor on the Amazon platform is Elastic MapReduce (EMR). Cloudera points out that EMR lacks various capabilities that are in the Cloudera stack. Of course, versions of these capabilities are sometimes found in other Amazon offerings, such as Redshift.
- Cloudera’s big competitor on Azure is HDInsight. Cloudera sells against that via:
- General Cloudera vs. Hortonworks distinctions.
Cloudera also offered a distinction among three types of workload:
- ETL (Extract/Transform/Load) and “modeling” (by which Cloudera seems to mean predictive modeling).
- Cloudera pitches this as batch work.
- Cloudera tries to deposition competitors as being good mainly at these kinds of jobs.
- This can be reasonably said to be the original sweet spot of Hadoop and MapReduce — which fits with Cloudera’s attempt to portray competitors as technical laggards.
- Cloudera observes that these workloads tend to call for “transient” jobs. Lazier marketers might trot out the word “elasticity”.
- BI (Business Intelligence) and “analytics”, by which Cloudera seems to mainly mean Impala and Spark.
- “Application delivery”, by which Cloudera means operational stuff that can’t be allowed to go down. Presumably, this is a rough match to what I — and by now a lot of other folks as well — call short-request processing.
While I don’t agree with terminology that says modeling is not analytics, the basic distinction being drawn here make considerable sense.
I’m on two overlapping posting kicks, namely “lessons from the past” and “stuff I keep saying so might as well also write down”. My recent piece on Oracle as the new IBM is an example of both themes. In this post, another example, I’d like to memorialize some points I keep making about business intelligence and other analytics. In particular:
- BI relies on strong data access capabilities. This is always true. Duh.
- Therefore, BI and other analytics vendors commonly reinvent the data management wheel. This trend ebbs and flows with technology cycles.
Similarly, BI has often been tied to data integration/ETL (Extract/Transform/Load) functionality.* But I won’t address that subject further at this time.
*In the Hadoop/Spark era, that’s even truer of other analytics than it is of BI.
My top historical examples include:
- The 1970s analytic fourth-generation languages (RAMIS, NOMAD, FOCUS, et al.) commonly combined reporting and data management.
- The best BI visualization technology of the 1980s, Executive Information Systems (EIS), was generally unsuccessful. The core reason was a lack of what we’d now call drilldown. Not coincidentally, EIS vendors — notably leader Comshare — didn’t do well at DBMS-like technology.
- Business Objects, one of the pioneers of the modern BI product category, rose in large part on the strength of its “semantic layer” technology. (If you don’t know what that is, you can imagine it as a kind of virtual data warehouse modest enough in its ambitions to actually be workable.)
- Cognos, the other pioneer of modern BI, depending on capabilities for which it needed a bundled MOLAP (Multidimensional OnLine Analytic Processing) engine.
- But Cognos later stopped needing that engine, which underscores my point about technology ebbing and flowing.
I’m not as familiar with the details for MicroStrategy, but I do know that it generates famously complex SQL so as to compensate for the inadequacies of some DBMS, which had the paradoxical effect of creating performance challenges for MicroStrategy used over more capable analytic DBMS, which in turn led at least Teradata to do special work to optimize MicroStrategy processing. Again, ebbs and flows.
More recent examples of serious DBMS-like processing in BI offerings may be found in QlikView, Zoomdata, Platfora, ClearStory, Metamarkets and others. That some of those are SaaS (Software as a Service) doesn’t undermine the general point, because in each case they have significant data processing technology that lies strictly between the visualization and data store layers.
- Context for this post may be found in my piece on The two sides of BI. (August, 2013)