Skip navigation.

Curt Monash

Syndicate content
Choices in data management and analysis
Updated: 3 hours 17 min ago

Oracle as the new IBM — has a long decline started?

Thu, 2015-12-31 03:15

When I find myself making the same observation fairly frequently, that’s a good impetus to write a post based on it. And so this post is based on the thought that there are many analogies between:

  • Oracle and the Oracle DBMS.
  • IBM and the IBM mainframe.

And when you look at things that way, Oracle seems to be swimming against the tide.

Drilling down, there are basically three things that can seriously threaten Oracle’s market position:

  • Growth in apps of the sort for which Oracle’s RDBMS is not well-suited. Much of “Big Data” fits that description.
  • Outright, widespread replacement of Oracle’s application suites. This is the least of Oracle’s concerns at the moment, but could of course be a disaster in the long term.
  • Transition to “the cloud”. This trend amplifies the other two.

Oracle’s decline, if any, will be slow — but I think it has begun.

 

Oracle/IBM analogies

There’s a clear market lead in the core product category. IBM was dominant in mainframe computing. While not as dominant, Oracle is definitely a strong leader in high-end OTLP/mixed-use (OnLine Transaction Processing) RDBMS.

That market lead is even greater than it looks, because some of the strongest competitors deserve asterisks. Many of IBM’s mainframe competitors were “national champions” — Fujitsu and Hitachi in Japan, Bull in France and so on. Those were probably stronger competitors to IBM than the classic BUNCH companies (Burroughs, Univac, NCR, Control Data, Honeywell).

Similarly, Oracle’s strongest direct competitors are IBM DB2 and Microsoft SQL Server, each of which is sold primarily to customers loyal to the respective vendors’ full stacks. SAP is now trying to play a similar game.

The core product is stable, secure, richly featured, and generally very mature. Duh.

The core product is complicated to administer — which provides great job security for administrators. IBM had JCL (Job Control Language). Oracle has a whole lot of manual work overseeing indexes. In each case, there are many further examples of the point. Edit: A Twitter discussion suggests the specific issue with indexes has been long fixed.

Niche products can actually be more reliable than the big, super-complicated leader. Tandem Nonstop computers were super-reliable. Simple, “embeddable” RDBMS — e.g. Progress or SQL Anywhere — in many cases just work. Still, if you want one system to run most of your workload 24×7, it’s natural to choose the category leader.

The category leader has a great “whole product” story. Here I’m using “whole product” in the sense popularized by Geoffrey Moore, to encompass ancillary products, professional services, training, and so on, from the vendor and third parties alike. There was a time when most serious packaged apps ran exclusively on IBM mainframes. Oracle doesn’t have quite the same dominance, but there are plenty of packaged apps for which it is the natural choice of engine.

Notwithstanding all the foregoing, there’s strong vulnerability to alternative product categories. IBM mainframes eventually were surpassed by UNIX boxes, which had grown up from the minicomputer and even workstation categories. Similarly, the Oracle DBMS has trouble against analytic RDBMS specialists, NoSQL, text search engines and more.

 

IBM’s fate, and Oracle’s

Given that background, what does it teach us about possible futures for Oracle? The golden age of the IBM mainframe lasted 25 or 30 years — 1965-1990 is a good way to think about it, although there’s a little wiggle room at both ends of the interval. Since then it’s been a fairly stagnant cash-cow business, in which a large minority or perhaps even small majority of IBM’s customers have remained intensely loyal, while others have aligned with other vendors.

Oracle’s DBMS business seems pretty stagnant now too. There’s no new on-premises challenger to Oracle now as strong as UNIX boxes were to IBM mainframes 20-25 years ago, but as noted above, traditional competitors are stronger in Oracle’s case than they were in IBM’s. Further, the transition to the cloud is a huge deal, currently in its early stages, and there’s no particular reason to think Oracle will hold any more share there than IBM did in the transition to UNIX.

Within its loyal customer base, IBM has been successful at selling a broad variety of new products (typically software) and services, often via acquired firms. Oracle, of course, has also extended its product lines immensely from RDBMS, to encompass “engineered systems” hardware, app server, apps, business intelligence and more. On the whole, this aspect of Oracle’s strategy is working well.

That said, in most respects Oracle is weaker at account control than peak IBM.

  • Oracle’s core competitors, IBM and Microsoft, are stronger than IBM’s were.
  • DB2 and SQL Server are much closer to Oracle compatibility than most mainframes were to IBM. (Amdahl is an obvious exception.) This is especially true as of the past 10-15 years, when it has become increasingly clear that reliance on stored procedures is a questionable programming practice. Edit: But please see the discussion below challenging this claim.
  • Oracle (the company) is widely hated, in a way that IBM generally wasn’t.
  • Oracle doesn’t dominate a data center the way hardware monopolist IBM did in a hardware-first era.

Above all, Oracle doesn’t have the “Trust us; we’ll make sure your IT works” story that IBM did. Appliances, aka “engineered systems”, are a step in that direction, but those are only — or at least mainly — to run Oracle software, which generally isn’t everything a customer has.

 

But think of the apps!

Oracle does have one area in which it has more account control power than IBM ever did — applications. If you run Oracle apps, you probably should be running the Oracle RDBMS and perhaps an Exadata rack as well. And perhaps you’ll use Oracle BI too, at least in use cases where you don’t prefer something that emphasizes a more modern UI.

As a practical matter, most enterprise app rip-and-replace happens in a few scenarios:

  • Merger/acquisition. An enterprise that winds up with different apps for the same functions may consolidate and throw the loser out. I’m sure Oracle loses a few customers this way to SAP every year, and vice-versa.
  • Drastic obsolescence. This can take a few forms, mainly:
    • Been there, done that.
    • Enterprise outgrows the capabilities of the current app suite. Oracle’s not going to lose much business that way.
    • Major platform shift. Going forward, that means SaaS/”cloud” (Software as a Service).

And so the main “opportunity” for Oracle to lose application market share is in the transition to the cloud.

 

Putting this all together …

A typical large-enterprise Oracle customer has 1000s of apps running on Oracle. The majority would be easy to port to some other system, but the exceptions to that rule are numerous enough to matter — a lot. Thus, Oracle has a secure place at that customer until such time as its applications are mainly swept away and replaced with something new.

But what about new apps? In many cases, they’ll arise in areas where Oracle’s position isn’t strong.

  • New third-party apps are likely to come from SaaS vendors. Oracle can reasonably claim to be a major SaaS vendor itself, and salesforce.com has a complex relationship with the Oracle RDBMS. But on the whole, SaaS vendors aren’t enthusiastic Oracle adopters.
  • New internet-oriented apps are likely to focus on customer/prospect interactions (here I’m drawing the (trans)action/interaction distinction) or even more purely machine-generated data (“Internet of Things”). The Oracle RDBMS has few advantages in those realms.
  • Further, new apps — especially those that focus on data external to the company — will in many cases be designed for the cloud. This is not a realm of traditional Oracle strength.

And that is why I think the answer to this post’s title question is probably “Yes”.

 

Related links

A significant fraction of my posts, in this blog and Software Memories alike, are probably at least somewhat relevant to this sweeping discussion. Particularly germane is my 2012 overview of Oracle’s evolution. Other posts to call out are my recent piece on transitioning to the cloud, and my series on enterprise application history.

Readings in Database Systems

Thu, 2015-12-10 06:26

Mike Stonebraker and Larry Ellison have numerous things in common. If nothing else:

  • They’re both titanic figures in the database industry.
  • They both gave me testimonials on the home page of my business website.
  • They both have been known to use the present tense when the future tense would be more accurate. :)

I mention the latter because there’s a new edition of Readings in Database Systems, aka the Red Book, available online, courtesy of Mike, Joe Hellerstein and Peter Bailis. Besides the recommended-reading academic papers themselves, there are 12 survey articles by the editors, and an occasional response where, for example, editors disagree. Whether or not one chooses to tackle the papers themselves — and I in fact have not dived into them — the commentary is of great interest.

But I would not take every word as the gospel truth, especially when academics describe what they see as commercial market realities. In particular, as per my quip in the first paragraph, the data warehouse market has not yet gone to the extremes that Mike suggests,* if indeed it ever will. And while Joe is close to correct when he says that the company Essbase was acquired by Oracle, what actually happened is that Arbor Software, which made Essbase, merged with Hyperion Software, and the latter was eventually indeed bought by the giant of Redwood Shores.**

*When it comes to data warehouse market assessment, Mike seems to often be ahead of the trend.

**Let me interrupt my tweaking of very smart people to confess that my own commentary on the Oracle/Hyperion deal was not, in retrospect, especially prescient.

Mike pretty much opened the discussion with a blistering attack against hierarchical data models such as JSON or XML. To a first approximation, his views might be summarized as: 

  • Logical hierarchical models can be OK in certain cases. In particular, JSON could be a somewhat useful datatype in an RDBMS.
  • Physical hierarchical models are horrible.
  • Rather, you should implement the logical hierarchical model over a columnar RDBMS.

My responses start:

  • Nested data structures are more important than Mike’s discussion seems to suggest.
  • Native XML and JSON stores are apt to have an index on every field. If you squint, that index looks a lot like a column store.
  • Even NoSQL stores should and I think in most cases will have some kind of SQL-like DML (Data Manipulation Language). In particular, there should be some ability to do joins, because total denormalization is not always a good choice.

In no particular order, here are some other thoughts about or inspired by the survey articles in Readings in Database Systems, 5th Edition.

  • I agree that OLTP (OnLine Transaction Processing) is transitioning to main memory.
  • I agree with the emphasis on “data in motion”.
  • While I needle him for overstating the speed of the transition, Mike is right that columnar architectures are winning for analytics. (Or you could say they’ve won, if you recognize that mop-up from the victory will still take 1 or 2 decades.)
  • The guys seem to really hate MapReduce, which is an old story for Mike, but a bit of a reversal for Joe.
  • MapReduce is many things, but it’s not a data model, and it’s also not something that Hadoop 1.0 was an alternative to. Saying each of those things was sloppy writing.
  • The guys characterize consistency/transaction isolation as a rather ghastly mess. That part was an eye-opener.
  • Mike is a big fan of arrays. I suspect he’s right in general, although I also suspect he’s overrating SciDB. I also think he’s somewhat overrating the market penetration of cube stores, aka MOLAP.
  • The point about Hadoop (in particular) and modern technologies in general showing the way to modularization of DBMS is an excellent one.
  • Joe and Mike disagreed about analytics; Joe’s approach rang truer for me. My own opinion is:
  • The challenge of whether anybody wants to do machine learning (or other advanced analytics) over a DBMS is sidestepped in part by the previously mentioned point about the modularization of a DBMS. Hadoop, for example, can be both an OK analytic DBMS (although not fully competitive with mature, dedicated products) and of course also an advanced analytics framework.
  • Similarly, except in the short-term I’m not worried about the limitations of Spark’s persistence mechanisms. Almost every commercial distribution of Spark I can think of is part of a package that also contains a more mature data store.
  • Versatile DBMS and analytic frameworks suffer strategic contention for memory, with different parts of the system wanting to use it in different ways. Raising that as a concern about the integration of analytic DBMS with advanced analytic frameworks is valid.
  • I used to overrate the importance of abstract datatypes, in large part due to Mike’s influence. I got over it. He should too. :) They’re useful, to the point of being a checklist item, but not a game-changer. A big part of the problem is what I mentioned in the previous point — different parts of a versatile DBMS would prefer to do different things with memory.
  • I used to overrate the importance of user-defined functions in an analytic RDBMS. Mike had nothing to do with my error. :) I got over it. He should too. They’re useful, to the point of being a checklist item, but not a game-changer. Looser coupling between analytics and data management seems more flexible.
  • Excellent points are made about the difficulties of “First we build the perfect schema” data warehouse projects and, similarly, MDM (Master Data Management).
  • There’s an interesting discussion that helps explain why optimizer progress is so slow (both for the industry in general and for each individual product).

Related links

  • I did a deep dive into MarkLogic’s indexing strategy in 2008, which informed my comment about XML/JSON stores above.
  • Again with MarkLogic as the focus, in 2010 I was skeptical about document stores not offering joins. MarkLogic has since capitulated.
  • I’m not current on SciDB, but I did write a bit about it in 2010.
  • I’m surprised that I can’t find a post to point to about modularization of DBMS. I’ll leave this here as a placeholder until I can.
  • Edit: As promised, I’ve now posted about the object-relational/abstract datatype boom of the 1990s.

Transitioning to the cloud(s)

Mon, 2015-12-07 11:48

There’s a lot of talk these days about transitioning to the cloud, by IT customers and vendors alike. Of course, I have thoughts on the subject, some of which are below.

1. The economies of scale of not running your own data centers are real. That’s the kind of non-core activity almost all enterprises should outsource. Of course, those considerations taken alone argue equally for true cloud, co-location or SaaS (Software as a Service).

2. When the (Amazon) cloud was newer, I used to hear that certain kinds of workloads didn’t map well to the architecture Amazon had chosen. In particular, shared-nothing analytic query processing was necessarily inefficient. But I’m not hearing nearly as much about that any more.

3. Notwithstanding the foregoing, not everybody loves Amazon pricing.

4. Infrastructure vendors such as Oracle would like to also offer their infrastructure to you in the cloud. As per the above, that could work. However:

  • Is all your computing on Oracle’s infrastructure? Probably not.
  • Do you want to move the Oracle part and the non-Oracle part to different clouds? Ideally, no.
  • Do you like the idea of being even more locked in to Oracle than you are now? [Insert BDSM joke here.]
  • Will Oracle do so much better of a job hosting its own infrastructure that you use its cloud anyway? Well, that’s an interesting question.

Actually, if we replace “Oracle” by “Microsoft”, the whole idea sounds better. While Microsoft doesn’t have a proprietary server hardware story like Oracle’s, many folks are content in the Microsoft walled garden. IBM has fiercely loyal customers as well, and so may a couple of Japanese computer manufacturers.

5. Even when running stuff in the cloud is otherwise a bad idea, there’s still:

  • Test and dev(elopment) — usually phrased that way, although the opposite order makes more sense.
  • Short-term projects — the most obvious examples are in investigative analytics.
  • Disaster recovery.

So in many software categories, almost every vendor should have a cloud option of some kind.

6. Reasons for your data to wind up in a plurality of remote data centers include:

  • High availability, and similarly disaster recovery. Duh.
  • Second-source/avoidance of lock-in.
  • Geo-compliance.
  • Particular SaaS offerings being hosted in different places.
  • Use of both true cloud and co-location for different parts of your business.

7. “Mostly compatible” is by no means the same as “compatible”, and confusing the two leads to tears. Even so, “mostly compatible” has stood the IT industry in good stead multiple times. My favorite examples are:

  • SQL
  • UNIX (before LINUX).
  • IBM-compatible PCs (or, as Ben Rosen used to joke, Compaq-compatible).
  • Many cases in which vendors upgrade their own products.

I raise this point for two reasons:

  • I think Amazon/OpenStack could be another important example.
  • A vendor offering both cloud and on-premises versions of their offering, with minor incompatibilities between the two, isn’t automatically crazy.

8. SaaS vendors, in many cases, will need to deploy in many different clouds. Reasons include:

That said, there are of course significant differences between, for example:

  • Deploying to Amazon in multiple regions around the world.
  • Deploying to Amazon plus a variety of OpenStack-based cloud providers around the world, e.g. some “national champions” (perhaps subsidiaries of the main telecommunications firms).*
  • Deploying to Amazon, to other OpenStack-based cloud providers, and also to an OpenStack-based system that resides on customer premises (or in their co-location facility).

9. The previous point, and the last bullet of the one before that, are why I wrote in a post about enterprise app history:

There’s a huge difference between designing applications to run on one particular technology stack, vs. needing them to be portable across several. As a general rule, offering an application across several different brands of almost-compatible technology — e.g. market-leading RDBMS or (before the Linux era) proprietary UNIX boxes — commonly works out well. The application vendor just has to confine itself to relying on the intersection of the various brands’ feature sets.*

*The usual term for that is the spectacularly incorrect phrase “lowest common denominator”.

Offering the “same” apps over fundamentally different platform technologies is much harder, and I struggle to think of any cases of great success.

10. Decisions on where to process and store data are of course strongly influenced by where and how the data originates. In broadest terms:

  • Traditional business transaction data at large enterprises is typically managed by on-premises legacy systems. So legacy issues arise in full force.
  • Internet interaction data — e.g. web site clicks — typically originates in systems that are hosted remotely. (Few enterprises run their websites on premises.) It is tempting to manage and analyze that data where it originates. That said:
    • You often want to enhance that data with what you know from your business records …
    • … which is information that you may or may not be willing to send off-premises.
  • “Phone-home” IoT (Internet of Things) data, from devices at — for example — many customer locations, often makes sense to receive in the cloud. Once it’s there, why not process and analyze it there as well?
  • Machine-generated data that originates on your premises may never need to leave them. Even if their origins are as geographically distributed as customer devices are, there’s a good chance that you won’t need other cloud features (e.g. elastic scalability) as much as in customer-device use cases.

Related link

Issues in enterprise application software

Wed, 2015-11-11 07:39

1. I think the next decade or so will see much more change in enterprise applications than the last one. Why? Because the unresolved issues are piling up, and something has to give. I intend this post to be a starting point for a lot of interesting discussions ahead.

2. The more technical issues I’m thinking of include:

  • How will app vendors handle analytics?
  • How will app vendors handle machine-generated data?
  • How will app vendors handle dynamic schemas?
  • How far will app vendors get with social features?
  • What kind of underlying technology stacks will app vendors drag along?

We also always have the usual set of enterprise app business issues, including:

  • Will the current leaders — SAP, Oracle and whoever else you want to include — continue to dominate the large-enterprise application market?
  • Will the leaders in the large-enterprise market succeed in selling to smaller markets?
  • Which new categories of application will be important?
  • Which kinds of vendors and distribution channels will succeed in serving small enterprises?

And perhaps the biggest issue of all, intertwined with most of the others, is:

  • How will the move to SaaS (Software as a Service) play out?

3. I’m not ready to answer those questions yet, but at least I’ve been laying some groundwork.

Along with this post, I’m putting up a three post series on the history of enterprise apps. Takeaways include but are not limited to:

  • Application software is a very diverse area. Different generalities apply to different parts of it.
  • A considerable fraction of application software has always been sold with the technology stack being under vendor control. Examples include most app software sold to small and medium enterprises, and much the application software that Oracle sells.
  • Apps that are essentially distributed have often relied on different stacks than single-site apps. (Duh.)

4. Reasons I see for the enterprise apps area having been a bit dull in recent years include:

5. But I did do some work in the area even so. :) Besides posts linked above, other things I wrote relevant to the present discussion include:

 

Couchbase 4.0 and related subjects

Thu, 2015-10-15 09:17

I last wrote about Couchbase in November, 2012, around the time of Couchbase 2.0. One of the many new features I mentioned then was secondary indexing. Ravi Mayuram just checked in to tell me about Couchbase 4.0. One of the important new features he mentioned was what I think he said was Couchbase’s “first version” of secondary indexing. Obviously, I’m confused.

Now that you’re duly warned, let me remind you of aspects of Couchbase timeline.

  • 2 corporate name changes ago, Couchbase was organized to commercialize memcached. memcached, of course, was internet companies’ default way to scale out short-request processing before the rise of NoSQL, typically backed by manually sharded MySQL.
  • Couchbase’s original value proposition, under the name Membase, was to provide persistence and of course support for memcached. This later grew into a caching-oriented pitch even to customers who weren’t already memcached users.
  • A merger with the makers of CouchDB ensued, with the intention of replacing Membase’s SQLite back end with CouchDB at the same time as JSON support was introduced. This went badly.
  • By now, however, Couchbase sells for more than distributed cache use cases. Ravi rattled off a variety of big-name customer examples for system-of-record kinds of use cases, especially in session logging (duh) and also in travel reservations.
  • Couchbase 4.0 has been in beta for a few months.

Technical notes on Couchbase 4.0 — and related riffs :) — start:

  • There’s a new SQL-like language called N1QL (pronounced like “nickel”). I’m hearing a lot about SQL-on-NoSQL these days. More on that below.
  • “Index”, “data” and “query” are three different services/tiers.
    • You can run them all on the same nodes or separately. Couchbase doesn’t have enough experience yet with the technology to know which choice will wind up as a best practice.
    • I’m hearing a lot about heterogeneous-node/multi-tier DBMS architectures these days, and would no longer stand by my 2009 statement that they are unusual. Other examples include Oracle Exadata, MySQL, MongoDB (now that it has pluggable storage engines), MarkLogic, and of course the whole worlds of Hadoop and Spark.
  • To be clear — the secondary indexes are global, and not tied to the same nodes as the data they index.
  • There’s a new back end called ForestDB, but if I understood correctly, it’s used just for the indexes, not for the underlying data.
  • ForestDB represents Couchbase indexes in something that resembles b-trees, but also relies on tries. Indeed, if I’m reading the relevant poster correctly, it’s based on a trie of b-trees.
  • In another increasingly common trend, Couchbase uses Bloom filters to help decide which partitions to retrieve for any particular query.

Up to a point, SQL-on-NoSQL stories can be fairly straightforward.

  • You define some kind of a table,* perhaps in a SQL-like DDL (Data Description Language).
  • SELECT, FROM and WHERE clauses work in the usual way.
  • Hopefully, if a column is going to have a lot of WHERE clauses on it, it also has an index.

For example, I think that’s the idea behind most ODBC/JDBC drivers for NoSQL systems. I think it’s also the idea behind most “SQL-like” languages that NoSQL vendors ship.

*Nobody I talk to about this ever wants to call it a “view”, but it sure sounds like a view to me — not a materialized view, of course, but a view nonetheless.

JOIN syntax can actually be straightforward as well under these assumptions. As for JOIN execution, Couchbase pulls all the data into the relevant tier, and nested loop execution there. My new clients at SequoiaDB have a similar strategy, by the way, although in their case there’s a hash join option as well.

But if things stopped there, they would miss an important complication: NoSQL has nested data. I.e., a value can actually be an array, whose entries are arrays themselves, and so on. That said, the “turtles all the way down” joke doesn’t quite apply, because at some point there are actual scalar or string values, and those are the ones SQL wants to actually operate on.

Most approaches I know of to that problem boil down to identifying particular fields as table columns, with or without aliases/renaming; I think that’s the old Hadapt/Vertica strategy, for example. Couchbase claims to be doing something a little different however, with a SQL-extending operator called UNNEST. Truth be told, I’m finding the N1QL language reference a bit terse, and haven’t figured out what the practical differences vs. the usual approach are, if any. But it sounds like there may be some interesting ideas in there somewhere.