Relational DBMS used to be fairly straightforward product suites, which boiled down to:
- A big SQL interpreter.
- A bunch of administrative and operational tools.
- Some very optional add-ons, often including an application development tool.
Now, however, most RDBMS are sold as part of something bigger.
- Oracle has hugely thickened its stack, as part of an Innovator’s Solution strategy — hardware, middleware, applications, business intelligence, and more.
- IBM has moved aggressively to a bundled “appliance” strategy. Even before that, IBM DB2 long sold much better to committed IBM accounts than as a software-only offering.
- Microsoft SQL Server is part of a stack, starting with the Windows operating system.
- Sybase was an exception to this rule, with thin(ner) stacks for both Adaptive Server Enterprise and Sybase IQ. But Sybase is now owned by SAP, and increasingly integrated as a business with …
- … SAP HANA, which is closely associated with SAP’s applications.
- Teradata has always been a hardware/software vendor. The most successful of its analytic DBMS rivals, in some order, are:
- Netezza, a pure appliance vendor, now part of IBM.
- Greenplum, an appliance-mainly vendor for most (not all) of its existence, and in particular now as a part of EMC Pivotal.
- Vertica, more of a software-only vendor than the others, but now owned by and increasingly mainstreamed into hardware vendor HP.
- MySQL’s glory years were as part of the “LAMP” stack.
- Various thin-stack RDBMS that once were or could have been important market players … aren’t. Examples include Progress OpenEdge, IBM Informix, and the various strays adopted by Actian.
This phenomenon is, I think, much more driven by vendors than users. Most of the examples I listed work or could work perfectly well on their own.* But relational database management systems are seen as “strategic” products, which means in particular:
- They’re often expensive to adopt (software, hardware, people costs).
- They’re also often expensive to switch away from.
And strategic products, high price tags, and thick product stacks commonly go together.
*Netezza is an exception. But Exadata is not; while Oracle data warehousing was in a bad technical place before Exadata, Exadata software is what cleaned the problem up.
Also relevant is that I took those examples from relatively mature RDBMS market segments — high-end OLTP/general purpose (OnLine Transaction processing), mid-range OLTP/general-purpose, and analytic. Products in those sectors have had enough time to be built out. They also tend to have fairly close competitors, as the most important product features (e.g. columnar storage in analytic RDBMS, or online backup across the board) have been imitated numerous times each.
NewSQL, by way of contrast, is just as thin-stack as NoSQL is. Products in those sectors are immature; vendors are completing them first before wedding them to other technology layers. They’re also strongly differentiated; if you tell me what topology you need and which style(s) of API or DML (Data Manipulation Language) you prefer, the list of product candidates I give you may be short indeed.
HBase is the obvious exception to my “NoSQL products stand alone” generalization, but its market position is a matter of debate.
I have mixed feelings about this trend. For starters, I’m grudgingly becoming more sympathetic to DBMS/hardware bundles, notwithstanding their role as a way to gouge more money from customers than the hardware is actually worth. Why? Because of my opinion that there’s a general move toward appliances, clusters and clouds. In particular:
- As DBMS become better at straddling and melding RAM, flash and disk, legitimate reasons to optimize hardware/software integration will increase.
- Microsoft (with Parallel Data Warehouse) and SAP (with HANA) induce customers to adopt hardware “appliances” even though they don’t sell and profit from the hardware themselves. This shoots down the argument that appliances are only a vendor trick to squeeze out more profits.
- Netezza’s super-easy installation was a really nice feature.
When it comes to RDBMS/business intelligence bundles, my thoughts start:
- As a general rule, a benefit of BI is that it can get at data from lots of different sources. This speaks against tying it to a specific DBMS.
- The vendor-specific evidence is mixed:
- IBM has never explained any user advantages to including Cognos in its analytic “appliance” product lines.
- Teradata did some special optimizations for MicroStrategy. This suggests that, conversely, MicroStrategy could benefit from DBMS-specific features.
- QlikView built a custom in-memory data store.
- Specialized business intelligence stacks are on the rise, although generally with a beyond-just-relational flavor.
And so I’m skeptical about RDBMS/BI integration, but willing to be persuaded otherwise.
The integration of advanced analytics with RDBMS leaves me perplexed. Gains in performance, scalability and/or development ease would seem, in many cases, too great to pass up. (E.g.. the Teradata Aster 6 story, analytic libraries and all.) And indeed most analytic platform vendors report some level of adoption. But the whole thing is moving more slowly than I expected. Meanwhile in the Hadoop world, a much lesser SQL capability — Hive — seems to be integrated into other analytic processing with enthusiasm. Perhaps the problem is that enterprises have to figure out which analytic techniques to use in the first place, before they worry too much about making them efficient.
And finally, when it comes to bundling of packaged applications with RDBMS — that depends on the class of application.
- At the high end, it’s almost purely a pricing ploy, as those apps are usually written for lowest-common-denominator SQL functionality, so as to preserve portability.
- A lot of mid-range apps are written against a specific DBMS, which is then resold along with the app. What’s more …
- … most of those apps will migrate over time to a SaaS (Software as a Service) delivery model, which allows for a wholly integrated stack. And as the Workday example teaches us, database choices for SaaS apps can be pretty imaginative.
- The refactoring of everything (July, 2013)
- Comments about Gartner’s comments about a bunch of DBMS products (November, 2013)
- The cardinal rules of DBMS development (March, 2013)
The 2013 Gartner Magic Quadrant for Operational Database Management Systems is out. “Operational” seems to be Gartner’s term for what I call short-request, in each case the point being that OLTP (OnLine Transaction Processing) is dubious term when systems omit strict consistency, and when even strictly consistent systems may lack full transactional semantics. As is usually the case with Gartner Magic Quadrants:
- I admire the raw research.
- The opinions contained are generally reasonable (especially since Merv Adrian joined the Gartner team).
- Some of the details are questionable.
- There’s generally an excessive focus on Gartner’s perception of vendors’ business skills, and on vendors’ willingness to parrot all the buzzphrases Gartner wants to hear.
- The trends Gartner highlights are similar to those I see, although our emphasis may be different, and they may leave some important ones out. (Big omission — support for lightweight analytics integrated into operational applications, one of the more genuine forms of real-time analytics.)
- The 2013 Gartner Magic Quadrant for Operational Database Management Systems puts Oracle in the lead, closely followed in some order Microsoft, SAP, and IBM, with everybody else way behind. That’s reasonable, harkening back to the time when Oracle, IBM, Microsoft and to some extent Sybase were seemingly secure oligopolists, and most of the other vendors mentioned didn’t yet exist.
- Gartner seems to view a proprietary appliance strategy as good for customers, without mentioning that it’s also a way to sell hardware at ridiculous prices.
- Gartner evidently likes memory-centric positioning. SAP, Aerospike, VoltDB and McObject all get surprisingly high marks.
- Gartner gives Intersystems pretty high marks, while Progress Software isn’t even mentioned. Despite Progress’ recent restructuring, I’d think the core Progress OpenEdge business — arguably Intersystems’ closest rival — deserves more respect than that. (But given how rarely I write about it myself, perhaps I shouldn’t criticize.)
- Gartner has long been oddly positive on Actian, which is a floundering hodgepodge of half a dozen database also-rans. I like Mike Hoskins a lot too, but just how much has Actian’s supposedly “energized” “strong leadership” accomplished in the recent past, at Actian or elsewhere?
- Gartner has brutally low “vision” rankings for NuoDB and Clustrix. I think scaling out SQL effectively is more impressive than that. Gartner also omits to mention Clustrix’s past as an appliance vendor.
- Gartner refers to Oracle’s multi-tenancy support as if … well, as if it supported multi-tenancy.
- I don’t understand Gartner’s rankings of or comments about NoSQL vendors. For example:
- Three “strengths” are mentioned for MongoDB, yet none reference MongoDB’s developer outreach, which may be second only to prime Microsoft’s.
- HBase is discussed as if the Hadoop vendors were still pushing it hard, or if it were showing up in a lot of enterprise evaluations.
- Geo-distribution is mentioned as a strength for Riak, yet not for Cassandra.
- Every Gartner Magic Quadrant (or Forrester Wave) features one or more outright brain cramps. In this one:
- Gartner writes “the Clustrix database offers no support for data types beyond traditional relational types,” when in fact Clustrix was one of the early indicators of a trend toward relational DBMS JSON support.
- Gartner suggests that EnterpriseDB’s Oracle compatibility is something new, when it was actually the company’s whole strategy 6-7 years ago.
Finally, since I’ve struggled with the definition of “DBMS”, I’ll finish by quoting with approval the start of Gartner’s:
We define a DBMS as a complete software system used to define, create, manage, update and query a database.
- Comments on the most recent Gartner Magic Quadrant for Data Warehouse Database Management Systems
- My definition of operational analytics
Oracle announced its in-memory columnar option Sunday. As usual, I wasn’t briefed; still, I have some observations. For starters:
- Oracle, IBM (Edit: See the rebuttal comment below), and Microsoft are all doing something similar …
- … because it makes sense.
- The basic idea is to take the technology that manages indexes — which are basically columns+pointers — and massage it into an actual column store. However …
- … the devil is in the details. See, for example, my May post on IBM’s version, called BLU, outlining all the engineering IBM did around that feature.
- Notwithstanding certain merits of this approach, I don’t believe in complete alternatives to analytic RDBMS. The rise of analytic DBMS oriented toward multi-structured data just strengthens that point.
I’d also add that Larry Ellison’s pitch “build columns to avoid all that index messiness” sounds like 80% bunk. The physical overhead should be at least as bad, and the main saving in administrative overhead should be that, in effect, you’re indexing ALL columns rather than picking and choosing.
Anyhow, this technology should be viewed as applying to traditional business transaction data, much more than to — for example — web interaction logs, or other machine-generated data. My thoughts around that distinction start:
- I argued back in 2011 that traditional databases will wind up in RAM, basically because …
- … Moore’s Law will make it ever cheaper to store them there.
- Still, cheaper != cheap, so this is a technology only to use with you most valuable data — i.e., that transactional stuff.
- These are very tabular technologies, without much in the way of multi-structured data support.
But in a bit of evidence that disconfirms my case, one of the first SAP applications to require HANA was something called “Smart Meter Analytics”.
To see more specifically where this technology could be useful, let’s map it against my 2011 analytic database taxonomy.
- If you’re managing a partial EDW (Enterprise Data Warehouse) on the same technology as your OLTP (OnLine Transaction Processing) databases, but are running out of steam, in-memory columnar could provide some acceleration.
- Traditional data marts are somewhat obsolete, and establishing a new one would be mainly a cost play. So the fit is questionable.
- Investigative data marts could be a good fit, but only if you’re fairly unimaginative as to the kinds of data you want to include.
- Several other categories are no fit at all.
- There’s a good fit for certain kinds of operational analytics.
I’ll finish by expanding on that last point.
Operational applications have always had analytics blended in. If nothing else, there were a lot of straight reports; sometimes there’s a bit of optimization as well. Workday, for example, has BI and search as two of its core OLTP UI metaphors, and has a lot of other BI snippets called worklets as well. (And by the way, a lot of Workday’s database is in-memory.) I’ve thought for years that operational/analytic blending would be a major area of competition between Oracle and SAP; hence — I believe — SAP’s acquisitions of Business Objects and KXEN. Columnar in-memory Oracle features, and similarly SAP HANA, seem well-suited to support such application elements.
- Developed by Cloudera.
- An Apache incubator project.
- Slated to be rolled into CDH — Cloudera’s Hadoop distribution — over the next couple of weeks.
- Only useful with Hive in Version 1, but planned to also work in the future with other Hadoop data access systems such as Pig, search and so on.
- Lacking in administrative scalability in Version 1, something that is also slated to be fixed in future releases.
Apparently, Hadoop security options pre-Sentry boil down to:
- Kerberos, which only works down to directory or file levels of granularity.
- Third-party products.
Sentry adds role-based permissions for SQL access to Hadoop:
- By server.
- By database.
- By table.
- By view.
for a variety of actions — selections, transformations, schema changes, etc. Sentry does this by examining a query plan and checking whether each step in the plan is permissible.
What Sentry doesn’t have is cell-based security, for which Charles perceives relatively little demand. I agree, but also note that traditional RDBMS implementations of cell-based security — notably Oracle Label Security — can have unpleasant performance consequences. From there, I segued the discussion to Accumulo. Unlike Hortonworks, Cloudera sees Accumulo demand strictly in the Federal government, where Accumulo is baked into some major reference architectures.
Charles also walked me through the use cases for some security requests he does frequently hear:
- Encryption at rest is important for compliance, for example for credit card numbers.
- Masking is also of particular interest for credit card numbers.
- Audit arises frequently for Sarbanes-Oxley compliance, and also in financial services (not necessarily for compliance).
- View-based security — a big point of Sentry — is usually to satisfy internal (i.e. non-regulatory) policies.
- Other issues in regulatory compliance (July, 2012)