Skip navigation.

Curt Monash

Syndicate content
Choices in data management and analysis
Updated: 3 hours 55 min ago

Issues in enterprise application software

Wed, 2015-11-11 07:39

1. I think the next decade or so will see much more change in enterprise applications than the last one. Why? Because the unresolved issues are piling up, and something has to give. I intend this post to be a starting point for a lot of interesting discussions ahead.

2. The more technical issues I’m thinking of include:

  • How will app vendors handle analytics?
  • How will app vendors handle machine-generated data?
  • How will app vendors handle dynamic schemas?
  • How far will app vendors get with social features?
  • What kind of underlying technology stacks will app vendors drag along?

We also always have the usual set of enterprise app business issues, including:

  • Will the current leaders — SAP, Oracle and whoever else you want to include — continue to dominate the large-enterprise application market?
  • Will the leaders in the large-enterprise market succeed in selling to smaller markets?
  • Which new categories of application will be important?
  • Which kinds of vendors and distribution channels will succeed in serving small enterprises?

And perhaps the biggest issue of all, intertwined with most of the others, is:

  • How will the move to SaaS (Software as a Service) play out?

3. I’m not ready to answer those questions yet, but at least I’ve been laying some groundwork.

Along with this post, I’m putting up a three post series on the history of enterprise apps. Takeaways include but are not limited to:

  • Application software is a very diverse area. Different generalities apply to different parts of it.
  • A considerable fraction of application software has always been sold with the technology stack being under vendor control. Examples include most app software sold to small and medium enterprises, and much the application software that Oracle sells.
  • Apps that are essentially distributed have often relied on different stacks than single-site apps. (Duh.)

4. Reasons I see for the enterprise apps area having been a bit dull in recent years include:

5. But I did do some work in the area even so. :) Besides posts linked above, other things I wrote relevant to the present discussion include:


Couchbase 4.0 and related subjects

Thu, 2015-10-15 09:17

I last wrote about Couchbase in November, 2012, around the time of Couchbase 2.0. One of the many new features I mentioned then was secondary indexing. Ravi Mayuram just checked in to tell me about Couchbase 4.0. One of the important new features he mentioned was what I think he said was Couchbase’s “first version” of secondary indexing. Obviously, I’m confused.

Now that you’re duly warned, let me remind you of aspects of Couchbase timeline.

  • 2 corporate name changes ago, Couchbase was organized to commercialize memcached. memcached, of course, was internet companies’ default way to scale out short-request processing before the rise of NoSQL, typically backed by manually sharded MySQL.
  • Couchbase’s original value proposition, under the name Membase, was to provide persistence and of course support for memcached. This later grew into a caching-oriented pitch even to customers who weren’t already memcached users.
  • A merger with the makers of CouchDB ensued, with the intention of replacing Membase’s SQLite back end with CouchDB at the same time as JSON support was introduced. This went badly.
  • By now, however, Couchbase sells for more than distributed cache use cases. Ravi rattled off a variety of big-name customer examples for system-of-record kinds of use cases, especially in session logging (duh) and also in travel reservations.
  • Couchbase 4.0 has been in beta for a few months.

Technical notes on Couchbase 4.0 — and related riffs :) — start:

  • There’s a new SQL-like language called N1QL (pronounced like “nickel”). I’m hearing a lot about SQL-on-NoSQL these days. More on that below.
  • “Index”, “data” and “query” are three different services/tiers.
    • You can run them all on the same nodes or separately. Couchbase doesn’t have enough experience yet with the technology to know which choice will wind up as a best practice.
    • I’m hearing a lot about heterogeneous-node/multi-tier DBMS architectures these days, and would no longer stand by my 2009 statement that they are unusual. Other examples include Oracle Exadata, MySQL, MongoDB (now that it has pluggable storage engines), MarkLogic, and of course the whole worlds of Hadoop and Spark.
  • To be clear — the secondary indexes are global, and not tied to the same nodes as the data they index.
  • There’s a new back end called ForestDB, but if I understood correctly, it’s used just for the indexes, not for the underlying data.
  • ForestDB represents Couchbase indexes in something that resembles b-trees, but also relies on tries. Indeed, if I’m reading the relevant poster correctly, it’s based on a trie of b-trees.
  • In another increasingly common trend, Couchbase uses Bloom filters to help decide which partitions to retrieve for any particular query.

Up to a point, SQL-on-NoSQL stories can be fairly straightforward.

  • You define some kind of a table,* perhaps in a SQL-like DDL (Data Description Language).
  • SELECT, FROM and WHERE clauses work in the usual way.
  • Hopefully, if a column is going to have a lot of WHERE clauses on it, it also has an index.

For example, I think that’s the idea behind most ODBC/JDBC drivers for NoSQL systems. I think it’s also the idea behind most “SQL-like” languages that NoSQL vendors ship.

*Nobody I talk to about this ever wants to call it a “view”, but it sure sounds like a view to me — not a materialized view, of course, but a view nonetheless.

JOIN syntax can actually be straightforward as well under these assumptions. As for JOIN execution, Couchbase pulls all the data into the relevant tier, and nested loop execution there. My new clients at SequoiaDB have a similar strategy, by the way, although in their case there’s a hash join option as well.

But if things stopped there, they would miss an important complication: NoSQL has nested data. I.e., a value can actually be an array, whose entries are arrays themselves, and so on. That said, the “turtles all the way down” joke doesn’t quite apply, because at some point there are actual scalar or string values, and those are the ones SQL wants to actually operate on.

Most approaches I know of to that problem boil down to identifying particular fields as table columns, with or without aliases/renaming; I think that’s the old Hadapt/Vertica strategy, for example. Couchbase claims to be doing something a little different however, with a SQL-extending operator called UNNEST. Truth be told, I’m finding the N1QL language reference a bit terse, and haven’t figured out what the practical differences vs. the usual approach are, if any. But it sounds like there may be some interesting ideas in there somewhere.