Skip navigation.

Curt Monash

Syndicate content
Choices in data management and analysis
Updated: 15 hours 49 min ago

Notes on analytic technology, May 13, 2015

Wed, 2015-05-13 20:38

1. There are multiple ways in which analytics is inherently modular. For example:

  • Business intelligence tools can reasonably be viewed as application development tools. But the “applications” may be developed one report at a time.
  • The point of a predictive modeling exercise may be to develop a single scoring function that is then integrated into a pre-existing operational application.
  • Conversely, a recommendation-driven website may be developed a few pages — and hence also a few recommendations — at a time.

Also, analytics is inherently iterative.

  • Everything I just called “modular” can reasonably be called “iterative” as well.
  • So can any work process of the nature “OK, we got an insight. Let’s pursue it and get more accuracy.”

If I’m right that analytics is or at least should be modular and iterative, it’s easy to see why people hate multi-year data warehouse creation projects. Perhaps it’s also easy to see why I like the idea of schema-on-need.

2. In 2011, I wrote, in the context of agile predictive analytics, that

… the “business analyst” role should be expanded beyond BI and planning to include lightweight predictive analytics as well.

I gather that a similar point is at the heart of Gartner’s new term citizen data scientist. I am told that the term resonates with at least some enterprises. 

3. Speaking of Gartner, Mark Beyer tweeted

In data management’s future “hybrid” becomes a useless term. Data management is mutable, location agnostic and services oriented.

I replied

And that’s why I launched DBMS2 a decade ago, for “DataBase Management System SERVICES”. :)

A post earlier this year offers a strong clue as to why Mark’s tweet was at least directionally correct: The best structures for writing data are the worst for query, and vice-versa.

4. The foregoing notwithstanding, I continue to believe that there’s a large place in the world for “full-stack” analytics. Of course, some stacks are fuller than others, with SaaS (Software as a Service) offerings probably being the only true complete-stack products.

5. Speaking of full-stack vendors, some of the thoughts in this post were sparked by a recent conversation with Platfora. Platfora, of course, is full-stack except for the Hadoop underneath. They’ve taken to saying “data lake” instead of Hadoop, because they believe:

  • It’s a more benefits-oriented than geek-oriented term.
  • It seems to be more popular than the roughly equivalent terms “data hub” or “data reservoir”.

6. Platfora is coy about metrics, but does boast of high growth, and had >100 employees earlier this year. However, they are refreshingly precise about competition, saying they primarily see four competitors — Tableau, SAS Visual Analytics, Datameer (“sometimes”), and Oracle Data Discovery (who they view as flatteringly imitative of them).

Platfora seems to have a classic BI “land-and-expand” kind of model, with initial installations commonly being a few servers and a few terabytes. Applications cited were the usual suspects — customer analytics, clickstream, and compliance/governance. But they do have some big customer/big database stories as well, including:

  • 100s of terabytes or more (but with a “lens” typically being 5 TB or less).
  • 4-5 customers who pressed them to break a previous cap of 2 billion discrete values.

7. Another full-stack vendor, ScalingData, has been renamed to Rocana, for “root cause analysis”. I’m hearing broader support for their ideas about BI/predictive modeling integration. For example, Platfora has something similar on its roadmap.

Related links

  • I did a kind of analytics overview last month, which had a whole lot of links in it. This post is meant to be additive to that one.