RE: No to SQL? Anti-database movement gains steam

From: Matthew Zito <mzito_at_gridapp.com>
Date: Mon, 6 Jul 2009 11:51:05 -0400
Message-ID: <C0A5E31718FC064A91E9FD7BE2F081B102288710_at_exchange.gridapp.com>


See inline, and snipped as best as possible to keep things within the quote limit, and a generally long email. Back to work.

> -----Original Message-----
>
> > I'd be very careful making these kinds of statements. In my
experience,
> > the folks working at companies like Google, Facebook, MySpace, Ning,
> > LiveJournal, etc. are easily as bright and experienced as the folks
who
> > work in tech at banks, pharmaceuticals, etc.
>
> "very careful" doesn't even make it into scope. And before any veiled
> mentions
> of my "career" are brought up by anyone:
> I have had a career, don't need another one. That's why I can speak
with
> true
> independence, instead of "toeing lines". People seem to like that I do
so,
> given
> they are willing to pay for it.
>
>
> Now: the simple fact here is that folks from Google, Facebook,
Myspace,
> Ning
> etcetc, and what they do as far as IT goes, are absolutely and totally
> irrelevant to the VAST majority of enterprise business.
>
> For starters most of them don't have prior baggage: they can afford to
do
> something totally new with no concerns whatsoever about existing
> data/code.

<snip with good commentary on how web companies have different IT needs>
> Google et all are a drop in the ocean in the IT market and what they
do
> does NOT
> define the general market, not by a long shot.
>
> Which is exactly the point Sunil made and I agreed to in my reply.

Well, so of course I'd never think about impugning your credentials - you've contributed long enough and articulately enough to this forum that it's clear you know what you're talking about.

However, my "very careful" comment points out how you're saying two different things here. If we go back to your original statement that I was replying to, you said:

" Bingo. IOW, a group of inexperienced and incompetent developers decides to "write a web 2.0 site" and shazam, now "ALL enterprises should do the same"."

My point was simply that calling them incompetent is a dangerous path. It's the old, "Not Invented Here" syndrome - i.e., the way we do things has worked for us, so someone who does something different must clearly be incompetent.

Now, however, you clarify your point to mean that the folks in the web space have concerns that are totally different than those in more traditional enterprise IT. I agree largely with that statement, and I assume that you're no longer calling these developers with different needs and requirements "incompetent" and "inexperienced".

I also agree that they're starting fresh - they are, after all, startups. Existing enterprises have an existing codebase, internal expertise, etc.

However, I believe that it's important to consider ways in which new technologies can be leveraged to add efficiency, performance, etc. For example, if you look at some of my banking customers, while they have a ton of traditional J2EE+Oracle+EMC storage infrastructure - a lot of them also use proprietary C++ + Memcache + custom built non-SQL data stores for things like algorithmic trading. For them, their needs are specific enough, and the upside large enough, that it's worth looking at other options. So clearly even traditional Enterprise IT has areas where standard relational data stores aren't an appropriate decision.

> > They've simply made a different determination - that the cost of
using a
> > relational database in a scale-up or scale-out configuration is
greater
> > than the cost of using one of these non-traditional data stores.
>
>
> Nevertheless, I'd like to see factual proof that non-traditional data
> stores can
> indeed provide that scalability whereas traditional ones can't.
>
> That proof better be a litle more than just "because it works at such
and
> such".
> Such is no proof whatsoever that:
>
> 1- it indeed is the *only* solution for that such and such.
> 2- it does apply to ALL others.
>
> Which is what the demented fringe of Web2.0 is trying to convince the
> world of.
> In good old web 1.0 lunatic fring style: history after all is cyclic.

Bear in mind that I think everyone on this thread agrees with the idea that "Because it works for XXXX, it must be great for everything" is silly. If I had to judge, I'd say that's reporter mojo speaking there, or punditry in action.

Most of the folks I know at these web companies that are working with this type of tech *also* are large consumers of MySQL, PostgreSQL, the odd bit of Oracle or SQL Server here or there. Even they don't think that the non-relational model is appropriate for everything. However, they believe that operations with:

- large degrees of data independence
- very high concurrent query levels
- high levels of throughput
- very strong sensitivity to latency
- a need to scale linearly

Simply don't work well with traditional relational databases, and hence you have these non-traditional data stores as alternative options for these types of workloads.

But just as one example, there's Facebook's Cassandra project, which I picked because a good friend of mine works for Facebook, and I happened to have just been reading about it a few weeks ago. Cassandra is the rapidly growing semi-structured data store for their user information. It was started as a way to do full-text search for user inboxes, and is being extended to support more and more operational data at Facebook. Some notes from their configuration:

- Approximately 600+ cores as of late '08
- Approximately 120TB of disk space
- 25TB of indexes
- 4B worker threads
- Average ~12ms response time for a search
- Software level features like automatic partitioning, distributed local
and remote replication, insert/append without read, automated data file collapse and aggregation,

Now certainly, you can build a >100TB Oracle instance, but the cost and the complexity would be challenging. In addition, presumably they only see this data store growing, and how do you deal with a 200, 300, 400TB Oracle instance? Google, for example, in 2006 had approximately 1.2PB of data in their structured data store. Heaven knows what it is now.

> I don't know. But we did not spend anywhere near as much as many
others,
> we
> churn through 0.5TB per day, and it has trebled in one year.
>
> Our business is good old commercial property management. Something
that
> is
> traditionally "low volume"
>
> Yet our Oracle DW db seems to manage quite well with the above, thank
you.
> Of course: we collapse data periodically as well. And aggressively
so.

Right, something that is not an option for most of these organizations. To use the gmail/facebook/my ad startup example, collapsing data means you lose data. In the case of the advertising startup, they realistically can only collapse user persistence data they haven't seen for a very long time. Real-time analytics is critical for making ad display decisions, ad placement optimization, spend analytics, etc.

Aggregate data is death for some workloads.

>
> True. But how many sites are there in general IT that can afford the
cost
> of
> developing and maintaining their own apps from scratch as well as
> implementing
> an entirely new data store technology, incompatible with their
existing
> one?
>
> I lost track of how many years ago I saw the last one, outside of the
> lunatic
> web fringe.
>
> The vast majority nowadays is running some form of third party app or
code
> that
> does most of what they need and refuse point blank to spend one cent
with
> inhouse development of replacements.
>
> It might surprise a lot of the web 2.0 folks, but the biggest cost in
IT
> nowadays is inhouse development. Much more so than anything Oracle
might
> charge.

What you may not realize is that those stats include the cost of the DBAs, as they get accounted along with the development organization.

It's all about core competency. If you're a property management company, it makes zero sense to build your own email system and search index. It has nothing to do with your business.

If your business is vanilla enough, sure, go buy COTS, maybe do a little tweaking and customization. If you don't need to write an application, don't.

But there are use cases that don't map to vanilla software packages or COTS. For example, when you look at our business, we're an automation company, and hence we need the ability to have workflows - conditional execution, branching, parallelism, etc. Now, there are commercially available workflow engines that we could have used to power our automation software. But a) they don't map properly to the dynamic ways we need to generate workflows, at least not without enough gyrations that it isn't worth it, b) that's a software cost that scales with the product - every time we close a customer, we have to pay a certain amount to the workflow vendor, and c) the workflow itself *is our core competency*. As a comparison, we use things like PostgreSQL, ACE, OpenSSL, etc. in our product because they're simply convenient pieces of software that are not core to our business.

So, any sane business - you look at what is or isn't your core competency, and how closely COTS maps to your core competency, and make decisions as best you can.  

> Let me cite one small example of how costs can blow out with the web
2.0
> stuff.
>

<snip story about custom development vs. COTS vs. SaaS>
>
> Cost of re-training staff and users? Nill!
> Performance and scalability? It now copes faster with 20 times more
data
> than
> the original version did, 10 years ago.
> Cost of integration into existing infra-structure? Nill!
> Try something like this with the new fangled non-traditional data
stores
> and
> their necessarily custom apps and check how much it'll cost. So much
for
> the
> "cost-effective web 2.0 cloud" nonsense.
>

With all due respect, you can hardly hold up one example where a project was (what sounds to be) poorly managed from start to finish and tar an entire option.

I have a contravening example. IHAC that was running on a ridiculously old legacy, custom written, terminal-driven, ERP system that everyone loved. For a series of reasons I can't get into, they made the very right decision that it was time to upgrade to something "this decade" as you put it.

They don't manufacture ball bearings - they manufacture unbelievably complex, very specialized pieces of equipment - to the tune of thousands of individual parts per units, and they produce a few units a month.

They looked at hiring developers to rewrite their app in something more modern, and they looked at buying Oracle E-business suite. They were sold on the "off the shelf" nature of E-biz, and hired a consulting firm to do the customizations for the reporting, etc. for their business.

The result? The project was delayed by a year and a half, users hated it, it screwed up manufacturing orders, and was overall a huge mess.

The mistake they made was that manufacturing management *is* a core competency for them, given their business. Trying to map a traditional solution to their model created something that was half off-the-shelf, half written from scratch, and all a mess.

> > Of course, the article is overblown and hyperbolic, because that
makes
> > for a much better story.
>
> Exactly. That seems to be a constant with the web 2.0 brigade. It
doesn't
> help
> their cause one single bit: everyone still remembers the web 1.0 tech
> wreck,
> where the same was rampant.
>

I don't know - a lot of great stuff came out of the Web 1.0 "tech wreck":

- Linux
- Commodity compute
- Distributed clusters
- Grid Computing
- MySQL/PostgreSQL
- Open Source
- Web-based applications
- Content Delivery Networks
- Datacenter Automation/Configuration Management

These are all things that either became powerhouses in their own right, or fueled the next gen of technology.

To be honest, I hear the same hype from traditional Enterprise IT, and even from Oracle itself. Let's sample the main link on Oracle.com today:

" With the launch of Oracle Fusion Middleware 11g, Oracle is fundamentally transforming the way its customers develop, run and manage their custom, packaged and composite business applications. Unprecedented integration across the industry's most complete middleware stack-including application server, SOA, BPM, BI and content management technologies-will help Oracle customers build agile, adaptable applications in ways that were not possible until now."

There's not a bit of hype there.  

>
> Fact is: "not going anywhere" is tremendously cost-effective and
efficient
> if perfectly capable of coping with general purpose requirements.
>
> Storage models that purport to be "better" need to first define
exactly
> how general purpose they can be.
>
> Any fool can create a custom designed system, with custom designed
code,
> and end up with a fast result. Heck: I know quite a few folks who
could
> write a lot of
> apps in Assembler and make them lighting fast. Still true today.
> Would anyone in the enterprise universe pay them to do so? No way
> Are web 2.0 and these non-traditional data stores easily maintainable?
> No: it is custom code, any changes will involve costly recoding.
Calling
> it "refactoring" instead of "recoding" doesn't make it any less
costly.
> Change in requirements is a constant in modern IT. Ergo: these
> technologies are
> inappropriate and costly.

I think it's odd you'd assume that they're not "easily maintainable", especially if we're comparing it to Oracle. First of all, you have access to the code, and if there were a critical issue, you could walk over to the developers who wrote it, smack them on the head, and make them fix it.

In addition, the levels of operational efficiency that have been suggested by folks like Google, etc. are extraordinary. While they develop their own software in-house, they build it to be fault-tolerant and self-healing, and hence numbers are frequently thrown around of tens of thousands of servers per administrator. I tried to find some hard stats around this, but they keep it close to the vest.

Again, not to keep hammering this home, it's about your core competency. If your organization's core competency is IT in one way or another, then it might make sense to build something rather than buy it.

The beauty of open-source today is, these companies are open sourcing what they've created. Now, if Cassandra looks like the right solution for you - you don't need to build it. Just download and install it. You can then decide if you want to develop a competency in supporting it, but that gets rid of the whole overhead in writing it from scratch.  

> True IT professionalism and responsibility picks a general purpose
data
> store
> and app technology and makes it perform within the requirements, for a
> much
> reduced overall cost and with easy and cheap maintenance.
>
> That is what the IT enterprise market is all about. Ferraris are great
for
> show,
> but what is really cost effective for day to day use is a station
wagon.
> The
> rest is hype.

To extend this analogy further, if what you really need is an 18-wheeler with refrigeration and three different levels of chilled compartments, you don't buy three times as many station wagons and put varying levels of ice in them. You build an 18 wheeler with what you need.

I'll give you an example - Back in the Day (tm) at Register.com, we had a fraction of the budget of a lot of other web startups, and hence we wrote our own monitoring software, and bought Linux boxes, and invested in smart load balancers, etc. I remember when I was building out our collocation facilities, the other startups around us were all using nice big Solaris boxes. When we were rackmounting the VA Linux boxes we were buying for $2k/ea, people would literally come ask me why I had so many tiny boxes, and thought it freakish that anyone would run Linux for the website. After all, "Solaris is a REAL Operating System".

And for sure, we hit bugs in Linux that we would not have hit with Solaris, and we had to accept a higher level of downtime at an individual server level. But we built that into our platform and our load balancers and our monitoring infrastructure. And in the end, we were able to build, manage, scale, monitor, and operate that farm for less than just the CapEx would have been to buy the equivalent capacity in Sun gear and an off the shelf monitoring solution.

These days, almost everyone uses Linux somewhere in their infrastructure. Many people still use Solaris. They each serve a purpose. But this was something that was "new" and "hyped" and turned out to actually be pretty darn good.

> > So why can't we have both?
>
> Of course we can have - and need! - both. Ferraris do exist and serve
a
> purpose. What we can hardly afford is yet another round of demented
"new
> black"
> where the whole of IT is told to ditch tried and proven cost effective
> technology for something that can only fit, at the very and costly
best, a
> niche.
>
> Which is what those articles are clearly promoting and why they need
to be
> exposed for the fraud they are.
>

It's not "fraud", it's just "hype", something that is rampant in technology, and the world in general. It would be nice if reporters were a little more skeptical.

Thanks,
Matt

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jul 06 2009 - 10:51:05 CDT

Original text of this message