Re: No to SQL? Anti-database movement gains steam

From: Nuno Souto <>
Date: Fri, 10 Jul 2009 19:05:47 +1000
Message-ID: <>

Matthew Zito wrote,on my timestamp of 7/07/2009 1:51 AM:

> My point was simply that calling them incompetent is a dangerous path.
> It's the old, "Not Invented Here" syndrome - i.e., the way we do things
> has worked for us, so someone who does something different must clearly
> be incompetent.

Or much more simply: so completely outside of general purpose IT as to be totally and completely irrelevant other than as an odd curiosity. Just like a Ferrari is.

> traditional enterprise IT. I agree largely with that statement, and I
> assume that you're no longer calling these developers with different
> needs and requirements "incompetent" and "inexperienced".

A lot of them I am. For a number of very good reasons. Most have NEVER EVER even attempted to write correct SQL or design a simple database. All they do is cobble together some pasted code from other applications, spread it over as many systems as they can to make it perform minimally acceptably, and then claim it's the only way to achieve top performance. Total bollocks.

> However, I believe that it's important to consider ways in which new
> technologies can be leveraged to add efficiency, performance, etc. For
> example, if you look at some of my banking customers, while they have a
> ton of traditional J2EE+Oracle+EMC storage infrastructure - a lot of
> them also use proprietary C++ + Memcache + custom built non-SQL data
> stores for things like algorithmic trading. For them, their needs are
> specific enough, and the upside large enough, that it's worth looking at
> other options. So clearly even traditional Enterprise IT has areas where
> standard relational data stores aren't an appropriate decision.

Sorry, I don't follow this one. They have needs that are specific enough that it's worth looking outside the square and that is proof enterprise IT - which was never specific - needs to do the same? My apologies, but non-sequitur. Still, not a major point so don't fret on it.

> - large degrees of data independence
> - very high concurrent query levels
> - high levels of throughput
> - very strong sensitivity to latency
> - a need to scale linearly
> Simply don't work well with traditional relational databases, and hence
> you have these non-traditional data stores as alternative options for
> these types of workloads.

Good. And that is precisely where they should stay: in the realm of the very specific and vertical markets they come from.

Or are we supposed to believe that Joe Average in the shopping centre corner store - or indeed just about any commercial venture outside of the web-specific market (believe it or not, they are the majority of IT users) - also needs 15PB data stores with sub-microsecond query times over 100K clients?

I think not...

> It was started as a way to do full-text search for user inboxes, and is
> being extended to support more and more operational data at Facebook.
> Some notes from their configuration:
> - Approximately 600+ cores as of late '08
> - Approximately 120TB of disk space
> - 25TB of indexes
> - 4B worker threads
> - Average ~12ms response time for a search
> - Software level features like automatic partitioning, distributed local
> and remote replication, insert/append without read, automated data file
> collapse and aggregation,
> Now certainly, you can build a >100TB Oracle instance, but the cost and
> the complexity would be challenging. In addition, presumably they only
> see this data store growing, and how do you deal with a 200, 300, 400TB
> Oracle instance? Google, for example, in 2006 had approximately 1.2PB
> of data in their structured data store. Heaven knows what it is now.

Exactly. Like I said: specific, vertical markets that have no influence whatsoever on how general purpose IT is carried out.

And I doubt half those numbers are valid. One thing is to add up total disk capacity, another is to call it the active data store for queries. The two couldn't be more different.

But let me emphasize one particular area of your points above, which is very close to me:

"> - Average ~12ms response time for a search
 > - Software level features like automatic partitioning, distributed local
 > and remote replication, insert/append without read, automated data file
 > collapse and aggregation,


I am a strong believer that as a search-enabling technology for very large data stores, indexes are way under-powered. Extensive, automatic partitioning is the way of the future. Oracle 11g has made incredible strides in that direction and it is my belief it will continue to do so. No need to change the relational model: just improve it.

I wrote in my blog a few years ago what and how I considered we could address this very large data store problem. It was the "No Moore" series of posts. Won't repeat it here, still there for anyone to check.

But in a nutshell: Moore's law is history. we cannot continue to use "brute force" to approach searches and processing of very large data stores. You know: the "personal Petabyte" and other such.

I have proven to my own satisfaction with our own DW that extensive partitioning is indeed one way of addressing this problem of fast searches of very large data sets without the need for huge indexing an its associated maintenance nightmare.

Only in that sense do I find it interesting to follow some of the new developments.

The rest of Facebook and its specifics, quite frankly, is irrelevant to general purpose IT.

> To use the gmail/facebook/my ad startup example, collapsing data means
> you lose data. In the case of the advertising startup, they
> realistically can only collapse user persistence data they haven't seen
> for a very long time. Real-time analytics is critical for making ad
> display decisions, ad placement optimization, spend analytics, etc.
> Aggregate data is death for some workloads.

The same problem applies to search engine marketing, for example. And yet, I've seen them address that problem with extensive pre-"crunching" of data and then collapsing the results to a RDBMS. In fact, I worked at the company that handled Google traffic for two years, doing just that. Guess what we used?

Oracle, Perl, C, Linux on pizza boxes.

And it coped easily with Google traffic volumes back then. Still does.

So, it can be done. It however requires folks who know what data management and storage is all about, not just "trendy" buzzwording.

> What you may not realize is that those stats include the cost of the
> DBAs, as they get accounted along with the development organization.

The "cost of DBAs" is grossly exaggerated and has been for years now. We run an entire 75B$ organization with 3 DBAs, with SQL Server and Oracle data stores. NO ONE can convince me that our cost is a significant factor in our overall IT costing. To do so is to basically lie. In fact, knowing exactly how much we spend in IT and what our DBAs cost, I can confidently guarantee to you that the so-called "excessive DBA cost" is complete boulderdash.

> It's all about core competency. If you're a property management
> company, it makes zero sense to build your own email system and search
> index. It has nothing to do with your business.

Bingo. And that goes for the vast majority of IT users out there.

> With all due respect, you can hardly hold up one example where a project
> was (what sounds to be) poorly managed from start to finish and tar an
> entire option.

That project and other similar I have seen repeated ad nauseum in IT in the last few years. Would you like me to provide heaps of examples? I can...

> The mistake they made was that manufacturing management *is* a core
> competency for them, given their business. Trying to map a traditional
> solution to their model created something that was half off-the-shelf,
> half written from scratch, and all a mess.

That, I am sorry, reinforces my point: they should have looked at the solution they had and which was satisfactory, and find ways of running it faster/more efficiently. Instead of gong for "modern" solutions with absolutely no fit whatsoever to their business.

> I don't know - a lot of great stuff came out of the Web 1.0 "tech
> wreck":
> - Linux
> - Commodity compute
> - Distributed clusters
> - Grid Computing
> - MySQL/PostgreSQL
> - Open Source
> - Web-based applications
> - Content Delivery Networks
> - Datacenter Automation/Configuration Management
> These are all things that either became powerhouses in their own right,
> or fueled the next gen of technology.

Sure. But don't forget that not a single one of those is applicable only to a vertical market. Which is what those non-SQL solutions are.

> To be honest, I hear the same hype from traditional Enterprise IT, and
> even from Oracle itself. Let's sample the main link on
> today:

Absolutely! Oracle is not above hype by any means!

But I have yet to see proof that ANY modern j2ee or otherwise custom-designed system using non-general purpose IT technologies can be maintained easily. It is so complex to do that they even invented a new term to mask the need for re-writing: "refactoring". Which is itself the biggest hype I've seen.

> Again, not to keep hammering this home, it's about your core competency.
> If your organization's core competency is IT in one way or another, then
> it might make sense to build something rather than buy it.

Bingo. Now: exactly how many companies do IT as core competency, compared to the market of IT users? Do I need to continue?

> These days, almost everyone uses Linux somewhere in their
> infrastructure. Many people still use Solaris. They each serve a
> purpose. But this was something that was "new" and "hyped" and turned
> out to actually be pretty darn good.

Because it is GENERAL PURPOSE. NOT a vertical market or very specific. THAT, is WHY they were successful.
> It's not "fraud", it's just "hype", something that is rampant in
> technology, and the world in general. It would be nice if reporters
> were a little more skeptical.

I call it fraud. But it's OK to disagree there. ;)

Nuno Souto
Received on Fri Jul 10 2009 - 04:05:47 CDT

Original text of this message