RE: No to SQL? Anti-database movement gains steam

From: Matthew Zito <>
Date: Fri, 10 Jul 2009 15:35:39 -0400
Message-ID: <>

See inline. Longest thread ever.

> -----Original Message-----
> From: Nuno Souto []
> Sent: Friday, July 10, 2009 5:06 AM
> To: Matthew Zito
> Cc:
> Subject: Re: No to SQL? Anti-database movement gains steam
> Matthew Zito wrote,on my timestamp of 7/07/2009 1:51 AM:
> > My point was simply that calling them incompetent is a dangerous
> > It's the old, "Not Invented Here" syndrome - i.e., the way we do
> > has worked for us, so someone who does something different must
> > be incompetent.
> Or much more simply: so completely outside of general purpose IT as
to be
> totally and completely irrelevant other than as an odd curiosity.
> like a
> Ferrari is.

This is also the perspective that people have variously had about Linux, the web, network attached computers, Ethernet, Oracle RAC, grid computing, etc. etc. at the time of their emergence.

Today they are all part of general purpose IT. Is it so hard to believe that non-relational databases might eventually be as well?

> > traditional enterprise IT. I agree largely with that statement, and
> > assume that you're no longer calling these developers with different
> > needs and requirements "incompetent" and "inexperienced".
> A lot of them I am. For a number of very good reasons. Most have
> even attempted to write correct SQL or design a simple database. All
> do is
> cobble together some pasted code from other applications, spread it
> as many
> systems as they can to make it perform minimally acceptably, and then
> claim it's
> the only way to achieve top performance. Total bollocks.

My word, there's a lot of generalizations in there - "most", "a lot of", "all they do is". To say that folks at Google, Facebook, Yahoo, etc. have just "cobbled together some pasted code" is patronizing, as well as just plain wrong. I have no doubt that some organizations could have used a relational database instead, or made a bad decision by using one of these alternate solutions. I also find many companies that make bad decisions by buying Oracle when they don't need to. People make mistakes.

However, there are people at these organizations who are top computer scientists and developers. You only have to look at the papers they've released to see some of the impressive technology they've built. We can disagree all we want about whether they made the *right* decision by building or using a non-relational data store, but how can you say "most have never ever attempted to write correct SQL" - how can you possibly know that?

Also, many of the people who actually develop these solutions will never claim that it's the "only" way to achieve top performance, but that it's the most cost-effective way to get the performance they need based on their requirements.

By the way, here's a list of papers by Googlers:

There's some pretty impressive things in there, including work on how to use speculative threads to improve relational database performance. As I said before, all of these companies use relational databases as well - so presumably someone there knows how to write SQL.

> > However, I believe that it's important to consider ways in which new
> > technologies can be leveraged to add efficiency, performance, etc.
> > example, if you look at some of my banking customers, while they
have a
> > ton of traditional J2EE+Oracle+EMC storage infrastructure - a lot of
> > them also use proprietary C++ + Memcache + custom built non-SQL data
> > stores for things like algorithmic trading. For them, their needs
> > specific enough, and the upside large enough, that it's worth
looking at
> > other options. So clearly even traditional Enterprise IT has areas
> > standard relational data stores aren't an appropriate decision.
> Sorry, I don't follow this one. They have needs that are specific
> that
> it's worth looking outside the square and that is proof enterprise IT
> which
> was never specific - needs to do the same? My apologies, but non-
> sequitur.
> Still, not a major point so don't fret on it.

My point is that they ARE enterprise IT. Before, you held up the "web 2.0" companies as being different than "Enterprise IT" - well, here's some app dev guys at a bank, which is about as traditional a picture of enterprise IT as you can get, who have decided it's worth considering alternative/emerging technologies.

My overarching point is that why *shouldn't* traditional companies be at least considering this tech? If you don't need it, great. If you do, think about it.  

> > What you may not realize is that those stats include the cost of the
> > DBAs, as they get accounted along with the development organization.
> The "cost of DBAs" is grossly exaggerated and has been for years now.
> We run an entire 75B$ organization with 3 DBAs, with SQL Server and
> data
> stores. NO ONE can convince me that our cost is a significant factor
> our
> overall IT costing. To do so is to basically lie.
> In fact, knowing exactly how much we spend in IT and what our DBAs
cost, I
> can
> confidently guarantee to you that the so-called "excessive DBA cost"
> complete
> boulderdash.

Well, it depends on your business. My team and myself have the somewhat unique opportunity to spend time in probably close to 100-200 DBA organizations a year - obviously, some of those in more detail than others. Over the past years, I have seen:

  • $4b logistics company with no DBAs at all
  • a 10s 0f $b logistics company with ~125 DBAs
  • ~$5-10m startup with 50 DBAs
  • A large financial services company with 500-600 DBAs
  • A massive pharmaceutical company with 8 DBAs

It totally varies, sometimes by vertical, sometimes by perception of the value of the DBAs, sometimes by their overall dependence on IT, sometimes even just by their definition of a DBA - I ran into a company once where their application development teams were all the DBAs, they didn't have a separate job function. The devil is in the details.

However, people are expensive. Average DBA salary, fully loaded, is approximately $150-180k in the US, with a 20% variability based on location. Bear in mind that we're including the cost of health care and other benefits, payroll taxes, insurance, and even facilities cost in that "fully loaded" number.

That's a lot of money per person. On top of that, I consistently hear that quality DBAs are among the hardest IT positions to hire - there was a stat somewhere from a recruiting agency, possibly Robert Half, I don't remember, where they talked about the average time to hire a DBA was approximately double that of any other operational IT position. Average 6 months vs. 3 months, perhaps?

So, as an organization, I can see wanting to minimize the DBA and database overhead as much as possible. This, of course, does not automatically mean that low-maintenance, custom-written applications are the way to go, and let's throw Oracle out the window. But I think it's important to consider balancing different considerations.  

> > It's all about core competency. If you're a property management
> > company, it makes zero sense to build your own email system and
> > index. It has nothing to do with your business.
> Bingo. And that goes for the vast majority of IT users out there.

I also think there's a point being missed here that you don't always have to build these solutions from scratch. Memcached was built by LiveJournal/Danga to improve their performance, and they open-sourced it. Now it's used all over the place - and I agree, probably sometimes in places where it doesn't need to be/shouldn't be.

But, if I had an application that was negatively impacting my database because of excessive transitive data or high quantity of I/Os, I don't have to write a caching layer from scratch. I can just use memcached.

The other implication of this, of course, is that I don't need my DBAs to invest effort in improving the performance - I can simply buy two linux boxes w/ 32GB of RAM each and cluster them with memcached. There's a one-time development cost to convert my app to talk to memcached, but that might be considerably less cost than:

- tuning my database
- buying bigger/faster storage
- buying a bigger server and more database licenses
- all of the above

Or it might not be. I think that's the more concerning implication - if you don't need as many smart, hard-to-hire, expensive DBAs by just throwing some extra hardware at the problem and a little extra app dev time, then that could have a real impact on the nature of being a DBA in the future.

> > The mistake they made was that manufacturing management *is* a core
> > competency for them, given their business. Trying to map a
> > solution to their model created something that was half
> > half written from scratch, and all a mess.
> That, I am sorry, reinforces my point: they should have looked at the
> solution
> they had and which was satisfactory, and find ways of running it
> faster/more
> efficiently. Instead of gong for "modern" solutions with absolutely
> fit
> whatsoever to their business.

I can't go into specifics, but the company that made the server and the OS it ran on had not been in business for ~15 years. It had to have a firewall in front of it, because if an errant ping hit its custom-built Ethernet interface, the whole server would crash.

They had two options - rewrite from scratch, or buy off the shelf.

> > I don't know - a lot of great stuff came out of the Web 1.0 "tech
> > wreck":
> > - Linux
> > - Commodity compute
> > - Distributed clusters
> > - Grid Computing
> > - MySQL/PostgreSQL
> > - Open Source
> > - Web-based applications
> > - Content Delivery Networks
> > - Datacenter Automation/Configuration Management
> >
> > These are all things that either became powerhouses in their own
> > or fueled the next gen of technology.
> Sure. But don't forget that not a single one of those is applicable
> to a
> vertical market. Which is what those non-SQL solutions are.

Well, you're making my point. At the time, people *thought* various of these technologies were only applicable to a vertical market - i.e., Linux is only for "cheap web companies", or Grid Computing is only for "universities and banks", or DCA is only good for "really big companies". Now, they're mainstream, and they're used when appropriate.

> > Again, not to keep hammering this home, it's about your core
> > If your organization's core competency is IT in one way or another,
> > it might make sense to build something rather than buy it.
> Bingo. Now: exactly how many companies do IT as core competency,
> to
> the market of IT users? Do I need to continue?

Well, it varies widely. When I was at, we considered IT a core competency, because all we really sold were entries in a database (Oracle, Perl, C shop, btw). Most of my large customers consider IT a core competency, at least in their particular field, because they see it as giving them a competitive advantage - if they can do something better/faster/more efficiently, they win.  

> > These days, almost everyone uses Linux somewhere in their
> > infrastructure. Many people still use Solaris. They each serve a
> > purpose. But this was something that was "new" and "hyped" and
> > out to actually be pretty darn good.
> Because it is GENERAL PURPOSE. NOT a vertical market or very
> THAT, is WHY they were successful.

As I said above, people did not consider Linux general purpose at the time. "Linux doesn't scale", "Linux is only for small shops", "Linux isn't enterprise-grade", "Linux is only for webservers", etc. etc. Some of that was true, but it grew into it. I would argue that Linux still isn't fully general purpose, but it's close enough these days.

> >
> > It's not "fraud", it's just "hype", something that is rampant in
> > technology, and the world in general. It would be nice if reporters
> > were a little more skeptical.
> I call it fraud. But it's OK to disagree there.
> ;)

Tomato, tomahto, potato, potahto, let's call the whole thing off.

Next time I'm in Sydney, I'll buy you a beer and we can argue about this in person. :)


Received on Fri Jul 10 2009 - 14:35:39 CDT

Original text of this message