Re: No to SQL? Anti-database movement gains steam

From: Nuno Souto <dbvision_at_iinet.net.au>
Date: Mon, 13 Jul 2009 21:42:08 +1000
Message-ID: <4A5B1D90.3070307_at_iinet.net.au>



Matthew Zito wrote,on my timestamp of 11/07/2009 5:35 AM:

> My word, there's a lot of generalizations in there - "most", "a lot of",
> "all they do is". To say that folks at Google, Facebook, Yahoo, etc.
> have just "cobbled together some pasted code" is patronizing, as well as
> just plain wrong.

Nor at all. Few folks in those companies are real thinkers. Most are just code churners. And certainly the most vocal seem to be, judging by the level of argumentation. I've heard most of their arguments and quite frankly, they suck -   to use one of their expressions.

> However, there are people at these organizations who are top computer
> scientists and developers. You only have to look at the papers they've
> released to see some of the impressive technology they've built.

Oh please: you are not gonna quote that supreme idiot, what's-his-name-Stonebraker, are you? That guy has had more papers "redefining" the entire edifice of database technology than one can poke a stick at. ALL of them - WITHOUT exception, having resulted in failed commercial ventures. Enough is enough! THe guy is a FRAUD! He shouldn't even be allowed to talk anymore, after all the failed "theories" he's blurted over the last 25 years!

> disagree all we want about whether they made the *right* decision by
> building or using a non-relational data store, but how can you say "most
> have never ever attempted to write correct SQL" - how can you possibly
> know that?

Oh, I don't know. Maybe because I dealt first hand with a lot of them? And saw first hand their idiotic and uninformed attitude to anything that is "not invented here"?

> Also, many of the people who actually develop these solutions will never
> claim that it's the "only" way to achieve top performance, but that it's
> the most cost-effective way to get the performance they need based on
> their requirements.

How can they say that, if they never tried another way?

> By the way, here's a list of papers by Googlers:
> http://research.google.com/pubs/papers.html

Yes, I know: I'm familiar with many of them.

> There's some pretty impressive things in there, including work on how to
> use speculative threads to improve relational database performance. As
> I said before, all of these companies use relational databases as well -
> so presumably someone there knows how to write SQL.

There is also a lot of "pie in the sky" and just plain wrong and unworkable technologies in there. At least anywhere where cost-effectiveness is a requirement. IOW: the vast majority of IT.

> My overarching point is that why *shouldn't* traditional companies be at
> least considering this tech? If you don't need it, great. If you do,
> think about it.

Absolutely. Which is completely different from claiming to the four winds that "relational dbs are dead" because someone, somewhere, once, did something different.

> sometimes even just by their definition of a DBA - I ran into a company
> once where their application development teams were all the DBAs, they
> didn't have a separate job function. The devil is in the details.

For sure. I once worked in a project where there were 27 DBAs and 2 developers.   The company? IBM. One would think they, of all, would know better. All you need is one deranged project and the managers to match...

> However, people are expensive. Average DBA salary, fully loaded, is
> approximately $150-180k in the US, with a 20% variability based on
> location. Bear in mind that we're including the cost of health care and
> other benefits, payroll taxes, insurance, and even facilities cost in
> that "fully loaded" number.

Mathew: *ALL* good IT people are expensive. Developers capable of the feats you describe at gogle and so on do *NOT* work for $50/day and live in Pune, India. OK?
So before you blame DBAs for being expensive, at least admit that *ANY* high-end IT person *IS* expensive, *ALWAYS* was, *ALWAYS* will be. It's not just DBAs!

> But, if I had an application that was negatively impacting my database
> because of excessive transitive data or high quantity of I/Os, I don't
> have to write a caching layer from scratch. I can just use memcached.

Or you can re-write it to do less I/Os? By spending sometime in the design of the db with a qualified person, by coding to reduce the need for such I/O.

Instead of the usual "I just want to set a flag and have my data persist", which is at the root of *ALL* excessive I/O problems I've seen so far, without exception. And will *NOT* go away by using a flat file or some other deranged technology.

Mem-cached is nothing new, it's not a new technology. Cripes, OS's, dbs', servers, *ALL* have caching technology, this is *NOTHING* new or even original!

Somewhere, somehow, down the line, one has to provide out-of-memory storage while maitaining the ability to reload that information in a consistent state.

And if you don't do it with a database, you are forever condemned to need to code freshly *ANY* new view of that data.

This is precisely *WHY* databases were invented: to facilitate the retrieval of such information in a consistent manner by *MULTIPLE* applications.

Only deranged application teams who think the world gyrates around one single app are capable of coming up with such beauties as "cache everything".

>
> The other implication of this, of course, is that I don't need my DBAs
> to invest effort in improving the performance - I can simply buy two
> linux boxes w/ 32GB of RAM each and cluster them with memcached.
> There's a one-time development cost to convert my app to talk to
> memcached, but that might be considerably less cost than:
> - tuning my database

Fair enough.

> - buying bigger/faster storage
> - buying a bigger server and more database licenses
> - all of the above

What's any of these got to do with the cost of DBAs? Can we stay on subject, please?

And a few other costs that you conveniently omitted. Like:

  • having to recode the application (I know they call it re-factor, but that's just weasel-speak) for *ANY* changes to the structure or contents of the data. Call it whatever they want, recoding costs heaps more than DBAs.
  • having to *manually* partition the application across those Linux boxes. NO, the mem-cached thing won't give you 12ms execution time for 15PB across 100K users if your invoice is in one node and the line items are scattered across others: it is a physical impossibility to provide such performance with such spread, let's not even go there. Coding this costs money and very few "cheap" duhvelopers can do it.
  • being forever condemned to hiring a small army of duhvelopers to do even the most basic of additional extra-application queries to existing data. Because there is no such thing as a general purpose query language to access non-relational dbs, EVERYTHING must be done from scratch. Cheap? Not in your lifetime!

> Or it might not be. I think that's the more concerning implication - if
> you don't need as many smart, hard-to-hire, expensive DBAs by just
> throwing some extra hardware at the problem and a little extra app dev
> time, then that could have a real impact on the nature of being a DBA in
> the future.

And it will have a HUGE impact in being able to extract from that data anything other than what is locked away in the code of a single application. And that is the death of that whole theory and it, once again, has NOTHING to do with DBAs!

This whole subject was discussed ad-nauseum when databases started to get used.   OF COURSE they slow you down. OF COURSE they make FURTHER access and manipulation of data incomparably cheaper than having to write/re-write/re-factor/re-whatever for every single instance of additional processing!

Please! This is data management 101, I'm surprised that in this day and age I'm having to discuss it when it was done and settled 35 years ago!

> Well, you're making my point. At the time, people *thought* various of
> these technologies were only applicable to a vertical market - i.e.,
> Linux is only for "cheap web companies", or Grid Computing is only for
> "universities and banks", or DCA is only good for "really big
> companies". Now, they're mainstream, and they're used when appropriate.

Er.....
No, they're not mainstream. But I agree: they are used whenever appropriate.

> Next time I'm in Sydney, I'll buy you a beer and we can argue about this
> in person. :)

I'm looking forward to it, if you allow me to reciprocate! ;)

-- 
Cheers
Nuno Souto
dbvision_at_iinet.net.au
--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jul 13 2009 - 06:42:08 CDT

Original text of this message