Re: Databases as objects

From: <kvnkrkptrck_at_gmail.com>
Date: 28 Dec 2006 11:01:24 -0800
Message-ID: <1167332484.798373.32330_at_n51g2000cwc.googlegroups.com>


Thomas Gagne wrote:
> Gene Wirchenko wrote:
> > Thomas Gagne <tgagne_at_wide-open-west.com> wrote:
> >
> > [snip]
> >
> >
> >> At some point, all that expression and composition has an ingress and
> >> egress, and what lies between them is a reusable, repeatable code. No
> >> one really wants to write all that SQL by hand every time so it's put
> >> inside a macro. To increase the macro's value parameters are applied to
> >> make it a function. Now we're cooking with fire.
> >>
> >
> > I think you are going in the wrong direction.
> >
> > All what SQL? The SQL in my app is a very small part of it. The
> > presentation code is much, much bigger. It is the presentation code
> > that I would like to cut down.
> >
> In my commercial finance system, 37% of the source code is SQL. 43% is
> presentation (PHP). When you say your SQL is a small amount, what does
> that mean? Are able to count it?
>

Thomas,

Perhaps the following anecdote will shed some light on the vantage point from which many of your responses are coming. Two years ago I joined a team which was very liberal with SQL-wrapping - using small procedures to "name" simple queries, updates, inserts, etc.; "enhancing" the database with a procedural interface. The team developed and maintained larger processes that "reused" these simple procedures to the greatest extent possible, using embedded one-time SQL queries as a last resort (and in many cases, still wrapping these niche SQL queries into reusable procedures).

One of these large processes, a weekly "PRO" process, was having some performance troubles. It was running from Friday night into Sunday, with a few runtimes approaching 48 hours. There would have been real problems if it were not to complete by 6:00 a.m. on a Monday morning, so it needed to be looked into. The process itself was a divide-and-conquer program, splitting the workload into ranges and spawning 140 child processes to work on each range (140 was based on dozens of trial-error optimization runs done over the years - yes, we have a workhorse of a production system). The PRO process had 6 input parameters which could be used to limit the workload by time frames, financial status, etc. Each was added to speed things up at the relatively minor risk of mishandling the "non-critical" or "old" data. The weekly settings limitted the workload to approximately 85% of the data from the prior year of activity, in a database that held 7 years of activity - literally, trade-offs had been made for performance over correctness. The program had a changelog that amounted to 23 modifications over the prior 3 years - a quick skim-through makes it appear that 8 were for tuning efforts, 10 for bug fixes, 3 to address schema changes, and a couple of misellaneous changes. Here are the stats from the code as it was:

  • Module Analysis 83 Declarations 577 Statements 20 DML statements 1396 Code lines 1490 Blank or comment lines

The whole team was familiar with and despised the weekly PRO process. After I'd worked on the team a few months, my supervisor decided it was "my turn" to tune it - kind of an "induction" thing.

I spent three weeks mapping out every twist and turn of the whole process, and ultimately translated it into a single 300-line SQL "MERGE" statement (a DML combo of INSERT and UPDATE). This MERGE statement ran (and still runs today) in under 45 minutes, boosted with a "sip" of degree-8 concurrency. It reproduced the results of the old process with only 6 differences (each of which was an undetected bug in the old code). The only other difference was that it processed all data in the database, not just one years worth; so in 45 minutes it was actually doing what would have required about 300 hours for the old process to run.

Here are the stats of my replacement module:

  • Module Analysis 0 Declarations 1 Statements 1 DML statements 318 Code lines 35 Blank or comment lines

I put it in place over 2 years ago (amusingly, I had to fight quite a battle to get it in place: nobody "trusted" it because it was so utterly different than the "easy to read" procedural code).

To date, only one modification has occured, and this was to accomodate an additional field added to one of the underlying tables.

The new code is 0% reusable - you cannot take any part of it and use it for anything other than the PRO process. In contrast, though the old PRO process is no longer used, some of the old PRO code had to be left in place because it contained about a dozen procedures (with signatures resembling: "validate_project_has_lines (project_name IN, date_range_start IN, date_range_end IN, proj_has_lines_flag OUT, error_flag OUT, error_message OUT)") that *had* been reused by other processes.

The two solutions exemplify the difference made by viewing an application's requirements in a process-oriented light instead of a data-oriented light. If you have any sense that the two approaches are comparable and ought to be evaluated against one another in terms of pros-cons (e.g. the pure SQL approach runs faster but is in some sense inferior due to lack of code reuse), then I can see how you'd interpret as "rude" the ease and cavalierness with which certain cdt posters dismiss ignorant proposals to treat databases as objects. Received on Thu Dec 28 2006 - 20:01:24 CET

Original text of this message