Re: Base Normal Form

From: Marshall Spight <marshall.spight_at_gmail.com>
Date: 14 Jul 2005 12:01:01 -0700
Message-ID: <1121366997.049321.69770_at_g43g2000cwa.googlegroups.com>


dawn wrote:
>
> While the average comp sci major can figure out relations from the way
> it is typically explained, I don't like the way we have carved out the
> dbms from the rest of the application, modeling stored, persisted,
> remembered data; validating such data; naming such data; etc decidedly
> differently than other data that will hang around for less time. The
> current language does nothing to promote a holistic approach to
> software development unless you use mountain man's approach of hauling
> everything into the dbms. I think that is the next best thing to
> hauling it all out of "there" (the typical sql-dbms). (The last
> statement was for your amusement only, mostly, sortof).

I agree that this separation is a real problem for software development.
I believe that it is a historical accident, largely driven by the fact that dbms's have been completely dominated by commercial interests,
while programming languages have had a much freer development path. The problem is, indeed, that we segment these concerns (application code and data management) but I don't believe there is actually any contest here. The goal isn't for one side to "win." Rather, each side has some really important lessons they have learned. The dbms tradition is quite impoverished when it comes to writing application code, and the application code side is really quite limited when it comes to what they can do with data management.

I believe the solution lies in a new, higher-order view that takes into account the lessons of both sides. That means we want the best features of each. We need declarative integrity constraints; we need the relational algebra, AND we need clever type systems, polymorphism, modularity, etc. We need to have these tools all available "right here" so they can be used together seamlessly. The database should not feel "far away."

(As an aside, I don't think that dynamically typed languages have a lot to offer for the enterprise. They're great for prototyping and for quite-small teams, but they don't scale up to large development processes. Just my opinion.)

> Ok, so you are, indeed, tapping into my brain (pretty scary). One of
> the dangers of declaring "stored" database functions as "relations" is
> that they seem so distant, as if we cannot access them. Set operators
> are fine and dandy, but people understand single transactions handily.
> They pay at the grocery store, get money from the bank, and type their
> name into ba-zillion web pages that require it. Start with individual
> "records" (as in "the doctors office keeps a paper copy of my ct scan
> record") and show that you have an API for it, then go to sets. How do
> we collect data? One record at a time. (I seem to be in a mood as if
> trying to entice Fabian to quote me, eh?)

I think this approach makes sense if we're trying to come up with a view of programming and data management that we want to be able to teach in high school, but I don't see why this approach is the one we should take for the professional software engineer.

And in fact, I assert that this kind of record-at-a-time approach is a disaster for scalability. It works fine for hundres of records, and maybe for thousands, but for tens of thousands, or hundreds of millions, it's a disaster. The computing landscape is distributed now, for good or ill, and we have to build systems that don't send a network packet for every record, or every keypress, or whatever.

Marshall Received on Thu Jul 14 2005 - 21:01:01 CEST

Original text of this message