Re: Fails Relational, Fails Third Normal Form

From: James K. Lowden <jklowden_at_speakeasy.net>
Date: Thu, 12 Feb 2015 01:23:41 -0500
Message-Id: <20150212012341.d8329e51.jklowden_at_speakeasy.net>

On Tue, 10 Feb 2015 23:06:25 -0800 (PST) Derek Asirvadem <derek.asirvadem_at_gmail.com> wrote:

> > From the academic point of view -- indeed from the point of view of
> > the DBMS, as you know -- no column has meaning.
>
> Totally disagree. When you say "DBMS", you may be meaning
> "theoretical DBMS", in which case, I don't agree or disagree, you are
> welcome to entertain it as a theoretical concept, if such is valid.

Actually, I'm sure you agree. By "DBMS" I mean "database management system". It's a machine, and it has no concept of meaning. It provides us with the illusion of semantic engagement by representing its tuples with names with which we associate meaning. To the machine, each column simply has a type and some defined relationship to other columns. It enforces those relationships, thereby consistency, thus supporting verifiable logical manipulation.

> I am not saying the theoreticians in the RDB space are stupid because
> they have assumptions and can't proceed, etc.
> - I am saying the theoreticians in the RDB space are stupid because
> they are using a hammer for a task that calls for an axe.
> ==AND== they will not observe the evidence that the hammer is not
> working, that it is not suited to the job.
> ==AND== the are ignorant (or in denial) that axes exist.

Because they insist on an using an algorithm and FDs to determine keys?

> We want to prevent "Warshinton", when "Washington" is in the database.

I see. I never worked on an application where that was warranted.

Re soundex, I guess I reached for it half a dozen times over the years, ever hopeful, always disappointed.

> But it concerns me, that you use the term FD interchangeably, whilst
> knowing full-well that the real FD and the theoretical one are quite
> different, the latter being only a fragment of the former.

I am using the term conventionally; I have no tricks up my sleeve. What part of the "real" definition does the "theoretical" one lack?

> a. that it cannot be used [to] determine Keys when they are jumping
> out at you, as per this task), or for the normal determination of
> Keys during the exercise of data modelling, so there is no point is
> using it in those scenarios.

OK.

> b. that for the given purpose, another method exists

Namely, intuition. I'm not being pejorative: The "jumping out" is the practice of associating meanings with (column) names and deciding what identifies what.

> c. that they are using it anyway, which is silly, given [a][b]
> d. they still haven't produced anything by that method
> e. but that it remains a valid method for determining keys when a
> human is not present to perform the analysis of data. (eg. I am not
> saying it has no use; the use has not been defined to me; I leave
> that open)

I think you mean that you've never seen a good tool for describing a database that uses FDs as its vocabulary. Neither have I, but I suggest to you that ERwin is basically that in disguise, more on which in a moment.

For an undisguised version, consider Nicola's exercise:

Quoth Nicola on Thu, 05 Feb 2015:
> A few years ago I implemented a few algorithms from Koehler's PhD
> thesis in a Ruby script. Given a set of FDs, the script finds all the
> keys and all the minimal covers.... Then, I had a graduating student
> re-implement it in Java and adding a dependency-preserving
> decomposition in BCNF (when it exists) or in 3NF to the output.

My first reaction is a little unkind. I think this is what lawyers call "assuming facts not in evidence". *Given* a set of FDs, the program generated a 3NF database design. Hurray! Now, where to find those pesky FDs for input?

It reminds me of a joke my Economics professor told the class about three professors stuck on a desert island devising a way to open a can of beans. The Economist's solution was simple, "First, assume a can opener".

On second thought, though, it's cause for optimism. If a Ruby script and a grad student can generate a correct design, then it is a tractable problem. What remains is a convenient syntax for capturing the "pesky FDs", something that is the purview of academia.

> Do get a trial copy of ERwin, and look into it.

The last time I used ERwin in a serious way was in the late 90's. It's what printed out the size E paper charts. We used the "team" version that kept the diagrams in a database, and relied heavily on the version management and reconciliation function. I also reverse engineered their diagram database and posted that on the wall, to help us fix anomalies that crept in from time to time. I remember the "role" FK dialog could rename a column (or associate two names, if you prefer).

I wrote macros to generate triggers to enforce relationships that couldn't be declared, such as sub/supertype relationships that required both parts of a two-part entity, where the first part was fixed and the second part was in one of N mutually exclusive tables. (For example, every security has a name and type, but bonds have coupons, equities have common/preferred, and options have an underlying). Note that both parts *must* exist: no security is described only by its name and type, and every security (let us say) does have a name and type. ISTR you said such relationships don't exist, but I think you must have meant you never came across one.

ERwin is a good tool, the best I ever saw. (We also used the CAST workbench for stored procedure development, quite a boon.) I have sometimes wished for a better version of their macro language independent of the tool. It would be nice to define a relationship-rule in a symbolic library, and be able to apply it to a given set of tables.

> > Are you prepared to say that's the last and best way? I'm not.
>
> I am. And it is worse than that. I am saying it is the only method
> for implementers, for practical people, for humans.

Everything that can be invented has been invented?

(Cf.
http://patentlyo.com/patent/2011/01/tracing-the-quote-everything-that-can-be-invented-has-been-invented.html, seriously)

At least on this point we're clear. You think ERwin is the best (kind of) tool that can exist for the purpose. I can imagine better, but doubt the market will demand or produce it.

Specifically, the IDEF1X diagram you construct is a tautology. You say, here is a key, there is a foreign key, this is unique, that has such-and-such domain. And, great, you can generate the SQL for the DBMS. You are doing the designing, designating the FDs by way of those keys, and reasoning about all the dependencies and potential anomalies. It's right because you say it's right. The tool doesn't know beans about normal forms. It can't *check* anything. All it can do is convert *your* design, expressed as a diagram, into SQL. Sure, the SQL will be right; that's about as automatic and brain-dead a thing as can be imagined. Is it 3NF? In 2015, that's on you.

I spent many, many hours re-arranging my boxes and routing my connections, and tediously moving names up and down to reflect what's the key and what's not (and what order I wanted the physical columns in). I worked my way through a few knotty 4-column keys and used such diagrams to explain contradictory nebulous understandings, wherein different departments had different ideas of what, say, a product is.

I am not at all convinced that's the best way.

I don't deal in 200-table databases anymore. The databases I deal with nowadays have fewer tables and users, and lots more rows. I write the SQL directly and rely on DRI for almost everything. When I want a diagram for a database, I go the other way, and generate it with pic (cf. groff) from the SQL, plus some ancillary information for box placement and line drawing.

I suggest to you that pic is every bit as smart about diagrams as ERwin is about databases. And both are equally smart about database design.

> Oh yeah, they are still working with text strings, 1's and 2's.

You're convinced that your way -- designing via diagram -- isn't just the best way, but the only way and the last way that ever will exist. I'm convinced that's not true. I'm sure a better language than SQL could be invented that could be "compiled to" SQL and represented diagrammatically. I don't see why it couldn't also capture meaning not in the SQL catalog, such as ownership, provenance, and definition. Rather than defining tables per se, we'd define columns and their dependencies, and let a model-generator express them as tables in a schema known to be, say, 3NF.

History is on my side. There have been countless attempts in the last 30 years to express logic graphically. We might start with Logo, say, and include any number of CASE tools. (I don't suppose you remember Software Through Pictures?) They. All. Fail. Visual Basic becomes Manual Basic when you're done drawing the dialog boxes.

Meanwhile, we're living through an explosion of languages and are making progress with previously intractable problems such as correctness. If history is any guide, better database designs will come from better languages, not better diagrams.

> Please give me an example of what SQL cannot do.

Tuple comparison,

	select ... where R.(a, b) = S(a, b)
	select ... where (a,b) in (select a, b from S)

Column comparison,
	check R.a = S.a where R.b = 'Y'

Universal quantification is similar. Needed for relational division.

Yes, you can accomplish those in SQL by writing it out more verbosely or, as with universal quantification (find students who have taken all required classes), you can employ De Morgan and use "not not exists". If you have that filed under "can do", not only do we disagree on the meaning of "can", but you will find yourself defending the use of lower-level constructs to implement concepts defined by relational algebra. That smell you smell is awk.

Table comparison,

where T = (select ... from S where ...)

SQL can't constrain views in any way, can't use views as a FK target. I would like to be able to use a union as a domain; SQL cannot.

Why does UNION (and similar) require column order to match, but not name? Why does SELECT return duplicate column names or permit unnamed columns? Why SELECT DISTINCT but UNION ALL? Why must FROM appear only between SELECT and WHERE? (Why even say "SELECT"?) What purpose does HAVING serve anymore? Why do subqueries require aliases even when unreferenced?

Just look at the butt-ugly porcine ungainliness of UPDATE. Updating one table from another has to be the most cumbersome, verbose, and redundant aspect of SQL, not that it lacks competition (except in any other language). Why can we not say instead

R(a, b, c) = S(a, b, c) WHERE R.x = S.x

SQL is a relic of another age, the last man standing after RJE, COBOL, Cullinet, PL/1 and all the rest have disappeared. To the economics of those days we owe the fact that we use SQL and not Ingres's QUEL, a much better language. The strange, ignorant time we currently live in promises very little progress, if any, because users of databases doen't realize how much is being lost, never mind forgone.

If your assertion is that SQL can, after a fashion, express any relational algebra function, I concede the point. When people say it's "not relational", that's not what they mean. They mean it does not express relational concepts directly or particularly well, and sometimes -- bags, column order -- ignores it entirely.

SQL is indefensible. Your serve. ;-)

--jkl Received on Thu Feb 12 2015 - 07:23:41 CET

This message: [ Message body ]
Next message: Derek Asirvadem: "Progress Please"
Previous message: Nicola: "Re: "Hard" Key Determination Method is Easy. DNF Paper is Done.."
Maybe in reply to: Derek Asirvadem: "Fails Relational, Fails Third Normal Form"
In reply to Derek Asirvadem: "Re: Fails Relational, Fails Third Normal Form"
Next in thread: Derek Asirvadem: "Re: Fails Relational, Fails Third Normal Form"
Reply: Nicola: "Re: Fails Relational, Fails Third Normal Form"
Reply: Derek Asirvadem: "Re: Fails Relational, Fails Third Normal Form"
Reply: Derek Asirvadem: "SQL Bashing - the Sport of Champions!"
Reply: Derek Asirvadem: "Re: SQL Bashing - the Sport of Champions!"
Reply: Derek Asirvadem: "View and UPDATE"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message