Re: Sensible and NonsenSQL Aspects of the NoSQL Hoopla

From: <karl.scheurer_at_o2online.de>
Date: Sun, 1 Sep 2013 03:47:35 -0700 (PDT)
Message-ID: <c4c2dff5-285b-43f0-9492-382cbc04e541_at_googlegroups.com>


Am Samstag, 31. August 2013 19:59:52 UTC+2 schrieb James K. Lowden:
> On Sat, 31 Aug 2013 08:22:41 -0700 (PDT)
>
> karl.scheurer_at_o2online.de wrote:
>
>
>
> > Codd's model emerged out of the technology of the seventies and needs
>
> > urgently a revision.
>

>
>
> That's a novel observation. I'm sure others would be interested to
>
> know of any aspect of the relational model rooted in 1970s technology.
>
>
>
> Do tell.

My observation is based on Codd's paper of 1970 "A Relational Model of Data for Large Shared Data Banks"

He addresses the following problems
1.2.1. Ordering Dependence.
"Let us consider those existing systems which either require or permit data elements to be stored in at least one total ordering which is closely associated with the hardware-determined ordering of addresses. "
Without bypassing all operating systems this is impossible nowadays. Before UNIX and other operating systems it was common practice to file layout directly in storages.

1.2.2. Indexing Dependence.
"...destroy indices from time to time will probably be necessary. The question then arises: Can application programs and terminal activities remain invariant as indices come and go?..."

In the seventies "bigdata" had to be stored on sequential data storages (tapes, cards). Querying data from sequential media cannot use indices ("indices go").

1.2.3. Access Path Dependence.
"
One solution to this is to adopt the policy that once a user access path is defined it will not be made obsolete until all application programs using that path have become obsolete. Such a policy is not practical, because the number of access paths in the total model for the community of users of a data bank would eventually become excessively large."

That statement is based on the hardware of the seventies.

First normal form and normalization
"
So far, we have discussed examples of relations which are defined on simple domains-domains whose elements are atomic (nondecomposable) values. Nonatomic values can be discussed within the relational framework. Thus, some domains may have relations as elements. These relations may, in turn, be defined on nonsimple domains, and so on. "
It is clear, Codd started 1970 with a design like the "document storages" in NOSQL or the N1F systems of the past.

For reasons not comprehensible any more (Codd's reference is out of print and not online available), he restricted his model

"1.4. NORMAL FORM
A relation whose domains are all simple can be represented in storage by a two-dimensional column-homogeneous array of the kind discussed above. Some more complicated data structure is necessary for a relation with one or more nonsimple domains. For this reason (and others to be cited below) the possibility of eliminating nonsimple domains appears worth investigating! There is, in fact, a very simple elimination procedure, which we shall call normalization.
"
Reading more than enough horror stories about program failures based on "index out of bound" it was reasonable to keep the design simple and avoid complex dynamic data structures. Meanwhile are complex dynamic data structures (trees, graphs, lists... ) part of standard liraries for common mainstream programing languages.

Last but not least
"Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation)"
For me as a programmer this sounds like a textbook example for object design. If objects are the "best" way to implement
relations, then do it the relational way is like driving a car with 5 gears and only using 3 gears.

m.f.G.
Karl Scheurer Received on Sun Sep 01 2013 - 12:47:35 CEST

Original text of this message