Re: Sensible and NonsenSQL Aspects of the NoSQL Hoopla
Date: Mon, 2 Sep 2013 23:31:04 +0100
Message-ID: <slrnl2a4d8.c0q.eric_at_teckel.deptj.eu>
On 2013-09-02, karl.scheurer_at_o2online.de <karl.scheurer_at_o2online.de> wrote:
> Am Sonntag, 1. September 2013 18:53:22 UTC+2 schrieb James K. Lowden:
>> On Sun, 1 Sep 2013 03:47:35 -0700 (PDT)
>> karl.scheurer_at_o2online.de wrote:
>>> He addresses the following problems >>> 1.2.1. Ordering Dependence. >>> "Let us consider those existing systems which either require or >>> permit data elements to be stored in at least one total ordering >>> which is closely associated with the hardware-determined ordering of >>> addresses. "
>> If you remove "hardware-determined" from the sentence, it's exactly as
>> true now as then.
> if you remove hardware-determined from the sentence, then it's no difference > between pointers and foreign keys.
Impossible. A pointer is a pointer (address, offset, ordinal position), a foreign key is a value which matches the value of the primary key of another (or the same) relation.
>>> 1.2.2. Indexing Dependence. >>> "...destroy indices from time to time will probably be necessary. The >>> question then arises: Can application programs and terminal >>> activities remain invariant as indices come and go?..." >>> >>> In the seventies "bigdata" had to be stored on sequential data >>> storages (tapes, cards). Querying data from sequential media cannot >>> use indices ("indices go").
>>
>> Hmm, no, I'm pretty sure VSAM and IMS were available in the 70s.
>> Cullinet was selling IDMS.
>>
> Maybe! In the 70s storage was very expensive and indices are additional > costs. Using a model without the need for indices sounds attractive
Indices are not part of the model, they are a possible part of the implementation.
>>> 1.2.3. Access Path Dependence. >>> " >>> One solution to this is to adopt the policy that once a >>> user access path is defined it will not be made obsolete until >>> all application programs using that path have become >>> obsolete. Such a policy is not practical, because the number >>> of access paths in the total model for the community of >>> users of a data bank would eventually become excessively >>> large." >>> >>> That statement is based on the hardware of the seventies.
>>
>> On the web I believe it's called "404".
>>
> No! with cheap storage it can be reasonable to keep all user access path, > at least the policy has to be reconsidered.
It's not just about storage. You have to maintain every path whenever date is added, deleted, or updated. This in a situation where there is no upper bound on the number of paths.
>> > For reasons not comprehensible any more (Codd's reference is
>> > out of print and not online available), he restricted his model
>>
>> No mystery. Books in a library are hardly lost texts of Babylon. And
>> he states his motivation plainly: "the possibility of eliminating
>> nonsimple domains appears worth investigating!"
>>
> Without any objective reasons this seems to be a personal opinion to > me. I don't object for all cases, but using a unchecked rule without > considering potential alternatives is not a good thing.
If you were meaning footnote 4 after "worth investigating", it refers to a person. If you then look at the end of the paper, that person is thanked for "helpful discussions", there is no reference. If not, what reference did you mean? Footnote 4 does nothing for the argument either way. Codd investigates what is worth investigating in this paper, it has been thoroughly discussed since, and there were no remaining arguments (other than how far can you go with normalization) until the arrival of object proponents who didn't understand the relational model properly.
>> The model is not "restricted". It is *simplified*, a feature, not a
>> bug. By showing -- more, *proving* -- that logical inferences could be
>> drawn from data manipulated with a small number of operators closed
>> over a domain, Codd released programmers from low-level complexity and
>> man-centuries of work.
>>
> Codd shifted the complexity from one area to another. When dealing with > n entities is replaced with n*x relations is a considerable increase in > complexity. One reference told me that a SAP R3 system contains more than > 100000 tables (one hundred thousand!). What about complexity?
SAP does what it says on the tin. It says a lot of things on the tin. However there is no particular reason to hold SAP up as a good example of relational design.
>> If you're programming a computer, graphs are a your natural ally
>> because they can be mapped directly onto the computer's memory. They're
>> of no use, though, when you want to manage data logically. How, for
>> example, do you define a subset of a cyclic graph?
>>
> Depends on the subset definition. In our field of work we use a ordered > list representation of graphs. A subset is a range of items satisfying > same criterias.
Confusing implementation and concept again.
>> You're right to say that graphs are more complex than relations. It's a
>> mistake, though, to conclude therefore that they are more powerful.
>> It's been proved mathematically that graphs and relations are
>> interchangeable in the sense that they can represent the same
>> information. The difference is that relational theory is much
>> simpler. That's its advantage, not a handicap.
>>
> It really a handicap. Try to express grouping and grouped aggregates > in a frame with only "unordered sets". SQL deviates from the relational > theory with implementing groups and grouped aggegats.
Did this one in another post.
>> OK, but it's not.
>> Consider the UNIX filesystem, for instance, which you refered to
>> earlier. Upon a time, when my mother wrote disk access routines for
>> Univac, the programmer had to know all the particulars of the device,
>> and read/write data in terms of the device's design. Unix
>> revolutionized the field by abstracting all disk access into today's
>> familar stream of bytes. No addresses, no heads or sectors or tracks.
>> A catalog to facilitate sharing that anyone (potentially) can update,
>> not just the system programmers. Works pretty good for nonrotating
>> media, too, and over the network. And not an object in sight.
>>
> That depends on your object definition. For me a object is a data type > with addition internal procedures, identity and means to communicate > (signals, messages...) and can be implemented in various manors. Unix > files fit in this description.
No. A file doesn't behave like an object. You might see object parallels in the implementation, but that's different, it's not visible to the user or application.
>> ... I have long thought that
>> stored procedures are to databases what methods are to objects, and
>> subscribe to the idea that applications should access the data only
>> through views and procedures.
>>
> Ok!
Not altogether (but that would be a separate discussion).
>> Part of your critique is actually of DBMSs that we have, not of RM.
>> SQL DBMSs largely support only a few primitive types that the user may
>> then further constrain or write functions for. One cannot, for
>> example, define an aggregate type as a set of columns, and use that
>> name in, say, FK declarations. Nor can we usually define types of blobs
>> and comparison functions for them (although I'm unconvinced that's a
>> good idea).
> > On the contrary! My critic is on RM and not on the real SQL databases. My > only critic on SQL is to pretend to be relational. My critic on RM can > be summarized in the statement "relations are unordered sets". Any model > based on a definition like "relations are sets with any (exploitable) > sort order" is much better.
Yet again, the RM is _not_ about ordering, it does not ban ordering you can use, merely ordering that you are forced to use. Please don't quote Codd's paper again, it is not an absolute truth or even an attempt at one, it describes a theory which, used in an appropriate manner, provides a useful and logically correct way to manipulate and present data to an application. If it needs to be extended, fine, but if you extend it by breaking it then it is no longer the RM, and it is likely to no longer be logically correct.
Eric
-- ms fnd in a lbryReceived on Tue Sep 03 2013 - 00:31:04 CEST