Re: Sensible and NonsenSQL Aspects of the NoSQL Hoopla

From: Eric <eric_at_deptj.eu>
Date: Sun, 25 Aug 2013 14:38:37 +0100
Message-ID: <slrnl1k26t.up7.eric_at_teckel.deptj.eu>


On 2013-08-22, karl.scheurer_at_o2online.de <karl.scheurer_at_o2online.de> wrote:

> Am Mittwoch, 21. August 2013 19:17:09 UTC+2 schrieb Eric:

>> On 2013-08-21, karl.scheurer_at_o2online.de <karl.scheurer_at_o2online.de> wrote:
>>
>>> - With database cursors and indizes there are only "Not only SQL" systems.
>>
>> That's just confusing.
>>
> Why? with database cursor You can traverse a table in a very old fashioned
> navigational way (first, next, last, previous). Indizes are logical data
> structures when efficient subsets are essential (conditional indizes)

I know what cursors and indexes are, but it is "Not only SQL" that is confusing. They are not part of relational theory, but they are part of SQL.
>>> - The relational modell is never "correct" implemented. Without
>>> "holographic" storage media I see no way to implement "tables" where
>>> "row- and column order are insignificant". Defining relations as a
>>> "interface" views of storage objects leads to practical, but "pseudo"
>>> relational systems.
>>
>> You do not appear to be aware of the difference between logical and
>> physical models.
>>

> As a programmer I don't see the advantage of a "partial" logical view to
> real life structures. A "mathematical" relation can be implemented as a
> incomplete view on a more complex structure. A "computational" relation
> has relation and tuple identities and a life before and after filled with
> data. As long as I see no alternative working implementation I see no use
> for such a limited "logical model" 

The logical model is the set of relations and constraints that you define to represent as much of a set of real life structures as you wish to store and manipulate. The physical model is how your database software organises the data for efficient storage and retrieval. As a programmer, you deal with the logical model, it is the tables you write SQL to manipulate. As a programmer, you don't care about the physical model unless you think it needs to be modified for performance reasons. The fact that present database products have a physical model which can be, and therefore often is, an exact representation of the logical model leads too many people to make unfortunate modifications to their logical model when they should be thinking only about the physical model and how it can map to the (unchanged) logical model.

It seems to me that you may be talking about something else here.

>> As far as I can see NoSQL systems don't have "system catalogs" because
>> the data in them is insufficiently organised to allow one to be
>> designed. However, whether or not this is a real problem for them has
>> got nothing to do with any other argument about their merits relative
>> to SQL or the relational model or anything else.
>>

> Wrong! 
> It's no problem to create Your own "system catalogs". We have done this for
> a application when we integrate our regulation database into a workflow
> system. The real problem is the missing standard.

I believe you, For any specific case, yes, but it is the general case that is the problem, which is why is is at best difficult to create a standard.

>>> The "SQL vs noSQL debate" is fruitless, unless Codd's theory will
>>> finally
>>> be consolidated. Looking into Codd's paper of 1970 You will find:
>>>
>>> - Normalization and 1NF is a "workaround" to avoid "too complicated
>>> datastructures". A sound idea 1970 with Fortran, Assembler and
>>> PreAnsiC
>>> it was never revised
>>
>> Codd does not use the word "workaround". He says that "... more
>> complicated data structure is necessary for a relation with one or more
>> nonsimple domains" and so "the possibility of eliminating nonsimple
>> domains appears worth investigating!" and that "there is, in fact, a very
>> simple elimination procedure, which we shall call normalization." Not
>> a workaround, and nothing to do with how old-fashioned the languages
>> being used are.
>>

> Maybe I gave Codd to much credit for his reasoning. Based on my
> experiences with these languages it seems to be reasonable to shift
> complexity in a easier manageable area (tuple fractioning). Since
> "more complicated data structures" became part of standard libraries,
> I see no reason to cope with exploding numbers of tables when simpled
> solutions exists.

I don't think it has anything to do with the languages or libraries, but merely with the attempt to define easily usable and easily manageable data structures. Having a normalised relational model of your data means that every element of data is easily accessible both individually and in context, easily updated (subject to constraints), easily compared both with its peers and with suitable other elements, easily filtered and easily aggregated or summarised (and even easily ignored).

If you have a type of complex structure for which you do not need to deal with its internal complexity until after you have identified and retrieved it (or when you are constructing it prior to storage), then you can treat it as a "simple domain" in relational terms. Incidentally, this makes it an element of a tuple, it will not be a table, or a row of a table (though the physical representation in the back-end may look like anything at all).

>>> - Defining relations and tuples only via it's contents limits the
>>> "physical data independence" to "record oriented storages". "Column
>>> oriented storages" needs additional data for mapping (rowid)
>>
>> Once again, you do not appear to be aware of the difference between
>> logical and physical models. A relational implementation could quite easily
>> use column-wise storage instead of or as well as any other storage
>> structure without imposing any limitations on its faithfulness to the
>> relational model. It's just an implementation layer, not part of the
>> relational model!
>>
> Wrong!
> - Try column-wise storage on a relation with composite primary key

For efficiency reasons you might need key-wise storage (in effect a key index) as well as or instead of pure column-wise storage, but that is about designing the physical storage layer.

> - Try export a Access database table to a Excel worksheet and reimport > the data and compare the tables.

What data property does this not preserve? But it doesn't really matter, they are different tools for different purposes and it is no surprise that a double translation would lose something unless the mechanisms had been specifically designed not to.

>>
>> Relations and tuples do not have an object nature, they are not objects.
>> Tables do not have inheritance, they are relations which are not
>> objects. Missing data and identity are not problems anyway.

>
> "It's not a bug, it's a feature". Tables can have inheritance (at least
> in Postgres).

"Feature under construction" judging by the documentation. It is a new way to deal with sub-types (in the Entity-Relationship modelling sense) and might turn out to be a good idea (or not) but I am sure that people will find ways to misuse it because it says "inheritance" and they will assume that it is the same as object inheritance.

> "Missing data and identity are not problems anyway" is a very bold
> statement without limit it to Your field of work. We develop applications
> for inspection and maintainance of powerplants. Missing data is a problem
> for us without a system created identity.

Missing data is of two kinds:
- It is not currently applicable to the real-world thing we are dealing   with (this kind is never a problem)
- Is is applicable and intended to be in the database, but for some   reason we currently do not have it for this particular real-world   thing. In this case we have to either disregard (and discard?) the   real-world thing, or we have to provide for storing the fact that   it is missing (and/or the reason for this). Problems arise here only   because database designers do not pay attention to the alternatives.

A real-world thing is either identifiable in the real world, or it isn't. In the latter case we either don't store anything about it, or we consider it as one of a counted collection of a type that is identifiable. Another possibility is that we assign it a system-created identifier and mark the real-world thing with the identifier (which may be impossible and/or undesirable).

If is is identifiable we have to have the complete set of identifying features in the database. This is a natural key, but it may be awkward to use, so we might need to use a system-created identifier which is mapped, in one relation, to the real-world identifying features. Again it might sometimes be appropriate to mark the real-world thing with the identifier.

>> The whole object thing is a methodology which can be very helpful to
>> programmers, if done correctly, but it is harder than most programmers
>> think to do correctly. It has nothing whatever to do with "Large Shared
>> Data Banks". The key word in that phrase is "Shared", a fact that most
>> proponents of "objects" do not seem to get at all.
>>
> You are partially right. Storing application objects in "object > repositories" is indeed ridiculous.

Thankyou.

> Separating applications and databases is no argument against a object > view to databases.

Agreed, though many of the attempts at the object view are (deliberately or not) still trying to be object repositories.

> Though no Microsoft fan I believe the ADO framework is a step in the > right direction. ADO's "Datasets" are "relation objects".

I have insufficient knowledge to judge this, but it at least sounds plausible.

Eric

-- 
ms fnd in a lbry
Received on Sun Aug 25 2013 - 15:38:37 CEST

Original text of this message