Re: Sensible and NonsenSQL Aspects of the NoSQL Hoopla
Date: Sun, 25 Aug 2013 14:38:37 +0100
Message-ID: <slrnl1k26t.up7.eric_at_teckel.deptj.eu>
On 2013-08-22, karl.scheurer_at_o2online.de <karl.scheurer_at_o2online.de> wrote:
> Am Mittwoch, 21. August 2013 19:17:09 UTC+2 schrieb Eric:
>> On 2013-08-21, karl.scheurer_at_o2online.de <karl.scheurer_at_o2online.de> wrote:
>>
>>> - With database cursors and indizes there are only "Not only SQL" systems.
>>
>> That's just confusing.
>>
> Why? with database cursor You can traverse a table in a very old fashioned > navigational way (first, next, last, previous). Indizes are logical data > structures when efficient subsets are essential (conditional indizes)
I know what cursors and indexes are, but it is "Not only SQL" that is
confusing. They are not part of relational theory, but they are part of
SQL.
>>> - The relational modell is never "correct" implemented. Without
>>> "holographic" storage media I see no way to implement "tables" where
>>> "row- and column order are insignificant". Defining relations as a
>>> "interface" views of storage objects leads to practical, but "pseudo"
>>> relational systems.
>>
>> You do not appear to be aware of the difference between logical and
>> physical models.
>>
> As a programmer I don't see the advantage of a "partial" logical view to > real life structures. A "mathematical" relation can be implemented as a > incomplete view on a more complex structure. A "computational" relation > has relation and tuple identities and a life before and after filled with > data. As long as I see no alternative working implementation I see no use > for such a limited "logical model"
The logical model is the set of relations and constraints that you
define to represent as much of a set of real life structures as you wish
to store and manipulate. The physical model is how your database
software organises the data for efficient storage and retrieval. As a
programmer, you deal with the logical model, it is the tables you write
SQL to manipulate. As a programmer, you don't care about the physical
model unless you think it needs to be modified for performance reasons.
The fact that present database products have a physical model which can
be, and therefore often is, an exact representation of the logical model
leads too many people to make unfortunate modifications to their logical
model when they should be thinking only about the physical model and how
it can map to the (unchanged) logical model.
It seems to me that you may be talking about something else here.
>> As far as I can see NoSQL systems don't have "system catalogs" because
>> the data in them is insufficiently organised to allow one to be
>> designed. However, whether or not this is a real problem for them has
>> got nothing to do with any other argument about their merits relative
>> to SQL or the relational model or anything else.
>>
> Wrong!
> It's no problem to create Your own "system catalogs". We have done this for
> a application when we integrate our regulation database into a workflow
> system. The real problem is the missing standard.
I believe you, For any specific case, yes, but it is the general case that is the problem, which is why is is at best difficult to create a standard.
>>> The "SQL vs noSQL debate" is fruitless, unless Codd's theory will
>>> finally
>>> be consolidated. Looking into Codd's paper of 1970 You will find:
>>>
>>> - Normalization and 1NF is a "workaround" to avoid "too complicated
>>> datastructures". A sound idea 1970 with Fortran, Assembler and
>>> PreAnsiC
>>> it was never revised
>>
>> Codd does not use the word "workaround". He says that "... more
>> complicated data structure is necessary for a relation with one or more
>> nonsimple domains" and so "the possibility of eliminating nonsimple
>> domains appears worth investigating!" and that "there is, in fact, a very
>> simple elimination procedure, which we shall call normalization." Not
>> a workaround, and nothing to do with how old-fashioned the languages
>> being used are.
>>
> Maybe I gave Codd to much credit for his reasoning. Based on my > experiences with these languages it seems to be reasonable to shift > complexity in a easier manageable area (tuple fractioning). Since > "more complicated data structures" became part of standard libraries, > I see no reason to cope with exploding numbers of tables when simpled > solutions exists.
I don't think it has anything to do with the languages or libraries,
but merely with the attempt to define easily usable and easily manageable
data structures. Having a normalised relational model of your data means
that every element of data is easily accessible both individually and
in context, easily updated (subject to constraints), easily compared
both with its peers and with suitable other elements, easily filtered
and easily aggregated or summarised (and even easily ignored).
>>> - Defining relations and tuples only via it's contents limits the
>>> "physical data independence" to "record oriented storages". "Column
>>> oriented storages" needs additional data for mapping (rowid)
>>
>> Once again, you do not appear to be aware of the difference between
>> logical and physical models. A relational implementation could quite easily
>> use column-wise storage instead of or as well as any other storage
>> structure without imposing any limitations on its faithfulness to the
>> relational model. It's just an implementation layer, not part of the
>> relational model!
>>
> Wrong!
> - Try column-wise storage on a relation with composite primary key
For efficiency reasons you might need key-wise storage (in effect a key
index) as well as or instead of pure column-wise storage, but that is
about designing the physical storage layer.
> - Try export a Access database table to a Excel worksheet and reimport
> the data and compare the tables.
What data property does this not preserve? But it doesn't really matter,
they are different tools for different purposes and it is no surprise
that a double translation would lose something unless the mechanisms had
been specifically designed not to.
>>
>> Relations and tuples do not have an object nature, they are not objects.
>> Tables do not have inheritance, they are relations which are not
>> objects. Missing data and identity are not problems anyway.
>
> "It's not a bug, it's a feature". Tables can have inheritance (at least
> in Postgres).
Missing data is of two kinds:
- It is not currently applicable to the real-world thing we are dealing
with (this kind is never a problem)
- Is is applicable and intended to be in the database, but for some
reason we currently do not have it for this particular real-world
thing. In this case we have to either disregard (and discard?) the
real-world thing, or we have to provide for storing the fact that
it is missing (and/or the reason for this). Problems arise here only
because database designers do not pay attention to the alternatives.
A real-world thing is either identifiable in the real world, or it isn't. In the latter case we either don't store anything about it, or we consider it as one of a counted collection of a type that is identifiable. Another possibility is that we assign it a system-created identifier and mark the real-world thing with the identifier (which may be impossible and/or undesirable).
If is is identifiable we have to have the complete set of identifying features in the database. This is a natural key, but it may be awkward to use, so we might need to use a system-created identifier which is mapped, in one relation, to the real-world identifying features. Again it might sometimes be appropriate to mark the real-world thing with the identifier.
>> The whole object thing is a methodology which can be very helpful to
>> programmers, if done correctly, but it is harder than most programmers
>> think to do correctly. It has nothing whatever to do with "Large Shared
>> Data Banks". The key word in that phrase is "Shared", a fact that most
>> proponents of "objects" do not seem to get at all.
>>
> You are partially right. Storing application objects in "object
> repositories" is indeed ridiculous.
Thankyou.
> Separating applications and databases is no argument against a object > view to databases.
Agreed, though many of the attempts at the object view are (deliberately or not) still trying to be object repositories.
> Though no Microsoft fan I believe the ADO framework is a step in the > right direction. ADO's "Datasets" are "relation objects".
I have insufficient knowledge to judge this, but it at least sounds plausible.
Eric
-- ms fnd in a lbryReceived on Sun Aug 25 2013 - 15:38:37 CEST