Re: Tarski school influence on Database Theory

From: Eric <eric_at_deptj.eu>
Date: Fri, 2 Oct 2015 21:52:58 +0200
Message-ID: <slrnn0to4q.3u6.eric_at_bruno.deptj.eu>


On 2015-09-29, vldm10 <vldm10_at_yahoo.com> wrote:

> Dana utorak, 29. rujna 2015. u 19:40:04 UTC+2, korisnik Eric napisao je:

>> On 2015-09-28, vldm10 <vldm10_at_yahoo.com> wrote:
>>> Dana ponedjeljak, 28. rujna 2015. u 09:40:04 UTC+2, korisnik Eric napisao je:
>>>> On 2015-09-25, vldm10 <vldm10_at_yahoo.com> wrote:
>>>>> On Monday, July 20, 2015 at 16:09:59 PM UTC-7, compdb <compdb_at_hotmail.com> wrote:
>>>>>> Besides inventing relational algebra, Codd also initiated and championed
>>>>>> query safety, integrity, normal forms and other issues ...

>> 8>< --------
>>>>> Integrity and normal forms. Regarding the normal forms, I must say that >>>>> Codd did not invent the "First normal form." ...
>> 8>< --------
>>>>> ... records that have a fixed length (that is, they were working with the >>>>> first normal form) ...
>> 8>< --------
>>>>> So the idea of "First normal form" was performed and analyzed in detail >>>>> before Codd. All the advantages and disadvantages of "First Normal Form" >>>>> were well analyzed in very complex cases. Note that variable length of >>>>> records and entities, we can not apply to relations. >>>>> >>>>> It is not true that Codd invented the "First normal form". Codd added >>>>> "First normal form" to relational model, and he gave the name: "The >>>>> first normal form" >>>> >>>> Fixed length records can not possibly be the same as first normal form >>>> since records are about files and first normal form is about relations. >>>> However, I can not see at all how they are even in any way similar to >>>> first normal form. So what on earth are you talking about? >>> >>> Have you ever worked with programming languages? If so, have you worked >>> with complex data structures by using complex files?

>>
>> Yes. And yes. I stand by my first two sentences. So would you please
>> answer my question.
>>
>> Maybe I could amplify the question. What definitions of "first normal
>> form" and "fixed length records" are you using? I ask for the first
>> because the concept seems to be widely misunderstood, and it is as well
>> to be sure that we are talking about exactly the same thing. I ask for
>> the second because, other than the obvious "all the records always have

>> the same total length", there is no universal definition of the concept,
>> and many different ways of using something that conforms to the above
>> obvious definition.
>
> I think you are not well enough, understand this post. I did not write that 
> the file model is in some way similar to relational model.

So, having read the rest of what you say in this post, I now realise that what we have is a terminology problem.

Long ago and far away, when I first started to work with computers, "fixed length records" meant that every record in a file was N characters long, and was divided into M fields, each of which had a starting position and a length and a purpose. This is what I understood you to mean, and of course it provides no obvious way to deal with the multiple telephone number problem. What was done commonly was to say "there will never be more than 5 telephone numbers, so we provide 5 fields and an agreed non-numeric content for those not in use in any given record". The other common answer was to allow the file to contain different types of record, with the first field containing a record type, and with the "fixed" record length being the maximum of all the types.

Thus you would have a type 1 record with the name etc., and type 2 records for phone numbers. After a type 1 record had been read, any type 2 records read before the next type 1 record "belonged" to the preceding type 1 record. Of course there might be more that two record types. This approach has a danger in that if a type 1 record proves to be corrupt an is discarded, its type 2 records would be seen as "belonging" to the previous type 1 record.

To deal with this, I have seen a specification (for files to be sent to a central authority) change to include a unique identifier field in each type 1 record (generated by the creator of the file and unspecified except by its uniqueness within a file). The value of this field was then specified for a particular field in the type 2 records, linking them together with what would, in another context, be a key field. The particular change I refer to was made only a few years ago, so I'm not sure sure how clever of them it was to think of it.

A step beyond this could be to keep the record types, and the "key" fields, but to put each record type in its own file. I now think that this is what you were talking about, and it certainly suggests, here and now, a parallel with Codd's "normalization".

Staying in my context, a variable length record has M fields some or all of which are variable length, so the whole record is variable length. Each field has an ordinal (position) and a purpose. If sub-fields are supported, they can be used to deal with the multiple telephone number issue. If not, a field could be introduced whose value is the number of immediately following fields which are telephone numbers. It's a bit awkward to talk about the fields after that, and the processing will be messy too. However, all the methods available for fixed length records are also available for variable length records, including the "normalization" one!. So there's no real difference here.

Sorry for the length of the above, but it is why I had no idea what you meant. Now that I do, I still do not actually agree. Codd was necessarily influenced by all sorts of things, but assuming that all he did was name something is not logical - it seems obvious after you have seen both things even though they are not totally identical, but it may not have seemed that way from the other end, and in any case independent discovery is pretty common.

BTW, using https://en.wikipedia.org/wiki/First_normal_form as a reference is not really a good idea, it is criticised far too widely.

Nor is using William Kent, though in any case he is trying to explain, not define. He says "Under first normal form, all occurrences of a record type must contain the same number of fields." which demonstrates one of the basic misunderstandings of the RM, because the RM has neither records nor fields, and to think of it in those terms is to open the way to many more misunderstandings. There is a useful analogy perhaps, but they are NOT the same thing.

Eric

-- 
ms fnd in a lbry
Received on Fri Oct 02 2015 - 21:52:58 CEST

Original text of this message