Re: database design method

From: Jan Hidders <hidders_at_REMOVE.THIS.uia.ua.ac.be>
Date: 18 Nov 2002 01:03:22 +0100
Message-ID: <3dd82e4a$1_at_news.uia.ac.be>


Lauri Pietarinen wrote:
>Jan Hidders wrote:
>
>> >So is the function of these identifiers to act as an "internal glue"?
>>
>> In a sense, yes. Suppose you want to represent a nested relation R(a,B)
>> (B is the nested column) with the flat relations R1(a,c) and R2(c,b)
>> where in R1.c you use an identifier to replace the set in R.B, and in R2
>> you record what the elements of the sets. You could indeed say that the
>> identifiers in colomns c is what allows us to glue the two relations
>> together and reconstruct the nested relation R.
>
>OK. Why do they have to be abstract?

That's the wrong question. If their representation has no meaning for you, why do you want to see it? This is a logical data model we are talking about, it should model what is relevant for the user and nothing else. If you're puzzled by what I mean see how it works in formal logic and model theory, which are the roots of the relational model.

>> >What is the problem with generating such values? If they are just
>> >integer values (say) and there is a process inside the engine that gives
>> >the system the next integer? Or maybe generates a random value within a
>> >large space and just quickly checks if it is reserved?
>>
>> For surrogate identifers the DBMS must somehow remember all identifiers
>> that have ever been used, also those that are no longer used in any
>> relation. It could do this by either storing the set of all identifiers
>> that were ever used (could be big) or remembering the highest integer
>> that it has every generated, but then it cannot reuse identifiers that
>> are no longer used.
>
>Why does it have to remember all identifiers ever been used if it want's
>to reuse them anyway?

No, they cannot be reused and that is why you have to remember them. This is because otherwise a user might come back after a while and ask if entity with id 12345 is still there and get the answer "yes" but in reality this entity was removed and after that a new identity was created the reused its identifier.

>On the other hand if it does not reuse the identifiers
>what's wrong with an integer counter (say 2**64).

If you are going to use identifiers to simulated nested values you will have to create new identifiers lots of times and then you might run out.

>And still: why do the identifiers have to be globally unique? Surely it
>suffices to have them unique within each relation?

That depends on what they represent and what the columns mean they are in.

>And how do "abstract identifiers" come to the resque? Surely the same
>pertains to them?

Since the user cannot see them, the system can reuse them freely.

>> Another problem is if you define views with new identifiers in them.
>> Should these be regenerated every time the view is accessed? What would
>> these identifiers then mean?
>
>Why not? What are they used for if they mean nothing?

Glue, as you called it, as in the example above with relation R and R1 and R2.

>> >And one more question: what would be a practical query in which
>> >these abstract identifiers would be needed.
>>
>> Suppose I have a relation
>>
>> Bought(customer, product)
>>
>> and I want as the result a relation
>>
>> Cust_class(class_id, customer)
>>
>> that groups the customers into classes that have bought the same set of
>> prodcucts and gives them a new identifier.
>
>OK, but
>
> - what's wrong with having an ID that can be seen

Like I said, no reuse and it's bad data modelling. The concrete representation has no meaning so any operation that uses it should not be allowed which leaves only equality.

> - I would suppose that the id's could be generated
>each time the query was executed - what's the problem?

If it is executed a lot you may run out of them.

  • Jan Hidders
Received on Mon Nov 18 2002 - 01:03:22 CET

Original text of this message