Re: database design method
Date: 18 Nov 2002 01:03:22 +0100
Lauri Pietarinen wrote:
>Jan Hidders wrote:
>> >So is the function of these identifiers to act as an "internal glue"?
>> In a sense, yes. Suppose you want to represent a nested relation R(a,B)
>> (B is the nested column) with the flat relations R1(a,c) and R2(c,b)
>> where in R1.c you use an identifier to replace the set in R.B, and in R2
>> you record what the elements of the sets. You could indeed say that the
>> identifiers in colomns c is what allows us to glue the two relations
>> together and reconstruct the nested relation R.
>OK. Why do they have to be abstract?
That's the wrong question. If their representation has no meaning for you, why do you want to see it? This is a logical data model we are talking about, it should model what is relevant for the user and nothing else. If you're puzzled by what I mean see how it works in formal logic and model theory, which are the roots of the relational model.
>> >What is the problem with generating such values? If they are just
>> >integer values (say) and there is a process inside the engine that gives
>> >the system the next integer? Or maybe generates a random value within a
>> >large space and just quickly checks if it is reserved?
>> For surrogate identifers the DBMS must somehow remember all identifiers
>> that have ever been used, also those that are no longer used in any
>> relation. It could do this by either storing the set of all identifiers
>> that were ever used (could be big) or remembering the highest integer
>> that it has every generated, but then it cannot reuse identifiers that
>> are no longer used.
>Why does it have to remember all identifiers ever been used if it want's
>to reuse them anyway?
No, they cannot be reused and that is why you have to remember them. This is because otherwise a user might come back after a while and ask if entity with id 12345 is still there and get the answer "yes" but in reality this entity was removed and after that a new identity was created the reused its identifier.
>On the other hand if it does not reuse the identifiers
>what's wrong with an integer counter (say 2**64).
If you are going to use identifiers to simulated nested values you will have to create new identifiers lots of times and then you might run out.
>And still: why do the identifiers have to be globally unique? Surely it
>suffices to have them unique within each relation?
That depends on what they represent and what the columns mean they are in.
>And how do "abstract identifiers" come to the resque? Surely the same
>pertains to them?
Since the user cannot see them, the system can reuse them freely.
>> Another problem is if you define views with new identifiers in them.
>> Should these be regenerated every time the view is accessed? What would
>> these identifiers then mean?
>Why not? What are they used for if they mean nothing?
Glue, as you called it, as in the example above with relation R and R1 and R2.
>> >And one more question: what would be a practical query in which
>> >these abstract identifiers would be needed.
>> Suppose I have a relation
>> Bought(customer, product)
>> and I want as the result a relation
>> Cust_class(class_id, customer)
>> that groups the customers into classes that have bought the same set of
>> prodcucts and gives them a new identifier.
> - what's wrong with having an ID that can be seen
Like I said, no reuse and it's bad data modelling. The concrete representation has no meaning so any operation that uses it should not be allowed which leaves only equality.
> - I would suppose that the id's could be generated
>each time the query was executed - what's the problem?
If it is executed a lot you may run out of them.
- Jan Hidders