Re: Database design, Keys and some other things
Date: 5 Oct 2005 11:47:49 -0700
Message-ID: <1128538069.353968.27060_at_g44g2000cwa.googlegroups.com>
David Cressey wrote:
> "vldm10" <vldm10_at_yahoo.com> wrote in message
> > This story about "external" and "internal" is interesting. I
> > would like to add some practical thoughts related to my solution, to
> > clarify it. I connected "internal" and "external" using
> > following identifiers:
> > 1) The identifier of the state of an entity.
> > 2) The identifier of the entity.
>
> This connection works in your particular case. But not all internal
> identifiers (surrogate keys) are associated
> with a state of an entity. Here's an example:
>
> Alpha Airlines merges with Beta Airlines to from a new entity, Gamma
> Airlines. The employees win the right to carry over their employment
> record, and benefits that go with time of service. It's desired to create a
> data warehouse with the combined employment history of both sets of
> employees.
>
> Alpha Airlines has identified its employees by employee_id starting at 1 and
> assigned sequentially.
> Beta Airlines has identified its employees by employee_id starting at 1 and
> assigned sequentially.
>
> Clearly, employee_id is not going to work as the key in the DW. So we
> invent a surrogate for DW purposes, and use copies of the surrogate as
> foreign keys elsewhere in the ware house. We have a couple of columns,
> ALPHA_EMPLOYEE_ID and BETA_EMPLOYEE_ID in the employee table, and for every
> employee that came over with the merger, one of these columns is filled in,
> and the other one is missing.
>
> This is a different reason to go with surrogates than the one that pertains
> to the case you raised.
>
> Your surrogate really stands as a surrogate for a compound key: the key of
> the entity, plus the key of a time span that is bounded by two state
> transitions of the entity in question. So it really doesn't act as a
> surrogate for only the entit
It seems that your example is in fact the problem which Codd defined. I will show how I would solve this. I will use my data model www.dbdesign10.com 2.4 under 2.
Let E-Relation for the Alpha Company be:
aEmpKey aEmpID KnowledgeA
Pa1 ssn_a1 ka1
...
Pan ssn_ai kan
- Where Pai is Key (as I defined). All values are different in the EmpKey column.
- ssn is person's social security number. Value in EmpID can be repeated.
- Kai represent knowledge columns. These columns depend on the Designer's solution. They can have Dates, Operators, StationID, Status - which define that this data are from the Alpha company, etc.
You can have additional attributes for an employee (K-Relations). But
they are not important here.
S-Relation also is not added so that we can concentrate on the problem.
Let E-Relation for Beta Company be:
bEmpKey bEmpID KnowledgeB
Pb1 ssn_b1 kb1
...
Pbm ssn_bj Kbm
The conditions for this relation are similar as for the Alfa Company.
Now Solution for the Gamma Company is following E-Relation :
gEmpKey gEmpID KnowledgeB
Pg1 ssn_a1 kg1
...
Pgn ssn_ai kgn
...
Pgm+n ssn_bj kgm+n
This will support the employees which were working for both companies. Using the knowledge columns and the sequences of data I can get good and reasonable information, about wrong data. These columns have here the practical role. They can show which rows we don't need, which data are redundancy which keys are not appropriate. Columns "Key" can be created very sophisticated using the Constructor.
("By the Constructor" basically I mean to do write or insert on organized way. I don't have delete or update. It is possible to do update with "new", but this is worse solution. So the transactions in both constructors are caused by an "insert" new information in the database)
It will be good idea to organize the knowledge columns so that we know which row comes from the Alpha (or Beta) company. Keys Pgi are generated as a new key. If you need exactly relationship between old keys and new keys you can make following relationship:
RelKey gEmpkey OldKey KnowledgeR
P1 Pg1 Pa1 Kr1
...
Pn Pgn Pan Krn
...
Pn+m Pgm+n Pam Krm
The knowledge columns depend on your choice. (A solution without RelKey column can be accetible in almost every case)
Similar solution is possible using 2.4 under 1. That is implementation
of the
relation Employe in the form which is denoted as ...(2) in my text.
Above solution is done so that the solution can be immediate applied.
Later we can refine the structures and the data.
However I don't think that this problem (to merge two databases by
"click") is realistic.
In the real life, for the real complex databases this will be the new
very hard project. I think that just the transfer of the data from one
database to another database or the transfer of the data from one
hardware environment to another is very serious job. Just to mention a
verification of the transferred data. For simple applications it is
possible to do merge quickly.
Vladimir odrljin Received on Wed Oct 05 2005 - 20:47:49 CEST