Re: 0.99999998 (was: Unknown SQL)
Date: Sat, 21 Jul 2001 23:29:22 GMT
Message-ID: <7epkhtgm9jrpliu2j21bqsmflbbohocdh5_at_4ax.com>
On Sun, 3 Jun 2001 12:09:22 -0400, "Bob Badour" <bbadour_at_golden.net> wrote:
>>Say if two banks, A and B, merge, then if a customer of bank A is
>>called 'Mrs Robinson' and a customer of bank B is called 'Mrs
>>Robinson', then although the strings match textually, there is no
>>guarantee that they are one and the same person. You cannot just
>>import the data.
>
>Well, actually you can just import the data. Since each Mrs. Robinson has
>some form of customer ID at one of the two banks and since the import
>process must translate these customer ID's into non-overlapping sets in the
>merged result, one can just import the data. Your initial assumption that
>banks will identify customers by honorific and surname is absurd.
>
I have seen this done (and I had to clear it up!). In this instance it was admittedly a bit different from the above.
It was a reporting task, reporting from several systems, each outputting different tables (outside our control). These tables were without ID's, the person names were sufficiently few to be assumed to be unique. The system was written using simple joins on person name (the key in each respective table), until someone finally realised that there were plenty of typo's, case mismatches, and spelling mistakes. At this point I had to dissect the joins (the queries by then were quite big) and use lookup tables.
He had, as many beginners would, assumed a higher notion of equality than was supported. He assumed that 'Mr Fred Smith' = 'Mr Frederick Smith'. And they ARE equal on a higher level.
Incidentally, the person who wrote the initial queries was a newcomer to SQL with little experience (I notice that people have been claiming that SQL is easy to use for newcomers). I am not particularly biassed to OO or relational, but I must admit that in the case of an OO world with enforced unique identifiers for each object, the initial programmer would have been forced to think twice before assuming that the names would match, instead of the GUI (OK, Access) deciding it for him based on the primary keys.
This comes down to the debate about the use of unique identifiers.
- An OODBMS can enforce them (many do implicitly).
- The relational model can support them but it is up to the user to implement them (as you highlight, in most cases the user will do). However, they are not guaranteed.