Re: Design question

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sat, 24 Jan 2009 14:14:38 -0400
Message-ID: <497b5a93$0$5487$9a566e8b_at_news.aliant.net>


Mark S. (UK) wrote:

> Mark S. (UK) wrote:
> 

>> Hi,
>> I'm currently working on a system that consists of several sites.
>> Each site runs a system with its own local database, and then
>> periodically each site updates the "master" site with its data, allow
>> the data to be viewed centrally.
>>
>> It must be done like this because each site may not always be on the
>> network, but the data must always be available from the central server.
>>
>> To enable the master database to contain all the site database
>> information, I use a composite key, with the first part being the site
>> identification, and the second part being the row identification.
>> This allows user with row id 5 for instance to exist more than once
>> because paired with the site id it is still unique.
>>
>> On top of this, there is another layer. Each site must be contained
>> within an "Enterprise", which is just another identifier. So, I have
>> a table called enterprise which contains the list of site ids.
>>
>> So far then, I have this (obviously cut down to only show the relevant
>> columns) : -
>>
>> table: enterprise
>> column: id
>>
>> table: site
>> column: id
>>
>> table: user
>> column: site_id
>> column: id
>> column: label
>> etc..
>>
>> With the site_id and id making the composite key.
>>
>> Now, I've just had to add the concept of "enterprise" level users.
>> Initially the only clean way I could think of doing this is having two
>> separate user tables; enterprise_user and site_user : -
>>
>> table: enterprise_user
>> column[key]: enterprise_id
>> column[key]: id
>> column: label
>> etc..
>>
>> table: site_user
>> column[key]: site_id
>> column[key]: id
>> column: label
>> etc..
>>
>> However this does not seem very "normalized"..
>>
>> Other ways I considered are : -
>>
>> table: user
>> column[key]: id
>> column[key]: enterprise_id
>> column: site_id
>> column: label
>> etc..
>>
>> But this does not make sense as an enterprise user is not associated
>> with any site. I could have a dummy site, but I really do not like
>> that. I could just set site_id to NULL, but I'm not a big fan of NULL
>> in database and it just seems a bit hacky.
>>
>> So then I thought of adding a link table:
>>
>> table: user
>> column[key]: id
>> column[key]: enterprise_id
>> etc..
>>
>> table: user_site <== link table
>> column: enterprise_id
>> column: user_id
>> column: site_id
>> etc..
>>
>> with a unique constraint on enterprise_id, user_id (the composite key
>> that identifies a user in an enterprise).
>>
>> I think that also works.. basically with no entry in the "user_site"
>> table, they are enterprise users. An entry in user_site effectively
>> limits them to being a site user.
>>
>> However.. this might create complications in the "master" update process.
>>
>> Part of the batch update process involves updating user information.
>> Enterprise users (only editable at the master) must be propagated down
>> to each site, so that the site is "aware" of enterprise users.
>>
>> Site users (only editable at the site) must be propagated up to the
>> master, purely so all the (read only) data at the master is complete.
>>
>> So I'm wondering if having the two tables separate will be just much
>> simpler (although not being terribly efficient in terms of pretty much
>> repeating a table structure).
>>
>> I hope all that makes sense!
>>
>> Thanks,
>>
>> Mark.
> 
> 
> After some further thought, I've realised there is another problem: user 
>  name (or label as I called it above) uniqueness.
> 
> A user name at a site may be re-used, as the site is pretty much a self 
> contained system.  So a user "fred" can exist at site 1, and at site 2. 
>  Enterprise users however must remain unique.
> 
> So.. site users are unique to sites, and enterprise users are unique to 
> the enterprise (which is the "container" for sites).
> 
> This brings up a problem with the normalized design:
> 
> table: user
> column[key]: enterprise_id
> column[key]: id
> column: label
> 
> add constraint unique on user(enterprise_id, label)
> 
> table: user_site <== link table
> column: enterprise_id
> column: user_id
> column: site_id
> 
> This is fine for keeping users unique at the enterprise level, but 
> doesn't work out for site users when they've been propagated back to the 
> "master" database.
> 
> So, currently I'm thinking two separate tables.  I'm not repeating data, 
> I'm just repeating some elements of a table, so perhaps it isn't that bad?
> 
> Having read these posts back, I'm not sure I've explained the scenario 
> very clearly.  I hope it makes some sense at least.
> 
> Mark.

Mark,

I don't think theory-based answers to your questions exist, and I also find design via usenet counterproductive at best.

You are struggling with a number of issues including distribution and partial fault tolerance. When needed, these are very important considerations.

As a general principle, it is best to handle your distribution and fault tolerance needs at the physical level of discourse without exposing any of that in your logical data model. You are trying to handle it entirely at the logical level, which I think is probably inappropriate.

Pragmatically, it is best to consider these issues at the earliest stages and to use your distribution and fault tolerance requirements as a basis for comparing dbms products and then choosing the dbms product that best addresses your distribution needs. Once you have chosen a dbms product, which it sounds like you have already done, then it is best to consider the specific features that dbms product has for handling distribution, replication, fault tolerance et cetera physically with the least impact to your logical design.

Many products now have features for distribution or replication, for assigning globally unique numeric identifiers, for reconciling off-line changes and more. Perhaps even the product you use has such features.

The issue you have with user id's is a basic requirements and design issue. Assuming you are using user id in the conventional way, the purpose of a user id is to uniquely identify users of a system. Pragmatically, the scope of such identity has to be an entire independent system. Generally, that means globally throughout a system. In your case, that may not be true. Individual sites, though, are not independent systems because they are tied together into enterprises. It may be your enterprises are entirely independent from one another. Or it may be they have interdependencies too.

Sometimes, it makes sense to scope user ids to have partial independence, which is what email addresses do. The entire globally unique id is the full email address with a user name, the _at_ symbol, and a domain, while the locally unique id is simply the user name. Doing so simplifies some things while complicating others.

In any case, I think you need to decide the scope of your user ids based on what users will expect and what meets the needs of the system you are designing. You may have to trade off conflicts between them.

In the end, at this point in time, design is more art than science. Received on Sat Jan 24 2009 - 19:14:38 CET

Original text of this message