Re: Design question

From: Mark S. (UK) <"Mark>
Date: Sat, 24 Jan 2009 19:01:20 +0000
Message-ID: <497B6580.2030808_at_nospamhere.com>


Bob Badour wrote:

> Mark S. (UK) wrote:

>> Mark S. (UK) wrote:
>>
>>> Hi,
>>> I'm currently working on a system that consists of several sites.
>>> Each site runs a system with its own local database, and then
>>> periodically each site updates the "master" site with its data, allow
>>> the data to be viewed centrally.
>>>
>>> It must be done like this because each site may not always be on the
>>> network, but the data must always be available from the central server.
>>>
>>> To enable the master database to contain all the site database
>>> information, I use a composite key, with the first part being the
>>> site identification, and the second part being the row
>>> identification. This allows user with row id 5 for instance to exist
>>> more than once because paired with the site id it is still unique.
>>>
>>> On top of this, there is another layer. Each site must be contained
>>> within an "Enterprise", which is just another identifier. So, I have
>>> a table called enterprise which contains the list of site ids.
>>>
>>> So far then, I have this (obviously cut down to only show the
>>> relevant columns) : -
>>>
>>> table: enterprise
>>> column: id
>>>
>>> table: site
>>> column: id
>>>
>>> table: user
>>> column: site_id
>>> column: id
>>> column: label
>>> etc..
>>>
>>> With the site_id and id making the composite key.
>>>
>>> Now, I've just had to add the concept of "enterprise" level users.
>>> Initially the only clean way I could think of doing this is having
>>> two separate user tables; enterprise_user and site_user : -
>>>
>>> table: enterprise_user
>>> column[key]: enterprise_id
>>> column[key]: id
>>> column: label
>>> etc..
>>>
>>> table: site_user
>>> column[key]: site_id
>>> column[key]: id
>>> column: label
>>> etc..
>>>
>>> However this does not seem very "normalized"..
>>>
>>> Other ways I considered are : -
>>>
>>> table: user
>>> column[key]: id
>>> column[key]: enterprise_id
>>> column: site_id
>>> column: label
>>> etc..
>>>
>>> But this does not make sense as an enterprise user is not associated
>>> with any site. I could have a dummy site, but I really do not like
>>> that. I could just set site_id to NULL, but I'm not a big fan of
>>> NULL in database and it just seems a bit hacky.
>>>
>>> So then I thought of adding a link table:
>>>
>>> table: user
>>> column[key]: id
>>> column[key]: enterprise_id
>>> etc..
>>>
>>> table: user_site <== link table
>>> column: enterprise_id
>>> column: user_id
>>> column: site_id
>>> etc..
>>>
>>> with a unique constraint on enterprise_id, user_id (the composite key
>>> that identifies a user in an enterprise).
>>>
>>> I think that also works.. basically with no entry in the "user_site"
>>> table, they are enterprise users. An entry in user_site effectively
>>> limits them to being a site user.
>>>
>>> However.. this might create complications in the "master" update
>>> process.
>>>
>>> Part of the batch update process involves updating user information.
>>> Enterprise users (only editable at the master) must be propagated
>>> down to each site, so that the site is "aware" of enterprise users.
>>>
>>> Site users (only editable at the site) must be propagated up to the
>>> master, purely so all the (read only) data at the master is complete.
>>>
>>> So I'm wondering if having the two tables separate will be just much
>>> simpler (although not being terribly efficient in terms of pretty
>>> much repeating a table structure).
>>>
>>> I hope all that makes sense!
>>>
>>> Thanks,
>>>
>>> Mark.
>>
>>
>> After some further thought, I've realised there is another problem:
>> user name (or label as I called it above) uniqueness.
>>
>> A user name at a site may be re-used, as the site is pretty much a
>> self contained system. So a user "fred" can exist at site 1, and at
>> site 2. Enterprise users however must remain unique.
>>
>> So.. site users are unique to sites, and enterprise users are unique
>> to the enterprise (which is the "container" for sites).
>>
>> This brings up a problem with the normalized design:
>>
>> table: user
>> column[key]: enterprise_id
>> column[key]: id
>> column: label
>>
>> add constraint unique on user(enterprise_id, label)
>>
>> table: user_site <== link table
>> column: enterprise_id
>> column: user_id
>> column: site_id
>>
>> This is fine for keeping users unique at the enterprise level, but
>> doesn't work out for site users when they've been propagated back to
>> the "master" database.
>>
>> So, currently I'm thinking two separate tables. I'm not repeating
>> data, I'm just repeating some elements of a table, so perhaps it isn't
>> that bad?
>>
>> Having read these posts back, I'm not sure I've explained the scenario
>> very clearly. I hope it makes some sense at least.
>>
>> Mark.
> 
> Mark,
> 
> I don't think theory-based answers to your questions exist, and I also 
> find design via usenet counterproductive at best.
> 
> You are struggling with a number of issues including distribution and 
> partial fault tolerance. When needed, these are very important 
> considerations.
> 
> As a general principle, it is best to handle your distribution and fault 
> tolerance needs at the physical level of discourse without exposing any 
> of that in your logical data model. You are trying to handle it entirely 
> at the logical level, which I think is probably inappropriate.
> 
> Pragmatically, it is best to consider these issues at the earliest 
> stages and to use your distribution and fault tolerance requirements as 
> a basis for comparing dbms products and then choosing the dbms product 
> that best addresses your distribution needs. Once you have chosen a dbms 
> product, which it sounds like you have already done, then it is best to 
> consider the specific features that dbms product has for handling 
> distribution, replication, fault tolerance et cetera physically with the 
> least impact to your logical design.
> 
> Many products now have features for distribution or replication, for 
> assigning globally unique numeric identifiers, for reconciling off-line 
> changes and more. Perhaps even the product you use has such features.
> 
> The issue you have with user id's is a basic requirements and design 
> issue. Assuming you are using user id in the conventional way, the 
> purpose of a user id is to uniquely identify users of a system. 
> Pragmatically, the scope of such identity has to be an entire 
> independent system. Generally, that means globally throughout a system. 
> In your case, that may not be true. Individual sites, though, are not 
> independent systems because they are tied together into enterprises. It 
> may be your enterprises are entirely independent from one another. Or it 
> may be they have interdependencies too.
> 
> Sometimes, it makes sense to scope user ids to have partial 
> independence, which is what email addresses do. The entire globally 
> unique id is the full email address with a user name, the _at_ symbol, and 
> a domain, while the locally unique id is simply the user name. Doing so 
> simplifies some things while complicating others.
> 
> In any case, I think you need to decide the scope of your user ids based 
> on what users will expect and what meets the needs of the system you are 
> designing. You may have to trade off conflicts between them.
> 
> In the end, at this point in time, design is more art than science.

Thank you for your well thought out and detailed answer Bob; it makes a lot of sense and you've given me a lot to think about.

Take care,

Mark.

--
Received on Sat Jan 24 2009 - 20:01:20 CET

Original text of this message