Re: organizing tables?

From: Bob Badour <bbadour_at_golden.net>
Date: Tue, 15 Jul 2003 18:09:22 -0400
Message-ID: <wU%Qa.851$De1.142752828_at_mantis.golden.net>


"Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message news:bf0lje$1djo$1_at_gazette.almaden.ibm.com...
> "Bob Badour" <bbadour_at_golden.net> wrote in message
> news:otKQa.829$4P6.136561801_at_mantis.golden.net...
> > > Indeed. Why 'application' and 'user' names. Are these important?
> >
> > Yes, they are important. Role and application version are as well, but I
> > tend to think of versions as identifying different applications.
> >
> >
> > > Why one
> > > hierarchy, why not many. Why a hierarchy at all. All good questions.
> > > To tackle them logically, we would propably want some agreed logical
model
> > > of 'users', and maybe of 'applications'.
> >
> > I suggest using one or more candidate keys for namespaces where at least
one
> > candidate key is compound. One can drill down a compound candidate key
in
> > any order if one wants to pretend the namespace has a hierarchic order.
>
> Not sure I understand.
> Do you mean for example including COMPANY_ID as a column in the compound
> candiate key for every entity in the part of the database to be split by
> company??

No. I mean using a compound key as the logical identifier for a namespace. What does company_id have to do with identifying a namespace? Are you suggesting that multiple companies share databases, users etc.?

> > For the most part, I agree that users will generally want to temporally
> > separate their interactions with different databases. For the most part,
a
> > user accessing the marketing database wants to have confidence that
nothing
> > they do will directly change the accounting database. However, some
users
> > may have access to both marketing and accounting databases with each of
> > these databases defining only a subset of the users' view. And even
users
> > who generally want to access one and only one database may at times want
to
> > access both simultaneously (albeit very carefully).
>
> At some level it is just a matter of perception. But I would want a DBMS
to
> encourage (and allow) users to see the complete set of relvars that they
can
> see at any moment as a single database.

Since a database is simply part of the namespace, I do not know what the difference is.

> If they can see both marketing and
> accounting relvars, they should just see that as on (integrated) database.

I disagree. Users should see the databases as integrated even if they see multiple databases. The distinction is subtle but important.

> In
> particular everything about such an integrated database should work
exactly as
> a 'single' database works.

At the logical level, I agree to a certain extent. Having multiple databases provides one an opportunity to define binary and ternary etc. operations on databases, which then means a system with multiple databases works differently from a system with one database because one can use these operations.

> > Both marketing databases and accounting databases may have Customer
> > relations where there are subtle differences between them. Further, the
> > marketing database may have multiple Customer relations where a Customer
> > means one thing to a PromotionPlanning application and something subtly
> > different to the CRM or CallCenter application.
>
> Tell me about it. All nice messy data integration stuff. If two relvars
are
> both called Customer (or, alternatively, have exactly the same tuple
type),

Calling two different relvars Customer and having two relvars with the same type are entirely different things.

> then the integration layer must make them the same relation as far as
users
> are concerned.

No, I disagree. As far as users are concerned, they are different things. People understand that "book" means one thing at the library, something different at the racetrack, something else still at the police station etc.

> If the layer cannot do that because they don't (or should not)
> contain the same data, then the layer needs to rename at least one of
them.
> Marketing_Customers & Account_Customers say.

Since the relations are in different namespaces, this is unecessary.

> > The user name does define a subset of the database through the security
> > function. I suggest that users do think in terms of separate databases
like
> > the marketing database and the accounting database, and that databases
> > define subsets of the user's view.
>
> I suggest users only think in such terms because that is the way current
> systems work. It is not essential. Another way would be for users to see
the
> accounting part of the corporate database, or the marketing part and to
see
> that these are just some subsets of a single whole.

Current systems work that way because people want them to. There are no technical reasons preventing integration of marketing and accounting in a single database even now. Business reasons and psychological reasons but no technical reasons, per se.

> > > what are we really talking about here. Naming scope?
> [snip]
> > The question then becomes: What defines context? I can make plenty of
> > suggestions: database, application, version, role, user, schema, relvar.
>
> I guess we should define contexts independently, then link them to users,
> schemas, roles, versions, applications as required.

I see no reason for this extra level of indirection, but I have no objection to it either.

> A context is then just a named database subset (I suggest).

Why a name? One can achieve the extra level of indirection you desire just as meaningfully with a number.

> And a database subset is any subset of database relvars and tuples that is
> itself a valid database.
> E.g.
> 1) all constraints only reference relvars within the database subset

This is impossible in practice without destroying integrity. The dbms must enforce all constraints even those referencing unseen relations.

> 2) everything that is not constrained can be updated (i.e. no hidden
> constraints)

This is impossible in practice without destroying either integrity or security.

> 3) everything that is constrained cannot change value (i.e. no hidden
> updates).

I see no reason for this arbitrary restriction and ample reason for trashing it. For instance, I recall a restaurant who caught one of its managers stealing. The manager was reducing the amounts recorded for meals paid by cash and pocketing the difference from the till. The manager even deleted the audit trail entries; however, the system maintained another audit trail the manager knew nothing about and had no access to. While everything always balanced, the general manager eventually noticed the suspicious activity.

> > > If it is to do with naming scope, then schema design and evolution are
> > some
> > > other things that play a part here..
> >
> > I suppose one could propose support for versioned database views,
versioned
> > application views, versioned role views, versioned user views and
versioned
> > schema views, but I tend to think of versions applying mostly to
> > applications.
>
> I was thinking more of what happens to canned statements if the scope
changes
> in such a way as the attributes the statement ultimately reference change
even
> if the canned statement itself has not. Does it reference the original
> attribute or the one according to the new scope?

Since a canned statement applies to a single version of a single application, a different application or a different version of the application using a different set of relvars may have to use a different canned statement. Changing the canned statement is just one of many changes one makes to the application when creating the new version.

The views defining an application's view of the data will change as necessary so long as the database supports that version of the application. When the database no longer supports a particular version of a particular application, the database no longer needs any namespaces the application version defines. Received on Wed Jul 16 2003 - 00:09:22 CEST

Original text of this message