Re: Lucid statement of the MV vs RM position?

From: dawn <dawnwolthuis_at_gmail.com>
Date: 2 May 2006 14:12:20 -0700
Message-ID: <1146604340.933641.127960_at_g10g2000cwb.googlegroups.com>


Jan Hidders wrote:
> dawn wrote:
> >
> > OK, so you told me the jury was out regarding modeling and implementing
> > with non-1NF data (or something like that);
>
> Not exactly. What the jury is out on is whether you can still have both
> efficiency and data-independence at the same time, which requires
> powerful query optimization.

Ah, that data independence thing again. There are plenty of requirements and I definitely do not want changes in the need for additional disk or any physical changes like that to prompt a change in software, but I still don't grok this requirement fully. Can this non-functional (theoretical?) requirement of "data independence" be written as a series of functional requirements?

For example, do we want the requirement that if data are moved from schema-A on host-A managed by subsidiary-A to schema-B on host-B managed by subsidiary-B, then must not be a need to change the logical data model used by the applications, so that applications can run without changes simply by redirecting (outside of the apps) requests for such data to another data source? Obviously, that would be nice. Each such requirement has an associated cost.

I'd like to take the overall functional requirements for a database management system (which are not the same for every organization, I will grant) and optimize all together rather than declaring a single non-functional requirement as fixed in stone, while users might not get what they need. Obviously, we want to have maintainability, reliability, security, and all other non-functional requirements met, but these need to be turned into functional requirements, it seems, before they can be tested. So what tests does a DBMS need to meet in order to be branded as having "data independence"? Are all the functional requirements in that category of equal importantance?

> > then I told you that there
> > are production systems out there, if the jury really wanted to know;
>
> The jury knows. :-)

But you cannot point me to what the jury has found out, right?

> > then you said they have already gathered data; then I tried to ask
> > "where is it?"; and you said "I don't know any, nor do I expect
> > something like that to exist."
>
> ... in a published form that you can Google for. But that data is there
> in the sense that you can talk to experienced developers and DBMS
> implementors.

OK, I'm talkin' to you, Jan. How is XQuery performance against a Cache' (MUMPS), U2 or D3/TigerLogic (PICK), or Viper implementation going to compare to SQL queries on an SQL-DBMS if the first are NF2 and the latter are normalized?

> > So, I think we talked past each other on that one. If RVAs and other
> > non-1NF structures are now accepted in theory and implemented in
> > practice, then when should they be used?
>
> When they bring you closer to your goals, such as for example
> flexibility and efficiency. Whether that is the case or not may depend
> upon the DBMS you are using.

Yes, in practice it will matter, but since we teach database theory in higher ed, what should we teach about use of NF2 other than "it will depend on your DBMS"? I would suggest that it is a bad idea to use RVAs for strong entities, for example, while often a good idea to model multivalued entity properties as RVAs or LVAs. Are logical data models about performance? (Of course in practice they are, but what about in theory?) If so, then we would need to teach logical data modeling for a particular DBMS.

> > If we are going to (re-!)introduce nested structures, we should have a
> > good idea when a logical data model should include them, or at least
> > know some best practices. If a particular DBMS is not up to the task
> > of working well with performance or ease of queries with the nested
> > structures, then perhaps the actual implementation needs to normalize
> > (1NF). But I'm not seeing nested structures yet in logical data models
> > (not that I'm seeing all LDMs).
>
> To add some confusion; there actually is such a thing as *the* logical
> data model, called LDM. Google for "logical data model kuper vardi".
> But I digress.

Which I just did too by googling as suggested. A poor choice of names, but I like di-graph models and I think I've glanced at this before, but I'll add to the reading list.

> But what do you mean with "I'm not seeing nested structures". Are we
> talking about conceptual data models?

No, in conceptual data models you will have multivalues. There are many defs for these terms, and if you consider the logical data model to be database-dependent, then it is the implementation model, but that is not quite how I'm thinking of it.

If I use the terms as described below, then what are some best practices for the LDM?

LDM: logical organization for the data (not physical), like a conceptual data model it is useful in a database independent way, used by software developers who are writing software to be db-independent

IDM: implementation data model, this is the implementation of the logical model optimized for a particular dbms. So, if one dbms does well with a particular data approach and another does not, the CRUD services written by software develoeprs for our database-independence and used by application software developers will have the proper translations for the specific supported data stores.

I recognize that a lot of SQL-DBMS's are not optimized for implementation data models with RVAs, but there could still be best practices with LDMs employing such -- what is a good theory of when to model data as a nested set, list, or bag?

> About database schema designs in
> practice? Proposed data modelling notations? Or what?

LDMs in practice still (from what I have seen) exclude RVAs and what I've read either ignores RVAs or suggests they rarely be employed in a model, in some extreme cases. The industry seems to be headed toward a return to nested structures (with XML and RVAs, for example), so what would be some best practices for when an LDM (as described above) should include them? Is the answer based on Oracle or SQL Server performance or can we recognize that there are some tools where nested structures work (we could choose Cache' for example) and start learning how to write NF2-ready LDMs. We can then let other DBMS's that want to play join in when they are NF2-capable.

I just not finding anything about NF2 best practices, although I can think of several. Do you know of such write-ups? Thanks. --dawn Received on Tue May 02 2006 - 23:12:20 CEST

Original text of this message