Re: Testing Various Data Models?

From: Dawn M. Wolthuis <dwolt_at_tincat-group.com>
Date: Wed, 10 Mar 2004 08:33:25 -0600
Message-ID: <c2n8vo$4h9$1_at_news.netins.net>


"Dan" <guntermannxxx_at_verizon.com> wrote in message news:mfz3c.82605$6K.21638_at_nwrddc02.gnilink.net...
>
> "Dawn M. Wolthuis" <dwolt_at_tincat-group.com> wrote in message
> news:c2l1dq$2q3$1_at_news.netins.net...
> > If we were to find a sponsor for a competition where the winner would be
a
> > named data model and the primary criteria would be related to the cost
tor
> > an institution choosing to use that data model (including costs if there
> is
> > a lack of quality or breach of security or difficulty in maintinaing it,
> > ...), what should that competition include?
> >
> Four things I would like to see included, at the very least:

Thank you, thank you for actually addressing the question, Dan!

> 1. Costs and resources incurred on applications as a result of changes of
> internal database organization, in turn due to evolving business models
and
> requirements. An empirical measurement of reengineering costs for both
> database and any and all affected applications - for both small and large
> changes. This would include the costs of adding new applications to share
> the same data with different access requirements and having competing or
> ancillary data quality constraints.

If we had an initial set of requirements, a single change to those requirements prior to the implementation and three changes after the initial setup, would that be sufficient for such a competition? One of the post-implementation requirements changes would be the one related to your last statement.

> 2. Cost comparisons in providing for direct ad-hoc access of the data
model
> to users, including implicit requirements of whether the data model is
with
> or without requiring intervention and high-cost assistance through
> programmers. An emprical analysis of the cost of being able to ask any
> relevant question over varying levels of complexity.

After the initial implementations, an independent group would then do ad hoc queries of various types and we would determine the skillsets required to complete each query. However, any that require training of the end-users or where experience really counts might need some pre-trained users???

> 3. Costs associated with integrating data structures across remote
systems,
> including the presentation of integrated views across distributed model
> elements, though not necessarily semantically, structurally, or
> intensionally analogous. Costs could be associated with ease of creating
a
> distributed system view (i.e. view creation versus hard-coding integrating
> middleware).

This one is difficult to determine what to measure. We could view the entire collection of implementations as a distributed set of data, but then if implementation A is better (by whatever measures we include) at aggregating data across all disperate data sources, what does that say about data model or even database implementation A? What if database A is not involved in the solution any more than database B? It would say that team A's overall solution "won" in this part of the competition, but it might say nothing about data model A, right?

> 4. Capabilities and costs associated with defining, maintaining, and
> presenting information from model elements based on ternary keys and
higher;
> returned with information associated with data elements from referenced
data
> structures (three if ternary, etc.).

These are quite implementation-specific criteria, but if all n implementations of a specific model "lose" to all of another model, then I guess it does relate to the usefulness of the model, as performance in general does.

> 5. Costs and resources and (damage) associated with the ability or lack
of
> ability of the model to maintain data integrity, especially in cases where
> redundancy exists or cannot be avoided because of the intricacies or
> limitations of the model itself.

If the solution developed by any team yields a lack of data integrity according to the constraints indicated in the requirements, that would surely detract from that team's score. It is quite likely that all teams will end up developing rather tight solutions in this regard, however, don't you think?

Would a "pet store" application be a sufficient as a basis for the competition, or is there something more rigorous that would yield a more convincingly serious competition?

Perhaps having 5 databases that are considered to be RDBMS's and 5 that are not based on the relational model would be a good start? For the RDBMS's -- Oracle, DB2, SQL Server, and which others? Informix, Sybase, MySQL? For non-relational, we could have a MUMPS implementation -- Cache, a PICK implementation such as UniVerse, an OO implementation -- which one? an XML-DB such as Berkeley XML-DB or ? an implementation tied to a particular vertical market such as that with metadata.com ? an implementation such as the e4graph just announced here ... a hierarchical such as IMS, others?

> Smiles.

Yup. --dawn Received on Wed Mar 10 2004 - 15:33:25 CET

Original text of this message