Re: Data Display & Modeling

From: Dawn M. Wolthuis <dwolt_at_tincat-group.com>
Date: Sat, 8 May 2004 09:38:21 -0500
Message-ID: <c7ird9$nsf$1_at_news.netins.net>


"Eric Kaun" <ekaun_at_yahoo.com> wrote in message news:Q4Smc.243$Nl2.8_at_newssvr16.news.prodigy.com...
> "Dawn M. Wolthuis" <dwolt_at_tincat-group.com> wrote in message
> news:c7e3hr$t0c$1_at_news.netins.net...
> > Let's say that someone likes to model data as functions [e.g.
> > PERSON(key)=tuple], in non-1NF and in di-graphs (so you can navigate
your
> > way through it), with a target implementation in XML or PICK.
>
> Uh oh... segueing into implementation already, after only 1 sentence? :-)

Unfortunately it is necessary BECAUSE if you are using an RDBMS, it would be silly (in general) to consider ignoring 1NF, while with PRETTY MUCH ANY OTHER TARGET IMPLEMENTATION you don't need to restrict your data model in this way. Did that make sense?

> > I have heard
> > people say that is a good way to "view" the data, but we need to model
it
> > using relational theory.
>
> To act as devil's advocate, why, given your intentions above? Every
function
> is a relation, though the converse isn't true.

Yes, that's true. And if you can base your approach on a narrowed down version of a relation, that simplifies things right? That's a reason why relational theory doesn't work with all collections, but narrows that down to just relations. I just narrow it down further - simplicity, right?

> > Given:
> >
> > 1. that there is a good non-relational way to view the data, such as one
> > using di-graphs with functions on strings.
> >
> > 2. that relational theory provides us with a loss-less decomposition of
> the
> > data.
>
> Are you talking specifically about decomposing those tuple attribute
values
> that aren't scalar? If not, then the above is simply a sort of definition
of
> primary key.

In 2, I'm talking about relational theory's def and the decomposition of non-scalars. I'm looking here at relational theory as providing one "view" of the data. If one can take a graph of data (however we define that) and fully normalize it using relational theory, then the statement is that we have not lost anything. If we haven't lost anything, then we can get back to our original, right?

> > 3. that we can mark-up the di-graph with information to permit automatic
> > normalization of the data, so that we could switch between these two
> > "pictures" of the data (O-R mappings, for example)
>
> What sort of markup? I'm unclear what you mean here.
This is the reverse process. Let's say that we have a relational model and turn it into something else (a graph, for example) -- if there is any concern about losing information from the relational model, then I just want to be covered and say that nothing is lost -- we just "mark up" the new model in some way so that there is no loss in either direction.

> > Then, these two models are just two pictures of the same data and one
> could
> > pop back and forth between them.
>
> The pop can only be 2-way if you assume the functional stance above as the
> basis, which is of course more limited than the stance of relations...

and relations more limited than sets, which is more limited than collections. Yes, that is the purpose of suggesting any sort of information needed for one model be present in the other too.

> > Then the software/database developer could view the data model either
way,
> > and, therefore, never have to bother with relational notation at all.
> >
> > 4. Now figure that relational theory is not related to the physical
> storage
> > of the data.
> >
> > Then there is no reason to even add the notations to switch between the
> > di-graph and relational provided that everything the system needs to
store
> > the data is provided in the di-graph "picture" -- there just needs to be
a
> > mapping between the di-graph and the physical model, which can be
> optimized
> > for machine processes (therfore, likely NOT relational!)
>
> a. optimization is easier under relational theory than under other
theories
> (and pseudo-theories); SQL != relational. Then again, since relational
isn't
> physical, not sure why this statement is important or meaningful

yup -- not relevant (in theory ;-)

> b. "Everything the system needs to store the data" isn't the point - the
> point is (or was anyway) a good logical structure

Yes, but that is what we differ on

> > Therefore, I see no reason for that R part in any aspect of the system,
>
> But you seem to want to deliberately exclude it, based on preference. If
> you're hinting that relational isn't needed to store data, you're right -
> all that's needed is a physical medium.

I realize it is not, in theory, needed for physical storage and I'm suggesting it is also not needed for viewing the data.

> > other than (recent) tradition and the fact that a ton of data has been
> > (unnecessarily)
>
> But usefully.
>
> > placed in 1NF. If the application of relational processes
>
> Normalization, I assume?

Also, narrowing down operations to those on sets (and types).

> > were really loss-less, then the data could be viewed the way I like to
see
> > it (but, alas, while non-1NF databases have no problem showing
themselves
> as
> > if they were relational, the reverse is often not the case).
>
> ?
>
> a. A relational scheme in 5NF/6NF can be viewed in many ways. Each of
those
> views hides something, though the scalar values can be manipulated in such
a
> way that they can be used to reconstitute the relations. One advantage of
> relations is that you can generate arbitrary hierarchies / views / reports
/
> etc., and without a lot of work. Egalitarian, which admittedly makes an
> app-specific view more difficult (at least some of the time). If all you
> have is 1 output function, then anything will do.

So, if one were able to specify all information needed for the computer to generate a relational model within a particular hierarchical view of the data, then there would be no need to view the data as tables if you didn't want to, right? And if the computer doesn't need that relational model for storage either, then there might never be a need to view the data as tables. Date says (p.62 Intro-DB) "relational systems require only that the database be perceived by the user as tables". My point is that we don't need to view the data as (1NF by the old def) tables to get the job done and done well.

> b. What relational "version" of data has trouble showing itself as a
non-1NF
> database? You mean the ordering discussed elsewhere, or is there something
> else?

I could be wrong, but from what I have seen, there are more than a few applications that use a SQL-RDBMS and don't specify parent-child constraints so that data is lost going from non-1NF databases to 1NF structures. Is that just my uncommon experience or are there a lot of implementations in SQL-RDBMS's where the parent-child constraints are in the application only and not in the database?

> I would argue [again] that the attribute violating 1NF (e.g. the list
> in a tuple) must be exceedingly simple (e.g. a list of textutal items
which
> no additional structure) for that "logical" 1NF expression not to have
> negative consequences.

There is a trade-off. I have seen no designs ever that have NO negative consequences.

> > I'd like to see diagramming tools for data modeling to specify data in
its
> > more conceptually simple format of propositions as functions that need
not
> > be in 1NF, showing foreign key links and such -- a web of data.
>
> A web (whatever it means) is simpler? In any event, I've heard relational
> schemas described as webs, and as a complaint against it. Hmmm.

Yes, "web" "graph" "hierarchy" and "network" are all bad words to relational theorists. However, the "web" as in WWW is not just a fad, it seems as people have quite taken to that model of text and data. So, even if not based on "relational theory" it seems to resonate with the human mind.

> > I have an appreciation for the relational model set theory and I think
> 2nd,
> > 3rd, and 5th normal forms are useful (when not defined in terms of 1NF),
> but
> > I have very little use for the relational model outside of that
>
> This is very confusing, since above you claimed to want to not use
> relational notation at all. Relational notation is orthogonal to
> normalization.

Right -- I want to keep some of the normalization and ditch the relational notation altogether. I reserve the right to change my mind on that as I learn more, however.

> > -- certainly
> > not for any human to and from computer nor computer to computer
> > communication purposes. And my appreciation for SQL is almost
completely
> > related to the fact that it has given us some industry standards that
are
> > used extensively. It is a language begging for retirement.
>
> Agreed.
>
> > I'm not sure that there will always be a need for the "R" in O-R
mappings
> > and that would certainly save both human and computer processing cycles.
> > You can figure I'm just ignorant (if that helps you in some way), but I
> have
> > not yet SEEN enough of what the R gives the IT profession to justify it
> and
> > it sure has cost us a bunch.
>
> One could just as easily state that incomplete adherence to R has cost us
a
> bunch. It's difficult to analyze costs of a model when the limitations of
> implementations of that model (which SQL at one point claimed to be) have
> obvious and well-cited downsides.

Yes, I think that it is logical to take one or the other position -- either that we have been strapped in moving forward because of attempting R at all or because we didn't implement it perfectly. I'm angling on the former and you the latter, perhaps? In either case, we agree that our discipline is being unnecessarily held back by the focus on current SQL-DBMS implementations.

Cheers! --dawn Received on Sat May 08 2004 - 16:38:21 CEST

Original text of this message