Re: Data Display & Modeling

From: Eric Kaun <ekaun_at_yahoo.com>
Date: Wed, 12 May 2004 15:02:38 GMT
Message-ID: <i0roc.344$uL5.343_at_newssvr33.news.prodigy.com>


"Dawn M. Wolthuis" <dwolt_at_tincat-group.com> wrote in message news:c7ird9$nsf$1_at_news.netins.net...
> "Eric Kaun" <ekaun_at_yahoo.com> wrote in message
> news:Q4Smc.243$Nl2.8_at_newssvr16.news.prodigy.com...
> > "Dawn M. Wolthuis" <dwolt_at_tincat-group.com> wrote in message
> > news:c7e3hr$t0c$1_at_news.netins.net...
> > > Let's say that someone likes to model data as functions [e.g.
> > > PERSON(key)=tuple], in non-1NF and in di-graphs (so you can navigate
> your
> > > way through it), with a target implementation in XML or PICK.
> >
> > Uh oh... segueing into implementation already, after only 1 sentence?
:-)
>
> Unfortunately it is necessary BECAUSE if you are using an RDBMS, it would
be
> silly (in general) to consider ignoring 1NF, while with PRETTY MUCH ANY
> OTHER TARGET IMPLEMENTATION you don't need to restrict your data model in
> this way. Did that make sense?

Sure.

> And if you can base your approach on a narrowed down
> version of a relation, that simplifies things right? That's a reason why
> relational theory doesn't work with all collections, but narrows that down
> to just relations. I just narrow it down further - simplicity, right?

Yes, definitely where applicable. There is one slight rub - any given relation of order N can be decomposed into N-K relations, where K is the number of attributes in a given candidate key. You would also need integrity constraints, lest a conceptual null be introduced (e.g. we used to have attributes ID, NAME, and BIRTHDAY, but decomposed into {ID,NAME} and {ID, BIRTHDAY}, and them someone inserts a tuple into the first but not the second).

Is that simpler? The parts are, individually. The sum isn't.

Your functions have what domain and range? If the domain of your function is the type of the key, then what is the type of the range?

> > > Then, these two models are just two pictures of the same data and one
> > could
> > > pop back and forth between them.
> >
> > The pop can only be 2-way if you assume the functional stance above as
the
> > basis, which is of course more limited than the stance of relations...

>

> and relations more limited than sets, which is more limited than
> collections. Yes, that is the purpose of suggesting any sort of
information
> needed for one model be present in the other too.

I'm interested in where constraints fit into all this popping.

> > > Then the software/database developer could view the data model either
> way,
> > > and, therefore, never have to bother with relational notation at all.

So it's the notation that's the problem? I'm not sure what's being gained - individual preference?

> > > Therefore, I see no reason for that R part in any aspect of the
system,
> >
> > But you seem to want to deliberately exclude it, based on preference. If
> > you're hinting that relational isn't needed to store data, you're
right -
> > all that's needed is a physical medium.

>

> I realize it is not, in theory, needed for physical storage and I'm
> suggesting it is also not needed for viewing the data.

Agreed, although "needed" is strong - relations, especially with relation-valued attributes, can be very useful for reporting (a slight generalization of viewing).

> > a. A relational scheme in 5NF/6NF can be viewed in many ways. Each of
> those
> > views hides something, though the scalar values can be manipulated in
such
> a
> > way that they can be used to reconstitute the relations.

I just realized that this is a little tricky - you're assuming that those scalar values are of a type rich enough to reconstruct the base relations. Relation-valued attributes do that, because of the richness of relations, but for other types, you have to be very careful. If knowledge goes into accumulating data into a string, then that same knowledge is needed to go the other way, and those aren't necessarily lossless, unlike some of the other transformations you've discussed.

> So, if one were able to specify all information needed for the computer to
> generate a relational model within a particular hierarchical view of the
> data, then there would be no need to view the data as tables if you didn't
> want to, right?

True, but much hinges on what you mean by "view" and "you". Do you mean the developer's "view" of the data? Presumably they're using the "core" agreed-upon definition of the data, which I of course argue is best done relationally.

> And if the computer doesn't need that relational model for
> storage either, then there might never be a need to view the data as
tables.

True, I think. Relational fits in the middle, in the logical realm.

> Date says (p.62 Intro-DB) "relational systems require only that the
database
> be perceived by the user as tables". My point is that we don't need to
view
> the data as (1NF by the old def) tables to get the job done and done well.

How do you feel about RVAs? I don't recall you directly addressing them.

> I could be wrong, but from what I have seen, there are more than a few
> applications that use a SQL-RDBMS and don't specify parent-child
constraints
> so that data is lost going from non-1NF databases to 1NF structures. Is
> that just my uncommon experience or are there a lot of implementations in
> SQL-RDBMS's where the parent-child constraints are in the application only
> and not in the database?

There are many of them, and they're just plain wrong. It's not hard to set up those constraints, but the desire for speed at any cost, plus the feeble implementations, pushes customers of MySQL and its ilk to do the wrong thing (before they've even tried the right thing). At the very least, such DBMSs (if you can call them that) should at least offer the ability to define constraints, run your app logic, and after adequate testing you determine that a) your code works, and b) that you absolutely need more speed and can't get it any other way; THEN let you drop the constraints for a raw-speed dont-give-a-damn-about-your-production-data mechanism (like the default MySQL tables). Instead, people start with the wrong way and just stick to that path.

> > I would argue [again] that the attribute violating 1NF (e.g. the list
> > in a tuple) must be exceedingly simple (e.g. a list of textutal items
> which
> > no additional structure) for that "logical" 1NF expression not to have
> > negative consequences.

>

> There is a trade-off. I have seen no designs ever that have NO negative
> consequences.

OK.

> > A web (whatever it means) is simpler? In any event, I've heard
relational
> > schemas described as webs, and as a complaint against it. Hmmm.
>
> Yes, "web" "graph" "hierarchy" and "network" are all bad words to
relational
> theorists. However, the "web" as in WWW is not just a fad, it seems as
> people have quite taken to that model of text and data. So, even if not
> based on "relational theory" it seems to resonate with the human mind.

Keep in mind that popular opinion, while important for selling products, is a bad basis for technical decisions. It may be right, but so may a stopped clock. The above aren't bad words, nor are they useless in computing - they're fads in that they're frequently adopted with feeble and unfounded statements about the inapplicability of relational. If someone finds their car uncomfortable and sells it to buy a new one that looks pretty, without having adjusted the old car's seat for their physical dimensions, then while they didn't necessarily make the right choice, you could argue that their decision-making process was ad hoc and lacked sufficient information.

"Resonating with the human mind." Well, this is a somewhat motherhood-and-apple-pie argument, but not entirely false. However, is this resonance due to the usefulness of the Web, or its value as a data structure? I'd argue the former - the fact that it's a "web" is considerably less critical than that fact that people can find and post information easily, whereas before it was difficult. That doesn't mean it's optimal. It's simply that it's easy, and requires no categorization. Thus content-related queries across the web are very difficult, and require major advances (e.g. Google's massive computing engine) to do well. Furthermore, most business systems don't require the scope that the Web does.

> Right -- I want to keep some of the normalization and ditch the relational
> notation altogether. I reserve the right to change my mind on that as I
> learn more, however.

Which notation? Date's Tutorial D isn't the only one possible. It's an example.

> > One could just as easily state that incomplete adherence to R has cost
us
> a
> > bunch. It's difficult to analyze costs of a model when the limitations
of
> > implementations of that model (which SQL at one point claimed to be)
have
> > obvious and well-cited downsides.

>
> Yes, I think that it is logical to take one or the other position --
either
> that we have been strapped in moving forward because of attempting R at
all
> or because we didn't implement it perfectly. I'm angling on the former
> and you the latter, perhaps?

I'd say that's right, although I'd gladly ditch "perfectly" if I could even get "mostly".

> In either case, we agree that our discipline
> is being unnecessarily held back by the focus on current SQL-DBMS
> implementations.

Agreed completely!

  • erk
Received on Wed May 12 2004 - 17:02:38 CEST

Original text of this message