Re: First Impressions on Using Alphora's Dataphor

From: Nathan Allan <nathan_at_alphora.com>
Date: 30 Aug 2004 14:48:31 -0700
Message-ID: <fedf3d42.0408301348.18b931e5_at_posting.google.com>


Josh,

> I have recently downloaded Alphora's Dataphor product and begun
> experimenting with it.

First of all, let me thank you. Many people speculate about our product but few actually give it more than a superficial glance-over. It is refreshing for us to be able to discuss reasonable and even valid criticisms.

> [...] I did not use any
> external storage device (that is to say, a SQL database back-end). I
> only experimented with the 'in-process' server. (Although I plan to
> connect to an Oracle instance later to see how it performs.)

Please allow me to correct the terminology here. "In-process" refers to the ability to host the Dataphor server within the instance of the client or development environment. I believe what you meant was that you were using one of the built-in storage devices; either "Memory Device" or "Simple Device" for in-memory or simple file dump respectively. Whether or not the Dataphor Server is running "in-process", it can connect to any number of underlying data stores. No big deal... just wanted to clarify.

> ... - Since the DBMS now can be extended with user defined types (UDTs) along
> with their operators it follows that types and operators become
> database objects themselves. This means that they must be 'CREATE'-ed,
> later, possibly, 'ALTER'-ed, and eventually, 'DROP'-ed, just like
> tables (I use the Dataphor terminology here), views, and constraints.
> In addition, the strongly-typed nature of Dataphor means that each
> database object, in any reasonably large system, will have several
> dependencies and several dependents. Types can depend on other types
> and operators for their definition; operators depend on types and can
> depend on tables and views; tables and views depend on types and
> operators; and finally, constraints can depend on everything mentioned
> so far. The end result is a complex web of dependencies between
> database objects where you cannot simply re-define anything without
> re-defining its dependents at the same time. Simply put, the
> create/drop cycle became the most annoying part of the development.
> For example, when I realized that I have to change an operator I had to
> drop all the constraints and operators that used the one being changed.
> The situation was even uglier when I realized that I had to change one
> of the scalar types. In this case I practically had to drop every
> database object and then re-create them again.

Let me start by saying that this is indeed a valid observation; something that a Dataphor developer must deal with. I'll add, in fact, that it is relatively easy to get into a tangled web of dependencies with other types of database objects. For example, imagine attempting to drop a column in a table that is referenced by an operator that is used in a default of another column of another table. This really is an issue in all compiled systems, though it particularly manifests itself in Database systems because the unit of compilation is more granular than that of a traditional compiler.

Dataphor mostly leaves management of dependencies to the developer, but it does provide some services for this management:  -Dataphor has several operators for querying the dependency graph for any object.
 -Dataphor has operators for scripting drop and re-creation scripts for objects, libraries, or even the entire catalog, based on the dependency graph.
 -Dataphor exposes these various operators interactively in the development environment (Dataphoria)
 -Great effort went into the error reporting system in Dataphor. Dependency related errors identify exactly what dependencies are problematic.

Though dependency management in Dataphor is mostly a manual process at present, the problem is not inherently manual. We have ideas for managing and automating complexities surrounding not only dependency management, but other schema modification problems as well. Future versions of Dataphor will address these issues.

One more note regarding this issue, though it may seem difficult managing a complex network of dependencies, development in Dataphor is not generally done this way in practice (at least we do not do it this way) because of its "federated" architecture. Due to the decoupling of the Dataphor schema from the underlying data store, if more than a few changes need to be applied, the schema script is modified and the library re-registered. This way there is no schema alteration concern for changes that do not affect the schema of the underlying storage device. Changes that do affect the underlying storage system are managed through alterations using Dataphor's upgrade subsystem.

> ... - Although it is logical after giving it a little thought, it came as a
> nasty surprise that Dataphor demands that all scalar types that might
> appear in a context where duplicate elimination is required must have
> the less-than (Dataphor calls it 'iLess') operator defined on them.
> (Probably any of the operators that provide ordering would do: <, >,
> >=, or <=).

The operators iEqual and either iLess or iGreater must be provided unless iCompare is provided. iCompare returns 0, -1, or 1 based on the comparison of two operands. If iCompare is defined, then Dataphor will use it in place of the individual comparison operators if they are not defined. BTW there is also a symbol (?=) for the iCompare operator due to its usefulness.

> It was a nasty surprise because, at first, I could not
> figure out why Dataphor rejected an otherwise perfectly formulated
> query, protesting that 'operator iLess' was not defined for one of the
> scalar types.

This is admittedly a vague error. We should address this.

> ... I understand that without ordering defined on a
> scalar type it is very inefficient to remove duplicates but I did not
> feel like defining an arbitrary comparison operator just for this
> purpose.

As was mentioned in another post in this thread, there are mechanisms for duplicate elimination that are at least as fast as sorting (e.g. hashing) but Dataphor does not currently employ them. I would note that although Dataphor's compiler detected the possible need for this operator, the operator would probably not have actually been needed if the storage device were one of the more sophisticated ones. If, for example, the execution plan at the point of the query in question were bound to one of the "big three" DBMSs, that DBMS would decide how to eliminate those duplicates (Dataphor would ensure that the emitted SQL statement instructs that they be removed).

> I ended up 'projecting away' the troublesome attribute since
> I did not need it in the final result anyway. This solved my problem
> but it showed that physical implementation does leak through the
> logical model.

Guilty as charged. Dataphor should only require that equal be defined, as there are critical logical reasons for this requirement. I would note that there are, however, logical reasons for introducing the notion of "ordinal" types in a system; the problem is of course requiring them for operations that do not logically need them.

> Maybe Dataphor should provide some kind of internally
> implemented default ordering for the purposes of duplicate elimination.

We have considered this, but have kept it as a low priority because most data types can easily be construed to have some sort of ordering (though a bit artificial at times).

> ... - The help system is a little bit out of sync with what is implemented at
> points and I found the number of examples provided too few. This is
> especially true of the D4 Language Guide. The sample applications
> bundled with the evaluation version do not really show the capabilities
> of the system either. In my opinion, developers might benefit more
> from a "feature fest" type demo application.

Yes, this is all true. Fortunately this is only a weakness of the beta and will not be the case in the final version 2 release.

> How can one do software maintenance in a live Dataphor installation?

I alluded to some of the answers above and I will skip the details, but we have found from experience that it is not too difficult.

> Since scalar types form the building blocks of the database design,
> almost every database object will depend on them, either directly or
> indirectly. ... You cannot just drop the type
> since everything depends on it. Also, what are you going to do with the
> data in your relvars? You cannot just drop your production tables.

As was mentioned, there are ways of changing the schema (including completely dropping and re-creating it) without affecting the stored data.

> What are you going to do with older applications that depend on (the now
> obsolete) old types/operators/tables/views.

This can be accomplished reasonably using storage device reconciliation modes, and Dataphor's upgrade mechanism.

> Some suggested creating
> views for the old applications based on the new tables. I do not think
> that this approach is viable in the long run. It would mean duplicating
> the affected parts every time there is a bug fix. Also, the added
> complexity of maintaining 'parallel universes' for old and new programs
> is not something that one would want.

I agree... not a viable solution.

> ... What if major vendors 'do get it' but enterprise environments simply
> 'squeeze out' anything but atomic types (numbers, character strings,
> booleans, etc.) from the database?
>
> Quite often, in enterprise environments, development teams rarely have
> access and/or control over applications written by other teams. Given
> the tangled web of dependencies in a 'UDT-enabled' RDBMS, achieving
> success suddenly becomes a factor of how well-designed those UDTs are in
> the first place, and if they need modification, who has the right (and
> the resources) to perform those modifications. I think this is the make
> or break question of the real life success of truly relational databases
> right along with the questions of performance (which I do not want to
> address right now.)

There are many levels being addressed here so let me start by suggesting that your ideas seem to be presented from the perspective of the current state of the art, whereas a system like Dataphor may completely change the dynamics of the situation. For example, the reasoning that led to today's typical development approach includes the dangerously insidious notion that "data" in a database system is somehow different than "data" in a "programming environment".* The point is that the entire application development process revolves around current thinking regarding the "layers" of an application and their composition. Dataphor is (for better or worse) against the grain in this regard by its very nature--that of a "middle (application) tier" that behaves as a database system.

Setting this aside, let's imagine that Dataphor were treated as the "database tier" in today's paradigm, and that an additional "middle tier" were being built against it.** In this scenario, I see managing change of UDTs as not presenting much (if any) more challenge to the "application tier" than do table changes. Note that UDTs (as well as system provided types) are translated between the logical layer and the external one, so an external user is really just concerned with the values on that side of the translation. If the system allows for control of this translation within the logical realm (as Dataphor does), then it could even be argued that UDTs provide additional indirection to the external level.

*In fact, even the notion of a TRDBMS as being some type of utopian ideal fuels this thinking. Regardless of its "conformity", a DBMS can only enable declarative development (the *real* goal) if its services (such as logical data independence and relation data types and operators) are exposed and even brought into each of the application layers.

  • This scenario seems pretty backwards considering that Dataphor is designed to replace and eliminate the need for such an imperative middle tier, but this seems to be the perspective from which you pose the issue.

> Most development projects try to have control over
> as many factors of their success as possible. (That is why everybody is
> building several layers of abstractions in order to 'insulate' themselves
> from the ever-changing environment.)

May I point out that ironically this is largely due to the lack of visibility with regard to dependencies? ;-)

> Average workplaces have average
> developers on average. This means that it is rarely the case that
> developers can work on a database that is well-designed. Even in
> database design there are a lot of ways to screw up and I do not think
> that designing UDTs is easier business in any way.

A few points:
-UDTs certainly can make things easier. If the problem domain makes use of coordinates, intervals, varying representations/units of measure, or whatever, then having to deal with encoding within overly simplistic types makes things more difficult and complex. -Dataphor makes building trivial UDTs very easy (e.g. create type MyID like String).
-Dataphor also makes using UDTs extremely easy. There are no differences between user-defined types and system-provided types in Dataphor, the system-defined types are built in the same way that UDTs are, so using values of UDTs is exactly the same as using values of the system-provided types.
-Because UDTs have the same characteristics as system-provided types, (indeed, we hardly ever refer to them as UDTs, they are simply scalar types), whether or not a given development effort takes advantage of them is largely an issue of comfort level. -UDTs do not have to be used.

> So in a UDT-rich
> world developers would be forced to work with someone else's not-so-well
> designed UDTs most of the time. Not to mention that fixing a badly
> designed mess can be very time-consuming, especially if it is not the job
> one were hired to do in the first place.

Again this is assuming the too common "DBAs with their sacred tome" approach to business. As Leandro stated, UDTs are hardly the problem in such an environment.

> Even fixing serious database
> design flows in today's UDT-free databases often meets with severe
> resistance because of the (not completely unsubstantiated) fear of
> breaking other applications. Is it not better then to forget about UDTs
> in the database and only allow them in programming languages where
> developers have complete control over them?

Again, that is assuming that the "developers" and the "database architects" are different people. I don't think this should be the case, especially if we aim for a more declarative approach to development

Begin flaming here ;-)

> Is it not the case that
> the lack of UDTs in today's databases is more of a blessing then a curse?

A strong but subjective No! As an application developer myself that has had the pleasure of using them for a couple of years, you would have to pry me from them. ;-)

> Sorry for the long post, but I felt that this might be an interesting
> topic to discuss in this news group.

Thanks again.

Regards,

--
Nathan Allan
Alphora
Received on Mon Aug 30 2004 - 23:48:31 CEST

Original text of this message