First Impressions on Using Alphora's Dataphor
Date: 26 Aug 2004 10:10:56 -0700
Message-ID: <1c92edeb.0408260910.60242a59_at_posting.google.com>
I have recently downloaded Alphora's Dataphor product and begun experimenting with it. I tried to write a simple (non-GUI), toy application based on the principles put forward in the The Third Manifesto (TTM) and also in the articles published on the Database Debunkings [http://www.dbdebunk.com] site. I defined scalar types and scalar operators and I also spent some time designing a normalized database schema along with integrity constraints. I did not use any external storage device (that is to say, a SQL database back-end). I only experimented with the 'in-process' server. (Although I plan to connect to an Oracle instance later to see how it performs.)
Good things
- Expressing relational algebraic (relalg) queries in D4 was much easier than it used to be in SQL. The resulting queries were usually clearer, significantly more compact, and, of course, were seamlessly integrated with the rest of the language. It is important to mention that several queries that would have been too cumbersome to bother with in SQL became expressible in D4, quite often easily expressible. I think this is mainly due to the orthogonal design of the language which allows arbitrary nesting of relalg expressions. Needless to say that with D4 being truly relational, or, in other words, with D4 removing duplicates at each point of the evaluation, one can rest assured that the results will be relations in the mathematical sense.
- The ability to declaratively define arbitrary constraints in Dataphor's D4 language was truly impressive. Knowing that the DBMS would enforce the constraints at all times gave great confidence that the database could not be modified in an inconsistent way. As it turned out later, this helped in finding subtle logic errors in the algorithms that tried to update the database.
Nuisances
- Since the DBMS now can be extended with user defined types (UDTs) along with their operators it follows that types and operators become database objects themselves. This means that they must be 'CREATE'-ed, later, possibly, 'ALTER'-ed, and eventually, 'DROP'-ed, just like tables (I use the Dataphor terminology here), views, and constraints. In addition, the strongly-typed nature of Dataphor means that each database object, in any reasonably large system, will have several dependencies and several dependents. Types can depend on other types and operators for their definition; operators depend on types and can depend on tables and views; tables and views depend on types and operators; and finally, constraints can depend on everything mentioned so far. The end result is a complex web of dependencies between database objects where you cannot simply re-define anything without re-defining its dependents at the same time. Simply put, the create/drop cycle became the most annoying part of the development. For example, when I realized that I have to change an operator I had to drop all the constraints and operators that used the one being changed. The situation was even uglier when I realized that I had to change one of the scalar types. In this case I practically had to drop every database object and then re-create them again.
(I will further elaborate on this point and its implications a bit more at the end of the post.)
- Although it is logical after giving it a little thought, it came as a nasty surprise that Dataphor demands that all scalar types that might appear in a context where duplicate elimination is required must have the less-than (Dataphor calls it 'iLess') operator defined on them. (Probably any of the operators that provide ordering would do: <, >, >=, or <=). It was a nasty surprise because, at first, I could not figure out why Dataphor rejected an otherwise perfectly formulated query, protesting that 'operator iLess' was not defined for one of the scalar types. The query did not use the 'ORDER BY' clause and did not even refer to any attributes of the scalar type mentioned in the error message. As it turned out, some relalg subexpression of the original query contained a projection that required duplicate elimination during evaluation and an attribute of the mentioned scalar type was part of this relation. (The scalar type was describing the operating system of a node in a network.) I understand that without ordering defined on a scalar type it is very inefficient to remove duplicates but I did not feel like defining an arbitrary comparison operator just for this purpose. I ended up 'projecting away' the troublesome attribute since I did not need it in the final result anyway. This solved my problem but it showed that physical implementation does leak through the logical model. Maybe Dataphor should provide some kind of internally implemented default ordering for the purposes of duplicate elimination. Also, I must add that if the resulting relation contains a key attribute Dataphor is clever enough to recognize this in order to avoid duplicate elimination so the problem described here is not a really serious one.
- The help system is a little bit out of sync with what is implemented at points and I found the number of examples provided too few. This is especially true of the D4 Language Guide. The sample applications bundled with the evaluation version do not really show the capabilities of the system either. In my opinion, developers might benefit more from a "feature fest" type demo application.
Musings
How can one do software maintenance in a live Dataphor installation?
Since scalar types form the building blocks of the database design,
almost every database object will depend on them, either directly or
indirectly. Relvars and operators are all defined using types while
constraints are defined in terms of expressions involving relvars and
operators. The consequence is that every scalar type will have several
Now I digress a little bit so the following musings are about UDTs in RDBMSs in general. Please do understand that I am in full support for truly relational systems and I pose the following question in order to provoke a healthy debate that might focus attention to areas neglected by relational theory but which are still important in practice:
What if major vendors 'do get it' but enterprise environments simply 'squeeze out' anything but atomic types (numbers, character strings, booleans, etc.) from the database?
Quite often, in enterprise environments, development teams rarely have access and/or control over applications written by other teams. Given the tangled web of dependencies in a 'UDT-enabled' RDBMS, achieving success suddenly becomes a factor of how well-designed those UDTs are in the first place, and if they need modification, who has the right (and the resources) to perform those modifications. I think this is the make or break question of the real life success of truly relational databases right along with the questions of performance (which I do not want to address right now.) Most development projects try to have control over as many factors of their success as possible. (That is why everybody is building several layers of abstractions in order to 'insulate' themselves from the ever-changing environment.) Average workplaces have average developers on average. This means that it is rarely the case that developers can work on a database that is well-designed. Even in database design there are a lot of ways to screw up and I do not think that designing UDTs is easier business in any way. So in a UDT-rich world developers would be forced to work with someone else's not-so-well designed UDTs most of the time. Not to mention that fixing a badly designed mess can be very time-consuming, especially if it is not the job one were hired to do in the first place. Even fixing serious database design flows in today's UDT-free databases often meets with severe resistance because of the (not completely unsubstantiated) fear of breaking other applications. Is it not better then to forget about UDTs in the database and only allow them in programming languages where developers have complete control over them? Is it not the case that the lack of UDTs in today's databases is more of a blessing then a curse?
Sorry for the long post, but I felt that this might be an interesting topic to discuss in this news group.
Josh Hewitt
Ph.D. Student
Department of Computer Sciences
Florida Institute of Technology
Received on Thu Aug 26 2004 - 19:10:56 CEST