Re: Navigation question

From: Tony D <tonyisyourpal_at_netscape.net>
Date: 5 Mar 2007 15:36:17 -0800
Message-ID: <1173137777.820852.283890_at_30g2000cwc.googlegroups.com>


On Mar 5, 5:25 am, "dawn" <dawnwolth..._at_gmail.com> wrote:
>
> Hi Tony -- I'm hoping you are still in a good mood.

Well I was until Google dumped the first attempt at a response to this down the memory hole. Ho hum.

[ snips-a-daisy ]

> This MUMPS-like example is slightly different from the more PICK-like
> example I gave earlier. In spite of the data being model as both
> objects (with the classes) and relations (with the SQL-ish query), the
> logical data model is specified without methods, so no one needs to
> launch into OO vs Relations on this one, I hope. This data model
> (MUMPS) pre-dates relational and OO.
>

Settle in, we could be here a while.

Well, I took it upon myself to look MUMPS up on Wikipedia, which led me to this lovely document here :

Globals - A Primer for Relational Programmers http://www.mgateway.com/extreme1.doc

This is a document to demonstrate to mere relational programmers the joys of working with MUMPS, in all its navigational, hierarchical and yes, iterative glory.

I won't bore you with all the details, but I will share a couple of fantastic quotes and one example with you.

First, from the introduction :

"The really important heart of MUMPS is its data storage mechanism. This is based on what are known as Global Variables, more commonly known simply as Globals. Globals are an incredibly simple concept, and yet incredibly powerful.

Many people get turned off by Globals. They are a primitive structure. There are none of the controls, safety nets or value-added mechanisms that "proper" databases provide. As a result, MUMPS is often dismissed as irrelevant, insufficient or somehow, just plain wrong. To do so is to ignore a data storage mechanism that is lean, mean and totally malleable. In the right hands, that lack of baggage and overhead can be an incredibly liberating experience. In the wrong hands it can be a recipe for disaster. It's a bit like a dangerous, extreme sport such as free-form mountain climbing. Few "proper" sportspeople will recommend it, but it's the fastest, most efficient and most exhilarating way up that mountain if you can master the skills to climb without the safety-related paraphernalia. The climbing equipment companies won't recommend it either, because it questions the need for their products!

So if you think you're up to the challenge, and ready to accept that there may be another way to consider data storage than the entrenched relational and SQL view of the world, let's dig deeper."

Bronco-busting programmers only here, please. I'm amazed, in this day and age, that we're still encouraging programmers to play chicken like this. Formal methods ? Proofs ? Hell, declaring variables before you use them ? Pah, for wimps ! Note that air of slightly smug elitism in that third paragraph - it crops up more than a few times ...

Before we go much further, we best find out what we're dealing with here. MUMPS uses a thing called globals as a very rough analogue to tables. Quote ...

"So what are Globals?

Put simply, a Global is a persistent, sparse, dynamic, multidimensional  array, containing a text value. Actually MUMPS allows the use of both persistent and non-persistent multi-dimensional arrays, the latter known as "local arrays"."

The document goes on to describe globals as a "primitive" data structure, and say that the rest of the document is about how make this primitive structure work for you.

What kind of things might we want to do with globals ? Well, we could "traverse" them ...

"One of the most frequent things you need to do is traverse some or all of a global. For example, let's say you want to manipulate all the employee records, perhaps to display a list so that the user can select one of them, or to count them. To do this, you use the $order MUMPS function. The $order function is one of the "crown jewels" of MUMPS, allowing, with incredible simplicity, access to any of the data you store in your globals. It's not functionality that will be intuitive to a "traditional" database programmer, but it's worth understanding because it is so powerful and yet so simple.

The $order function operates on a single subscript level within a global, and returns the next subscript value in collating sequence that exists at that level in the global. You can specify some starting value, and the $order function will find the next value that exists in collating sequence. To find the first subscript at the specified level, use a starting value of null (""). To find the first subscript at the specified level that starts with "C", use a starting value that collates just before "C", for example "B~""

Well, I'm glad you cleared that up for me. Sounds terribly navigational, what with all this subscripting, and collating sequences. I'm waiting for the juice though, the thing that makes me say "ooh, must ditch that awful SQL for the new nirvana !" Reading on, we get to the conclusions of Chapter 1, which includes the following two paragraphs :

"Once you try using MUMPS globals persistent storage, you'll probably begin to wonder why all databases couldn't work this way! Its simple, intuitive, flexible and the performance outstrips any relational database. Its also available for pretty much every platform, and will scale to enormous systems - some of the biggest interactive systems in the world run in MUMPS, some with tens of thousands of concurrent users.

However, if you think you need those controls and safety nets and the features that the relational world believe are essential "must-have" features, then MUMPS is definitely not for you. If you're still determined to go free-form mountain climbing, move on to the next chapter !"

Some bold claims, followed up by more of that smug elitism again - kind of a "if you can't hang with the *real* programmers, go back to your toys kiddies" vibe going on here. The true irony of the first paragraph will become apparent as we go on

Chapter 2 is title "RDBMS/SQL vs MUMPS". It includes a running example using a three table schema - customers, who can have one or more orders, which can consist of one or more items. The customers table has a derived column with the total value of all orders placed by the customer, and the orders table has a derived column with the total value of all items in an order, but we'll let that slide for now (although they will prove Pascal's points about paying somewhere for denormalisation in buckets later).

Before we get to the example, a quote from Chapter 2 :

"With $order and globals, you have total access to step through any of the keys in a table, starting and finishing on any value we wish. The important thing is to understand that Globals represent a hierarchical data storage structure. The key fields in the tables we're emulating are represented as the subscripts in a global, so we no longer have to access rows from a table in strict sequence: the $order function can be applied to each subscript (key) independently."

So, we have a hierarchical structure; that necessarily means we have a concept of "here"; we also need to know where "there" is, and how to "get" "there" from "here". Note that "step through"; that means that navigation's pal, iteration, will be along for the party.

The example query I'm going to list here, stated by the authors in SQL, is embarrassingly trivial. It changes the customer number in the customers table. There are referential integrities declared so that this change will ripple down to the orders table. The query, as given by the authors of the paper, is

UPDATE CUSTOMER A
SET custNo = :newCustNo
WHERE A.custNo = :oldCustNo

Like I say, nothing fancy. Let's look at the MUMPS code.

As Paul Daniels would say, "You're going to like this. Not a lot, but you'll like it."

updateCustomer(oldCustNo,newCustNo,newData) ;  new result,orderData,orderNo
 if (oldCustNo="")!(newCustNo="") Quit 0  set orderNo=""
 for set orderNo=$order(^ORDERX1(oldCustNo,orderNo)) Quit:orderNo="" do

  . set result=$$getOrder(orderNo,.orderData)
  . set orderData("custNo")=newCustNo
  . set ok=$$setOrder(orderNo,.orderData)
 set ok=$$setCustomer(newCustNo,.newData)  if newCustNo'=oldCustNo set ok=$$deleteCustomer(oldCustNo)  Quit 1

Oooh, look at that lovely for-loop iterating over the orders ! But wait a minute, what's that ORDERX1 thing ? Well, it turns out that that is another global that the MUMPS programmer has set up in this case to provide pointers (natch !) from customers to orders. So, we've started with a customer number, iterated over a bunch of pointers to some other data and done work with that data. Sound navigational yet ?

Sorry, did you think you were finished ? Nahhhh ! This is MUMPS, where the Real Programmers hang out. Which is a good thing because there's loads of code to write. See all those procedure names that start with "$$" ? You get to write those ! Think I'm kidding ? Here's the code from the paper ...

Here's getOrder ...

getOrder(orderNo,orderData) ;
 new record
 if (orderNo="") Quit 0

 set record=$g(^ORDER(orderNo))
 set orderData("custNo")=$piece(record,"~",1)
 set orderData("orderDate")=$piece(record,"~",2)
 set orderData("invoiceDate")=$piece(record,"~",3)
 set orderData("totalValue")=$piece(record,"~",4)
 Quit 1

Here's setOrder ...

setOrder(orderNo,data) ;
 new rec,itemNo,ok
 if orderNo="" Quit 0
 ; Calculate the value of the order
 set data("totalValue")=0
 set itemNo=""
 for set itemNo=$Order(^ITEM(orderNo,itemNo)) Quit:itemNo="" do  . set ok=$$getItem(orderNo,itemNo,.itemData)  . Set data("totalValue")=data("totalValue")+itemData("price")  set rec=data("custNo")_"~"_data("orderDate")_"~"_data("invoiceDate")  set rec=rec_"~"_data("totalValue")
 set ^ORDER(orderNo)=rec
 if data("custNo")'="" set ^ORDERX1(data("custNo"),orderNo)=""  Quit 1

Oooh look, another for-loop iterating over some more data ! Note the maintenance of the ORDERX1 pointers at the end, and the amount of work done to maintain that derived column.

Here's setCustomer ...

getCustomer(custNo,data) ;

	new record
	kill data ; clear down data array
	if custNo="" Quit 0
	set record=$get(^CUSTOMER(custNo))
	set data("name")=$piece(record,"~",1)
	set data("address")=$piece(record,"~",2)
	set data("totalOrders")=$piece(record,"~",3)
	Quit 1

There's a deleteCustomer too, but the earlier trivial example has to be extended for our referential integrity example and the Real Programmer must have run out of puff by this stage, as the extension is left as an exercise for the reader.

Well, really. I cannot honestly believe that in this day and age someone would willingly inflict this sort of nonsense upon themselves. And, I have stuck to exactly what is in the text, and these people are *advocates* of this stuff.

"But Tony," you might say, "what has this got to do with navigation ?"

"Well," I would say, "observe the pointer maintenance, the traversal of globals, the mentions in the paper of the concept of 'nodes', and I think that adds up to a pretty grim picture of a navigational, iterative, roll your own database set up."

"But," you may retort, "surely you could write this kind of bumph in SQL ? What about selecting data from one table into a variable, then using that variable to qualify queries on other tables ?"

"Well," I would answer, "that might *look* like this kind of thing. The tactic is the same. But there are (barring 'connect by' et al) no pointers, and no traversal going on. The queries are logically separate. They are written by programmers who are at best naive, and at worst incompetent, and there is almost always a better way. In a system like MUMPS, you have no option."

"But surely Caché allows SQL ?" you may be moved to say.

"Indeed," I would answer, "but only as some window dressing to this bumph. This stuff is still creeping out from under the covers, waiting to bite you. And would you care to consider, what would happen if another programmer were to access your globals directly, without going through your lovely procedures ?"

> Is there something problematic or bad about this form of navigation.

Well, just look at it. The really try telling me there's nothing wrong or bad about this. Go on, have a shot at convincing me. Go for it, I'd love to see someone try to stick up for this.

> If not, then I might finally have my answer.

> If so, is it bad in the same way that it is bad to specify a JOIN in
> an SQL statement

Well, we're nearly 200 posts into this thread, and hopefully there's a little inkling that although this might look like a join, it don't walk like a join or quack like a join, so it ain't a join.

> in a
> problem because that means the developer has to know where the data
> are located, or is there something additonally problematic with
> navigating like this? Thanks. --dawn

Consider the number of concepts, ideas, mechanisms and the amount of code that has been written to do the job of three lines of SQL and a couple of declarative constraints. We've got nodes, pointers, subscripts, subscript levels, collating sequences, globals, navigation, iteration, procedures and a bag full of user written code. And what has it got us ?

*whoosh* there goes an ancient English razor (it does fit the tune of "High Hopes", just)

  • Tony
Received on Tue Mar 06 2007 - 00:36:17 CET

Original text of this message