Re: Units of Measurement in the Database Model

From: Paul G. Brown <paul_geoffrey_brown_at_yahoo.com>
Date: 12 Apr 2002 10:28:16 -0700
Message-ID: <57da7b56.0204120928.5ac72603_at_posting.google.com>


johnrlewis_at_hotmail.com (John R. Lewis) wrote in message news:<4323c1b8.0204111001.4cd074e2_at_posting.google.com>...  

> How should the database be organized in regards to measurements
> recorded, and the units of measurement (UOM) used to record them?

  Other folk are advocating solution # 4. But IMHO it has a couple of   significant failings that have not been considered adequately.

  1. Indexing. With the value in one place and the unit in another along with, I assume, the appropriate ratios for unit conversion, I don't see how it is going to be possible to build an index on the measurements. Any query seeking a range of measurement values is going to be obliged to perform the join and scan the data, performing conversions one tuple at a time.
  2. Many modern DBMS engines employ techniques like merge-joins and hash-joins to improve the performance of certain kinds of query plans. But with this seperation of the semantics of the measurements I am having a hard time seeing how you would do an 'ORDER BY Measurement' kind of operation. The only way to achieve this would be to convert all of the measurements into a canonical form at run-time and then sort on the form.
  3. A similar problem will be encountered whenever you try to employ the aggregate functionality: SUM(), AVG(), VARIANCE() etc.

    There is an alternative approach to this problem that is not listed among    the proposed solutions. Many of today's DBMS products provide a user-defined    type facility. This permits the data modeller to add new 'object classes'    into the DBMS's relational framework as 'domains'. In a well-engineered    DBMS, all of the operations described above -- indexing, sorting, joining,    and aggregation -- can be provided for the new type. These new user-defined    types may be implemented (depending on your DBMS) using Java, C#, C or the    DBMS's own stored procedure language.

    To give you some idea of what this looks like (and your mileage will vary)    I'll give a quick semi-pseudo code implementation below.

     i. The new type. SQL-99 defines a standard set of mechanisms for creating    and naming a new type in SQL that is distinct from the built-in types (or any    other types you want to use). In addition, the SQL-J Part 3 standard    defines how a Java Class can be used to implement the new type. In the example    that follows I've just used the bog-standard 'structured type' mechanism.

     CREATE TYPE Measurement ( 
             Quantity            DOUBLE PRECISION    NOT NULL,
             Unit                CHAR[1]             NOT NULL,
             Error               DOUBLE PRECISION    NOT NULL
     );

     Several alternative approaches exist. You might instead use a DISTINCT TYPE
   of VARCHAR, and then encode the Measurement into the string using XML.

      CREATE DISTINCT TYPE Measurement AS VARCHAR(64);

     An instance of one of these strings might look like this:

     "<measurement><quantity>12.5</><unit>FEET</><error>0.05</></>"

    ii. The Type's Behavior

     Operations over the new type/domain/object class would include things like     'convert from string', 'compare with another', 'hash', and so on. For     example, the following code would be used to convert a string of the     form "double string double" (for example "12.5 FEET 0.05") into an instance     of the new type. Note that I'm using a kind of 'idealized' stored     procedure language here. The syntax will vary from product to product.

     CREATE FUNCTION String2Measurement ( Arg1 String )
     RETURNS Measurement
           Qty   DOUBLE PRECISION;
           Unit  String;
           RUnit  CHAR(1);
           Error DOUBLE PRECISION;

           SSCANF ( Arg1, "%l %s %l", Qty, Unit, Error);
           
           IF ( Unit LIKE 'F%' or Unit LIKE 'f% ) THEN
              RUnit = 'F';
           ELSE IF ( Unit LIKE 'P%' or Unit LIKE 'p%') THEN
               RUnit = 'P';
           .. etce
           ELSE
               ERROR "String2Measurement: Invalid or Unknown Unit";
           END IF;

           RETURN Measurement ( Qty, RUnit, Error);
       END FUNCTION;

     and on the other side;

      Now, a DBMS is really just a big algorithm library, where the algorithms 
     are things like 'sort', 'index', and various flavors of 'join'. To 
     use these algorithms over your new type you need to implement the kinds 
     of operations these algorithms expect, such as 'compare'. Each compare
     function takes two instances of your user-defined type (let's call them 
     'A and 'B) and returns -1 iff. 'A < 'B, 1 iff 'A > 'B, and 0 otherwise. 
     By encapsulating the convert/compare mathematics within the Compare()
     function you ought to be able to exploit all of the DBMS's power to 
     store/sort/join based on these new types.

     [NOTE: On Errors - this gets a bit tricky, as values with errors do not
            necessarily follow the triangle inequality: if 'A > 'B, then 
            'A + 'x > 'B + 'x. You need to be a bit careful with this. ]

      To take advantage of the aggregates you might need to implement basic
     operations like 'Plus', 'Times ( Type, INTEGER )' and so on. In some 
     DBMS products you can even overload the aggregates with your own;
     Geometric Mean, as well as more advanced statistical functions.

     iii. Schema.

       Having defined the new type, you then just use it in the schema just 
      as you would use any other type.

      CREATE TABLE Stuff_Being_Measured (
            Id            SYSTEM_GENERATED_KEY    PRIMARY KEY,
            What          Description             NOT NULL,
            When          Time                    NOT NULL,
            How_Much       Measurement            NOT NULL
       );

     

   My Advice:

  1. Get PostgreSQL. It's free, and it provides all of the functionality described above. If you want to go with another engine then I would recommend looking at { DB2, INFORMIX, Oracle } (alphabetical order) and then SQL Server (which is apparently a bit short of grunt in this department).
  2. Do the new types as OPAQUE TYPEs in 'C' (for speed). If you are using Java classed on the client/middleware to encapsulate the measurement behavior then you might (depending on the product) be able to drop them straight into the DBMS.
  3. Front end programming in PHP/Perl and CGI.
  4. Linux.

     Hope this helps.

               KR

                           Pb
Received on Fri Apr 12 2002 - 19:28:16 CEST

Original text of this message