Re: Units of Measurement in the Database Model
Date: 12 Apr 2002 10:28:16 -0700
Message-ID: <57da7b56.0204120928.5ac72603_at_posting.google.com>
johnrlewis_at_hotmail.com (John R. Lewis) wrote in message news:<4323c1b8.0204111001.4cd074e2_at_posting.google.com>...
> How should the database be organized in regards to measurements
> recorded, and the units of measurement (UOM) used to record them?
Other folk are advocating solution # 4. But IMHO it has a couple of significant failings that have not been considered adequately.
- Indexing. With the value in one place and the unit in another along with, I assume, the appropriate ratios for unit conversion, I don't see how it is going to be possible to build an index on the measurements. Any query seeking a range of measurement values is going to be obliged to perform the join and scan the data, performing conversions one tuple at a time.
- Many modern DBMS engines employ techniques like merge-joins and hash-joins to improve the performance of certain kinds of query plans. But with this seperation of the semantics of the measurements I am having a hard time seeing how you would do an 'ORDER BY Measurement' kind of operation. The only way to achieve this would be to convert all of the measurements into a canonical form at run-time and then sort on the form.
- A similar problem will be encountered whenever you try to employ the aggregate functionality: SUM(), AVG(), VARIANCE() etc.
There is an alternative approach to this problem that is not listed among the proposed solutions. Many of today's DBMS products provide a user-defined type facility. This permits the data modeller to add new 'object classes' into the DBMS's relational framework as 'domains'. In a well-engineered DBMS, all of the operations described above -- indexing, sorting, joining, and aggregation -- can be provided for the new type. These new user-defined types may be implemented (depending on your DBMS) using Java, C#, C or the DBMS's own stored procedure language.
To give you some idea of what this looks like (and your mileage will vary) I'll give a quick semi-pseudo code implementation below.
i. The new type. SQL-99 defines a standard set of mechanisms for creating and naming a new type in SQL that is distinct from the built-in types (or any other types you want to use). In addition, the SQL-J Part 3 standard defines how a Java Class can be used to implement the new type. In the example that follows I've just used the bog-standard 'structured type' mechanism.
CREATE TYPE Measurement ( Quantity DOUBLE PRECISION NOT NULL, Unit CHAR[1] NOT NULL, Error DOUBLE PRECISION NOT NULL ); Several alternative approaches exist. You might instead use a DISTINCT TYPEof VARCHAR, and then encode the Measurement into the string using XML.
CREATE DISTINCT TYPE Measurement AS VARCHAR(64);
An instance of one of these strings might look like this:
"<measurement><quantity>12.5</><unit>FEET</><error>0.05</></>"
ii. The Type's Behavior
Operations over the new type/domain/object class would include things like 'convert from string', 'compare with another', 'hash', and so on. For example, the following code would be used to convert a string of the form "double string double" (for example "12.5 FEET 0.05") into an instance of the new type. Note that I'm using a kind of 'idealized' stored procedure language here. The syntax will vary from product to product.
CREATE FUNCTION String2Measurement ( Arg1 String ) RETURNS Measurement Qty DOUBLE PRECISION; Unit String; RUnit CHAR(1); Error DOUBLE PRECISION; SSCANF ( Arg1, "%l %s %l", Qty, Unit, Error); IF ( Unit LIKE 'F%' or Unit LIKE 'f% ) THEN RUnit = 'F'; ELSE IF ( Unit LIKE 'P%' or Unit LIKE 'p%') THEN RUnit = 'P'; .. etce ELSE ERROR "String2Measurement: Invalid or Unknown Unit"; END IF; RETURN Measurement ( Qty, RUnit, Error); END FUNCTION; and on the other side; Now, a DBMS is really just a big algorithm library, where the algorithms are things like 'sort', 'index', and various flavors of 'join'. To use these algorithms over your new type you need to implement the kinds of operations these algorithms expect, such as 'compare'. Each compare function takes two instances of your user-defined type (let's call them 'A and 'B) and returns -1 iff. 'A < 'B, 1 iff 'A > 'B, and 0 otherwise. By encapsulating the convert/compare mathematics within the Compare() function you ought to be able to exploit all of the DBMS's power to store/sort/join based on these new types. [NOTE: On Errors - this gets a bit tricky, as values with errors do not necessarily follow the triangle inequality: if 'A > 'B, then 'A + 'x > 'B + 'x. You need to be a bit careful with this. ] To take advantage of the aggregates you might need to implement basic operations like 'Plus', 'Times ( Type, INTEGER )' and so on. In some DBMS products you can even overload the aggregates with your own; Geometric Mean, as well as more advanced statistical functions. iii. Schema. Having defined the new type, you then just use it in the schema just as you would use any other type. CREATE TABLE Stuff_Being_Measured ( Id SYSTEM_GENERATED_KEY PRIMARY KEY, What Description NOT NULL, When Time NOT NULL, How_Much Measurement NOT NULL );
My Advice:
- Get PostgreSQL. It's free, and it provides all of the functionality described above. If you want to go with another engine then I would recommend looking at { DB2, INFORMIX, Oracle } (alphabetical order) and then SQL Server (which is apparently a bit short of grunt in this department).
- Do the new types as OPAQUE TYPEs in 'C' (for speed). If you are using Java classed on the client/middleware to encapsulate the measurement behavior then you might (depending on the product) be able to drop them straight into the DBMS.
- Front end programming in PHP/Perl and CGI.
- Linux.
Hope this helps.
KR
PbReceived on Fri Apr 12 2002 - 19:28:16 CEST