Re: Question on Structuring Product Attributes

From: Derek Asirvadem <derek.asirvadem_at_gmail.com>
Date: Sun, 29 Sep 2013 20:01:46 -0700 (PDT)
Message-ID: <3a6e7659-c8f5-4dcb-a098-9fe4bba6eccf_at_googlegroups.com>


Rob

Thank you for an excellent post.

I would like to add a couple of points, if I may.

On Sunday, 17 February 2013 13:30:04 UTC+11, Rob wrote:
>
> Something that never seems to get addressed is the pre-history of DB technology before Codd, before SQL. Within that universe, IMS and IMS-like databases were used much as today to track the state of some model, but were queried by a mechanism called RPG (for Report Program Generator). The RPG spec could be given within COBOL or in a standalone interface, but the idea was to spit out retrieved data in a report formed as the interpretive software made a single pass over the linearized physical (after RDBMS, logical) order of the records, or segments as they were called. Any sorts or aggregates were computed after-the-fact from an intermediate containing the retrieved data in the logical/physical order in which it was accumulated.

Absolutely.

> So when the System R folks at IBM were defining Sequel (which became SQL), the objective was not logical inference, it was report-writing. After all, the whole point of Codd's thrust was to replace IMS (et. al.) with a system that was immune to changes in the physical storage structure of the data. But that didn't obviate the major purpose of such databases, report-writing If you examine the most primitive SQL form of the SELECT statement, it is a non-procedural specification of a traversal of the logical product (in the FROM clause) to retrieve the attributes (in the SELECT clause) restricted by the predicate of the WHERE clause). And again, sorted (as specified in the ORDER clause), formatted and aggregated (SELECT clause again).

Absolutely.

The RM is applicable, by engineers, to data. Period. It is not merely the data model, it is about modelling data. It is not about the data sublanguage, but about how data is perceived, understood, modelled. (I believe Jan Hidders stated the same thing, in different words, in another post.)

I have two commercial instances of the following. I had a fairly complex Relational database that does both OLTP and DSS from the single set of tables. Completely faithful to the RM (the real RM, not the nonsense written up in books by the insane). Completely Normalised (Codd's Normalisation, plus Normalisation as a principle, not the absurdities of abstract "thinkers" that produce unnormalised models). I pride myself on the fact that I can produce any report (planned and unplanned) with a single SELECT statement. That includes various types of Pivots, etc (without using the PIVOT function). Platform is Sybase ASE.

After that was well-established, I had the need to implement that db+app with no database footprint. I chose awk. The entire Sybase database, the rows (forget 'tuple', it is an implementation) from all reference or static tables, are implemented in a single file (segment), and the source data which is site-specific, in a separate single file. The reference tables are a fully integrated data dictionary with constraints, etc (allows customisation, and traps errors).

awk (just one example) is RPG, post 1970. I notice that programmers who have difficulty with awk simply do not understand RPG as a working principle, as a Method. I perform a single pass of the two datasets, and produce any of the same reports using the awk-equivalent of a single SELECT.

for ( row in awk-array ) {

    printf ...
    }

Relations are as simple as:

for ( row in parent_array ) {

    for ( parent_row in child-array ) {

        printf ...
        }

    }

Which means, ORDER BY is merely choosing a path through the Ordinals or some parent-array (table).

Which also means, I never perform table scans (which is the common method of handling awk arrays), I read only the rows that are relevant to the query. On an ancient Unix box, the entire process, regardless of report complexity, runs in sub-second speeds. 24 (hourly) or 48 (half-hourly) text reports such as this: http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/SEQUOIA_120806_sysmon_09

are converted into a "raw" grid such as this (ugly but readbale, compared to 24x40 pages of text): http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Capture.pdf

or something more logical such as this:
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Trend.pdf

and with a few more clicks and drags, using Excel or Numbers, into something like this: http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/New%20Storage%20DiskGroup.pdf

awk has a beautiful implementation of arrays, that easily provides for Relational tables. It also has a very nice method of handling the empty set (the Null Problem to the cultists) without any fanfare.

I use Codd's concept of Ordinals as well, but that is only for advanced implementers of the RM, those who understand surrogates without suffering from black-or-white thinking.

The point is, if one understands the RM, one can implement it in any language, any record filing platform. One does need to make the discernment between data and modelling the data, versus the data sublanguage for operating on the data.

(Darwen, who has no technical qualifications, and cannot program his way out of a cardboard box, but who has declared that he will invent a full "relational" language, has been unable to make that normal human discernment. Twenty years of dishonest bashing the RM, but nothing to replace it. The padded cell waits.)

The implementation is a commercial product, if anyone has questions about details of the implementation, please note, there are *some aspects* that I will not be able to answer here. I will however answer them if you send me a private email.

Cheers
Derek
derek.asirvadem_at_gmail.com Received on Mon Sep 30 2013 - 05:01:46 CEST

Original text of this message