Re: Relational query with path expressions

From: Brian Selzer <brian_at_selzer-software.com>
Date: Mon, 6 Apr 2009 08:31:38 -0400
Message-ID: <MCmCl.11549$jZ1.10622_at_flpi144.ffdc.sbc.com>


"David BL" <davidbl_at_iinet.net.au> wrote in message news:edd73d6f-678e-4a1c-bd1f-ac59e06f9bb1_at_s12g2000prc.googlegroups.com...
>I get the impression that a lot of the time, a query (or a large
> portion of a query) can be thought of as a navigation from attribute
> to attribute using a sequence of joins. This made me think there may
> be some merit in thinking of attribute names as global in nature and
> the user only has to specify the start and end attributes and the DBMS
> is able to deduce the joins that are required. The potential upshot
> would be that queries are simplified and the user is freed from the
> burden of understanding the way the data has been decomposed into
> separate relations. Just recently I read about the so called
> Universal Relation which takes this approach. However there are a
> number of criticisms - such as those described in a paper by Kent.
>
> I think the biggest problem is that quite often there are multiple
> useful paths.

I disagree with your line of reasoning because relation names have significance. For example, suppose that amongst the myriad relations in the manufacturing system for a synthetic rubber plant, two stand out: one that specifies the recipies that /can be used/ to produce batches of rubber, and another with exactly the same attributes that specifies the recipies that /are being used/ to produce batches of rubber. Here it is not just the juxtaposition of attributes that conveys information but also the names of the relations.

> For example, consider Chris Date's supplier example
> from his book Introduction to Database Systems. The relations are
>
> S(S#, SNAME, STATUS, CITY)
> Supplier number S# has name SNAME, status STATUS
> and is located in CITY
>
> P(P#, PNAME, COLOR, WEIGHT, CITY)
> Part number P# has name PNAME, colour COLOR,
> weight WEIGHT in pounds and is located in CITY
>
> SP(S#, P#, QTY)
> S# shipped QTY of P#
>
> There are two obvious ways to navigate from S# to P#. One can either
> go via the shipments relation SP (where S# and P# appear in the same
> relation), or one can join relations S and P on S.CITY = P.CITY.
>
> Presumably to define the Universal Relation one would be forced to
> pick one of these paths as the default, and rename attributes
> accordingly. E.g. we could arbitrarily assume the default
> relationship between suppliers and parts is through the shipments, and
> rename CITY as SCITY in S, and CITY as PCITY in P, so they have
> distinct roles in the Universal Relation.
>
> It seems that explicit joins are a better way to give the user precise
> control over the access paths. However I find it interesting that a 3-
> way natural join on S,P and SP imposes both of the above navigation
> paths at the same time, and that's rarely what's required.
>
> After thinking about that problem for a while, it occurred to me that
> access paths should emphasise navigation from relation to relation,
> rather than from attribute to attribute (as suggested by the hyper-
> graphs in the literature on the Universal Relation).
>
> The following describes some ideas for a query language on a
> relational database based on path expressions that allow relation
> names to be specified in the path. It is more closely related to the
> relational algebra than the calculus. It's assumed there are no NULLs
> and queries work on sets not bags.
>
> Relvars
> -------
>
> The name of a base relvar evaluates to its current relation value.
> For example, the path expression
>
> P
>
> evaluates to the current value of the relvar named P.
>
> Restriction
> -----------
>
> A restriction can be taken using a postfix operator which is a boolean
> expression enclosed in square brackets and parameterised on the
> attributes of the relation. For example, the path expression
>
> P[COLOR = 'red']
>
> evaluates to the restriction on P where parts are red.
>
> For convenience it's assumed it's not necessary to specify a selector
> as in COLOR('red').
>
> Projection
> ----------
>
> The following path expression subsequently takes the projection onto
> attributes P# and WEIGHT:
>
> P[COLOR = 'red'].(P#,WEIGHT)
>
> Note that since path expressions evaluate to sets, we know that
> duplicates are removed from the projection.
>
> When projecting onto a single attribute the brackets aren't required.
> For example,
>
> S.S#
>
> is the projection of relation S onto attribute S#.
>
> Join with projection
> --------------------
>
> Let R1 and R2 be relations, then the path expression R1.R2 denotes the
> natural join of R1 and R2 followed by a projection onto the attributes
> of R2. For example,
>
> P[COLOR = 'red'].SP
>
> retrieves the tuples in SP associated with the shipment of a red part.
>
> The idea that R1.R2 can drop attributes from R1 may seem strange. A
> benefit arises for example in path expressions like R1.R2.R3 because
> it can avoid joining on attributes which happen to have the same name
> in R1 and R3. This can help to control the joins along a path. For
> example,
>
> P[COLOR = 'red'].SP.S.SNAME
>
> retrieves the names of the suppliers who ship at least one red part,
> whereas
>
> P[COLOR = 'red'].S.SNAME
>
> returns the names of suppliers that are located in a city where there
> are red parts.
>
> This is exactly the distinction discussed in the introduction.
>
> A brief comparison to other query languages
> -------------------------------------------
>
> From Date's book, the following SQL retrieves supplier names for
> suppliers who supply at least one red part:
>
> SELECT DISTINCT S.SNAME
> FROM S
> WHERE S.S# IN
> ( SELECT SP.S#
> FROM SP
> WHERE SP.P# IN
> ( SELECT P.P#
> FROM P
> WHERE P.COLOR = 'red'));
>
> The following expression in the relational algebra is somewhat more
> concise:
>
> ( ( ( P WHERE COLOR = 'red' )
> JOIN SP ) { S# } JOIN S ) { SNAME }
>
> Note that the above needs to project the result of the join of P and
> SP onto { S# } before joining with S in order to avoid the join on
> P.CITY = S.CITY.
>
> For those interested in the comparison, Date also provides the same
> query in the relational calculus and the domain calculus.
>
> Evidently the path expression
>
> P[COLOR = 'red'].SP.S.SNAME
>
> is simpler than the above alternatives.
>
> Join without projection
> -----------------------
>
> Given relations R1,...,Rn, (R1,...,Rn) denotes the natural join of
> R1,...,Rn. For example,
>
> (S,P)
>
> gets all combinations of supplier and part information such that they
> are located in the same city.
>
> Aggregate operators
> -------------------
>
> Let R be a relation and every attribute of R is defined on a domain
> that supports summation. Then the path expression R._at_sum denotes a
> relation with the same attribute names as R that contains a single
> tuple which maps each attribute name to the sum of that attribute over
> tuples from R. If R is empty then R._at_sum contains a single tuple full
> of zeros. For example,
>
> SP[P# = 'P2'].QTY._at_sum
>
> retrieves the total quantity of shipments for part P2.
>
> Similarly we define operators _at_min, @max, @avg, except there is no
> tuple in the output if the input relation is empty (i.e. has no
> tuples).
>
> R._at_count evaluates to a relation with a single tuple with a single
> attribute which gives the number of tuples in R. For example,
>
> S._at_count
>
> retrieves the total number of suppliers.
>
> R._at_exists evaluates to a relation with a single tuple with a single
> boolean attribute that is true iff R is non-empty.
>
> Generalisation of projection
> ----------------------------
>
> It was stated earlier that R.(x1,...,xn) denotes a projection of R
> onto attributes named x1,...,xn. This can be regarded as a special
> case of something much more general - where we allow path expressions
> instead of just attribute names for each xi.
>
> Formally the result is defined as a union of joins where the union is
> taken over the tuples t of R, and the joins are taken over the
> relations {t}.xi for each i where {t} denotes the relation consisting
> of the single tuple t. Typically the {t}.xi have no attributes in
> common so the joins are cartesian products.
>
> For example,
>
> P.(COLOR, SP)
>
> evaluates to the relation which is the natural join of P and SP,
> projected onto the attributes COLOR,S#,P#,QTY.
>
> To help understand this it may be helpful to pretend that the dot
> operator distributes over the comma separated list. i.e. we think of
> P.(COLOR, SP) as (P.COLOR, P.SP). However we don't simply take the
> cartesian product P.COLOR x P.SP. Instead we account for the shared
> prefix P in the paths which "correlate" these two path expressions.
> In other words, for each tuple from P that we use to navigate to its
> COLOR, we note that the same tuple is involved in the navigation from
> P to SP by joining on P.P# = SP.P#.
>
> It is convenient to support a syntax that adds extra attributes to an
> existing relation. The path expression R.(*,x1,...,xn) is equivalent
> to R.(y1,...,ym,x1,...,xn) where y1,...,ym are the attributes of R.
>
> The aggregate operators can be used to aggregating over groups. For
> example,
>
> P.(P#, SP.QTY._at_sum)
>
> returns a relation where for each part, it returns the part number and
> the total shipment quantity (which could be zero if there are no
> shipments of that part).
>
> Note that
>
> P.(P#, SP.QTY._at_min)
>
> retrieves the minimum quantity of shipments for each part number, but
> only for parts where there exists at least one shipment by some
> supplier.
>
> We can group over more than one attribute. For example,
>
> P.(PNAME,CITY).(*,P.WEIGHT._at_sum)
>
> retrieves the sum of the weights, grouped over PNAME and CITY.
>
> As a more complicated example,
>
> P.(P#, SP.(QTY,S.STATUS).(_at_min,_at_max,_at_avg))
>
> retrieves for each part, the part number and the minimum, maximum and
> average shipment QTY, and the minimum, maximum and average supplier
> status. However for this to make sense we would need some convention
> to assign unique attribute names in the output.
>
> Generalisation of restriction
> -----------------------------
>
> So far it has have assumed that for the restriction R[b(x1,...,xn)],
> the boolean expression b can only be parameterised by attribute names
> x1,...,xn from R. This is generalised to allow the xi to be path
> expressions.
>
> R[b(x1,...,xn)] is formalised as a restriction on R where for each
> tuple t in R, b is evaluated as a boolean function parameterised on
> the {t}.xi, which most generally are relations. If however, a {t}.xi
> is a relation containing a single tuple defined on a single attribute,
> then it can be treated as a scalar value within the boolean
> expression.
>
> For example,
>
> P[SP.S._at_count > 1].P#
>
> retrieves the part numbers of the parts supplied by more than one
> supplier.
>
> P[SP.S# = {'S2'}].P#
>
> retrieves part numbers for parts that are shipped only by supplier S2.
>
> S[not SP.P[P#='P2']._at_exists].SNAME
>
> or equivalently,
>
> S[not 'P2' in SP.P.P#].SNAME
>
> retrieves names of suppliers that don't supply part P2.
>
> Renaming
> --------
>
> Renaming of attributes is supported. For example
>
> S.(CITY as SCITY, SP.P.CITY as PCITY)
>
> retrieves all pairs of city names (SCITY, PCITY) such that a supplier
> in city SCITY supplies a part to city PCITY
>
> Calculated attributes
> ---------------------
>
> calculated attributes are supported. For example,
>
> P.(P#, WEIGHT*454 as W)[W > 10000]
>
> retrieves part number P# and weight W in grams for each part with
> weight > 10000 grams
>
> Path expressions with global scope
> ----------------------------------
>
> A path expression that begins with '..' is assumed to have a global
> scope. For example:
>
> S[STATUS < ..S.STATUS._at_max].S#
>
> retrieves supplier numbers for suppliers with status less than the
> current maximum status.
>
> As another example
>
> S[SP.P# = ..P.P#].SNAME
>
> retrieves supplier names for suppliers who ship all parts
Received on Mon Apr 06 2009 - 14:31:38 CEST

Original text of this message