# Re: Relational query with path expressions

Date: Mon, 6 Apr 2009 08:31:38 -0400

Message-ID: <MCmCl.11549$jZ1.10622_at_flpi144.ffdc.sbc.com>

"David BL" <davidbl_at_iinet.net.au> wrote in message
news:edd73d6f-678e-4a1c-bd1f-ac59e06f9bb1_at_s12g2000prc.googlegroups.com...

>I get the impression that a lot of the time, a query (or a large

*> portion of a query) can be thought of as a navigation from attribute
**> to attribute using a sequence of joins. This made me think there may
**> be some merit in thinking of attribute names as global in nature and
**> the user only has to specify the start and end attributes and the DBMS
**> is able to deduce the joins that are required. The potential upshot
**> would be that queries are simplified and the user is freed from the
**> burden of understanding the way the data has been decomposed into
**> separate relations. Just recently I read about the so called
**> Universal Relation which takes this approach. However there are a
**> number of criticisms - such as those described in a paper by Kent.
**>
**> I think the biggest problem is that quite often there are multiple
**> useful paths.
*

I disagree with your line of reasoning because relation names have significance. For example, suppose that amongst the myriad relations in the manufacturing system for a synthetic rubber plant, two stand out: one that specifies the recipies that /can be used/ to produce batches of rubber, and another with exactly the same attributes that specifies the recipies that /are being used/ to produce batches of rubber. Here it is not just the juxtaposition of attributes that conveys information but also the names of the relations.

> For example, consider Chris Date's supplier example

*> from his book Introduction to Database Systems. The relations are
**>
**> S(S#, SNAME, STATUS, CITY)
**> Supplier number S# has name SNAME, status STATUS
**> and is located in CITY
**>
**> P(P#, PNAME, COLOR, WEIGHT, CITY)
**> Part number P# has name PNAME, colour COLOR,
**> weight WEIGHT in pounds and is located in CITY
**>
**> SP(S#, P#, QTY)
**> S# shipped QTY of P#
**>
**> There are two obvious ways to navigate from S# to P#. One can either
**> go via the shipments relation SP (where S# and P# appear in the same
**> relation), or one can join relations S and P on S.CITY = P.CITY.
**>
**> Presumably to define the Universal Relation one would be forced to
**> pick one of these paths as the default, and rename attributes
**> accordingly. E.g. we could arbitrarily assume the default
**> relationship between suppliers and parts is through the shipments, and
**> rename CITY as SCITY in S, and CITY as PCITY in P, so they have
**> distinct roles in the Universal Relation.
**>
**> It seems that explicit joins are a better way to give the user precise
**> control over the access paths. However I find it interesting that a 3-
**> way natural join on S,P and SP imposes both of the above navigation
**> paths at the same time, and that's rarely what's required.
**>
**> After thinking about that problem for a while, it occurred to me that
**> access paths should emphasise navigation from relation to relation,
**> rather than from attribute to attribute (as suggested by the hyper-
**> graphs in the literature on the Universal Relation).
**>
**> The following describes some ideas for a query language on a
**> relational database based on path expressions that allow relation
**> names to be specified in the path. It is more closely related to the
**> relational algebra than the calculus. It's assumed there are no NULLs
**> and queries work on sets not bags.
**>
**> Relvars
**> -------
**>
**> The name of a base relvar evaluates to its current relation value.
**> For example, the path expression
**>
**> P
**>
**> evaluates to the current value of the relvar named P.
**>
**> Restriction
**> -----------
**>
**> A restriction can be taken using a postfix operator which is a boolean
**> expression enclosed in square brackets and parameterised on the
**> attributes of the relation. For example, the path expression
**>
**> P[COLOR = 'red']
**>
**> evaluates to the restriction on P where parts are red.
**>
**> For convenience it's assumed it's not necessary to specify a selector
**> as in COLOR('red').
**>
**> Projection
**> ----------
**>
**> The following path expression subsequently takes the projection onto
**> attributes P# and WEIGHT:
**>
**> P[COLOR = 'red'].(P#,WEIGHT)
**>
**> Note that since path expressions evaluate to sets, we know that
**> duplicates are removed from the projection.
**>
**> When projecting onto a single attribute the brackets aren't required.
**> For example,
**>
**> S.S#
**>
**> is the projection of relation S onto attribute S#.
**>
**> Join with projection
**> --------------------
**>
**> Let R1 and R2 be relations, then the path expression R1.R2 denotes the
**> natural join of R1 and R2 followed by a projection onto the attributes
**> of R2. For example,
**>
**> P[COLOR = 'red'].SP
**>
**> retrieves the tuples in SP associated with the shipment of a red part.
**>
**> The idea that R1.R2 can drop attributes from R1 may seem strange. A
**> benefit arises for example in path expressions like R1.R2.R3 because
**> it can avoid joining on attributes which happen to have the same name
**> in R1 and R3. This can help to control the joins along a path. For
**> example,
**>
**> P[COLOR = 'red'].SP.S.SNAME
**>
**> retrieves the names of the suppliers who ship at least one red part,
**> whereas
**>
**> P[COLOR = 'red'].S.SNAME
**>
**> returns the names of suppliers that are located in a city where there
**> are red parts.
**>
**> This is exactly the distinction discussed in the introduction.
**>
**> A brief comparison to other query languages
**> -------------------------------------------
**>
**> From Date's book, the following SQL retrieves supplier names for
**> suppliers who supply at least one red part:
**>
**> SELECT DISTINCT S.SNAME
**> FROM S
**> WHERE S.S# IN
**> ( SELECT SP.S#
**> FROM SP
**> WHERE SP.P# IN
**> ( SELECT P.P#
**> FROM P
**> WHERE P.COLOR = 'red'));
**>
**> The following expression in the relational algebra is somewhat more
**> concise:
**>
**> ( ( ( P WHERE COLOR = 'red' )
**> JOIN SP ) { S# } JOIN S ) { SNAME }
**>
**> Note that the above needs to project the result of the join of P and
**> SP onto { S# } before joining with S in order to avoid the join on
**> P.CITY = S.CITY.
**>
**> For those interested in the comparison, Date also provides the same
**> query in the relational calculus and the domain calculus.
**>
**> Evidently the path expression
**>
**> P[COLOR = 'red'].SP.S.SNAME
**>
**> is simpler than the above alternatives.
**>
**> Join without projection
**> -----------------------
**>
**> Given relations R1,...,Rn, (R1,...,Rn) denotes the natural join of
**> R1,...,Rn. For example,
**>
**> (S,P)
**>
**> gets all combinations of supplier and part information such that they
**> are located in the same city.
**>
**> Aggregate operators
**> -------------------
**>
**> Let R be a relation and every attribute of R is defined on a domain
**> that supports summation. Then the path expression R._at_sum denotes a
**> relation with the same attribute names as R that contains a single
**> tuple which maps each attribute name to the sum of that attribute over
**> tuples from R. If R is empty then R._at_sum contains a single tuple full
**> of zeros. For example,
**>
**> SP[P# = 'P2'].QTY._at_sum
**>
**> retrieves the total quantity of shipments for part P2.
**>
**> Similarly we define operators _at_min, @max, @avg, except there is no
**> tuple in the output if the input relation is empty (i.e. has no
**> tuples).
**>
**> R._at_count evaluates to a relation with a single tuple with a single
**> attribute which gives the number of tuples in R. For example,
**>
**> S._at_count
**>
**> retrieves the total number of suppliers.
**>
**> R._at_exists evaluates to a relation with a single tuple with a single
**> boolean attribute that is true iff R is non-empty.
**>
**> Generalisation of projection
**> ----------------------------
**>
**> It was stated earlier that R.(x1,...,xn) denotes a projection of R
**> onto attributes named x1,...,xn. This can be regarded as a special
**> case of something much more general - where we allow path expressions
**> instead of just attribute names for each xi.
**>
**> Formally the result is defined as a union of joins where the union is
**> taken over the tuples t of R, and the joins are taken over the
**> relations {t}.xi for each i where {t} denotes the relation consisting
**> of the single tuple t. Typically the {t}.xi have no attributes in
**> common so the joins are cartesian products.
**>
**> For example,
**>
**> P.(COLOR, SP)
**>
**> evaluates to the relation which is the natural join of P and SP,
**> projected onto the attributes COLOR,S#,P#,QTY.
**>
**> To help understand this it may be helpful to pretend that the dot
**> operator distributes over the comma separated list. i.e. we think of
**> P.(COLOR, SP) as (P.COLOR, P.SP). However we don't simply take the
**> cartesian product P.COLOR x P.SP. Instead we account for the shared
**> prefix P in the paths which "correlate" these two path expressions.
**> In other words, for each tuple from P that we use to navigate to its
**> COLOR, we note that the same tuple is involved in the navigation from
**> P to SP by joining on P.P# = SP.P#.
**>
**> It is convenient to support a syntax that adds extra attributes to an
**> existing relation. The path expression R.(*,x1,...,xn) is equivalent
**> to R.(y1,...,ym,x1,...,xn) where y1,...,ym are the attributes of R.
**>
**> The aggregate operators can be used to aggregating over groups. For
**> example,
**>
**> P.(P#, SP.QTY._at_sum)
**>
**> returns a relation where for each part, it returns the part number and
**> the total shipment quantity (which could be zero if there are no
**> shipments of that part).
**>
**> Note that
**>
**> P.(P#, SP.QTY._at_min)
**>
**> retrieves the minimum quantity of shipments for each part number, but
**> only for parts where there exists at least one shipment by some
**> supplier.
**>
**> We can group over more than one attribute. For example,
**>
**> P.(PNAME,CITY).(*,P.WEIGHT._at_sum)
**>
**> retrieves the sum of the weights, grouped over PNAME and CITY.
**>
**> As a more complicated example,
**>
**> P.(P#, SP.(QTY,S.STATUS).(_at_min,_at_max,_at_avg))
**>
**> retrieves for each part, the part number and the minimum, maximum and
**> average shipment QTY, and the minimum, maximum and average supplier
**> status. However for this to make sense we would need some convention
**> to assign unique attribute names in the output.
**>
**> Generalisation of restriction
**> -----------------------------
**>
**> So far it has have assumed that for the restriction R[b(x1,...,xn)],
**> the boolean expression b can only be parameterised by attribute names
**> x1,...,xn from R. This is generalised to allow the xi to be path
**> expressions.
**>
**> R[b(x1,...,xn)] is formalised as a restriction on R where for each
**> tuple t in R, b is evaluated as a boolean function parameterised on
**> the {t}.xi, which most generally are relations. If however, a {t}.xi
**> is a relation containing a single tuple defined on a single attribute,
**> then it can be treated as a scalar value within the boolean
**> expression.
**>
**> For example,
**>
**> P[SP.S._at_count > 1].P#
**>
**> retrieves the part numbers of the parts supplied by more than one
**> supplier.
**>
**> P[SP.S# = {'S2'}].P#
**>
**> retrieves part numbers for parts that are shipped only by supplier S2.
**>
**> S[not SP.P[P#='P2']._at_exists].SNAME
**>
**> or equivalently,
**>
**> S[not 'P2' in SP.P.P#].SNAME
**>
**> retrieves names of suppliers that don't supply part P2.
**>
**> Renaming
**> --------
**>
**> Renaming of attributes is supported. For example
**>
**> S.(CITY as SCITY, SP.P.CITY as PCITY)
**>
**> retrieves all pairs of city names (SCITY, PCITY) such that a supplier
**> in city SCITY supplies a part to city PCITY
**>
**> Calculated attributes
**> ---------------------
**>
**> calculated attributes are supported. For example,
**>
**> P.(P#, WEIGHT*454 as W)[W > 10000]
**>
**> retrieves part number P# and weight W in grams for each part with
**> weight > 10000 grams
**>
**> Path expressions with global scope
**> ----------------------------------
**>
**> A path expression that begins with '..' is assumed to have a global
**> scope. For example:
**>
**> S[STATUS < ..S.STATUS._at_max].S#
**>
**> retrieves supplier numbers for suppliers with status less than the
**> current maximum status.
**>
**> As another example
**>
**> S[SP.P# = ..P.P#].SNAME
**>
**> retrieves supplier names for suppliers who ship all parts
*

Received on Mon Apr 06 2009 - 14:31:38 CEST