Re: Arbitrary Constraints

From: Vadim Tropashko <vadimtro_at_yho.cm>
Date: Tue, 26 Oct 2004 09:16:37 -0700
Message-ID: <2_ufd.30$ct5.102_at_news.oracle.com>


Kenneth Downs wrote:
> Last week I mentioned that I believed unique and referential constraints are
> not used to their full potential, and that they could be used instead of
> some of the constraints we see today...

Here is ASCII snippet of an article I submitted to dbazine a week ago:

Subqueries in the check constraints
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

One of the most powerful SQL features is scalar subquery. Scalar subquery can be a part of where or select clause, as opposed to inline view, which is an element of from list. It is natural to expect that scalar subquery can substitute any scalar variable or constant in an arbitrary expression. For example, given a constraint

ALTER TABLE emp
ADD CONSTRAINT ck_0_le_1 CHECK( 0 < 2 )
which trivially holds true, we can try substituting the constant 2 with subquery as follows

ALTER TABLE emp
ADD CONSTRAINT ck_fk CHECK(

    0 < (select count(1) from dept

         where dept.deptno=emp.deptno)
)
Assuming that emp relation doesn't admit NULLs, we easily recognize that the above check constraint is essentially referential integrity constraint or, simply, foreign key. Although there seems to be little point reinventing foreign key constraints as check constraints, the next example demonstrates that subqueries in check constraints greatly increase constraint's expressive power.

Consider the customer supertype which has 2 subtypes: business_customer and home_customer. Further, assume that our data model don't have the customer table. If we introduce one more entity -- order --, then it is natural to reinforce many-to-one relationship between orders and customers as the following constraint:

ALTER TABLE order
ADD CONSTRAINT ck_fk CHECK(

    0 < (select count(1) from business_customer b

         where b.cust_id=order.cust_id)
    OR
    0 < (select count(1) from home_customer h

         where h.cust_id=order.cust_id)
)
The question how to impose a "generalized" foreign key constraint onto a set of more than 2 tables is frequently asked on various database forums.

Limitations
^^^^^^^^^^^

Unfortunately, RDBMS that I use, leaves no doubt that today this is not possible:

ORA-02251: subquery not allowed here

What about User Defined SQL functions (UDF)? Like subquery UDF returns a scalar value as well, which is normally allowed in any place of expression where ordinary scalar variable or constant is allowed. However, if RDBMS of my choice were allowing UDF calls within check constraint expression, then I could claim that the implementation is inconsistent. Indeed, UDF can contain embedded SQL inside its body

function deptno_cnt( deptno integer )
RETURN integer IS

    ret_val integer;
BEGIN
    select count(1) into ret_val from dept     where dept.deptno = deptno
    RETURN ret_val;
END; which effectively makes UDFs as powerful as scalar subqueries.

Your mileage might vary, depending what RDBMS you use, but the problem is more fundamental than just allowing naive subquery evaluation, or blind UDF call when validating check constraint.

Constraints as a View Equations

There are two issues of paramount importance as far as constraints are concerned: performance and concurrency. In this article we'll focus on performance only. Lets analyze the above "foreign key" constraint

ALTER TABLE emp
ADD CONSTRAINT ck_fk CHECK(

    0 < (select count(1) from dept

         where dept.deptno=emp.deptno)
)
from that perspective.

Suppose a tuple were inserted into the emp table then, how RDBMS is supposed to check whether the constraint hasn't been violated? That's easy: RDBMS just have to check if the expression is valid for that specific tuple. This, in turn, would trigger execution of the query

select count(1) from dept
where dept.deptno=:1
where :1 bind variable matches current emp.deptno value. If there is an index on dept.deptno, then query optimizer should be able to leverage it. With the dept table growth the performance index range scan would degrade gracefully (logarithmically to be more exact), which mimics the "authentic" foreign key implementation. (Typical RDBMS requires foreign key constraint to refer to the master table column which has a unique key constraint on it, and unique key constraint implies index). So far so good: the technique where constraint is checked only for those tuples in the emp and the dept tables which are affected by the transaction is called incremental constraint evaluation.

What if a tuple is deleted from dept table, which condition should the RDBMS check? Unlike the previous case, constraint doesn't favor single tuple evaluation at the dept table side. It has to be rewritten.

Informally, the constraint says

"Any tuple in the emp table satisfies the check condition"

which is equivalent to

"No tuple in the emp table refutes the check condition"

In other words, if we take the emp table and select all the tuples that don't satisfy the check condition, then the resulting view must be empty. The last sentence could be easily translated back into a formal expression

select 1 from emp where not

    0 < (select count(1) from dept

         where dept.deptno=emp.deptno)
= {}
which we interpret as view equation. Indeed, in this form it looks very similar to equations in math. The empty set {} at the right side of the view equation mimics zero element in math equation.

Next, the view can be rewritten as antijoin

select 1 from emp where not exists

    (select 1 from dept
     where dept.deptno=emp.deptno)
= {}
and, then, antijoin can be transformed into a set operation

select deptno from emp
minus
select deptno from dept
= {}
What a charming view equation! It's immediate that the emp-dept referential integrity constraint can be violated only if the left side view becomes nonempty. It could happen when a tuple is inserted into emp table, or deleted from dept table.

This is as far as we can get in our quest for symmetry in a formal representation of our foreign key constraint. Now we can focus on implementation.

Materialized Views
^^^^^^^^^^^^^^^^^^

In RDBMS world queries and updates are often conflicting goals from performance perspective. We can speed up some queries at the cost of introducing some auxiliary structures. Those structures are maintained incrementally: the change to the structures is small when the update transaction is small, which makes the overall performance acceptable. Incremental query evaluation is, arguably, one of the most important performance ideas.

Ubiquitous examples of incremental query evaluation are indexes and materialized views. More exclusive examples include materializing transitive closure relation and various hierarchy encodings. (Admittedly, the last two incremental evaluation techniques are not supported by query rewrite, when user's query is agnostic of the auxiliary structures).

Incrementally maintained materialized views are ideal candidates for view equation evaluation. In our last example RDBMS have to check if the materialized view

select deptno from emp
minus
select deptno from dept
is empty. Formally, this condition is check constraint on the aggregated materialized view

CREATE MATERIALIZED VIEW emp_minus_dept_mv AS select count(1) cnt from (

    select deptno from emp
    minus
    select deptno from dept
)
ALTER TABLE emp_minus_dept_mv
ADD CONSTRAINT ck_card_eq_0 CHECK( cnt = 0 ) Although we succeeded reducing the problem of generic constraints to check constraints with simple expressions, incrementally maintaining materialized views is technically difficult and, therefore, has limitations on the implementation side.

Oracle, for example, doesn't allow set operations in the definition of incrementally maintained materialized view (oracle term is fast refreshable). The allowed relational operations that preserve fast refreshable property of materialized view are: join, outer join, and aggregation. We have to change the design in order to work around this restriction.

An alternative solution would use outer join and check if the column is not null:

CREATE MATERIALIZED VIEW LOG ON emp
WITH ROWID INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW LOG ON dept
WITH ROWID INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW emp_dept_oj_mv
REFRESH FAST ON COMMIT AS
select d.deptno ddept, d.rowid drid, e.rowid erid from emp e, dept d
where e.deptno=d.deptno(+);
ALTER TABLE emp_dept_oj_mv
ADD CONSTRAINT ck_oj_mv CHECK(ddept is not null); delete from dept where deptno = 20;
commit;
ORA-12008: error in materialized view refresh path ORA-02290: check constraint (SCOTT.CK_OJ_MV) violated

As expected, deleting a key in the dept table causes update of emp_dept_oj_mv materialized view with NULL value in the ddept column, which is not allowed by ck_oj_mv check constraint.

This completes our program implementing foreign key constraint as check constraint. From performance perspective our last solution is still not quite satisfactory. Unlike our original materialized view with minus operator, outer join view grows with the base tables growing. This storage penalty is totally unnecessary and, once more, could be avoided, when incremental view update mechanism becomes more sophisticated to allow set operations.

Now that the idea of implementing complex constraints via check constraints on materialized views looks sound, we can investigate other cases.

<2 more sections> Received on Tue Oct 26 2004 - 18:16:37 CEST

Original text of this message