Re: On view updating

From: Paul <paul_at_test.com>
Date: Tue, 28 Sep 2004 13:45:24 +0100
Message-ID: <41595cde$0$69737$ed2619ec_at_ptn-nntp-reader01.plus.net>


mAsterdam wrote:
> But (assuming) we *do* want to store what *do* know,
> we could invent a new type for this, let's say {nameRX},
> (RX for some regular expression syntax)
> and construct a predicate like this:
>
> Employee [ID:EmpID] is called one of [Firstnamepattern:nameRX],
> one of [Surnamepattern:nameRX], works in one of the
> departments [deptpattern:nameRX] and has one of the salaries
> of [salpattern:amountRX].

Interesting solution. But I think it involves pushing functionality away from the database engine and into the type engine.

Taking this to its logical absurdity, you could have a table with a single column of type "logical proposition"! With a function "VAL" say that returned either true or false.

It also means the DBMS has to be know this type is a regular expression, just as it had to know an empty string meant "missing data". Because suppose you enter a value of '/[Bill|John]/' for someone's name. Now if you do a query in SQL like:

   SELECT * FROM emp WHERE name = 'Bill' OR name = 'John'

It's not going to pick up our row, even though logically it should. You'd have to just know that the column is a regular expression and use some function called "Like" say that takes two regexps as arguments and returns true if the first regexp is a subset of the second.

   SELECT * FROM emp WHERE Like(name, '/[Bill|John]/') = 1

The "Like" function would have to realise that, for example, '/[Bill|John]/' and '/[John|Bill]/' meant the same thing. Also you'd want Like('/[Bill|John]/', '/[Bill|John|Fred]/') to be true.

But the two queries above should logically return identical results. So the database engine would have to know about and treat regular expression types as "special" and be able to relate them to its own boolean logic.

>> So does this mean that
>> the relational model is lacking in terms of being able to store any
>> expression of first-order logic?

>
> No, it just means that the table can't hold data
> reflecting propositions beyond its predicate.
> This is a feature.

I agree it's an intended and maybe desirable feature, but I think the consequence of this is that the relational model can't be said to truly be equivalent to first order logic.

Sure, a database could always be redesigned to store some boolean expression involving sub-parts of the allowed propositions, but you don't want to have to redo your design every time a new piece of information comes up that doesn't fit.

>> Or does it just mean that the constraints we have implicitly chosen on
>> allowable propositions (via our table design) won't allow that
>> proposition to be stored?

>
> Explicitly, I'ld say.

OK, by implicitly I really just meant "not using the SQL CONSTRAINT keyword". Bad use of word maybe.

> A database that holds simple facts
> is simpler than a database that hold facts of the complexity
> of what you *did* know about Smith.
> Imagine telling somebody else about what you know
> about Smith. You 'll take more words, more
> complicated - and rarer - grammatical constructs
> in natural langague to convey your facts to another
> person than when you simply know that
> John Smith works at accounting.

I'm not really thinking about the translation from natural language to logic though. I'm assuming you have a fact written down unambiguously in some logical notation.

>> Is it even possible to make the database engine aware of this 
>> interpretation? Maybe if each data type could specify to the DBMS a 
>> value that meant "data unknown"? 

>
> That is *not* easy. Maybe you'ld want to read what
> Hugh Darwen writes about missing information without NULLS:
> http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf

It seems like they are taking something to its logical absurdities here. They're splitting everything down to essentially binary relations to deal with unknown information. Which is very similar to my suggestion about dealing with information that is unknown, but still constrained to a subset of possible values.

Suppose you allow global NULLs.

Then if you have a row like: (123, NULL, 10, 10,000) to mean "Employee 123 is of unknown name, works in dept 10, and has salary 10,000"

Does that conflict with the row-level constraint: empid!=123 OR (empid=123 AND (name='John' OR name = 'Bill'))?

I guess that any boolean expression involving NULL would return NULL. Do   you have 3-valued constraints in SQL? I'm not sure.

If it didn't conflict with the constraint then we have effectively stored in our database the fact that employee 123 has an unknown name, but is either called John or Bill. If it did, then my point is irrelevant.

Now to deal with it if you don't have global NULLs I'm not sure... Maybe you could make it a compulsory part of a type definition to define one value for each type that means "data unknown"?

Paul Received on Tue Sep 28 2004 - 14:45:24 CEST

Original text of this message