ambiguity (spin-off from: On view updating)

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Tue, 28 Sep 2004 23:57:21 +0200
Message-ID: <4159de43$0$36861$e4fe514c_at_news.xs4all.nl>


Paul wrote:
> mAsterdam wrote:

>> But (assuming) we *do* want to store what *do* know,
>> we could invent a new type for this, let's say {nameRX},
>> (RX for some regular expression syntax)
>> and construct a predicate like this:
>>
>>    Employee [ID:EmpID] is called one of [Firstnamepattern:nameRX],
>>    one of [Surnamepattern:nameRX], works in one of the
>>    departments [deptpattern:nameRX] and has one of the salaries
>>    of [salpattern:amountRX].

>
> Interesting solution.

Hmmm... I guess I should have not given it. Appearantly it draws attention away from the IMO more important restatement of the problem (which you skipped - I'll repeat it, starting with the crux of the dialogue):

You wrote:

>>> For example our predicate might be:
>>> 
>>>   Employee [ID] is called [Firstname] [Surname], works in department
>>>   [dept] and has a salary of [sal]

and
>>> We want to insert to our database the proposition:
>>>
>>>   Employee 123 is called (John or Bill) Smith, works in department 10
>>>   and has a salary of 10,000.

>

>>> But, as it doesn't fit our template, we can't.

I replied:

>> Indeed. Why doesn't it fit?
>> (John or Bill) is not a [Firstname].
>> 
>> Let's say the type of [Firstname] and [Surname] is both {name}
>> (inventing syntax for this case only as we go along,
>> no special considerations here)
>> 
>> '(John or Bill)' is *not* of type {name}.
>> It is an expression that gives
>> alternative possible values for a {name}
>> 
>> So it could never render a true proposition under the predicate:
>>    Employee [ID] is called [Firstname] [Surname], works in
>>    department [dept] and has a salary of [sal].

In other words: your proposition does not fit because the example template is phrased for simple facts, and the example proposition is too complex for the predicate.

There is AFAIK nobody by the name of (Bill or John) - even if there were, it's not the fact you want to store.

Next step: try to state a
predicate that *can* hold
your propositions.

(and I proceeded to give one way to do that: a predicate which makes it possible to state the example fact).

Summarizing/emphasizing: Why doesn't it fit?

> But I think it involves pushing functionality away
> from the database engine and into the type engine.
>
> Taking this to its logical absurdity, you could have a table with a
> single column of type "logical proposition"! With a function "VAL" say
> that returned either true or false.
>
> It also means the DBMS has to be know this type is a regular expression,
> just as it had to know an empty string meant "missing data". Because
> suppose you enter a value of '/[Bill|John]/' for someone's name. Now if
> you do a query in SQL like:

'/[Bill|John]/' is *not* a value of type name.

> SELECT * FROM emp WHERE name = 'Bill' OR name = 'John'
>
> It's not going to pick up our row, even though logically it should.

According to which logic?
Why would it pick it?
I'ld report a bug if it did.
His name, according to your entered value, is /[Bill|John]/, not Bill, not John.

> You'd have to just know that the column is a regular expression and use
> some function called "Like" say that takes two regexps as arguments and
> returns true if the first regexp is a subset of the second.
>
> SELECT * FROM emp WHERE Like(name, '/[Bill|John]/') = 1
>
> The "Like" function would have to realise that, for example,
> '/[Bill|John]/' and '/[John|Bill]/' meant the same thing. Also you'd
> want Like('/[Bill|John]/', '/[Bill|John|Fred]/') to be true.

I am not aware of accepted semantics of
regexes matching against eachother.
Do you have a reference?

> But the two queries above should logically return identical results.
> So the database engine would have to know about
> and treat regular expression types as "special" and be
> able to relate them to its own boolean logic.

Regex evaluating libraries
(for string/RX matching, not RX/RX)
are easily found.

>>> So does this mean that
>>> the relational model is lacking in terms of being able to store any
>>> expression of first-order logic?
>>
>> No, it just means that the table can't hold data
>> reflecting propositions beyond its predicate.
>> This is a feature.

>
> I agree it's an intended and maybe desirable feature, but I think the
> consequence of this is that the relational model can't be said to truly
> be equivalent to first order logic.

ISTM you are taking predicates as argument, talking second-order logic. But it doesn't really matter, because the example has a problem at a more basic level.

> Sure, a database could always be redesigned to store some boolean
> expression involving sub-parts of the allowed propositions, but you
> don't want to have to redo your design every time a new piece of
> information comes up that doesn't fit.

Say some piece of information does not fit. Now what? We can discard it or redesign. What further alternatives do we have?
An automagic redesigner?
That would mean it did fit after all.

>>> Or does it just mean that the constraints we have implicitly chosen on
>>> allowable propositions (via our table design) won't allow that
>>> proposition to be stored?
>>
>> Explicitly, I'ld say. 

>
> OK, by implicitly I really just meant "not using the SQL CONSTRAINT
> keyword". Bad use of word maybe.

No problem.
BTW I don't think the problem at hand is an SQL problem.

>> A database that holds simple facts
>> is simpler than a database that hold facts of the complexity
>> of what you *did* know about Smith.
>> Imagine telling somebody else about what you know
>> about Smith. You 'll take more words, more
>> complicated - and rarer - grammatical constructs
>> in natural langague to convey your facts to another
>> person than when you simply know that
>> John Smith works at accounting.

>
> I'm not really thinking about the translation from natural language to
> logic though. I'm assuming you have a fact written down unambiguously in
> some logical notation.

Ah! Here we have it! The key here is ambiguity. We want to store certain, but ambiguous data.

>>> Is it even possible to make the database engine aware of this 
>>> interpretation? Maybe if each data type could specify to the DBMS a 
>>> value that meant "data unknown"? 
>>
>> That is *not* easy. Maybe you'ld want to read what
>> Hugh Darwen writes about missing information without NULLS: 
>> http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf 

>
> It seems like they are taking something to its logical absurdities here.
> They're splitting everything down to essentially binary relations to
> deal with unknown information. Which is very similar to my suggestion

:-) your suggestion reminded of the article.

> about dealing with information that is unknown, but still constrained to
> a subset of possible values.
>
> Suppose you allow global NULLs.
>
> Then if you have a row like: (123, NULL, 10, 10,000)
> to mean "Employee 123 is of unknown name, works in dept 10, and has
> salary 10,000"
>
> Does that conflict with the row-level constraint:
> empid!=123 OR (empid=123 AND (name='John' OR name = 'Bill'))?

By stating the fact you want to store as a constraint you are practising second-order logic here.

> I guess that any boolean expression involving NULL would return NULL.
> Do you have 3-valued constraints in SQL? I'm not sure.

...

Let's not go there (NULLs and 3VL, or even SQL) yet, ok? Maybe it is unavoidable to go there while exploring ambiguity, then so be it.
First let's get the basic stuff right. Received on Tue Sep 28 2004 - 23:57:21 CEST

Original text of this message