Re: Question on Structuring Product Attributes

From: Jan Hidders <hidders_at_gmail.com>
Date: Tue, 8 Oct 2013 19:51:35 +0200
Message-ID: <52544627$0$1659$e4fe514c_at_dreader35.news.xs4all.nl>


On 2013-10-08 12:17:15 +0000, Nicola said:

> In article <5251dfef$0$574$e4fe514c_at_dreader34.news.xs4all.nl>,
>  Jan Hidders <hidders_at_gmail.com> wrote:
> 
>> On 2013-10-05 15:48:36 +0000, Nicola said:
>> 

>>> In article <524d7370$0$618$e4fe514c_at_dreader34.news.xs4all.nl>,
>>> Jan Hidders <hidders_at_gmail.com> wrote:
>>>
>>>> On 2013-10-02 18:55:36 +0000, Nicola said:
>>>> 
>>>>> In article <1601143370402385874.040013hidders-gmail.com_at_news.xs4all.nl>,
>>>>> Jan Hidders <hidders_at_gmail.com> wrote:
>>>>> 
>>>>>> Derek Asirvadem <derek.asirvadem_at_gmail.com> wrote:
>>>>>>> This one should not pass unnoticed.
>>>>>>> 
>>>>>>> On Monday, 18 February 2013 09:09:15 UTC+11, Eric  wrote:
>>>>>>>> 
>>>>>>>> To quote from Codd[1970]:
>>>>>>>> 
>>>>>>>> "A firstorder predicate calculus suffices
>>>>>>>> if the collection of relations is in normal form."
>>>>>>> 
>>>>>>> Codd specifies what that Normal Form is, that reference is not merely a
>>>>>>> generic  or general term.
>>>>>> 
>>>>>> Indeed. Also note that Codd's claim is not unproblematic. Even if you
>>>>>> restrict yourself to the flat relational model, first order logic cannot
>>>>>> express many practical queries. So what exactly does he mean by
>>>>>> "sufficient" here?
>>>>> 
>>>>> I interpret that quote as follows: "If we allow relations as attribute
>>>>> values,
>>>>> we need (at least) a second-order relational calculus". In this sense,
>>>>> first-order logic "suffices" when relations are "flat" (we don't need to
>>>>> quantify over relations). Of course I agree with you, many interesting
>>>>> queries
>>>>> cannot be expressed in FOL, but you can always design a query language
>>>>> with
>>>>> additional power (SQL is an example).
>>>> 
>>>> Agreed, that is indeed probably what he meant. But I guess I'm trying
>>>> to argue how weak that argument actually is. Sure, keeping things
>>>> first-order looks like something that keeps it simple, but does it
>>>> really? If you need extensions anyway, how bad then, relatively
>>>> speaking, is it to move to a higher-order setting?

>>>
>>> Loosely speaking, the complexity, measured as a function of the size of the
>>> database (the so-called “data complexity”), of answering a query of
>>> relational
>>> calculus is in L (solvable in deterministic logarithmic space), so it is
>>> tractable. If the query language is extended with existential second-order
>>> quantification, the problem becomes NP-complete. Adding only transitive
>>> closures
>>> makes the problem complete for NL (still in P). These results (and others)
>>> can
>>> be found in “The complexity of relational query languages” by M. Vardi.
>>> This
>>> should partially answer your latter question.
>> 
>> I'm a little bit insulted that you think I did not know that. :-)
> 
> Sorry, I've realized that I was not adding much to the discussion at the same
> time as I've hit "Post"... That's what happens when one replies without
> forethought :)
> 
>> There are similar results for the Nested Relational Calculus, so I do not see
>> that as a very convincing reason to stick to a first order data model.
>> But maybe that's not what you meant.
>> […]
>> Computational complexity is not the only thing that determines where a
>> database query or integrity language is an effective one. I'm not
>> saying these results do not matter or should not be taken into account
>> when desiging these languages, but ultimately the proof of the pudding
>> is in the eating, not in theorizing about it.
> 
> If you mean that theoretical bounds on the complexity do not say much about the
> actual behavior of an algorithm, I thoroughly agree (if not, I'm afraid I don't
> understand what “effective” means to you).

Close. I define a good query language as one that makes the DBMS a better DBMS(*). Computational complexity is only one aspect, and not necessarily the decisive one. A language can be computationally expensive but still good because it is easy to use and understand, and it is implemented such that a well-defined subset is efficient, and still the more expensive queries can be delegated to the DBMS if you want, et cetera.

I know this is comp.databases.theory, but let us not forget that theory is an approximation of reality, simplified enough so the theoreticians can have their theorems. :-)

(*) Not a very well-defined concept, I know, but that's besides the point.

  • Jan Hidders
Received on Tue Oct 08 2013 - 19:51:35 CEST

Original text of this message