Re: Query Bias

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Thu, 08 Jul 2010 18:10:09 -0300
Message-ID: <4c363eb9$0$11833$9a566e8b_at_news.aliant.net>


Joe Thurbon wrote:

> Hi folks,
>
> When JOG used to post here regularly he talked quite a bit about query
> bias. It also came up recently on the TTM mailing list. It also gets a
> vague nod on the wikipedia page on database normalisation.
>
> Does the term have any formal, well defined or shared meaning?
>
> I think I understand the intuition, that in some sense an 'unbiased'
> data model minimises some sort of complexity metric aggregated across
> some set of queries. (I think that 'some set' might actually be
> replaced by 'all' for some people).
>
> But complexity can mean lots of things:
> How simple is the query to write?
> How simple is the query to execute?
> How simple is the query to understand?
>
> What does it mean to the folk here?
>
> It's only a curiosity to me, but I'd be interested to hear what people
> think.
>
> Cheers,
> Joe

Query bias is more a property of the data model or language than of a design. Redundant, implicit or asymmetric language features or needless complexity in the data model mean that when one chooses a particular redundant, implicit, asymmetric or needless feature in a design, one makes some queries easy to express and other queries difficult or impossible to express.

Some languages or data models introduce all sorts of unecessary structural elements like pointers, records, sets, arrays, parents, children.

As an example, if orders have an array of child order items, there may be no way to access order items without going through orders. A query to determine the most expensive order item in the database becomes relatively difficult to express. (Note: the problem statement involves only order items not orders.)

In some cases, the bias can relate to performance. As a general rule, when one changes the physical layout of the data, one makes some queries faster and some queries slower. That can be quite intentional and desired. If the logical formalism (language) has redundancies that affect performance, too often naive developers will choose the "fastest" option for the first problem they have to solve without any regard for future needs.

In the above orders/order items example, making order items a child array of order might cluster order items with orders. While this will make it extremely fast to access an order's order items once one has navigated to the order, scanning all orders will be slowed--possibly by a wide margin. Received on Thu Jul 08 2010 - 23:10:09 CEST

Original text of this message