Queries and abstraction - longish

From: Joe Thurbon <nntp_at_thurbon.com>
Date: Thu, 07 May 2015 09:20:05 +1000
Message-ID: <mie7j6$pa1$1_at_speranza.aioe.org>



I have a question which comes from a practical issue, but which I think has a theoretical aspect. The summary of the issue is: what tools do we have to make abstractions where identical reports/queries/processing are made over different subsets/slices of data.

Please allow me to paint a scenario. I'm sorry that it's a little complicated, but it's the simplest one that I can devise to explain my scenario adequately. I should point out that I'm not suggesting this is a good schema in general, it's tailored purely to get my point across (I hope persuasively).

Consider a database that captures survey results where respondents rate a business's products on a number of scales. So, for example, a survey might look like:

--

What product are you rating? <user selects from a drop-down list>

Q1. How did you rate the product's design: <user selects 1 - 10>
Q2: How did you rate the product's cost:   <user selects 1 - 5>
Q3: How did you rate the product's value:  <user selects 1 - 3>

--

A very simple schema with three tables that captures this information might be (where * represents a primary key)

--

table QUESTIONS(*question_number, question_text, size_of_scale)
table RESPONSES(*response_number, product_being_identified)
table ANSWERS  (*response_number, *question_number, point_on_scale)

--

The company wishes to understand on both a per-product and overall basis how their customers rate their products. So a couple of views are useful to put together. These view's aren't super-complicated, although I may have the details wrong - it doesn't matter for the sake of this discussion.

So, let's assume we now have two views that are defined:



create view AVERAGE_PRODUCT_SCORES as
      select
          product_being_identified,
          question_number,
          question_text,
          avg(point_on_scale/size_of_scale) as average_scale
      from QUESTIONS natural join RESPONSES natural join ANSWERS
        group by
        product_being_identified,
        question_number,
        question_text;

create view AVERAGE_OVERALL_SCORES as
      select
          question_number,
          question_text,
          avg(average_scale) as average_scale
      from AVERAGE_PRODUCT_SCORED
      group by question_number, question_text;

All good so far - this is quite simple.

Now consider the following scenario. People who complete the survey can also provide bits of demographic information, such as their age, gender, country of birth, etc.

Now the company would like to understand the average product scores based on a variety of demographic information. So, they might want to aggregate responses for women aged 25-30 born in France. This targetted aggregation is the focus of my scenario.

Imagine some sort of demographics table like

table DEMOGRAPHICS
(

      response_number,
      demographic_variable,
      demographic_value

);

Please excuse the EAV nature of that table, it's for demonstration only, and the same issue obtains if we consider one table for each demographic variable.

Consider the AVERAGE_PRODUCT_SCORES definition. Rather than join against all RESPONSES, what I want to do is consider only RESPONSES that meet sort of demographic criteria. So the definition would be, on a query-by-query basis:

      select
          product_being_identified,
          question_number,
          question_text,
          avg(point_on_scale/size_of_scale) as average_scale
      from
        QUESTIONS
        natural join
           (
              ***
              some query based on RESPONSES and DEMOGRAPHICS
              ***
           ) responses
        natural join ANSWERS
        group by
        product_being_identified,
        question_number,
        question_text;


In some sense, all the bits outside the *** in the above query represent some sort of abstract processing that happens on "target responses". As that bit gets more and more complex, I want to avoid duplicating it.

O.K. So my question now is:

Is there some "relational" way (e.g. not using stored procedures) of capturing that abstract processing such that I can re-use it against different target responses?

The procedural/functional programming mechanism is pretty straight forward, so I'm wondering if there's some sort of relational equivalent.

Kind regards,
Joe Received on Thu May 07 2015 - 01:20:05 CEST

Original text of this message