Re: storing survey answers of different data types

From: Bob Badour <>
Date: Thu, 23 Apr 2009 14:32:22 -0300
Message-ID: <49f0a629$0$5501$>

lawpoop wrote:

> On Apr 23, 1:43 am, Roy Hann <specia..._at_processed.almost.meat> wrote:

>>OK, here's the sketch, and it is only a sketch of the "best" approach. I
>>am going to describe it in terms of SQL only because that is what you
>>are going to end up using. I hope you will end up thinking my sketch
>>is simplistic or trivial. (Looking at a database design and asking
>>"but where's the magic?" is probably a sign that you are on the right
>>track. There shouldn't be anything magic or mysterious in the database
>>that can be discovered/disclosed/decoded only by the application at
>>run-time.) I am sure others here will elaborate on my suggestion if
>>they feel I've been excessively colloquial/sloppy/terse/SQL-ish, but
>>this is a sketch for a newbie in a hurry and I am rushing this out
>>before the boss catches me. :-)

> So, it seems like the magic of this approach is 1. that each
> questionnaire has its own response table, and 2. that I'm querying the
> table metadata to find out column types, etc?
>>Before I start I should clarify the terminology a bit.  To me,
>>a questionnaire is a set of questions.  A single questionnaire may be
>>administered to any number of interviewees.  Each interviewee will
>>provide a response which is their set of answers to the questions on
>>the questionnaire, hence each questionnaire will have multiple
>>responses--one from each interviewee.

> This sounds about right.
>>First, note that each row in an SQL table is expected to be
>>understood as an assertion of fact.  There is expected to be a
>>"template" of a sentence associated with the table, into which the
>>values from each row can be plugged to form a meaningful statement.
>>For instance a simple template for the responses to one questionnaire
>>might be "on day <date> the interviewee called <name> answered question
>>1 <answer1> and question 2 <answer2> and question 3 <answer3>...".  We
>>don't store the template anywhere--maybe we should but we don't.
>>However it should be written down somewhere because everything that you
>>do subsequently in your database design and your application code has
>>to make sense with respect to that template.

> We're not making a narrative per se here -- I think this is what you
> mean by 'template'.

He's referring to the external predicate. In that sense, you are making a narrative whether you like it or not.

> So, if one wanted to call a list of questions a template, or
> narrative, or whatever, that's fine, but there's really not more to it
> than the list of questions.

It's called an external predicate. Relations formally represent predicates: sets of true statements. The formalism doesn't know anything outside of its manipulation rules. People, though, understand the predicates in terms of real everyday things.

The dbms doesn't know what a question is or what an answer is or what it means to answer a question. It just has symbols it can manipulate. That is the internal predicate.

If a relation has an age in years, the dbms doesn't care whether it is the age of the person or the age of his favourite cheese. Roy is pointing out that it is dodgy at best to mix ages of people with ages of cheese.

>>I know nothing about your business process (and I don't want to--too
>>busy),but it seems to me that you probably need just one response table
>>per questionnaire to record the responses.  If you have thousands of
>>questionnaires then you'll need thousands of response tables.  No
>>worry; there is no ration on tables, and multiple smaller tables will
>>make querying fast.

> Okay...
>>I expect different questionnaires to have different kinds of reponses.

> That's the whole point. Actually, it goes a bit further, if this isn't
> what you already meant: each *question* can have different kinds of
> responses. Each questionnaire will likely have many different data
> types of responses.

No, he is saying that the response to a questionnaire is a large compound statement:

Questionnaire #1

1. What is your name?
2. Where do you live?
3. What is your favourite type of cheese?
4. How aged do you prefer your cheese?

My name is Bob, I live in Canada, and my favourite cheddar is 4 years old. {name=Bob,place=Canada,favourite=cheddar,age=4}

is different from

Questionnaire #2

1. What is your name?
2. Where were you born?
3. What is your favourite breed of dog?
4. How old are you?

My name is Bob, I was born in Canada,my faviourate breed is Australian Shepherd, and I am 43 years old.
{name=Bob,place=Canada,favourite=Australian Shepherd,age=43}

Two different questionnaires. Two different relations.

Both are different from:

Questionnaire #3

1. What is your name?
2. Where do you own property?
3. How old is the principle structure at that property?

My name is Bob, I have property in Canada, my house is 114 years old. {name=Bob,place=Canada,age=114)

Different questionnaires. Different tables. A column for each question. A row for each respondent. All described neatly in the system catalog.

>>One might be a set of yes/no answers.  Another might be a set scores
>>from best to worst or whatever.  Another might be set of observations
>>(e.g. date, time, temperature, windchill, etc.).  And yet another could
>>be a mix.  So for each questionnaire you create a response table with
>>columns of the appropriate type for that questionnaire.  Nothing
>>mysterious so far.  But note that even if all your questionnaires have
>>the exact same kind of responses they should still each have their own
>>response table.  Having the same "shape" is an irrelevant coincidence;
>>don't be tempted to confuse or conflate them because that obscures
>>what is really going on.

> Say a questionnaire has three 1-5 questions ( "On a scale of 1 - 5,
> please rate..." ), a free-form text answers, two agree/disagree, and a
> branching yes/no question with three levels. What would be the table
> structure for that questionnaire?
> CREATE TABLE `Questionnaire_052` (
> `question_1` ENUM( '1', '2', '3', '4', '5' ) NULL , -- this is a MySQL
> data type; I suppose another database would have an int column and a
> constraint to make it 1-5.
> `question_2` ENUM( '1', '2', '3', '4', '5' ) NULL ,
> `question_3` ENUM( '1', '2', '3', '4', '5' ) NULL ,
> `question_4` TEXT NULL ,
> `question_5` BOOL NULL, -- might want to use ENUM here, so that the
> responses posed to the user are "Yes/No", "Agree/Disagree", etc.
> Otherwise, have to store that data somewhere else, why not here?
> `question_6` BOOL NULL,
> `question_7` BOOL NULL,
> `question_8` BOOL NULL,
> `question_9` BOOL NULL,
> ) ENGINE = InnoDB ;
> ?
> Please tell me I'm doing this wrong :)

I would avoid NULL if possible.

>>When you create your response table in your database, the SQL
>>engine will automatically create entries in the database catalogue
>>tables to describe your table.  It *has* to do this, both for the
>>obvious reasons, and because the relational model (which SQL pretends
>>to respect) insists that the meta-data be represented explicitly in the
>>database in a form capable of being queried by anyone.  I suspect this
>>one point is where 99.99% of practitioners go astray.  They don't know
>>the meta-data is there and that it is REQUIRED to be there and it is
>>absolutely legitimate to assume it will always be there and always be
>>public!  The meta-data tells both you and the SQL engine what columns
>>exist in the table and what each is called and what its data-type is
>>and a whole lot of other vital stuff.

> So using DDL in day-to-day operations ( adding a questionnaire ) are
> legal moves in this situation? I can go along with that.

To be fair, it may present some concurrency issues depending on the backend and the frequency with which questionnaires and versions of questionnaires get created.

(You realize each version needs its own table. Right?)

>>This is where your moderately elaborate application comes in.
>>You need to write a generic questionnaire-processing application
>>which takes a response table name as input, and then interrogates the
>>system catalogues for the meta-data to discover the number and type of
>>each column in the response table.  Once it has that information it can
>>insert/update/delete/select using dynamically constructed queries.
>>Just what your program has to look like will depend crucially on your
>>programming language and your DBMS.  If you are using any kind of API
>>you are probably already constructing queries dynamically whether you
>>know it or not. If you are using something more old-school like Cobol
>>or C with embedded SQL you will have to learn to use dynamic SQL, but
>>it's actually quite easy.

> It sounds like this is something I'm already doing when I create HTML
> select boxes. MySQL has an 'Enum' column type which is a list of
> enumerated values. I can query the table definition to get those
> values and populate the select box.
>>I haven't talked about a table for the questionnaire itself (i.e. the
>>questions).  I don't see any need for such a table in the database. It
>>is just documentation.  Obviously if it is convenient to have
>>the questions in the database for the purpose of decorating reports then
>>by all means have one.

> Survey Website: "Please choose a response, 1 through five, for this
> first question of the survey, and click the 'Submit button'. To find
> out what the first question is, refer to the documentation."
> Me: "56% of the respondants replied 'Yes' to the first question."
> Boss: "Okay -- what was the first question again?"
> Me: "It doesn't matter what the question was; that's just window
> dressing."
> Boss: "These survey results are useless without knowing the
> questions!"
> Me: "Alright, alright, let's look at the documentation..."
> Am I following you here? Convenient, but not necessary?

Not necessary in the dbms. It could just as easily be in php, html, a smarty template, etc. Where it belongs depends on all of your requirements.

> Suppose I wanted to compile a report of the responses. Am I to look up
> the questions from documentation, type them in, and produce the
> report? Why not store them electronically? In a database? Why not
> store them in the same database, in the same questionnaire where they
> originated? Why not put them right where I need them, so I don't need
> to refer to documentation when I make a report, but instead I can just
> throw another column into the query?

All excellent questions. I assume your requirements answer them.

> I don't really care that someone responded "Yes" to the first
> question, or that 72% of respondents answered "3" to the fourth
> question. I *do* care if someone says that they did use tech support
> in the past month, or if 72% said that their experience with tech
> support was "average". Why doesn't the question belong in the
> database, especially when the point of this design is to handle
> questionnaires that can be completely different?

Then don't name your columns question_1 and question_4. Name them [used tech support] and [tech support rating]

>>The reason this approach deserves the label "best" is because it
>>exploits SQL instead of working against it or in spite of it.  You
>>therefore end up with an intuitively simple table design (and an
>>intuitive approach to future table design), and a single application
>>that can (in principle) handle all future questionnaires without
>>modification.  Having written the application once, you will be able to
>>implement new questionnaires in a few minutes (just create a new
>>table), and you have the full power of your SQL engine to query it
>>directly and intuitively.

> I like the meta-data querying. I'm still having trouble figuring out
> what the structure of any given response table would look like. And it
> seems to me to be a no-brainer that the questionnaire goes in the
> database somewhere.
> Thanks, Roy! Now, I'll get started on implementing it. :)
Received on Thu Apr 23 2009 - 19:32:22 CEST

Original text of this message