Re: storing survey answers of different data types

From: lawpoop <lawpoop_at_gmail.com>
Date: Thu, 23 Apr 2009 08:33:50 -0700 (PDT)
Message-ID: <7eeaea8a-0404-4b96-a34a-716f1e2e39d7_at_k41g2000yqh.googlegroups.com>


On Apr 23, 1:43 am, Roy Hann <specia..._at_processed.almost.meat> wrote:

> OK, here's the sketch, and it is only a sketch of the "best" approach. I
> am going to describe it in terms of SQL only because that is what you
> are going to end up using. I hope you will end up thinking my sketch
> is simplistic or trivial. (Looking at a database design and asking
> "but where's the magic?" is probably a sign that you are on the right
> track. There shouldn't be anything magic or mysterious in the database
> that can be discovered/disclosed/decoded only by the application at
> run-time.) I am sure others here will elaborate on my suggestion if
> they feel I've been excessively colloquial/sloppy/terse/SQL-ish, but
> this is a sketch for a newbie in a hurry and I am rushing this out
> before the boss catches me. :-)
>

So, it seems like the magic of this approach is 1. that each questionnaire has its own response table, and 2. that I'm querying the table metadata to find out column types, etc?

> Before I start I should clarify the terminology a bit. To me,
> a questionnaire is a set of questions. A single questionnaire may be
> administered to any number of interviewees. Each interviewee will
> provide a response which is their set of answers to the questions on
> the questionnaire, hence each questionnaire will have multiple
> responses--one from each interviewee.

This sounds about right.

>
> First, note that each row in an SQL table is expected to be
> understood as an assertion of fact. There is expected to be a
> "template" of a sentence associated with the table, into which the
> values from each row can be plugged to form a meaningful statement.
> For instance a simple template for the responses to one questionnaire
> might be "on day <date> the interviewee called <name> answered question
> 1 <answer1> and question 2 <answer2> and question 3 <answer3>...". We
> don't store the template anywhere--maybe we should but we don't.
> However it should be written down somewhere because everything that you
> do subsequently in your database design and your application code has
> to make sense with respect to that template.

We're not making a narrative per se here -- I think this is what you mean by 'template'. We do need to know the question that prompted a response, however. One survey might start out with "Have you used tech support in the past month?", another might begin "On a scale of 1-5, how would you rate your experience on your support call...". It doesn't help us to know that someone responded "yes" to a question, while another person responded "3" to another quesiton, when we don't know what we asked them about.

So, if one wanted to call a list of questions a template, or narrative, or whatever, that's fine, but there's really not more to it than the list of questions.

>
> I know nothing about your business process (and I don't want to--too
> busy),but it seems to me that you probably need just one response table
> per questionnaire to record the responses. If you have thousands of
> questionnaires then you'll need thousands of response tables. No
> worry; there is no ration on tables, and multiple smaller tables will
> make querying fast.

Okay...

>
> I expect different questionnaires to have different kinds of reponses.

That's the whole point. Actually, it goes a bit further, if this isn't what you already meant: each *question* can have different kinds of responses. Each questionnaire will likely have many different data types of responses.

> One might be a set of yes/no answers. Another might be a set scores
> from best to worst or whatever. Another might be set of observations
> (e.g. date, time, temperature, windchill, etc.). And yet another could
> be a mix. So for each questionnaire you create a response table with
> columns of the appropriate type for that questionnaire. Nothing
> mysterious so far. But note that even if all your questionnaires have
> the exact same kind of responses they should still each have their own
> response table. Having the same "shape" is an irrelevant coincidence;
> don't be tempted to confuse or conflate them because that obscures
> what is really going on.

Say a questionnaire has three 1-5 questions ( "On a scale of 1 - 5, please rate..." ), a free-form text answers, two agree/disagree, and a branching yes/no question with three levels. What would be the table structure for that questionnaire?

CREATE TABLE `Questionnaire_052` (
`question_1` ENUM( '1', '2', '3', '4', '5' ) NULL , -- this is a MySQL data type; I suppose another database would have an int column and a constraint to make it 1-5.

`question_2` ENUM( '1', '2', '3', '4', '5' ) NULL ,
`question_3` ENUM( '1', '2', '3', '4', '5' ) NULL ,
`question_4` TEXT NULL ,
`question_5` BOOL NULL, -- might want to use ENUM here, so that the
responses posed to the user are "Yes/No", "Agree/Disagree", etc. Otherwise, have to store that data somewhere else, why not here?
`question_6` BOOL NULL,
`question_7` BOOL NULL,
`question_8` BOOL NULL,
`question_9` BOOL NULL,

) ENGINE = InnoDB ;

?

Please tell me I'm doing this wrong :)

> When you create your response table in your database, the SQL
> engine will automatically create entries in the database catalogue
> tables to describe your table. It *has* to do this, both for the
> obvious reasons, and because the relational model (which SQL pretends
> to respect) insists that the meta-data be represented explicitly in the
> database in a form capable of being queried by anyone. I suspect this
> one point is where 99.99% of practitioners go astray. They don't know
> the meta-data is there and that it is REQUIRED to be there and it is
> absolutely legitimate to assume it will always be there and always be
> public! The meta-data tells both you and the SQL engine what columns
> exist in the table and what each is called and what its data-type is
> and a whole lot of other vital stuff.

So using DDL in day-to-day operations ( adding a questionnaire ) are legal moves in this situation? I can go along with that.

>
> This is where your moderately elaborate application comes in.
>
> You need to write a generic questionnaire-processing application
> which takes a response table name as input, and then interrogates the
> system catalogues for the meta-data to discover the number and type of
> each column in the response table. Once it has that information it can
> insert/update/delete/select using dynamically constructed queries.
> Just what your program has to look like will depend crucially on your
> programming language and your DBMS. If you are using any kind of API
> you are probably already constructing queries dynamically whether you
> know it or not. If you are using something more old-school like Cobol
> or C with embedded SQL you will have to learn to use dynamic SQL, but
> it's actually quite easy.

It sounds like this is something I'm already doing when I create HTML select boxes. MySQL has an 'Enum' column type which is a list of enumerated values. I can query the table definition to get those values and populate the select box.

>
> I haven't talked about a table for the questionnaire itself (i.e. the
> questions). I don't see any need for such a table in the database. It
> is just documentation. Obviously if it is convenient to have
> the questions in the database for the purpose of decorating reports then
> by all means have one.

Survey Website: "Please choose a response, 1 through five, for this first question of the survey, and click the 'Submit button'. To find out what the first question is, refer to the documentation."

Me: "56% of the respondants replied 'Yes' to the first question." Boss: "Okay -- what was the first question again?" Me: "It doesn't matter what the question was; that's just window dressing."
Boss: "These survey results are useless without knowing the questions!"
Me: "Alright, alright, let's look at the documentation..."

Am I following you here? Convenient, but not necessary?

Suppose I wanted to compile a report of the responses. Am I to look up the questions from documentation, type them in, and produce the report? Why not store them electronically? In a database? Why not store them in the same database, in the same questionnaire where they originated? Why not put them right where I need them, so I don't need to refer to documentation when I make a report, but instead I can just throw another column into the query?

I don't really care that someone responded "Yes" to the first question, or that 72% of respondents answered "3" to the fourth question. I *do* care if someone says that they did use tech support in the past month, or if 72% said that their experience with tech support was "average". Why doesn't the question belong in the database, especially when the point of this design is to handle questionnaires that can be completely different?

>
> The reason this approach deserves the label "best" is because it
> exploits SQL instead of working against it or in spite of it. You
> therefore end up with an intuitively simple table design (and an
> intuitive approach to future table design), and a single application
> that can (in principle) handle all future questionnaires without
> modification. Having written the application once, you will be able to
> implement new questionnaires in a few minutes (just create a new
> table), and you have the full power of your SQL engine to query it
> directly and intuitively.

I like the meta-data querying. I'm still having trouble figuring out what the structure of any given response table would look like. And it seems to me to be a no-brainer that the questionnaire goes in the database somewhere.

Thanks, Roy! Now, I'll get started on implementing it. :) Received on Thu Apr 23 2009 - 17:33:50 CEST

Original text of this message