Re: storing survey answers of different data types

From: Roy Hann <specially_at_processed.almost.meat>
Date: Thu, 23 Apr 2009 03:43:31 -0500
Message-ID: <15adnVNRytKut23UnZ2dneKdnZydnZ2d_at_pipex.net>


cimode_at_hotmail.com wrote:

> [Snipped]

>> Oh what the heck. I promise! :)
> Be careful what you wish for...
>
> You just swallowed the red pill in Matrix...

OK, here's the sketch, and it is only a sketch of the "best" approach. I am going to describe it in terms of SQL only because that is what you are going to end up using. I hope you will end up thinking my sketch is simplistic or trivial. (Looking at a database design and asking "but where's the magic?" is probably a sign that you are on the right track. There shouldn't be anything magic or mysterious in the database that can be discovered/disclosed/decoded only by the application at run-time.) I am sure others here will elaborate on my suggestion if they feel I've been excessively colloquial/sloppy/terse/SQL-ish, but this is a sketch for a newbie in a hurry and I am rushing this out before the boss catches me. :-)

Before I start I should clarify the terminology a bit. To me, a questionnaire is a set of questions. A single questionnaire may be administered to any number of interviewees. Each interviewee will provide a response which is their set of answers to the questions on the questionnaire, hence each questionnaire will have multiple responses--one from each interviewee.

First, note that each row in an SQL table is expected to be understood as an assertion of fact. There is expected to be a "template" of a sentence associated with the table, into which the values from each row can be plugged to form a meaningful statement. For instance a simple template for the responses to one questionnaire might be "on day <date> the interviewee called <name> answered question 1 <answer1> and question 2 <answer2> and question 3 <answer3>...". We don't store the template anywhere--maybe we should but we don't. However it should be written down somewhere because everything that you do subsequently in your database design and your application code has to make sense with respect to that template.

I know nothing about your business process (and I don't want to--too busy),but it seems to me that you probably need just one response table per questionnaire to record the responses. If you have thousands of questionnaires then you'll need thousands of response tables. No worry; there is no ration on tables, and multiple smaller tables will make querying fast.

I expect different questionnaires to have different kinds of reponses. One might be a set of yes/no answers. Another might be a set scores from best to worst or whatever. Another might be set of observations (e.g. date, time, temperature, windchill, etc.). And yet another could be a mix. So for each questionnaire you create a response table with columns of the appropriate type for that questionnaire. Nothing mysterious so far. But note that even if all your questionnaires have the exact same kind of responses they should still each have their own response table. Having the same "shape" is an irrelevant coincidence; don't be tempted to confuse or conflate them because that obscures what is really going on.

When you create your response table in your database, the SQL engine will automatically create entries in the database catalogue tables to describe your table. It *has* to do this, both for the obvious reasons, and because the relational model (which SQL pretends to respect) insists that the meta-data be represented explicitly in the database in a form capable of being queried by anyone. I suspect this one point is where 99.99% of practitioners go astray. They don't know the meta-data is there and that it is REQUIRED to be there and it is absolutely legitimate to assume it will always be there and always be public! The meta-data tells both you and the SQL engine what columns exist in the table and what each is called and what its data-type is and a whole lot of other vital stuff.

This is where your moderately elaborate application comes in.

You need to write a generic questionnaire-processing application which takes a response table name as input, and then interrogates the system catalogues for the meta-data to discover the number and type of each column in the response table. Once it has that information it can insert/update/delete/select using dynamically constructed queries. Just what your program has to look like will depend crucially on your programming language and your DBMS. If you are using any kind of API you are probably already constructing queries dynamically whether you know it or not. If you are using something more old-school like Cobol or C with embedded SQL you will have to learn to use dynamic SQL, but it's actually quite easy.

I haven't talked about a table for the questionnaire itself (i.e. the questions). I don't see any need for such a table in the database. It is just documentation. Obviously if it is convenient to have the questions in the database for the purpose of decorating reports then by all means have one.

The reason this approach deserves the label "best" is because it exploits SQL instead of working against it or in spite of it. You therefore end up with an intuitively simple table design (and an intuitive approach to future table design), and a single application that can (in principle) handle all future questionnaires without modification. Having written the application once, you will be able to implement new questionnaires in a few minutes (just create a new table), and you have the full power of your SQL engine to query it directly and intuitively.

I hesitate to bring this up at all, but you're going to hear about it sooner or later: there is another, hugely popular approach that tries to do the same thing as what I have described. It gets aggrandized with an "official-sounding" name. It is called the entity-attribute-value model, or EAV for short. You can google the history of this group for explanations of why it is hideously wrong-headed. Don't be taken in by the erudite-sounding name or the mountains of erudite-looking papers and articles you will find. They are ignorant tosh. Maybe it made some kind of sense in the early 1970s, but its day is long past.

-- 
Roy
Received on Thu Apr 23 2009 - 10:43:31 CEST

Original text of this message