Re: Best way to design table to store attributes?

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sat, 24 Jan 2009 14:57:09 -0400
Message-ID: <497b648a$0$5460$9a566e8b_at_news.aliant.net>


paul c wrote:

> patrick61z_at_yahoo.com wrote:
> 

>> On Jan 22, 3:53 pm, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
>>
>>> At a more basic level, though, are you sure you have correctly modelled
>>> your problem? 150 independent booleans creates a state machine with
>>> somewhere around 10^45 states. That's a big state machine. Without any
>>> sort of transition constraints, that creates a fully connected state
>>> machine with 10^45 states and somewhere around 10^90 allowable
>>> transitions.
>>>
>>> Those are big numbers, and it seems unlikely you really need such an
>>> unwieldy state machine for each row of your table.
>>
>> I've often wondered about this line of thinking, that if your system
>> isn't implemented in a purely relational methodology, you have no
>> choice but to implement it as a state machine.

What a stupid thing to say... regardless whether one implements the state machine relationally or any other way, the state machine remains.

>> Just because the original design used what appears to be a perfectly
>> decent instance of repeating groups does not mean the designer is then
>> condemned to the complexity of Mr. Badour's imaginary zillion states.

Who the fuck said anything about repeating groups? Apparently, Patrick has dire problems comprehending relatively simple written english. Independent variables are independent not repeating. Duh! ::rolls eyes::

>> While the DATA in these tables could have a zillion states, thats like
>> saying that an ssn field has 10^9 states because it has nine
>> characters that can range from zero to nine.

Yes, exactly like that. One of my high school french teachers would have reponded to that with "You have a fine grasp of the obvious." Except elsewhere Patrick utterly fails to demonstrate any such grasp.

I suggest one stop to ponder that a billion seems like a large number but it is infinitesimally small compared to 10^45, which is itself infinitesimally small comparet to 10^90.

>> Yes, there may be many
>> many possible values, but simple payroll programs for example are not
>> burdened with a zillion states as either entire ranges of values are
>> handled identically, or the unneeded values are simply never entered
>> into storage.

That depends on the payroll program. ADP regularly handles on the order of 10^7 social security numbers, and the IRS handles on the order of 10^8 or 10^9 of them--when one stops and considers that even non-resident aliens like myself sometimes have one.

Nevertheless, 10^45 states is a vast number of states. It is on the order of the mass of the sun in picograms. However, Carl's explanation suggests the first thing anyone will do with that data is dimension reduction.

>> C'mon people, programming 101 here.
>>
>> Does anybody have knowledge of the origin of this silly "state
>> machine" argument?

Yes, of course, we all do. It originated in Patrick's skull as a figment of his imagination. I doubt any argument exists beyond that scope.

>> I'm really curious why it carries any weight in
>> these sort of discussions but I also imagine that it originally was a
>> valid point offered by someone a bit more academically inclined than
>> Mr. Badour, and I'm genuinely interested in reading about it.i

Since it carries no weight, the rest seems rather pointless.

<snip>

> I'd say the point of the 'state machine' comp-
> arison was likely to illustrate that the original post completely lacked 
> 'actionable' requirements, ie., the 'requirement', as stated, was silly 
> (no offence intended to the OP who qualified his questions).

I respectfully disagree. The point was to ask Carl to consider an observation and to judge for himself whether his requirements are reasonable in that light.

> I've seen 
> people try to pass off such as real requirements more times than I want 
> to remember.  Note that the original poster hadn't clarified his 
> requirement until later.  A later message showed that he's trying to 
> deal with a survey result.  But since a marketing operation seems 
> involved, there are further applications, eg., 'who bought a trip and 
> might become a repeat customer?'.

I suspect those considerations don't impact on the design of the part of the data model Carl asked about. Questions that come to my mind are: Why booleans? Why not "likes", "dislikes" and "didn't answer" ? Or even a numeric scale?

Can people take the survey more than once? Is the format of the survey fixed for all time? Or might one use different versions of the survey at different times? For example by adding or removing questions? Or by persenting additional questions to some cohorts? If that is the case, one might need another value like "not presented" as distinct from "didn't answer".

> Besides, if 1.2 million respondents, times 150 questions, is considered > a big db, I have to laugh.

Hardware available in 1994 handled much larger databases quite well. Capacities and speeds are much higher today.

150 dimensions seems unwieldy; except, I expect the data will be the raw input for some sort of dimension reduction or regression analysis.

Carl seems concerned about the labor creating 150 columns in his table. I don't see why it would involve any more labor than creating an enumerated type of some sort with 150 values, which is what his alternative designs seem to involve.

<snip> Received on Sat Jan 24 2009 - 19:57:09 CET

Original text of this message