Re: what are keys and surrogates?

From: David BL <davidbl_at_iinet.net.au>
Date: Thu, 10 Jan 2008 22:17:59 -0800 (PST)
Message-ID: <bf913e19-20aa-45fb-9256-875067ed9bdd_at_t1g2000pra.googlegroups.com>


On Jan 11, 2:45 pm, "David Cressey" <cresse..._at_verizon.net> wrote:
> "David BL" <davi..._at_iinet.net.au> wrote in message
>
> news:a8ec9dd4-ab6c-4117-980f-003328677c20_at_e10g2000prf.googlegroups.com...> On Jan 11, 4:28 am, "David Cressey" <cresse..._at_verizon.net> wrote:
> > > "David BL" <davi..._at_iinet.net.au> wrote in message
>
> news:1d8bc808-c202-45bd-8d04-5ad80bb895ef_at_n22g2000prh.googlegroups.com...>
> On Jan 10, 5:05 pm, "David Cressey" <cresse..._at_verizon.net> wrote:
>
>
>
>
>
>
>
> > > > > "David BL" <davi..._at_iinet.net.au> wrote in message
> > > > > Off topic.
>
> > > > > I prefer quantified as the difference in entropy between the state
> that
> > > > > includes d and the state that excludes it. I believe that, except
> for a
> > > > > scale factor, the two measure boil down to the same thing, except
> for
> > > one
> > > > > subtle difference:
>
> > > > > Using entropy as the measure enables one to consider information
> content
> > > as
> > > > > being context sensitive. That is, if d is to be included in some
> other
> > > > > database e, then the information provided by d to e is the entropy
> > > > > difference between e and e+d (where "+" is suitably defined).
>
> > > > Are you suggesting that when d is included in e, there are less states
> > > > available for d?
>
> > > No. Did I say something that implies that?
>
> > Perhaps not. My understanding is that entropy is defined as a
> > logarithm on the number of states available to a system, and tends to
> > be proportional to the number of bits required to represent a
> > particular state. When two *independent* systems s1,s2 are combined
> > into a single overall system s = s1 + s2, the total number of states
> > available to s is the product of the number of states available to s1
> > and s2, and by property of logarithms, the entropy is additive.
>
> > I thought your comment had something to do with coupling between d and
> > e. ie there being less available states for d in the context of e,
> > which is why you suggested an entropy measure of information content.
>
> Your nuderstanding is the same as mine.
>
> I did intend some sort of coupling, but I don't completely understand your
> response to my comment.
>
> Consider the following scenarios:
>
> Case 1.
> d contains "There is a person named Bob, and his age is 45."
> e contains "There is a person named Bob"
>
> Case 2.
> d contains "There is a person named Bob, and his age is 45."
> e contains "There might or might not be a person named Bob"
>
> Case 3.
> d contains "There is a person named Bob, and his age is 45."
> e contains "There is no person named Bob"
>
> If we ask how much entropy d+e holds, when compared to e alone, we get
> this.
>
> Case 1 provides less additional information than Case 2. Case 3 puts d+e in
> a self contradictory state.
>
> Then there's Case 4.
>
> d contains "There is a person named Bob, and his age is 45."
> e contains "There is a person named Bob, and his age is 45."
>
> d+e adds no information to e (wrt the subset of data under discussion)

This doesn't really make sense to me because I think of entropy of a system as relating to the number of states *available* to the system, and I don't know how to reconcile that with your discussion about particular states. If you know a-priori that a system is in a specific state then its entropy is zero!

Are you saying that because some cases are contradictory there are less available states in d+e than would otherwise be available if d,e were independent?

BTW my knowledge of entropy stems from a unit in Statistical Mechanics in second year Physics over 20 years ago, so my understanding may differ from the concept of entropy as defined in information theory. Received on Fri Jan 11 2008 - 07:17:59 CET

Original text of this message