Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Usenet -> comp.databases.theory -> Re: Idempotence and "Replication Insensitivity" are equivalent ?

Re: Idempotence and "Replication Insensitivity" are equivalent ?

From: Bob Badour <>
Date: Sat, 23 Sep 2006 20:30:29 GMT
Message-ID: <FPgRg.37439$>

Marshall wrote:

> Bob Badour wrote:

>>Marshall wrote:
>>>Amusing idea: write a machine learning program and feed
>>>it texts for which one knows the sex of the author, in order
>>>to produce a technique for establishing sex from someone's
>>>writings. I imagine it could be modestly successful.
>>Are you suggesting that she uses "that" a lot?

> Not so far off, actually. There has been a fair bit of work done
> in using machine learning techniques to identify authorship, and
> one technique that is surprisingly effective is looking at the
> frequency of use of common words. I wouldn't have thought
> that that technique was worth a damn, but in fact it works
> quite well. When one looks at the Jane Austen novels, she
> consistently uses "the" and "of" in almost exactly a 1:1 ratio,
> whereas Henry David Thoreau uses "the" more than 2:1 over
> "of." Throw together enough of these little features and they
> start to form a kind of textual fingerprint. I wrote some
> software that could consistently pair up all the Arthur
> Conan Doyle novels, and all the Jane Austen novels, and
> correctly distinguish them from Thoreau, Mary Shelley, etc.
> It had the most difficulty distinguishing between Jane Eyre and
> Wuthering Heights; the authors of those two novels were
> sisters, and had grown up and gone to school together.
> It had no trouble distinguishing between Mary Shelley and
> Percy Shelley, wife and husband.
> Oh, and all hail project Gutenberg as a fine source for
> online texts, whether for reading or analysis.
>>While that might be
>>suggestive of male sex if she were an anglophone, I don't know that it
>>would mean that much for someone that has a different mother tongue.

> That's a good point, actually. While I wouldn't expect that identity
> detection would be affected by English as a second language,
> I would expect that sex detection would be highly culturally
> specific. As far as detecting it by text goes, anyway.

I would say that having a foreign mother tongue would almost certain confound the results for sexing the writer. Consider the frequency of the verb "to be" from a russian. Received on Sat Sep 23 2006 - 15:30:29 CDT

Original text of this message