Re: Stochastic Queries
Date: Thu, 20 Sep 2007 16:01:56 -0700
Message-ID: <1190329316.117411.111390_at_n39g2000hsh.googlegroups.com>
>> I suggest the OP read "Massive Stochastic Testing of SQL" by Don Slutz of Microsoft. <<
I need to get a copy, too.
>> I am VERY interested in understanding why these 'proprietary kludges' (either to select random samples, or to generate random numbers) 'have problems' with skewed distributions. (Disclosure - I worked on one.) <<
The SQL products I have worked with use traditional Linear Congruential algorithms, which they inherited from C and UNIX. As you get larger and larger samples, you get skewing and duplicate values. Knuth Vol #2, Chapter 3 has a good history and some remarks about the history.
The best (worst?) horror story in the 1970's was the discovery that an IBM FORTRAN routine was not valid. It trashed quite a few PhD projects. That was the most popular tool in those days for research.
Not if the population changes each time. But I understand your point.
However, consider that one of the advantages of RNG is that you can repeat an experiment. I worked with some lab equipment on an old DEC PDP-11 more decades ago than I really like to remember with a special circuit card. This thing had a speck of radioactive material and a simple Geiger counter tube to create **quantum level** random digits. Of course people did not seeing that black & yellow radiation symbol on equipment in those days (Cold War Era) so I am not sure if it is still available. You can probably use background radiation or radio noise today with sensitive equipment.
I start at the front of the RAND Corporation table and pull digits until I get to the end. I can go to some university sites and get bigger validated tables, too. Received on Fri Sep 21 2007 - 01:01:56 CEST