Re: Stochastic Queries

From: DBMS_Plumber <paul_geoffrey_brown_at_yahoo.com>
Date: Wed, 19 Sep 2007 13:21:17 -0700
Message-ID: <1190233277.003919.150840_at_i38g2000prf.googlegroups.com>


On Sep 10, 9:45 am, -CELKO- <jcelko..._at_earthlink.net> wrote:
> You can write some proprietary kludges, but frankly they have problems
> with skewed distributions.
...

> then load it with the particular random numbers I wanrted from a stat
> package. More work, but MUCH better results.

In the first place, this does not answer the question the OP asked.

My advice to the OP would be to write a Java app that generated these queries. It's been done before. I suggest the OP read "Massive Stochastic Testing of SQL" by Don Slutz of Microsoft. To my certain knowledge at least two of the major DBMS vendors have a similar testing facility. They're not a panacea, but they're useful. For extra points you might want to include joins, unions and what-not in your range of query features.

But the the question Joe answers, I am VERY interested in understanding why these 'proprietery kludges' (either to seelct random samples, or to generate random numbers) 'have problems' with skewed distributions. (Disclosure - I worked on one.)

These 'proprietary kludges' sweat a lot of details. Joe's solution here is wrong. Repeated use of the same set of random numbers will generate an identical sample each time it's used. From a practical point of view, you would be required to generate a new set of random values for each query. I am also curious to know how you use this table to restrict the rows being returned in the general case to any kind of uniform random sample. Received on Wed Sep 19 2007 - 22:21:17 CEST

Original text of this message