| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> comp.databases.theory -> Re: Stochastic Queries
>> I suggest the OP read "Massive Stochastic Testing of SQL" by Don Slutz of Microsoft. <<
I need to get a copy, too.
You might want to look at a current article by Ben Gan on the use of RAND() and NEWID() and differences in SQL Server 2000 and 2005. He covers the problems with how a CASE expression uses RAND(), deterministic functions, duplicate values, etc.
>> I am VERY interested in understanding why these 'proprietary kludges' (either to select random samples, or to generate random numbers) 'have problems' with skewed distributions. (Disclosure - I worked on one.) <<
The SQL products I have worked with use traditional Linear Congruential algorithms, which they inherited from C and UNIX. As you get larger and larger samples, you get skewing and duplicate values. Knuth Vol #2, Chapter 3 has a good history and some remarks about the history.
The best (worst?) horror story in the 1970's was the discovery that an IBM FORTRAN routine was not valid. It trashed quite a few PhD projects. That was the most popular tool in those days for research.
>> Repeated use of the same set of random numbers will generate an identical sample each time it's used. <<
Not if the population changes each time. But I understand your point.
However, consider that one of the advantages of RNG is that you can repeat an experiment. I worked with some lab equipment on an old DEC PDP-11 more decades ago than I really like to remember with a special circuit card. This thing had a speck of radioactive material and a simple Geiger counter tube to create **quantum level** random digits. Of course people did not seeing that black & yellow radiation symbol on equipment in those days (Cold War Era) so I am not sure if it is still available. You can probably use background radiation or radio noise today with sensitive equipment.
But if I want a fixed table, I think most statisticians would agree that the RAND corporation "Table of One Million Random Digits" has been tested in every possible way for mathematical correctness. That is a fixed table available on diskette!
>> From a practical point of view, you would be required to generate a new set of random values for each query. I am also curious to know how you use this table to restrict the rows being returned in the general case to any kind of uniform random sample. <<
I start at the front of the RAND Corporation table and pull digits until I get to the end. I can go to some university sites and get bigger validated tables, too. Received on Thu Sep 20 2007 - 18:01:56 CDT
![]() |
![]() |