Re: Help with Random Samples!

From: Brian Tkatch <SPAMBLOCK.Maxwell_Smart_at_ThePentagon.com.SPAMBLOCK>
Date: Fri, 16 Feb 2001 15:28:00 GMT
Message-ID: <3a8d42cb.52489671@news.alt.net>

On Fri, 16 Feb 2001 14:00:52 GMT, sw_at_weinerfamily.org wrote:

>I do, however records are deleted so I will need to run more often than 300
>if no records are returned. In addition, while my example is 300, the sample
>size could be 300 million. Also, since I must have a unique set of sampled
>records, each fetch would have to be compared to the set so far to ensure
>they are distinct. This process could take hours or perhaps days for a large
>sample.
>
>Any suggestions on how to do this efficiently?
>
>Thank you.
>

OK, I have read this, and I will admit that I have no idea what it is that you want. I thought you asked how to retrieve 300 randomly unique records.

What exactly are your requirements?

The sample keyword helps pull out a percentage of the total. It is not necessarily random or accurate but it may do the job. If you want to completely control the randomness and number, I would think DBMS_RANDOM would help in generating ids for records top be pulled.

I am by no means an expert. I was thinking you could create a large IN clause for your query, I.E. "WHERE Id IN (<random numbers>);". If it gets too large it would probably be very slow; if you could even pass thay large a query.

In that case, you may want to create a second table to hold random numbers. Here's a guess:

- CREATE TABLE Test_Id(Id PRIMARY KEY);
- Get MAX(Id) from your data table.
- Have a procedure accept the max id and randomly put the numbers in

the test table, via a FOR loop, ignoring violations. - SELECT from the two tables joining on the ID column and use ROWNUM to limit the output to the number of rows you want returned.

HTH,
Brian Received on Fri Feb 16 2001 - 09:28:00 CST