Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Help with Random Samples!

Re: Help with Random Samples!

From: <sw_at_weinerfamily.org>
Date: Fri, 16 Feb 2001 18:09:23 GMT
Message-ID: <3A8D6C68.60D3E172@weinerfamily.org>

That's probably as good as it gets :-) I was hoping there was a function that I could pass a number and it returns that many rows at random. The SAMPLE clause sort of does this but it only works with percentages - not fixed numbers. I used the example of 300 rows because I didn't want to type 300,000,000,000. I'm dealing with terabytes and the sample size could be as much as 50% of the database.

So I guess I can create a loop that randomly picks ids and inserts into another table. Each time I will have to make sure the id isn't already in the table. Given the possible size of the sample does an index on this temp table make sense or will the insert cost be too high?

Thanks,

Brian Tkatch wrote:

> On Fri, 16 Feb 2001 14:00:52 GMT, sw_at_weinerfamily.org wrote:
>
> >I do, however records are deleted so I will need to run more often than 300
> >if no records are returned. In addition, while my example is 300, the sample
> >size could be 300 million. Also, since I must have a unique set of sampled
> >records, each fetch would have to be compared to the set so far to ensure
> >they are distinct. This process could take hours or perhaps days for a large
> >sample.
> >
> >Any suggestions on how to do this efficiently?
> >
> >Thank you.
> >
> OK, I have read this, and I will admit that I have no idea what it is
> that you want. I thought you asked how to retrieve 300 randomly unique
> records.
>
> What exactly are your requirements?
>
> The sample keyword helps pull out a percentage of the total. It is not
> necessarily random or accurate but it may do the job. If you want to
> completely control the randomness and number, I would think
> DBMS_RANDOM would help in generating ids for records top be pulled.
>
> I am by no means an expert. I was thinking you could create a large IN
> clause for your query, I.E. "WHERE Id IN (<random numbers>);". If it
> gets too large it would probably be very slow; if you could even pass
> thay large a query.
>
> In that case, you may want to create a second table to hold random
> numbers. Here's a guess:
> - CREATE TABLE Test_Id(Id PRIMARY KEY);
> - Get MAX(Id) from your data table.
> - Have a procedure accept the max id and randomly put the numbers in
> the test table, via a FOR loop, ignoring violations.
> - SELECT from the two tables joining on the ID column and use ROWNUM
> to limit the output to the number of rows you want returned.
>
> HTH,
> Brian
Received on Fri Feb 16 2001 - 12:09:23 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US