Random Sample Using SQL?

From: Chuck Erickson <beachst.cerick01_at_eds.com>
Date: 1997/06/27
Message-ID: <33B458F3.76ED_at_eds.com>#1/1


Any ideas on how to select a random sample of 'n' rows from a database using SQL. Our environment is a Data Warehouse using ORACLE and Business Objects. The goal is a generic technique to allow end-users the capability to select a random sample from various tables.

Since the database is large, we need something that is server based, rather than a function in a client application--otherwise, for example, a million rows are sent to the client which then filters it into the desired sample of a thousand rows--not good for the network.

[Quoted] I've seen examples of where every Nth row is selected, but that is not really random because data is usually sorted or ordered in some way before it is loaded into the database.

Another ideas is to use a random number. For example, if you had a random number function that returned a random float between 0 and 1 and you wanted to sample 10% of a table you could write

SELECT * FROM EMP
WHERE RANDOM() <= .10

Oracle does not have a random number function, but it does allow you to define your own function that can be used in SQL statements (as opposed to PL/SQL programs) with some limitations. The package listed below provides a random number but cannot be used in a SQL statement since it updates the seed on each call. Oracle does not allow updating of package variables--due to so-called side effects.

Also, does anyone know whether ORACLE 8 will have a built-in random number function?

Any ideas would be appreciated.
Thanks

/* RANDOM NUMBER GENERATOR */
create or replace package random
is

        pragma restrict_references( random, WNDS, RNPS );

        procedure srand( new_seed in number );

        function rand return number;
        pragma restrict_references( rand, WNDS  );

        procedure get_rand( r OUT number );

        function rand_max( n IN number ) return number;
        pragma restrict_references( rand_max, WNDS);

        procedure get_rand_max( r OUT number, n IN number );

end random;
/

create or replace package body random
is

        multiplier      constant number         := 22695477;
        increment       constant number         := 1;
        "2^32"          constant number                 := 2 ** 32;
        "2^16"          constant number         := 2 ** 16;
        "0x7fff"        constant number         := 32767;
        Seed            number := 1;

--
procedure srand( new_seed in number ) is begin Seed := new_seed; end srand;
--
function rand return number is begin seed := mod( multiplier * seed + increment, "2^32" ); return bitand( seed/"2^16", "0x7fff" ); end rand;
--
procedure get_rand( r OUT number ) is begin r := rand; end get_rand;
--
function rand_max( n IN number ) return number is begin return mod( rand, n ) + 1; end rand_max;
--
procedure get_rand_max( r OUT number, n IN number ) is begin r := rand_max( n ); end get_rand_max;

--

begin

        select userenv( 'SESSIONID' ) into seed from dual; end random;
/ Received on Fri Jun 27 1997 - 00:00:00 CEST

Original text of this message