Re: weighted random selection of record

From: Stuart McGraw <smcgraw_no_spam_me_at_frii.com>
Date: Wed, 21 May 2003 11:54:34 -0600
Message-ID: <3ecbbd5b$0$199$75868355_at_news.frii.net>


> The logic is:
> 1) Select each record with a probability proportional to its weight:
> SELECT * FROM tbl WHERE rnd() <= [weight]
> 2) Randomly sort the records returned:
> ORDER BY rnd()
> 3) Select just 1st record from the randomly selected and sorted set:
> SELECT TOP 1 * FROM (...)
Maybe I am missing something? Assume I want to get row A back 75% of the time and row B 25%. Then the table is

          id    weight
         --    ------
          A      .75
          B      .25

Then

    SELECT * FROM tbl WHERE rnd() <= [weight] will return no rows 25% of the time, yes? If the weights are normalized so the largest weight is 1, the table is

          id    weight
         --    ------
          A       1
          B      .3333

Now the select statement will always return at least one row. It will return only row A with probability 2/3 and both rows with probability 1/3. If we randomly pick a row from those results the total chances for A is 2/3+(1/2)*(1/3) = 5/6 and B = 1/6. This is the same result as the earlier rnd()*[weight] method.

But your post made me realize that one can store the cumulative probabilities in the weight attribute so that my original example table:

         id    weight
         --    ------
          1      .6
          2      .1
          3      .3

would become

          id    weight
         --    ------
          1       0
          2       .6
          3       .7

And then the select is simply:

    SELECT TOP 1 * FROM tbl WHERE Rnd()>[weight] ORDER BY weight DESC;

So thanks!!

"andrewst" <member14183_at_dbforums.com> wrote in message news:2902114.1053450601_at_dbforums.com...
>
> Originally posted by Stuart McGraw
> > I want a query that will randomly select a row from
> > a table but with a probability determined by a "weight"
> > value stored in the rows. For example, if I have the
> > table
> >
> > id weight
> > -- ------
> > 1 .6
> > 2 .1
> > 3 .3
> >
> > What I tried was
> >
> > SELECT TOP 1 * FROM tbl ORDER BY rnd()*[weight]
> >
> > where rnd() is a function that returns a uniformly
> > distributed real number between 0 and 1. This almost
> > does the right thing. The problem is that the resulting
> > distribution is not what one would (naively) hope.
> >
> You need to make use of the weighting in the WHERE clause to affect the
> chances of selecting a record, more like this:
>
> SELECT TOP 1 * FROM
> ( SELECT * FROM tbl WHERE rnd() <= [weight] ORDER BY rnd() )
>
> (I don't use this DBMS, so syntax may be dodgy).
>
> This assumes that [weight] is a number between 0 and 1.
>
> The logic is:
>
> 1) Select each record with a probability proportional to its weight:
>
> SELECT * FROM tbl WHERE rnd() <= [weight]
>
> 2) Randomly sort the records returned:
>
> ORDER BY rnd()
>
> 3) Select just 1st record from the randomly selected and sorted set:
>
> SELECT TOP 1 * FROM (...)
>
> --
> Posted via http://dbforums.com
Received on Wed May 21 2003 - 19:54:34 CEST

Original text of this message