Re: challenging sql question

From: Rudy Fernandes <rferdy_at_kuwait.net>
Date: 1997/04/21
Message-ID: <5jf4tb$kdk$2@gulfa.kuwait.net>#1/1

In article <335A81C2.27FD_at_ventera.com>,

"Lan H. Tran" <lantran_at_ventera.com> wrote:
>
>please help me come up with an elegant solution for the following
>problem:
>
>i have a table with 2 columns:
> state varchar(10)
> weight integer
>
>which may have the following values:
> maryland 10
> delaware 10
> virginia 30
>
>the weight column represents the random likelihood of retrieving that
>column. that is, virginia may be retrieved 60% of the time, maryland
>may be retrieved 20% of the time, and virginia may be retrieved 20% of
>the time. the table may have many entries and the weight column may be
>changed by the user.
>
>the application (written in c w/embedded sql) will repeatedly call a
>function that will randomly retrieve the data according to the weight
>distribution. how do we write this function and the accompanying sql
>call?
>
>i hope that i have clearly explained the problem.
>
>here is my preliminary thought:
>
>i retrieve a randomly generated value with the range of 1 and the sum of
>the weight column (50). i determine under which row that value
>belongs. for example, if the value is 15, the function returns
>delaware, because 15 does not fall in between 1 and 10 (maryland), does
>fall in between 11 and 20 (delaware), and does not fall in between 21
>and 50 (virginia). the coding is not too hard, but would seem
>inefficient. we would have to open up a cursor and continue to fetch
>next until we find the range. creating an internal array of records may
>also be inefficient, because the table may have many entries and not
>just 3 entries.
>
>please reply with e-mail to me as well as the newsgroup. thank you.
>
>Lan Tran

Try this out. It shifts the 'work' from the 'generator' program to the program which allows changes to weights.

Possibility 1.

Modify the program which adjust weights so that it generates rows for every single possibility. That is, your example would translate into 50 rows in the table as follows

maryland, 1
maryland, 2
..
delaware, 11
..
delaware, 19
virginia, 21
..
virginia, 50

2. Store the 'total_weight' in your parameter table.

3. Index your table on weight.

4. Retrieve a randomly generated value between 1 and the parameter stored and go for a 'direct hit' using weight as the key.

Drawbacks :
1. Too many rows created (probably livable; consider codifying 'state' into a smallint or its shortform)

2. Concurrency during weight changes - could be a big problem.

Possibility 2 :

Similar to 1, except reduce the number of rows by changing the table structure to

state
start_weight
end_weight

giving rows of

maryland, 1, 10
delaware, 11, 20
virginia, 21, 50

Again, go for a 'direct hit' using BETWEEN.

This could work out better than 1, if the number of rows reduce to a level where the whole table is in Shared Memory.

That will be 0.02
Cheers,

Rudy Fernandes
GIC, Kuwait
OL 7.20UC4, 4GL 6.04UC1

Received on Mon Apr 21 1997 - 00:00:00 CDT