Home » Other » General » Puzzle n°04  Evenly share batches of articles into groups ***
Puzzle n°04  Evenly share batches of articles into groups *** [message #290794] 
Mon, 31 December 2007 15:57 

Michel Cadot
Messages: 68447 Registered: March 2007 Location: Nanterre, France, http://...

Senior Member Account Moderator 


We have a table of batches containing some articles (see Puzzle n°03).
We now want to aggregate these batches into groups with an as equal as possible number of articles without spliting the batches.
With the following test case:
DROP TABLE batches PURGE;
CREATE TABLE batches (batch_id INTEGER PRIMARY KEY, avail_qty INTEGER NOT NULL);
INSERT INTO batches VALUES (1, 1);
INSERT INTO batches VALUES (2, 1);
INSERT INTO batches VALUES (3, 1);
INSERT INTO batches VALUES (4, 1);
INSERT INTO batches VALUES (5, 1);
INSERT INTO batches VALUES (6, 1);
INSERT INTO batches VALUES (7, 1);
INSERT INTO batches VALUES (8, 2);
INSERT INTO batches VALUES (9, 3);
INSERT INTO batches VALUES (10, 4);
INSERT INTO batches VALUES (11, 4);
INSERT INTO batches VALUES (12, 4);
INSERT INTO batches VALUES (13, 5);
INSERT INTO batches VALUES (14, 5);
INSERT INTO batches VALUES (15, 13);
INSERT INTO batches VALUES (16, 20);
INSERT INTO batches VALUES (17, 20);
COMMIT;
If we want to aggregate into 3, 4 or 5 groups we must have something like (quantities are given and not batch_id):
SQL> DEFINE groups=3
SQL> /
GROUP QUANTITIES TOTAL
  
1 20,4,3,1,1 29
2 20,4,2,1,1,1 29
3 13,5,5,4,1 28
SQL> DEFINE groups=4
SQL> /
GROUP QUANTITIES TOTAL
  
1 20,1,1 22
2 20,1,1 22
3 13,4,3,1 21
4 5,5,4,4,2,1 21
SQL> DEFINE groups=5
SQL> /
GROUP QUANTITIES TOTAL
  
1 20 20
2 20 20
3 13,1,1,1 16
4 5,4,4,1,1 15
5 5,4,3,2,1 15
The displayed ouput here is just to show the result it is not mandatory, you can display it as you want.
Enjoy!
Regards
Michel
[Updated on: Sat, 20 June 2009 10:12] Report message to a moderator



Re: Puzzle n°04  Evenly share batches of articles into groups *** [message #300263 is a reply to message #290794] 
Thu, 14 February 2008 15:38 
Frank_Zhou
Messages: 5 Registered: February 2008 Location: Braintree , MA

Junior Member 


The SQL solution for this puzzles is based on my SQL Query for the "Bin FITING" problem
http://www.jlcomp.demon.co.uk/faq/Bin_Fitting.html
The restriction of this SQL solution is that we must predetermine
the number of groups. So this SQL is not flexiable at all. We can consider it as a prelimary try of this puzzle.
SELECT bucket_name AS GROUPS,
trim(BOTH ','FROM regexp_replace(XMLAgg(
XMLElement(X,avail_qty) ORDER BY avail_qty desc),'<X></X><X></X>',',')) AS QUANTITIES,
SUM(avail_qty) TOTAL
FROM
(SELECT batch_id , avail_qty , bucket_name, rn, bucket_1, bucket_2, bucket_3
FROM
(SELECT batch_id , avail_qty , count(*) OVER ( ) counter,
ROW_NUMBER() OVER (ORDER BY avail_qty desc) rn FROM batches)
MODEL
DIMENSION BY (rn)
MEASURES (batch_id , avail_qty , 0 it, counter,
CAST(NULL AS NUMBER) bucket_name, CAST(NULL AS NUMBER) min_tmp,
CAST(NULL AS NUMBER) bucket_1, CAST(NULL AS NUMBER) bucket_2,
CAST(NULL AS NUMBER) bucket_3, CAST(NULL AS NUMBER) pbucket_1,
CAST(NULL AS NUMBER) pbucket_2, CAST(NULL AS NUMBER) pbucket_3
)
RULES ITERATE (100000)
UNTIL (ITERATION_NUMBER>= counter[1])
(
pbucket_1[2] = CASE WHEN it[ITERATION_NUMBER] = 0 THEN 0 END ,
pbucket_2[2] = CASE WHEN it[ITERATION_NUMBER] = 0 THEN 0.00001 END ,
pbucket_3[2] = CASE WHEN it[ITERATION_NUMBER] = 0 THEN 0.00002 END ,
min_tmp[1] = least(sum(pbucket_1)[ANY], sum(pbucket_2)[ANY], sum(pbucket_3)[ANY]) ,
bucket_1[ITERATION_NUMBER] = CASE WHEN sum(pbucket_1)[ANY] = min_tmp[1]
THEN avail_qty [ITERATION_NUMBER] END,
bucket_2[ITERATION_NUMBER] = CASE WHEN sum(pbucket_2)[ANY] = min_tmp[1]
THEN avail_qty [ITERATION_NUMBER] END,
bucket_3[ITERATION_NUMBER] = CASE WHEN sum(pbucket_3)[ANY] = min_tmp[1]
THEN avail_qty [ITERATION_NUMBER] END,
bucket_name[ITERATION_NUMBER] =CASE WHEN sum(pbucket_1)[ANY] = min_tmp[1] THEN 1
WHEN sum(pbucket_2)[ANY] = min_tmp[1] THEN 2
WHEN sum(pbucket_3)[ANY] = min_tmp[1] THEN 3
END,
pbucket_1[1] = sum(bucket_1)[ANY],
pbucket_2[1] = sum(bucket_2)[ANY],
pbucket_3[1] = sum(bucket_3)[ANY],
it[ITERATION_NUMBER] = ITERATION_NUMBER
)
)
WHERE batch_id IS NOT NULL
GROUP BY bucket_name;
GROUPS QUANTITIES TOTAL
  
1 20,4,3,1,1 29
2 20,4,2,1,1,1 29
3 13,5,5,4,1,1 29
SQL> spool off;
[Updated on: Fri, 22 February 2008 13:41] by Moderator Report message to a moderator



Re: Puzzle n°04  Evenly share batches of articles into groups *** [message #687416 is a reply to message #300263] 
Tue, 07 March 2023 09:10 

mathguy
Messages: 54 Registered: January 2023

Member 


"as equal as possible" is undefined. There are several ways to interpret that, which are not all equivalent; they may lead to different optimal solutions on the same data.
For example: we may take the difference between the highest and the lowest "TOTAL" and require that this difference be minimized. Alternatively, we may take the sum of the absolute pairwise differences between individual TOTALs and ask to minimize this sum.
A common idea in such problems is to penalize larger differences more than smaller ones. So, we might consider the squares of all pairwise differences between TOTALs, and ask to minimize the sum of squares. This last definition of "as equal as possible" has the pleasing property that it is equivalent to minimizing the standard deviation of the TOTALs.
     
Ceux qui ne peuvent lire plus que cinq lignes de texte... should stop reading now!
     
So, let's say the task is to minimize the standard deviation of the TOTALs.
The next question is, are we asking for the optimal solution (strictly speaking, the formulation given in the puzzle does ask for that), or are we just asking for a good approximation?
So far there is only one solution offered to this puzzle. The solution uses the LPT algorithm (more about this below), which does not give the optimal solution. For example, if there are only five batches with quantities 4, 5, 6, 7, 8 and we ask to generate 2 groups, the solution (using LPT) will give the answer 8,5,4 and 7,6 (TOTALs of 17 and 13) when in fact 8,7 and 6,5,4 has TOTALs of 15 and 15  a better solution.
The problem is known as a "scheduling" problem. Specifically, it is an identicalmachines scheduling problem where the objective function is the standard deviation (or, better, the variance) of the TOTALs. There are theoretical results about the "goodness" of various approximation approaches. To ask for the exact optimum is not reasonable; even with just two groups, and even to ask simply whether there is a grouping where the two TOTALs are equal, is the NPcomplete partition problem.
So, what is really the question? Find the absolute optimum? This can be done by brute force (inspect ALL groupings, select the best one), but that becomes unfeasible very quickly. Is it to implement one of the approximate algorithms? Perhaps simply to implement the LPT (longestprocesstime first) algorithm, as Frank Zhou has done?
The puzzle is certainly "very hard" if we talk about the math problem. If we must implement a given approximate algorithm, the programming problem may not be too difficult; another question, at this point, is whether only pureSQL solutions are accepted, or if PL/SQL solutions are OK too. (Implementing any algorithm in any procedural language should be pretty straightforward, and therefore easy.) And since the addition of functions in the WITH clause in Oracle 12, we should agree on what we mean by pureSQL; that should probably mean "no functions in the WITH clause, that's just cheating".




Goto Forum:
Current Time: Sun May 28 17:01:47 CDT 2023
