Re: What is general term for this problem?
Date: 19 Sep 2004 06:26:55 -0700
Message-ID: <18c7b3c2.0409190526.44864cc0_at_posting.google.com>
>> Consider a table representing supply, such as production or
purchase orders,
and another table representing demand, such as sales orders. The task
is to use a set of rules to determine for each order if supply will
meet demand, and to identify unmet demand and unnecessary supply. <<
This sounds like a version of Dr. Codd's T-Join, which was introduced in his book on the second version of the relational model, which were based on the idea of a best fit or approximate equality. The algorithm for the operators is easier to understand with an example modified from Dr. Codd.
The problem is to assign the classes to the available classrooms. We want (class_size < room_size) to be true after the assignments are made. This will allow us a few empty seats in each room for late students. We can do this in one of two ways. The first way is to sort the tables in ascending order by classroom size and the number of students in a class. We start with the following tables:
CREATE TABLE Rooms
(room_nbr CHAR(2) PRIMARY KEY,
room_size INTEGER NOT NULL);
CREATE TABLE Classes
(class_nbr CHAR(2) PRIMARY KEY,
class_size INTEGER NOT NULL);
These tables have the following rows in them:
Classes
class_nbr class_size
'c1' 80 'c2' 70 'c3' 65 'c4' 55 'c5' 50 'c6' 40
Rooms
room_nbr room_size
'r1' 70 'r2' 40 'r3' 50 'r4' 85 'r5' 30 'r6' 65 'r7' 55
The goal of the T-Join problem is to assign a class which is smaller than the classroom given it (class_size < room_size). Dr. Codd gives two approaches to the problem.
- Ascending order algorithm (cursors)
Sort both tables into ascending order. Reading from the top of the Rooms table, match each class with the first room that will fit.
Classes Rooms class_nbr class_size room_nbr room_size ==================== ===================This gives us
'c6' 40 'r5' 30
'c5' 50 'r2' 40
'c4' 55 'r3' 50
'c3' 65 'r7' 55
'c2' 70 'r6' 65
'c1' 80 'r1' 70
'r4' 85
Results
class_nbr class_size room_nbr room_size
'c2' 70 'r4' 85
'c3' 65 'r1' 70
'c4' 55 'r6' 65
'c5' 50 'r7' 55
'c6' 40 'r3' 50
2) Descending order algorithm (cursors)
Sort both tables into descending order. Reading from the top of the Classes table, match each class with the first room that will fit.
Classes Rooms class_nbr class_size room_nbr room_size ===================== =================== 'c1' 80 'r4' 85 'c2' 70 'r1' 70 'c3' 65 'r6' 65 'c4' 55 'r7' 55 'c5' 50 'r3' 50 'c6' 40 'r2' 40 'r5' 30
Results
class_nbr class_size room_nbr room_size
'c1' 80 'r4' 85 'c3' 65 'r1' 70 'c4' 55 'r6' 65 'c5' 50 'r7' 55 'c6' 40 'r3' 50
Notice that the answers are different! Dr. Codd has never given a definition in relational algebra of the T-Join, so I proposed that we need one. Informally, for each class, we want the smallest room which will hold it, while maintaining the T-Join condition. Or for each room, we want the largest class that will fill it, while maintaining the T-Join condition. These can be two different things, so you must decide which table is the driver. But either way, I am advocating a "best fit" over Codd's "first fit" approach.
In effect, the Swedish and Croatian solutions given later in this section use my definition instead of Dr. Codd's; the Colombian solution is true to the algorithmic approach.
Other theta conditions can be used in place of the "less than" shown here. If "less than or equal" is used, all the classes are assigned to a room in this case, but not in all cases. This is left to the reader as an exercise.
The first attempts in standard SQL are versions grouped by queries. They can, however, produce some rows that would be left out of the answers Dr. Codd was expecting. The first JOIN can be written as
SELECT class_nbr, class_size, MIN(room_size)
FROM Rooms, Classes
WHERE Classes.class_size < Rooms.room_size
GROUP BY class_nbr, class_size;
This will give a result table with the desired room sizes, but not the room numbers. You cannot put the other columns in the SELECT list, since it would conflict with the GROUP BY clause. But also note that the classroom with 85 seats (r4) is used twice, once by class 'c1' and then by class 'c2':
Result
class_nbr class_size MIN(room_size)
c1 80 85 <== room r4 c2 70 85 <== room r4 c3 65 70 c4 55 65 c5 50 55 c6 40 50
Your best bet after this is to use the query in an EXISTS clause.
SELECT *
FROM Rooms, Classes
WHERE EXISTS (SELECT class_nbr, class_size, MIN(room_size)
FROM Rooms, Classes WHERE Classes.class_size < Rooms.room_size GROUP BY class_nbr, class_size);
Alternately, you can save the results to a temporary table, then JOIN it back to the Cartesian product of Rooms and Classes. The second T-Join can be done by putting the columns for Rooms into the SELECT list of the same query schema:
SELECT room_nbr, room_size, MAX(class_size)
FROM Rooms, Classes
WHERE Classes.class_size < Rooms.room_size
GROUP BY room_nbr, room_size;
This time, the results are the same as those Dr. Codd got with his procedural algorithm:
Result
room_nbr room_size MAX(class_size)
'r4' 85 80 'r1' 70 65 'r6' 65 55 'r7' 55 50 'r3' 50 40
If you do a little arithmetic on the data, you find that we have 360 students and 395 seats, 6 classes and 7 rooms. This solution uses the fewest rooms, but note that the 70 students in class 'c2' are left out completely. Room 'r2' is left over, but it has only 40 seats.
As it works out, the best fit of rooms to classes is given by changing the matching rule to "less than or equal". This will leave the smallest room empty and pack the other rooms to capacity, thus:
SELECT class_nbr, class_size, MIN(room_size)
FROM Rooms, Classes
WHERE Classes.class_size <= Rooms.room_size
GROUP BY class_nbr, class_size;
The Croatian Solution
I published this same problem in an article in DBMS magazine in 1992 and got an answer in QUEL from Miljenko Martinis of Croatia in our Letters column (Miljenko 1992). He then translated it from QUEL into SQL with two views, thus:
CREATE VIEW Classrooms -- all possible legal pairs
AS SELECT *
FROM Classes, Rooms
WHERE class_size < room_size;
CREATE VIEW Classrooms1 -- smallest room for the class
AS SELECT *
FROM Classrooms AS CR1
WHERE room_size = (SELECT MIN(room_size)
FROM Classrooms WHERE class_nbr = CR1.class_nbr);
We find the answer with the simple query
SELECT class_nbr, class_size, room_size, room_nbr
FROM Classrooms1 AS CR1
WHERE class_size = (SELECT MAX(class_size)
FROM Classrooms1 WHERE room_nbr = CR1.room_nbr);
This is a pure SQL-89 solution.
class_nbr class_size room_size room_nbr
'c6' 40 50 'r3' 'c5' 50 55 'r7' 'c4' 55 65 'r6' 'c3' 65 70 'r1' 'c1' 80 85 'r4'
The Swedish Solution
I got another solution from Anders Karlsson of Mr. K Software AB in Stockholm, Sweden. Here is a version of that query:
SELECT C1.class_nbr, C1.class_size, R1.room_size, R1.room_nbr
FROM Classes AS C1, Rooms AS R1
WHERE C1.class_size = (SELECT MAX(C2.class_size)
FROM Classes AS C2 WHERE R1.room_size > C2.class_size) AND NOT EXISTS (SELECT * FROM Rooms AS R2 WHERE R2.room_size > C1.class_size AND R2.room_size < R1.room_size);
The first predicate says we have the largest class that will go into this room. The second predicate says there is no other room which would fit this class better (i.e. smaller than the candidate room and still larger than the class at which we are looking).
The Colombian Solution
Francisco Moreno of the Department of Systems Engineering, at the University of Antioquia in Colombia came up with another approach and data to demonstrate the problems in the T-Join.
Clean out the existing tables and insert this data:
DELETE FROM Classes;
INSERT INTO Classes
VALUES ('c1', 106), ('c2', 105), ('c3', 104), ('c4', 100), ('c5', 99), ('c6', 90), ('c7', 89), ('c8', 88), ('c9', 83), ('c10', 82), ('c11', 81), ('c12', 65), ('c13', 50), ('c14', 49), ('c15', 30), ('c16', 29), ('c17', 28), ('c18', 20), ('c19', 19);
DELETE FROM Rooms;
INSERT INTO Rooms
VALUES ('r1', 102), ('r2', 101), ('r3', 95), ('r4', 94), ('r5', 85), ('r6', 70), ('r7', 55), ('r8', 54), ('r9', 35), ('r10', 34), ('r11', 25), ('r12', 18);
Using Codd's T-Join algorithm for descending lists, you would have this mapping:
'c1' 106 'c2' 105 'c3' 104 'c4' 100 <---> 'r1' 102 'c5' 99 <---> 'r2' 101 'c6' 90 <---> 'r3' 95 'c7' 89 <---> 'r4' 94 'c8' 88 'c9' 83 <---> 'r5' 85 'c10' 82 'c11' 81 'c12' 65 <---> 'r6' 70 'c13' 50 <---> 'r7' 55 'c14' 49 <---> 'r8' 54 'c15' 30 <---> 'r9' 35 'c16' 29 <---> 'r10' 34 'c17' 28 'c18' 20 <---> 'r11' 25 'c19' 19 'r12' 18
There 1317 students in classes and 768 seats for them. You can see by inspection that some classes are too large for any room we have. If you started in ascending order, class 'c19' pairs with room 'r11' and you get another result set.
This algorithm is not a best fit answer, but a first fit answer. This is an important difference. To explain further, The first fit to class 'c4' is room 'r1', which has 102 seats; the best fit is room 'r2' which has 101 seats. The algorithm would give us this result table:
Results
class_nbr class_size room_size room_nbr
'c4' 100 102 'r1' 'c5' 99 101 'r2' 'c6' 90 95 'r3' 'c7' 89 94 'r4' 'c9' 83 85 'r5' 'c12' 65 70 'r6' 'c13' 50 55 'r7' 'c14' 49 54 'r8' 'c15' 30 35 'r9' 'c16' 29 34 'r10' 'c18' 20 25 'r11'
704 student served.
If you use Swedish or Croatian solution on this data, the answer is:
Swedish Result
class_nbr class_size room_size room_nbr
'c4' 100 101 'r2'
'c6' 90 94 'r4'
'c9' 83 85 'r5'
'c12' 65 70 'r6'
'c13' 50 54 'r8'
'c15' 30 34 'r10'
'c18' 20 25 'r11'
438 students served.
At this point you have a result which is not complete, but has the tightest mapping of each class into a room. There is another problem which was not mentioned. We have not had two classes or two rooms of the same size in the data. This will cause some other problems.
Instead of trying to use a single static SQL query, we can use SQL to generate SQL code, then execute it dynamically. This solution is right but, of course, is horrible from a performance viewpoint.
- build a table of possible T-Join pairings DROP TABLE T-Join; CREATE TABLE T-Join AS SELECT * FROM Classes, Rooms WHERE room_size > class_size;
- create a temporary working table DROP TABLE Ins; CREATE TABLE Ins (class_nbr CHAR(3) NOT NULL, class_size INTEGER NOT NULL, room_nbr CHAR(3) NOT NULL, room_size INTEGER NOT NULL);
- create a table with the insertion code for each row
SELECT
'INSERT INTO Ins
SELECT class_nbr, class_size, room_nbr, room_size
FROM T-Join AS T1
WHERE room_size
- (SELECT MAX(room_size)
FROM T-Join
WHERE room_size NOT IN (SELECT room_size FROM Ins))
AND class_size
- (SELECT MAX(class_size) FROM T-Join AS T2 WHERE class_size NOT IN (SELECT class_size FROM Ins) AND T2.class_size < T1.room_size);' FROM Rooms WHERE room_size > (SELECT MIN(class_size) FROM 'c); COMMIT;
- (SELECT MAX(room_size)
FROM T-Join
WHERE room_size NOT IN (SELECT room_size FROM Ins))
AND class_size
Moreno's second solution, which handles duplicates, is more complex and I will not give it here. It uses the keys of the tables to make rows with duplicate values unique.
This is all in my book SQL FOR SMARTIES. Received on Sun Sep 19 2004 - 15:26:55 CEST