Re: What is general term for this problem?

From: --CELKO-- <jcelko212_at_earthlink.net>
Date: 19 Sep 2004 06:26:55 -0700
Message-ID: <18c7b3c2.0409190526.44864cc0_at_posting.google.com>


>> Consider a table representing supply, such as production or purchase orders,
and another table representing demand, such as sales orders. The task is to use a set of rules to determine for each order if supply will meet demand, and to identify unmet demand and unnecessary supply. <<

This sounds like a version of Dr. Codd's T-Join, which was introduced in his book on the second version of the relational model, which were based on the idea of a best fit or approximate equality. The algorithm for the operators is easier to understand with an example modified from Dr. Codd.

The problem is to assign the classes to the available classrooms. We want (class_size < room_size) to be true after the assignments are made. This will allow us a few empty seats in each room for late students. We can do this in one of two ways. The first way is to sort the tables in ascending order by classroom size and the number of students in a class. We start with the following tables:

CREATE TABLE Rooms
(room_nbr CHAR(2) PRIMARY KEY,
 room_size INTEGER NOT NULL);

CREATE TABLE Classes
(class_nbr CHAR(2) PRIMARY KEY,
 class_size INTEGER NOT NULL);

These tables have the following rows in them:

 Classes
 class_nbr class_size


    'c1'    80
    'c2'    70
    'c3'    65
    'c4'    55
    'c5'    50
    'c6'    40

 Rooms
 room_nbr room_size


 'r1'     70
 'r2'     40
 'r3'     50
 'r4'     85
 'r5'     30
 'r6'     65
 'r7'     55

The goal of the T-Join problem is to assign a class which is smaller than the classroom given it (class_size < room_size). Dr. Codd gives two approaches to the problem.

  1. Ascending order algorithm (cursors)

Sort both tables into ascending order. Reading from the top of the Rooms table, match each class with the first room that will fit.

 Classes                    Rooms
 class_nbr class_size       room_nbr  room_size
 ====================       ===================

'c6' 40 'r5' 30
'c5' 50 'r2' 40
'c4' 55 'r3' 50
'c3' 65 'r7' 55
'c2' 70 'r6' 65
'c1' 80 'r1' 70
'r4' 85
This gives us

 Results
 class_nbr class_size room_nbr room_size



'c2' 70 'r4' 85
'c3' 65 'r1' 70
'c4' 55 'r6' 65
'c5' 50 'r7' 55
'c6' 40 'r3' 50

2) Descending order algorithm (cursors)

Sort both tables into descending order. Reading from the top of the Classes table, match each class with the first room that will fit.

 Classes                   Rooms
 class_nbr  class_size     room_nbr  room_size
 =====================     ===================
  'c1'        80             'r4'     85
  'c2'        70             'r1'     70
  'c3'        65             'r6'     65
  'c4'        55             'r7'     55
  'c5'        50             'r3'     50
  'c6'        40             'r2'     40
                             'r5'     30

 Results
 class_nbr class_size room_nbr room_size


  'c1'         80       'r4'       85
  'c3'         65       'r1'       70
  'c4'         55       'r6'       65
  'c5'         50       'r7'       55
  'c6'         40       'r3'       50

Notice that the answers are different! Dr. Codd has never given a definition in relational algebra of the T-Join, so I proposed that we need one. Informally, for each class, we want the smallest room which will hold it, while maintaining the T-Join condition. Or for each room, we want the largest class that will fill it, while maintaining the T-Join condition. These can be two different things, so you must decide which table is the driver. But either way, I am advocating a "best fit" over Codd's "first fit" approach.

In effect, the Swedish and Croatian solutions given later in this section use my definition instead of Dr. Codd's; the Colombian solution is true to the algorithmic approach.

Other theta conditions can be used in place of the "less than" shown here. If "less than or equal" is used, all the classes are assigned to a room in this case, but not in all cases. This is left to the reader as an exercise.

The first attempts in standard SQL are versions grouped by queries. They can, however, produce some rows that would be left out of the answers Dr. Codd was expecting. The first JOIN can be written as

SELECT class_nbr, class_size, MIN(room_size)   FROM Rooms, Classes
 WHERE Classes.class_size < Rooms.room_size  GROUP BY class_nbr, class_size;

This will give a result table with the desired room sizes, but not the room numbers. You cannot put the other columns in the SELECT list, since it would conflict with the GROUP BY clause. But also note that the classroom with 85 seats (r4) is used twice, once by class 'c1' and then by class 'c2':

 Result
 class_nbr class_size MIN(room_size)


    c1         80             85   <== room r4
    c2         70             85   <== room r4
    c3         65             70
    c4         55             65
    c5         50             55
    c6         40             50

Your best bet after this is to use the query in an EXISTS clause.

SELECT *
  FROM Rooms, Classes
 WHERE EXISTS (SELECT class_nbr, class_size, MIN(room_size)

                 FROM Rooms, Classes
                WHERE Classes.class_size < Rooms.room_size
                GROUP BY class_nbr, class_size);

Alternately, you can save the results to a temporary table, then JOIN it back to the Cartesian product of Rooms and Classes. The second T-Join can be done by putting the columns for Rooms into the SELECT list of the same query schema:

SELECT room_nbr, room_size, MAX(class_size)   FROM Rooms, Classes
 WHERE Classes.class_size < Rooms.room_size  GROUP BY room_nbr, room_size;

This time, the results are the same as those Dr. Codd got with his procedural algorithm:

 Result
 room_nbr room_size MAX(class_size)


 'r4'        85          80
 'r1'        70          65
 'r6'        65          55
 'r7'        55          50
 'r3'        50          40

If you do a little arithmetic on the data, you find that we have 360 students and 395 seats, 6 classes and 7 rooms. This solution uses the fewest rooms, but note that the 70 students in class 'c2' are left out completely. Room 'r2' is left over, but it has only 40 seats.

As it works out, the best fit of rooms to classes is given by changing the matching rule to "less than or equal". This will leave the smallest room empty and pack the other rooms to capacity, thus:

SELECT class_nbr, class_size, MIN(room_size)   FROM Rooms, Classes
 WHERE Classes.class_size <= Rooms.room_size  GROUP BY class_nbr, class_size;

The Croatian Solution

I published this same problem in an article in DBMS magazine in 1992 and got an answer in QUEL from Miljenko Martinis of Croatia in our Letters column (Miljenko 1992). He then translated it from QUEL into SQL with two views, thus:

 CREATE VIEW Classrooms -- all possible legal pairs AS SELECT *
     FROM Classes, Rooms
    WHERE class_size < room_size;

 CREATE VIEW Classrooms1 -- smallest room for the class AS SELECT *
    FROM Classrooms AS CR1
   WHERE room_size = (SELECT MIN(room_size)

                       FROM Classrooms
                      WHERE class_nbr = CR1.class_nbr);

We find the answer with the simple query

SELECT class_nbr, class_size, room_size, room_nbr   FROM Classrooms1 AS CR1
 WHERE class_size = (SELECT MAX(class_size)

                      FROM Classrooms1
                     WHERE room_nbr = CR1.room_nbr);

This is a pure SQL-89 solution.

 class_nbr class_size room_size room_nbr


 'c6'          40           50     'r3'   
 'c5'          50           55     'r7'   
 'c4'          55           65     'r6'   
 'c3'          65           70     'r1'   
 'c1'          80           85     'r4'   

The Swedish Solution

I got another solution from Anders Karlsson of Mr. K Software AB in Stockholm, Sweden. Here is a version of that query:

SELECT C1.class_nbr, C1.class_size, R1.room_size, R1.room_nbr   FROM Classes AS C1, Rooms AS R1
 WHERE C1.class_size = (SELECT MAX(C2.class_size)

                          FROM Classes AS C2
                         WHERE R1.room_size > C2.class_size)
   AND NOT EXISTS (SELECT *
                     FROM Rooms AS R2
                    WHERE R2.room_size > C1.class_size
                      AND R2.room_size < R1.room_size);

The first predicate says we have the largest class that will go into this room. The second predicate says there is no other room which would fit this class better (i.e. smaller than the candidate room and still larger than the class at which we are looking).

The Colombian Solution

Francisco Moreno of the Department of Systems Engineering, at the University of Antioquia in Colombia came up with another approach and data to demonstrate the problems in the T-Join.

Clean out the existing tables and insert this data:

DELETE FROM Classes;
 INSERT INTO Classes

VALUES ('c1', 106),
       ('c2', 105),
       ('c3', 104),
       ('c4', 100),
       ('c5', 99),
       ('c6', 90),
       ('c7', 89),
       ('c8', 88),
       ('c9', 83),
       ('c10', 82),
       ('c11', 81),
       ('c12', 65),
       ('c13', 50),
       ('c14', 49),
       ('c15', 30),
       ('c16', 29),
       ('c17', 28),
       ('c18', 20),
       ('c19', 19);

DELETE FROM Rooms;
 INSERT INTO Rooms

VALUES ('r1', 102),
       ('r2', 101),
       ('r3', 95),
       ('r4', 94),
       ('r5', 85),
       ('r6', 70),
       ('r7', 55),
       ('r8', 54),
       ('r9', 35),
       ('r10', 34),
       ('r11', 25),
       ('r12', 18);
    

Using Codd's T-Join algorithm for descending lists, you would have this mapping:     

'c1'     106
'c2'     105
'c3'     104
'c4'     100     <--->     'r1'  102
'c5'      99     <--->     'r2'  101
'c6'      90     <--->     'r3'   95
'c7'      89     <--->     'r4'   94
'c8'      88                
'c9'      83     <--->     'r5'   85
'c10'     82
'c11'     81
'c12'     65     <--->     'r6'   70
'c13'     50     <--->     'r7'   55
'c14'     49     <--->     'r8'   54
'c15'     30     <--->     'r9'   35
'c16'     29     <--->     'r10'  34
'c17'     28
'c18'     20     <--->     'r11'  25
'c19'     19
                           'r12'  18

There 1317 students in classes and 768 seats for them. You can see by inspection that some classes are too large for any room we have. If you started in ascending order, class 'c19' pairs with room 'r11' and you get another result set.

This algorithm is not a best fit answer, but a first fit answer. This is an important difference. To explain further, The first fit to class 'c4' is room 'r1', which has 102 seats; the best fit is room 'r2' which has 101 seats. The algorithm would give us this result table:

Results
class_nbr class_size room_size room_nbr


'c4'        100          102      'r1'
'c5'         99          101      'r2'
'c6'         90           95      'r3'
'c7'         89           94      'r4'
'c9'         83           85      'r5'
'c12'        65           70      'r6'
'c13'        50           55      'r7'
'c14'        49           54      'r8'
'c15'        30           35      'r9'
'c16'        29           34      'r10'
'c18'        20           25      'r11'

704 student served.

If you use Swedish or Croatian solution on this data, the answer is:

Swedish Result
class_nbr class_size room_size room_nbr


'c4' 100 101 'r2'
'c6' 90 94 'r4'
'c9' 83 85 'r5'
'c12' 65 70 'r6'
'c13' 50 54 'r8'
'c15' 30 34 'r10'
'c18' 20 25 'r11'

438 students served.

At this point you have a result which is not complete, but has the tightest mapping of each class into a room. There is another problem which was not mentioned. We have not had two classes or two rooms of the same size in the data. This will cause some other problems.

Instead of trying to use a single static SQL query, we can use SQL to generate SQL code, then execute it dynamically. This solution is right but, of course, is horrible from a performance viewpoint.

  • build a table of possible T-Join pairings DROP TABLE T-Join; CREATE TABLE T-Join AS SELECT * FROM Classes, Rooms WHERE room_size > class_size;
  • create a temporary working table DROP TABLE Ins; CREATE TABLE Ins (class_nbr CHAR(3) NOT NULL, class_size INTEGER NOT NULL, room_nbr CHAR(3) NOT NULL, room_size INTEGER NOT NULL);
  • create a table with the insertion code for each row SELECT 'INSERT INTO Ins SELECT class_nbr, class_size, room_nbr, room_size FROM T-Join AS T1 WHERE room_size
    • (SELECT MAX(room_size) FROM T-Join WHERE room_size NOT IN (SELECT room_size FROM Ins)) AND class_size
      • (SELECT MAX(class_size) FROM T-Join AS T2 WHERE class_size NOT IN (SELECT class_size FROM Ins) AND T2.class_size < T1.room_size);' FROM Rooms WHERE room_size > (SELECT MIN(class_size) FROM 'c); COMMIT;
Now use "SELECT * FROM Ins;" query in a host program with dynamic SQL and execute each statement in the temporary table in order. This will give us the first answer at the start of section 17.5.3, and it also works for the original data.

Moreno's second solution, which handles duplicates, is more complex and I will not give it here. It uses the keys of the tables to make rows with duplicate values unique.

This is all in my book SQL FOR SMARTIES. Received on Sun Sep 19 2004 - 15:26:55 CEST

Original text of this message