Re: Storing where clause

From: --CELKO-- <jcelko212_at_earthlink.net>
Date: 23 Aug 2004 17:51:15 -0700
Message-ID: <18c7b3c2.0408231651.29d9c7db_at_posting.google.com>


Try this old puzzle of mine for one way to do it in pure SQL.

Another SQL Puzzle

Larry Wade posted a version of this problem on the Microsoft Access CompuServe forum at the end of February 1996. He is running an employment service that has a database with tables for job orders, candidates, and their job skills. He is trying to perform queries to match candidates to job orders based on their skills. The job orders take the form of Boolean expressions connecting skills. For example, find all of the candidates with manufacturing and inventory or accounting skills.

First, let's construct a table of the candidates' skills. You can assume that personal information about the candidates is in another table, but I will not bother with it for this problem.

CREATE TABLE CandidateSkills
(candidateid INTEGER NOT NULL,

skill_code CHAR(15) NOT NULL,
PRIMARY KEY (candidateid, skill_code));

INSERT INTO CandidateSkills VALUES (100, 'accounting');
INSERT INTO CandidateSkills VALUES (100, 'inventory');
INSERT INTO CandidateSkills VALUES (100, 'manufacturing');
INSERT INTO CandidateSkills VALUES (200, 'accounting');
INSERT INTO CandidateSkills VALUES (200, 'inventory');
INSERT INTO CandidateSkills VALUES (300, 'manufacturing');
INSERT INTO CandidateSkills VALUES (400, 'inventory');
INSERT INTO CandidateSkills VALUES (400, 'manufacturing');
INSERT INTO CandidateSkills VALUES (500, 'accounting');
INSERT INTO CandidateSkills VALUES (500, 'manufacturing');

The obvious solution would be to create dynamic SQL queries in a front-end product for each job order, such as:

SELECT C1.candidateid, 'jobid #212' -- constant job id code FROM CandidateSkills AS C1, -- one correlation per skill

      CandidateSkills AS C2,
      CandidateSkills AS C3

WHERE C1.candidateid = C2.candidateid
  AND C1.candidateid = C3.candidateid
  AND -- job order expression created here   (C1.skill_code = 'manufacturing'
  AND C2.skill_code = 'inventory'
   OR C3.skill_code = 'accounting');

This statement is ugly and it will run forever, but it works. The order of evaluation of the Booleans in the final predicate will be handled by SQL's rules.

The working table will have the columns:

(C1.Candidateid, C2.Candidateid, C3.Candidateid, C1.skill_code,
C2.skill_code C3.skill_code)

and they will have all possible combinations of job skills. It will be huge:

(101, 101, 101, 'accounting', 'accounting', 'accounting')
(101, 101, 101, 'accounting', 'accounting', 'inventory')
(101, 101, 101, 'accounting', 'inventory', 'manufacturing')
(101, 101, 101, 'inventory', 'accounting', 'manufacturing')
(101, 101, 101, 'inventory', 'inventory', 'inventory')
(101, 101, 101, 'manufacturing', 'beet farming', 'shoemaking')
(101, 101, 101, 'manufacturing', 'inventory', 'accounting')
  ...

A good PowerBuilder or Delphi programmer can develop a screen form to do this in less than a week. You then save the query as a view with the same name as the jobid code. Neat and quick! The trouble is that this solution will give you a huge collection of very slow queries. Got a better idea? See Puzzle Answer.
Puzzle Answer

The first problem is that you have to worry about parsing the search criteria. Does "manufacturing and inventory or accounting" mean "(manufacturing and inventory) or accounting" or does it mean "manufacturing and (inventory or accounting)" when you search? Let's assume that and has higher precedence.

One solution is to put each query into a disjunctive canonical form; what that means in English is that the search conditions are written as a string of and conditions joined together at the highest level by or.

Let's build another table of job orders that we want to fill:

CREATE TABLE JobOrders
(jobid INTEGER NOT NULL,

skill_group INTEGER NOT NULL,
skill_code CHAR(15) NOT NULL,
PRIMARY KEY (jobid, skill_group, skill_code));

The skill_group code says that all of these skills are required -- they are the and terms in the canonical form. We can then assume that each skill_group in a job order is linked with the others for that jobid by an or statement. Create the table for the job orders. Now insert the following orders in their canonical form:

Job 1 = ('inventory' AND 'manufacturing') OR 'accounting'
Job 2 = ('inventory' AND 'manufacturing') OR ('accounting' 
 AND 'manufacturing')

Job 3 = 'manufacturing'
Job 4 = ('inventory' AND 'manufacturing' AND 'accounting')

This translates into:

INSERT INTO JobOrders VALUES (1, 1, 'inventory');
INSERT INTO JobOrders VALUES (1, 1, 'manufacturing');
INSERT INTO JobOrders VALUES (1, 2, 'accounting');
INSERT INTO JobOrders VALUES (2, 1, 'inventory');
INSERT INTO JobOrders VALUES (2, 1, 'manufacturing');
INSERT INTO JobOrders VALUES (2, 2, 'accounting');
INSERT INTO JobOrders VALUES (2, 2, 'manufacturing');
INSERT INTO JobOrders VALUES (3, 1, 'manufacturing');
INSERT INTO JobOrders VALUES (4, 1, 'inventory');
INSERT INTO JobOrders VALUES (4, 1, 'manufacturing');
INSERT INTO JobOrders VALUES (4, 1, 'accounting');

The query is a form of relational division, based on using the skill_code and skill_group combinations as the dividend and the candidate's skills as the divisor. Because the skill_groups within a jobid are linked by or statements, if any one of them matches, then we have a hit.

SELECT DISTINCT J1.jobid, C1.candidateid  FROM JobOrders AS J1 INNER JOIN CandidateSkills AS C1

      ON J1.skill_code = C1.skill_code
GROUP BY candidateid, skill_group, jobid HAVING COUNT(*) >= (SELECT COUNT(*)

           FROM JobOrders AS J2
           WHERE J1.skill_group = J2.skill_group
            AND J1.jobid = J2.jobid);

The sample data should produce the following results:

jobid candidateid
====== ===========

   1       100
   1       200
   1       400
   1       500
   2       100
   2       400
   2       500
   3       100
   3       300
   3       400
   3       500
   4       100

As job orders and candidates change, the query stays the same. You can put this query into a view, then use it to find the job for which we have no candidates, candidates for which we have no jobs, and so on. -- Joe Celko Received on Tue Aug 24 2004 - 02:51:15 CEST

Original text of this message