Re: For the SQL Gurus out there, a question

From: --CELKO-- <joe.celko_at_northface.edu>
Date: 10 Feb 2004 15:35:01 -0800
Message-ID: <a264e7ea.0402101535.4f7b8af5@posting.google.com>

I really hate you table names which are almost impossible to understand (blah, blah, sample data -- but let's make life easier for people anyway). How about weighted scores in various contests as the human-friendly version?

The lack of DDL is also a pain, since we have to make all kinds of assumptions about NULLs and keys. But is this what you wanted to say?

CREATE TABLE Scores
(score_type CHAR(2) NOT NULL PRIMARY KEY, wgt INTEGER NOT NULL CHECK (wgt > 0));

INSERT INTO Scores VALUES ('X1', 2);
INSERT INTO Scores VALUES ('X2', 1);
INSERT INTO Scores VALUES ('X3', 5);
INSERT INTO Scores VALUES ('X4', 1);

CREATE TABLE Contests
(contest_id CHAR(2) NOT NULL PRIMARY KEY, contest_name CHAR(10) NOT NULL);

INSERT INTO Contests VALUES ('Y1', 'Contest-1');
INSERT INTO Contests VALUES ('Y2', 'Contest-2');
INSERT INTO Contests VALUES ('Y3', 'Contest-3');
INSERT INTO Contests VALUES ('Y4', 'Contest-4');

CREATE TABLE Standings
(score_type CHAR(2) NOT NULL REFERENCES Scores (score_type), contest_id CHAR(2) NOT NULL REFERENCES Contests (contest_id), PRIMARY KEY (score_type, contest_id));

INSERT INTO Standings VALUES ('X1', 'Y1');
INSERT INTO Standings VALUES ('X1', 'Y2');
INSERT INTO Standings VALUES ('X2', 'Y1');
INSERT INTO Standings VALUES ('X3', 'Y1');
INSERT INTO Standings VALUES ('X3', 'Y2');
INSERT INTO Standings VALUES ('X3', 'Y4');
INSERT INTO Standings VALUES ('X4', 'Y4');

>> So based on Z the distribution (count of Y/ Total Y) is like
Y1 = 100*3/7 Y2=100*2/7 Y3=100*0/7 Y4=100*2/7 <<

I am feeling pedantic, so let's do this in steps to show the reasoning behind each part and then assemble the final query from them.

SELECT C1.contest_id, (1.0 * COUNT(s1.contest_id)/(SELECT 1.0 * COUNT(*) FROM Standings)) AS distribution FROM Contests AS C1

       LEFT OUTER JOIN 
       Standings AS ST1
       ON C1.contest_id = ST1.contest_id

GROUP BY C1.contest_id;

The 1.0 is to avoid integer math; it could have been a CAST()

>> Now I want to factor in the wgt column in Scores also <<

Here is where I get confused; you show a constant "wgt * 1" for each row of the standings table, but if this table is all key, isn't the one redundant? On that assumption:

SELECT ST1.score_type, ST1.contest_id, S1.wgt FROM Scores AS S1,

Standings AS ST1
WHERE ST1.score_type = S1.score_type;

>> Sum: 21 <<

SELECT SUM(S1.wgt)
FROM Scores AS S1,

Standings AS ST1
WHERE ST1.score_type = S1.score_type;

>> Given these tables, How would I factor in the weights and get the
distribution in a single SQL query. <<

SELECT C1.contest_id, C1.contest_name,

      SUM(S1.wgt)/
       (1.0 * (SELECT SUM(S1.wgt)
          FROM Scores AS S1, Standings AS ST1
         WHERE ST1.score_type = S1.score_type)) AS wgt-distrib
  FROM (Contests AS C1
        LEFT OUTER JOIN
        Standings AS ST1
        ON ST1.contest_id = C1.contest_id)
          LEFT OUTER JOIN
          Scores AS S1
          ON ST1.score_type = S1.score_type

GROUP BY C1.contest_id, C1.contest_name;

The optimizer ought to pick out the scalar subquery and do it once. The big worry is preserving the Contests table via nested LEFT UTER JOINs.

Results
contest_id contest_name wgt-distrib

Y1	   Contest-1 	.38095238095238
Y2	   Contest-2 	.33333333333333
Y3	   Contest-3 	NULL
Y4	   Contest-4 	.28571428571428

Notice I left a NULL in the answer. This means that **nobody** got a score in contest 'Y3', while a zero implies that **somebody** got a score of zero in contest 'Y3' -- big difference.

Does it help to explain thigns to this detail or do you need more? Received on Tue Feb 10 2004 - 17:35:01 CST