Re: Advice on SQL and records

From: FRAN <fran_beta_at_hotmail.com>
Date: 22 Aug 2005 17:22:25 -0700
Message-ID: <1124756544.927652.224000_at_f14g2000cwb.googlegroups.com>


David Cressey wrote:
> <fran_beta_at_hotmail.com> wrote in message
> news:1124611376.749896.208510_at_o13g2000cwo.googlegroups.com...
> > I'm teaching in computing. A question appeared in a trial paper
> > recently and I'm trying to establish whether the suggested answer is
> > correct.
> >
> > The question was:
> >
> > SELECT Students.Surname, Students.Name
> > FROM Courses, Classes, StudentClasses, Students
> > WHERE Courses.CourseID="10MA"
> >
> > AND
> > Courses.CourseID=Classes.CourseID
> > AND
> > Classes.ClassID=StudentClasses.ClassID AND
> >
> > StudentClasses.StudentID=STudents.StudentID
> >
> > ORDER BY Students.Surname, Students.Name
> >
> >
> > Which of the tables is likely to contain the most records?
> >
> > The suggested answer from Courses; Classes; StudentClassses and
> > Students was:
> >
> > StudentClasses
> >
> > How would this be determined? Why wouldn't Students contain the most
> > records?
> >
> > Thanks
> >
> > Fran
> >
>
> Assuming that the database contains only data about students that are
> currently taking courses, there will be at least one row in StudentClasses
> for each row in Student. Every student that's taking more than one course
> will have additional rows in StudentClasses. So Student classes has more
> rows.
>
> A similar argument applies to Classes versus StudentClasses. Since every
> class has at least one student, and most have more, there are more rows in
> StudentClasses than in Classes.
>
> There are likely to be more rows in Classes than in Courses, unless there
> are a lot of courses that are currently not being taught.

I'm learning so much here. Thanks everyone. I hope I don't sound like an ignorant moron, but frankly, I'd much prefer to run that risk here, than in front of a class, especially when I might spread my own ignorance about prior to assessment.

Lets say for the sake of argument, that we have a school with 500 students (pretty close to the size of the place where I teach actually).

In this hypothetical school, students take a minimum of five courses and a maximum of 8 courses. In total, there are 40 different courses at the school.

Each course is listed in the StudentCourses table. Each course name is the value of the CourseID field. A foreign key from the Students table called "StudentID" uniquely identifies each student in each course.

All 500 students are listed in the "Students" table. This table has "CourseID" as a foreign key.

The aggregate number of instances of a StudentID (regardless of duplicates) appears in all the courses for which there are records is 3860.

Is it likely that the number of records in StudentCourses is

  1. 40
  2. 3860
  3. 40*500 (ie 20000)
  4. Something else worked out by another method
  5. Insufficient data to determine an ballpark answer

Assuming it's not (a) I'd like the reasoning so I can get it clear in my head.

> A better question might be whether the optimizer can use the (approximate)
> cardinality of the tables, at the time of the query, in addition to other
> cues, to pick the best strategy for performing the query.

What does "cardinality" mean? What's an "optimizer"?

Thanks once again.

Fran Received on Tue Aug 23 2005 - 02:22:25 CEST

Original text of this message