Re: A better SQL implementation?

From: Carl Federl <cfederl_at_yahoo.com>
Date: 17 Jun 2006 07:03:51 -0700
Message-ID: <1150553031.049916.231380_at_h76g2000cwa.googlegroups.com>


>From "3.3 Horror scenario 1", Dr. Harry McKame indicates that under a
b-tree implimitation, 2,000 disk access would be required for this SQL:

select * from employees where job = 'programmer' and dept = 'dept1'
and we assume that there are only 10 rows in the result.

This is incorrect, as the actual number of IO is 21 using SQL Server. The IO ratio is then 21/12 or 175% not 2 orders of magnitude.

Dr McKame is also incorrect on the data range values when he states "Assume now 1,000,000 employees in the table, among them 1,000 programmers. Assume also 1,000 different departments in this company (for symmetry). "

With this senario, the combination of Job and Department would be unique. As an Alternative, have 1,000 different jobs, 100 departments and each department has each of the 1,000 jobs ten times.

Finally, Dr McKame states that a set system will do one access for each of the indicies. Since the undelying physical implimitation model has not been provided, it is not possible to validate this statements.

However, assumming that a fixed storage unit size of 8K is used and the record id is 8 bytes, I have recalculated the number of disk storage units and then the disk I/O:
For each of the 1000 JobId, there would 1,000 employees, the set index would then have a size of 1000 * 8 bytes, or about 8K, which is 1 storage unit.
For each of the 100 DeptIds, there would be 10,000 employees, the set index would then have a size of about 80K, which is 10 storage unit.

Based on one I/O for each storage unit, the set based index I/O would be 21 based on:

1.  1 I/O to read the JobId index entry
2.  10 I/O to read the DeptId index entry
3.  10 I/O to get the data rows

This totals 21 I/O for the set based index which is exactly equal to the 21 I/O B-Tree index solution used by SQL Server.

The SQL Server execution plan is:

  1. Scan the JobId index to get the employee rowids
  2. Scan the DeptId index to get the employee rowids
  3. Merge join the two sets of employee rowids
  4. Get the data for the resulting employee rowid

Here is the SQL to reproduce:
Table master.dbo.sequences constains 32768 rows with seq having values from 0 to 32767. For how to populate this tables see
http://www.aspfaq.com/show.asp?id=2516

create table Employees

(EmployeeId	integer not null identity(1,1)
,JobId	integer not null
,DeptId	integer not null

, constraint Employees_P Primary key clustered (EmployeeId) WITH FILLFACTOR = 100
)
Insert into Employees
(JobId, DeptId)
select Jobs.seq, Depts.seq
from master.dbo.sequences Jobs
cross join

        master.dbo.sequences Depts
cross join

	master.dbo.sequences EDJ
where 	Jobs.seq	between 1 and 1000
and	Depts.seq	between 1 and 100
and	EDJ.seq		between 1 and 10

go
create index Employees_X_JobId on Employees (JobId ) go
create index Employees_X_DeptId on Employees (DeptId ) go
set statistics io on
go
select 	*
from 	Employees
where	JobId = 345
and	DeptId = 67

go
set statistics io off
go
set showplan_text on
go
select 	*
from 	Employees
where	JobId = 345
and	DeptId = 67

go Received on Sat Jun 17 2006 - 16:03:51 CEST

Original text of this message