Re: A better SQL implementation?
Date: 17 Jun 2006 07:03:51 -0700
Message-ID: <1150553031.049916.231380_at_h76g2000cwa.googlegroups.com>
>From "3.3 Horror scenario 1", Dr. Harry McKame indicates that under a
b-tree implimitation, 2,000 disk access would be required for this SQL:
select * from employees where job = 'programmer' and dept =
'dept1'
This is incorrect, as the actual number of IO is 21 using SQL Server.
The IO ratio is then 21/12 or 175% not 2 orders of magnitude.
Dr McKame is also incorrect on the data range values when he states
"Assume now 1,000,000 employees in the table, among them 1,000
programmers. Assume also 1,000 different departments in this company
(for symmetry). "
With this senario, the combination of Job and Department would be
unique. As an Alternative, have 1,000 different jobs, 100 departments
and each department has each of the 1,000 jobs ten times.
Finally, Dr McKame states that a set system will do one access for each
of the indicies. Since the undelying physical implimitation model has
not been provided, it is not possible to validate this statements.
However, assumming that a fixed storage unit size of 8K is used and the
record id is 8 bytes, I have recalculated the number of disk storage
units and then the disk I/O:
Based on one I/O for each storage unit, the set based index I/O would
be 21 based on:
This totals 21 I/O for the set based index which is exactly equal to
the 21 I/O B-Tree index solution used by SQL Server.
The SQL Server execution plan is:
Here is the SQL to reproduce:
create table Employees
and we assume that there are only 10 rows in the result.
For each of the 1000 JobId, there would 1,000 employees, the set index
would then have a size of 1000 * 8 bytes, or about 8K, which is 1
storage unit.
For each of the 100 DeptIds, there would be 10,000 employees, the set
index would then have a size of about 80K, which is 10 storage unit.
1. 1 I/O to read the JobId index entry
2. 10 I/O to read the DeptId index entry
3. 10 I/O to get the data rows
Table master.dbo.sequences constains 32768 rows with seq having values
from 0 to 32767. For how to populate this tables see
http://www.aspfaq.com/show.asp?id=2516
(EmployeeId integer not null identity(1,1)
,JobId integer not null
,DeptId integer not null
, constraint Employees_P Primary key clustered (EmployeeId) WITH
FILLFACTOR = 100
)
Insert into Employees
(JobId, DeptId)
select Jobs.seq, Depts.seq
from master.dbo.sequences Jobs
cross join
master.dbo.sequences Depts
cross join
master.dbo.sequences EDJ where Jobs.seq between 1 and 1000 and Depts.seq between 1 and 100 and EDJ.seq between 1 and 10
go
create index Employees_X_JobId on Employees (JobId ) go
create index Employees_X_DeptId on Employees (DeptId ) go
set statistics io on
go
select * from Employees where JobId = 345 and DeptId = 67
go
set statistics io off
go
set showplan_text on
go
select * from Employees where JobId = 345 and DeptId = 67
go Received on Sat Jun 17 2006 - 16:03:51 CEST