On 18 Aug 2005 15:02:56 -0700, Mikito Harakiri wrote:
>Hugo Kornelis wrote:
>> Many companies use a short mnemonic code for their customers. The use of
>> a short mnemonic code for products is not uncommon either. Imagine a
>> customers table with 10,000 customers, a products table with 5,000
>> products and an orders table with a few million rows. Now would you
>> prefer char(6) or varchar(6) for CustomerCode? And char(5) or varchar(5)
>> for ProductCode?
>
>But if lookup table cardinality goes up, then, the storage factor
>char(n) vs varchar(n) goes down! Perhaps, you can convince me that the
>effect of the two trailing bytes is not miniscule, as I previously
>thought, but it just can't be significant. Any benchmark demonstrating
>that the performance degradation is not in single percentage digits is
>welcome.
Hi Mikito,
Below is the script I used to compare performance. I compared three
tasks:
- Performing lots of single inserts (speed of inserts matters in
databases that have to process thousands of new rows per second)
- Adding an index (not an everyday task, but there are situations where
a process can be sped up by adding an index, doing the process, then
dropping the index again)
- Generating a typical report that involves some joins (the CASE
expressions I used may look silly, but imagine that I am using a payment
status instead - also note that I store the results in a temporary table
first, to make sure that network speed won't influence the results)
You'll find the full code further below. But first the results:
- For the inserts, the elapsed time on my machine was 118156 ms with
char, 124406 ms with varchar. A degradation of over 5 %.
- Adding the index took 2840 ms with char, 3236 ms with varchar. A
performance degradation of almost 14 %.
3a. Generating the report with the index created in step 2 took 224 ms
on average for char; 234 ms on average for varchar (I used three
consecutive executions for both tests, each time starting with a clean
cache). Performance degradation: 4.5 %.
3b. Generating the same report after dropping the supporting index
showed an even bigger difference: avg 2281 for char; avg 3250 for
varchar. Degradation: over 40 %.
1a. To top it off, I repeated the insertion test, but this time AFTER
creating the extra index. For char, the time taken was 227936 ms; for
varchar 313110 ms. Performance degradation: 37 %.
So here's your choice: either you use the index, degrade the queries by
"only" 4.5% but at the cost of slowing inserts down by 37%, or you
discard the index to reverse these figures (inserts slowed down 5%;
queries slowed down 40%).
Or you choose char instead of varchar when you know that the length is
more or less fixed. :-)
Here's the code I used for this benchmark:
- Set up the tables
CREATE TABLE Clients
(ClientID char(6) NOT NULL,
ClientName varchar(36) NOT NULL,
PRIMARY KEY (ClientID)
)
CREATE TABLE Products
(ProdID char(5) NOT NULL,
ProdName varchar(30) NOT NULL,
PRIMARY KEY (ProdID)
)
CREATE TABLE Orders
(ClientID char(6) NOT NULL,
ProdID char(5) NOT NULL,
PRIMARY KEY (ClientID, ProdID),
FOREIGN KEY (ClientID) REFERENCES Clients,
FOREIGN KEY (ProdID) REFERENCES Products
)
go
- Put some rows in the Clients table
DECLARE @i int, @r int, @NumClients int
SET @NumClients = 10000
SET @i = 0
WHILE @i < @NumClients
BEGIN
SET @r = CAST(@i AS bigint) * 308915776 / @NumClients
INSERT Clients (ClientID, ClientName)
VALUES(CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65
+ ((@r / 676) % 26))
+ CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) %
26))
+ CHAR(65 + ((@r / 11881376) % 26)),
LEFT(CAST(NewID() as VARCHAR(36)), 10 + RAND() * 26))
SET @i = @i + 1
END
select count(*) from Clients
go
- Put some rows in the Products table
DECLARE @i int, @r int, @NumProds int
SET @NumProds = 5000
SET @i = 0
WHILE @i < @NumProds
BEGIN
SET @r = CAST(@i AS bigint) * 11881376 / @NumProds
INSERT Products (ProdID, ProdName)
VALUES(CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65
+ ((@r / 676) % 26))
+ CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) %
26)),
LEFT(CAST(NewID() as VARCHAR(36)), 5 + RAND() * 25))
SET @i = @i + 1
END
select count(*) from Products
go
- Have each client order up to 10 products
DECLARE @Start datetime, @End datetime
SET @Start = CURRENT_TIMESTAMP
DECLARE @c int, @i int, @p int, @r int, @NumClients int, @NumProds int,
@ClientID char(6), @ProdID char(5), @Amt tinyint
SET @NumProds = 5000
SET @NumClients = 10000
SET @c = 0
WHILE @c < @NumClients
BEGIN
SET @r = CAST(@c AS bigint) * 308915776 / @NumClients
SET @ClientID =
CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65
+ ((@r / 676) % 26))
+ CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) %
26))
+ CHAR(65 + ((@r / 11881376) % 26))
SET @Amt = RAND() * 20 + 20
SET @i = 0
WHILE @i < @Amt
BEGIN
SET @r = CAST(RAND() * @NumProds AS bigint) * 11881376 /
@NumProds
SET @ProdID =
CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) +
CHAR(65 + ((@r / 676) % 26))
+ CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976)
% 26))
INSERT Orders (ClientID, ProdID)
SELECT @ClientID, @ProdID
WHERE NOT EXISTS (SELECT *
FROM Orders
WHERE ClientID = @ClientID
AND ProdID = @ProdID)
SET @i = @i + 1
END
SET @c = @c + 1
IF @c % 100 = 0
PRINT @c
END
SET @End = CURRENT_TIMESTAMP
SELECT 'Generating orders', @Start, @End, DATEDIFF(ms, @Start, @End) AS
Elapsed
select count(*) from Orders
go
- Add an index
DECLARE @Start datetime, @End datetime
SET @Start = CURRENT_TIMESTAMP
CREATE INDEX Orders_ProdID ON Orders(ProdID)
SET @End = CURRENT_TIMESTAMP
SELECT 'Adding an index', @Start, @End, DATEDIFF(ms, @Start, @End) AS
Elapsed
go
- DROP INDEX Orders.Orders_ProdID
- Flush buffers
CHECKPOINT
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
go
- Perform a typical query
DECLARE @Start datetime, @End datetime
SET @Start = CURRENT_TIMESTAMP
SELECT p.ProdID,
COUNT(CASE WHEN LEN(c.ClientName) < 20 THEN 1 END) AS
ShortNameOrders,
COUNT(CASE WHEN LEN(c.ClientName) > 30 THEN 1 END) AS
LongNameOrders,
COUNT(c.ClientName) AS TotalOrders
INTO #Results
FROM Products AS p
INNER JOIN Orders AS o
ON o.ProdID = p.ProdID
INNER JOIN Clients AS c
ON c.ClientID = o.ClientID
WHERE p.ProdID LIKE 'D%'
GROUP BY p.ProdID
SET @End = CURRENT_TIMESTAMP
SELECT 'Query with join', @Start, @End, DATEDIFF(ms, @Start, @End) AS
Elapsed
SELECT * FROM #Results
DROP TABLE #Results
go
- Clean up the mess
DROP TABLE Orders
DROP TABLE Products
DROP TABLE Clients
go
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)
Received on Fri Aug 19 2005 - 17:32:28 CDT