Re: SQL Humor
Date: Sat, 20 Aug 2005 00:32:28 +0200
Message-ID: <eclcg19u8755p8o054c77j81jqn424mv9k_at_4ax.com>
On 18 Aug 2005 15:02:56 -0700, Mikito Harakiri wrote:
>Hugo Kornelis wrote:
Hi Mikito,
Below is the script I used to compare performance. I compared three
tasks:
You'll find the full code further below. But first the results:
>> Many companies use a short mnemonic code for their customers. The use of
>> a short mnemonic code for products is not uncommon either. Imagine a
>> customers table with 10,000 customers, a products table with 5,000
>> products and an orders table with a few million rows. Now would you
>> prefer char(6) or varchar(6) for CustomerCode? And char(5) or varchar(5)
>> for ProductCode?
>
>But if lookup table cardinality goes up, then, the storage factor
>char(n) vs varchar(n) goes down! Perhaps, you can convince me that the
>effect of the two trailing bytes is not miniscule, as I previously
>thought, but it just can't be significant. Any benchmark demonstrating
>that the performance degradation is not in single percentage digits is
>welcome.
3a. Generating the report with the index created in step 2 took 224 ms
on average for char; 234 ms on average for varchar (I used three
consecutive executions for both tests, each time starting with a clean
cache). Performance degradation: 4.5 %.
3b. Generating the same report after dropping the supporting index
showed an even bigger difference: avg 2281 for char; avg 3250 for
varchar. Degradation: over 40 %.
1a. To top it off, I repeated the insertion test, but this time AFTER creating the extra index. For char, the time taken was 227936 ms; for varchar 313110 ms. Performance degradation: 37 %.
So here's your choice: either you use the index, degrade the queries by "only" 4.5% but at the cost of slowing inserts down by 37%, or you discard the index to reverse these figures (inserts slowed down 5%; queries slowed down 40%).
Or you choose char instead of varchar when you know that the length is more or less fixed. :-)
Here's the code I used for this benchmark:
- Set up the tables CREATE TABLE Clients (ClientID char(6) NOT NULL, ClientName varchar(36) NOT NULL, PRIMARY KEY (ClientID) ) CREATE TABLE Products (ProdID char(5) NOT NULL, ProdName varchar(30) NOT NULL, PRIMARY KEY (ProdID) ) CREATE TABLE Orders (ClientID char(6) NOT NULL, ProdID char(5) NOT NULL, PRIMARY KEY (ClientID, ProdID), FOREIGN KEY (ClientID) REFERENCES Clients, FOREIGN KEY (ProdID) REFERENCES Products ) go
- Put some rows in the Clients table DECLARE _at_i int, @r int, @NumClients int SET _at_NumClients = 10000 SET _at_i = 0 WHILE _at_i < @NumClients BEGIN SET _at_r = CAST(@i AS bigint) * 308915776 / @NumClients INSERT Clients (ClientID, ClientName) VALUES(CHAR(65 + (_at_r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 + ((_at_r / 676) % 26)) + CHAR(65 + ((_at_r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % 26)) + CHAR(65 + ((_at_r / 11881376) % 26)), LEFT(CAST(NewID() as VARCHAR(36)), 10 + RAND() * 26)) SET _at_i = @i + 1 END select count(*) from Clients go
- Put some rows in the Products table DECLARE _at_i int, @r int, @NumProds int SET _at_NumProds = 5000 SET _at_i = 0 WHILE _at_i < @NumProds BEGIN SET _at_r = CAST(@i AS bigint) * 11881376 / @NumProds INSERT Products (ProdID, ProdName) VALUES(CHAR(65 + (_at_r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 + ((_at_r / 676) % 26)) + CHAR(65 + ((_at_r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % 26)), LEFT(CAST(NewID() as VARCHAR(36)), 5 + RAND() * 25)) SET _at_i = @i + 1 END select count(*) from Products go
- Have each client order up to 10 products DECLARE _at_Start datetime, @End datetime SET _at_Start = CURRENT_TIMESTAMP DECLARE _at_c int, @i int, @p int, @r int, @NumClients int, @NumProds int, _at_ClientID char(6), @ProdID char(5), @Amt tinyint SET _at_NumProds = 5000 SET _at_NumClients = 10000 SET _at_c = 0 WHILE _at_c < @NumClients BEGIN SET _at_r = CAST(@c AS bigint) * 308915776 / @NumClients SET _at_ClientID = CHAR(65 + (_at_r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 + ((_at_r / 676) % 26)) + CHAR(65 + ((_at_r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % 26)) + CHAR(65 + ((_at_r / 11881376) % 26)) SET _at_Amt = RAND() * 20 + 20 SET _at_i = 0 WHILE _at_i < @Amt BEGIN SET _at_r = CAST(RAND() * @NumProds AS bigint) * 11881376 / _at_NumProds SET _at_ProdID = CHAR(65 + (_at_r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 + ((_at_r / 676) % 26)) + CHAR(65 + ((_at_r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % 26)) INSERT Orders (ClientID, ProdID) SELECT _at_ClientID, @ProdID WHERE NOT EXISTS (SELECT * FROM Orders WHERE ClientID = _at_ClientID AND ProdID = _at_ProdID) SET _at_i = @i + 1 END SET _at_c = @c + 1 IF _at_c % 100 = 0 PRINT _at_c END SET _at_End = CURRENT_TIMESTAMP SELECT 'Generating orders', _at_Start, @End, DATEDIFF(ms, @Start, @End) AS Elapsed select count(*) from Orders go
- Add an index DECLARE _at_Start datetime, @End datetime SET _at_Start = CURRENT_TIMESTAMP CREATE INDEX Orders_ProdID ON Orders(ProdID) SET _at_End = CURRENT_TIMESTAMP SELECT 'Adding an index', _at_Start, @End, DATEDIFF(ms, @Start, @End) AS Elapsed go
- DROP INDEX Orders.Orders_ProdID
- Flush buffers CHECKPOINT DBCC DROPCLEANBUFFERS DBCC FREEPROCCACHE go
- Perform a typical query DECLARE _at_Start datetime, @End datetime SET _at_Start = CURRENT_TIMESTAMP SELECT p.ProdID, COUNT(CASE WHEN LEN(c.ClientName) < 20 THEN 1 END) AS ShortNameOrders, COUNT(CASE WHEN LEN(c.ClientName) > 30 THEN 1 END) AS LongNameOrders, COUNT(c.ClientName) AS TotalOrders INTO #Results FROM Products AS p INNER JOIN Orders AS o ON o.ProdID = p.ProdID INNER JOIN Clients AS c ON c.ClientID = o.ClientID WHERE p.ProdID LIKE 'D%' GROUP BY p.ProdID SET _at_End = CURRENT_TIMESTAMP SELECT 'Query with join', _at_Start, @End, DATEDIFF(ms, @Start, @End) AS Elapsed SELECT * FROM #Results DROP TABLE #Results go
- Clean up the mess DROP TABLE Orders DROP TABLE Products DROP TABLE Clients go
Best, Hugo
-- (Remove _NO_ and _SPAM_ to get my e-mail address)Received on Sat Aug 20 2005 - 00:32:28 CEST