Re: SQL Humor

From: Hugo Kornelis <hugo_at_pe_NO_rFact.in_SPAM_fo>
Date: Sat, 20 Aug 2005 00:32:28 +0200
Message-ID: <eclcg19u8755p8o054c77j81jqn424mv9k_at_4ax.com>

On 18 Aug 2005 15:02:56 -0700, Mikito Harakiri wrote:

>Hugo Kornelis wrote:
>> Many companies use a short mnemonic code for their customers. The use of
>> a short mnemonic code for products is not uncommon either. Imagine a
>> customers table with 10,000 customers, a products table with 5,000
>> products and an orders table with a few million rows. Now would you
>> prefer char(6) or varchar(6) for CustomerCode? And char(5) or varchar(5)
>> for ProductCode?
>
>But if lookup table cardinality goes up, then, the storage factor
>char(n) vs varchar(n) goes down! Perhaps, you can convince me that the
>effect of the two trailing bytes is not miniscule, as I previously
>thought, but it just can't be significant. Any benchmark demonstrating
>that the performance degradation is not in single percentage digits is
>welcome.

Hi Mikito,

Below is the script I used to compare performance. I compared three tasks:

Performing lots of single inserts (speed of inserts matters in databases that have to process thousands of new rows per second)
Adding an index (not an everyday task, but there are situations where a process can be sped up by adding an index, doing the process, then dropping the index again)
Generating a typical report that involves some joins (the CASE expressions I used may look silly, but imagine that I am using a payment status instead - also note that I store the results in a temporary table first, to make sure that network speed won't influence the results)

You'll find the full code further below. But first the results:

For the inserts, the elapsed time on my machine was 118156 ms with char, 124406 ms with varchar. A degradation of over 5 %.
Adding the index took 2840 ms with char, 3236 ms with varchar. A performance degradation of almost 14 %.

3a. Generating the report with the index created in step 2 took 224 ms on average for char; 234 ms on average for varchar (I used three consecutive executions for both tests, each time starting with a clean cache). Performance degradation: 4.5 %.
3b. Generating the same report after dropping the supporting index showed an even bigger difference: avg 2281 for char; avg 3250 for varchar. Degradation: over 40 %.

1a. To top it off, I repeated the insertion test, but this time AFTER creating the extra index. For char, the time taken was 227936 ms; for varchar 313110 ms. Performance degradation: 37 %.

So here's your choice: either you use the index, degrade the queries by "only" 4.5% but at the cost of slowing inserts down by 37%, or you discard the index to reverse these figures (inserts slowed down 5%; queries slowed down 40%).

Or you choose char instead of varchar when you know that the length is more or less fixed. :-)

Here's the code I used for this benchmark:

Set up the tables CREATE TABLE Clients (ClientID char(6) NOT NULL, ClientName varchar(36) NOT NULL, PRIMARY KEY (ClientID) ) CREATE TABLE Products (ProdID char(5) NOT NULL, ProdName varchar(30) NOT NULL, PRIMARY KEY (ProdID) ) CREATE TABLE Orders (ClientID char(6) NOT NULL, ProdID char(5) NOT NULL, PRIMARY KEY (ClientID, ProdID), FOREIGN KEY (ClientID) REFERENCES Clients, FOREIGN KEY (ProdID) REFERENCES Products ) go
Put some rows in the Clients table DECLARE _at_i int, @r int, @NumClients int SET _at_NumClients = 10000 SET _at_i = 0 WHILE _at_i < @NumClients BEGIN SET _at_r = CAST(@i AS bigint) * 308915776 / @NumClients INSERT Clients (ClientID, ClientName) VALUES(CHAR(65 + (_at_r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 + ((_at_r / 676) % 26)) + CHAR(65 + ((_at_r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % 26)) + CHAR(65 + ((_at_r / 11881376) % 26)), LEFT(CAST(NewID() as VARCHAR(36)), 10 + RAND() * 26)) SET _at_i = @i + 1 END select count(*) from Clients go
Put some rows in the Products table DECLARE _at_i int, @r int, @NumProds int SET _at_NumProds = 5000 SET _at_i = 0 WHILE _at_i < @NumProds BEGIN SET _at_r = CAST(@i AS bigint) * 11881376 / @NumProds INSERT Products (ProdID, ProdName) VALUES(CHAR(65 + (_at_r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 + ((_at_r / 676) % 26)) + CHAR(65 + ((_at_r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % 26)), LEFT(CAST(NewID() as VARCHAR(36)), 5 + RAND() * 25)) SET _at_i = @i + 1 END select count(*) from Products go
Have each client order up to 10 products DECLARE _at_Start datetime, @End datetime SET _at_Start = CURRENT_TIMESTAMP DECLARE _at_c int, @i int, @p int, @r int, @NumClients int, @NumProds int, _at_ClientID char(6), @ProdID char(5), @Amt tinyint SET _at_NumProds = 5000 SET _at_NumClients = 10000 SET _at_c = 0 WHILE _at_c < @NumClients BEGIN SET _at_r = CAST(@c AS bigint) * 308915776 / @NumClients SET _at_ClientID = CHAR(65 + (_at_r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 + ((_at_r / 676) % 26)) + CHAR(65 + ((_at_r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % 26)) + CHAR(65 + ((_at_r / 11881376) % 26)) SET _at_Amt = RAND() * 20 + 20 SET _at_i = 0 WHILE _at_i < @Amt BEGIN SET _at_r = CAST(RAND() * @NumProds AS bigint) * 11881376 / _at_NumProds SET _at_ProdID = CHAR(65 + (_at_r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 + ((_at_r / 676) % 26)) + CHAR(65 + ((_at_r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % 26)) INSERT Orders (ClientID, ProdID) SELECT _at_ClientID, @ProdID WHERE NOT EXISTS (SELECT * FROM Orders WHERE ClientID = _at_ClientID AND ProdID = _at_ProdID) SET _at_i = @i + 1 END SET _at_c = @c + 1 IF _at_c % 100 = 0 PRINT _at_c END SET _at_End = CURRENT_TIMESTAMP SELECT 'Generating orders', _at_Start, @End, DATEDIFF(ms, @Start, @End) AS Elapsed select count(*) from Orders go
Add an index DECLARE _at_Start datetime, @End datetime SET _at_Start = CURRENT_TIMESTAMP CREATE INDEX Orders_ProdID ON Orders(ProdID) SET _at_End = CURRENT_TIMESTAMP SELECT 'Adding an index', _at_Start, @End, DATEDIFF(ms, @Start, @End) AS Elapsed go
DROP INDEX Orders.Orders_ProdID
Flush buffers CHECKPOINT DBCC DROPCLEANBUFFERS DBCC FREEPROCCACHE go
Perform a typical query DECLARE _at_Start datetime, @End datetime SET _at_Start = CURRENT_TIMESTAMP SELECT p.ProdID, COUNT(CASE WHEN LEN(c.ClientName) < 20 THEN 1 END) AS ShortNameOrders, COUNT(CASE WHEN LEN(c.ClientName) > 30 THEN 1 END) AS LongNameOrders, COUNT(c.ClientName) AS TotalOrders INTO #Results FROM Products AS p INNER JOIN Orders AS o ON o.ProdID = p.ProdID INNER JOIN Clients AS c ON c.ClientID = o.ClientID WHERE p.ProdID LIKE 'D%' GROUP BY p.ProdID SET _at_End = CURRENT_TIMESTAMP SELECT 'Query with join', _at_Start, @End, DATEDIFF(ms, @Start, @End) AS Elapsed SELECT * FROM #Results DROP TABLE #Results go
Clean up the mess DROP TABLE Orders DROP TABLE Products DROP TABLE Clients go

Best, Hugo

-- 

(Remove _NO_ and _SPAM_ to get my e-mail address)

Received on Sat Aug 20 2005 - 00:32:28 CEST

This message: [ Message body ]
Next message: paul c: "Re: SQL Humor"
Previous message: Mikito Harakiri: "Re: SQL Humor"
Maybe in reply to: Mikito Harakiri: "Re: SQL Humor"
In reply to Hugo Kornelis: "Re: SQL Humor"
Next in thread: Mikito Harakiri: "Re: SQL Humor"
Reply: Mikito Harakiri: "Re: SQL Humor"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message