Path: dp-news.maxwell.syr.edu!spool.maxwell.syr.edu!drn.maxwell.syr.edu!news.maxwell.syr.edu!postnews.google.com!g47g2000cwa.googlegroups.com!not-for-mail
From: "Mikito Harakiri" <mikharakiri_nospaum@yahoo.com>
Newsgroups: microsoft.public.sqlserver.programming,comp.databases.theory
Subject: Re: SQL Humor
Date: 19 Aug 2005 16:18:03 -0700
Organization: http://groups.google.com
Lines: 246
Message-ID: <1124493483.455266.82160@g47g2000cwa.googlegroups.com>
References: <1124385453.899643.54220@f14g2000cwb.googlegroups.com>
   <1124389414.375636.99790@g49g2000cwa.googlegroups.com>
   <1124390672.631946.49660@z14g2000cwz.googlegroups.com>
   <1124391355.281387.67280@g44g2000cwa.googlegroups.com>
   <1124396257.455079.19740@g43g2000cwa.googlegroups.com>
   <e7s9g15sq4c1p45kgdvooh4d0ckgbdpemo@4ax.com>
   <1124399874.893606.276650@g14g2000cwa.googlegroups.com>
   <mmv9g11api0dnc9l7falp1ipqtad0q6b80@4ax.com>
   <1124402576.470238.211960@o13g2000cwo.googlegroups.com>
   <eclcg19u8755p8o054c77j81jqn424mv9k@4ax.com>
NNTP-Posting-Host: 148.87.1.171
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
X-Trace: posting.google.com 1124493488 27804 127.0.0.1 (19 Aug 2005 23:18:08 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Fri, 19 Aug 2005 23:18:08 +0000 (UTC)
In-Reply-To: <eclcg19u8755p8o054c77j81jqn424mv9k@4ax.com>
User-Agent: G2/0.2
Complaints-To: groups-abuse@google.com
Injection-Info: g47g2000cwa.googlegroups.com; posting-host=148.87.1.171;
   posting-account=jduQMA0AAABKdEh0L1SEqrR3hnlHUlAG
Xref: dp-news.maxwell.syr.edu microsoft.public.sqlserver.programming:499434 comp.databases.theory:33124


Hugo Kornelis wrote:
> On 18 Aug 2005 15:02:56 -0700, Mikito Harakiri wrote:
>
> >Hugo Kornelis wrote:
> >> Many companies use a short mnemonic code for their customers. The use of
> >> a short mnemonic code for products is not uncommon either. Imagine a
> >> customers table with 10,000 customers, a products table with 5,000
> >> products and an orders table with a few million rows. Now would you
> >> prefer char(6) or varchar(6) for CustomerCode? And char(5) or varchar(5)
> >> for ProductCode?
> >
> >But if lookup table cardinality goes up, then, the storage factor
> >char(n) vs varchar(n) goes down! Perhaps, you can convince me that the
> >effect of the two trailing bytes is not miniscule, as I previously
> >thought, but it just can't be significant. Any benchmark demonstrating
> >that the performance degradation is not in single percentage digits is
> >welcome.
>
> Hi Mikito,
>
> Below is the script I used to compare performance. I compared three
> tasks:
>
> 1. Performing lots of single inserts (speed of inserts matters in
> databases that have to process thousands of new rows per second)
> 2. Adding an index (not an everyday task, but there are situations where
> a process can be sped up by adding an index, doing the process, then
> dropping the index again)
> 3. Generating a typical report that involves some joins (the CASE
> expressions I used may look silly, but imagine that I am using a payment
> status instead - also note that I store the results in a temporary table
> first, to make sure that network speed won't influence the results)
>
> You'll find the full code further below. But first the results:
>
> 1. For the inserts, the elapsed time on my machine was 118156 ms with
> char, 124406 ms with varchar. A degradation of over 5 %.
>
> 2. Adding the index took 2840 ms with char, 3236 ms with varchar. A
> performance degradation of almost 14 %.
>
> 3a. Generating the report with the index created in step 2 took 224 ms
> on average for char; 234 ms on average for varchar (I used three
> consecutive executions for both tests, each time starting with a clean
> cache). Performance degradation: 4.5 %.
> 3b. Generating the same report after dropping the supporting index
> showed an even bigger difference: avg 2281 for char; avg 3250 for
> varchar. Degradation: over 40 %.
>
> 1a. To top it off, I repeated the insertion test, but this time AFTER
> creating the extra index. For char, the time taken was 227936 ms; for
> varchar 313110 ms. Performance degradation: 37 %.
>
> So here's your choice: either you use the index, degrade the queries by
> "only" 4.5% but at the cost of slowing inserts down by 37%, or you
> discard the index to reverse these figures (inserts slowed down 5%;
> queries slowed down 40%).
>
> Or you choose char instead of varchar when you know that the length is
> more or less fixed. :-)
>
>
> Here's the code I used for this benchmark:
>
> -- Set up the tables
> CREATE TABLE Clients
>         (ClientID char(6) NOT NULL,
>          ClientName varchar(36) NOT NULL,
>          PRIMARY KEY (ClientID)
>         )
> CREATE TABLE Products
>         (ProdID char(5) NOT NULL,
>          ProdName varchar(30) NOT NULL,
>          PRIMARY KEY (ProdID)
>         )
> CREATE TABLE Orders
>         (ClientID char(6) NOT NULL,
>          ProdID char(5) NOT NULL,
>          PRIMARY KEY (ClientID, ProdID),
>          FOREIGN KEY (ClientID) REFERENCES Clients,
>          FOREIGN KEY (ProdID) REFERENCES Products
>         )
> go
> -- Put some rows in the Clients table
> DECLARE @i int, @r int, @NumClients int
> SET @NumClients = 10000
> SET @i = 0
> WHILE @i < @NumClients
>   BEGIN
>     SET @r = CAST(@i AS bigint) * 308915776 / @NumClients
>     INSERT Clients (ClientID, ClientName)
>     VALUES(CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65
> + ((@r / 676) % 26))
>          + CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) %
> 26))
>          + CHAR(65 + ((@r / 11881376) % 26)),
>            LEFT(CAST(NewID() as VARCHAR(36)), 10 + RAND() * 26))
>     SET @i = @i + 1
>   END
> select count(*) from Clients
> go
> -- Put some rows in the Products table
> DECLARE @i int, @r int, @NumProds int
> SET @NumProds = 5000
> SET @i = 0
> WHILE @i < @NumProds
>   BEGIN
>     SET @r = CAST(@i AS bigint) * 11881376 / @NumProds
>     INSERT Products (ProdID, ProdName)
>     VALUES(CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65
> + ((@r / 676) % 26))
>          + CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) %
> 26)),
>            LEFT(CAST(NewID() as VARCHAR(36)), 5 + RAND() * 25))
>     SET @i = @i + 1
>   END
> select count(*) from Products
> go
> -- Have each client order up to 10 products
> DECLARE @Start datetime, @End datetime
> SET @Start = CURRENT_TIMESTAMP
> DECLARE @c int, @i int, @p int, @r int, @NumClients int, @NumProds int,
> @ClientID char(6), @ProdID char(5), @Amt tinyint
> SET @NumProds = 5000
> SET @NumClients = 10000
> SET @c = 0
> WHILE @c < @NumClients
>   BEGIN
>     SET @r = CAST(@c AS bigint) * 308915776 / @NumClients
>     SET @ClientID =
>            CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65
> + ((@r / 676) % 26))
>          + CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) %
> 26))
>          + CHAR(65 + ((@r / 11881376) % 26))
>     SET @Amt = RAND() * 20 + 20
>     SET @i = 0
>     WHILE @i < @Amt
>       BEGIN
>         SET @r = CAST(RAND() * @NumProds AS bigint) * 11881376 /
> @NumProds
>         SET @ProdID =
>                CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) +
> CHAR(65 + ((@r / 676) % 26))
>              + CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976)
> % 26))
>         INSERT Orders (ClientID, ProdID)
>         SELECT @ClientID, @ProdID
>         WHERE  NOT EXISTS (SELECT *
>                            FROM   Orders
>                            WHERE  ClientID = @ClientID
>                            AND    ProdID   = @ProdID)
>         SET @i = @i + 1
>       END
>     SET @c = @c + 1
>     IF @c % 100 = 0
>       PRINT @c
>   END
> SET @End = CURRENT_TIMESTAMP
> SELECT 'Generating orders', @Start, @End, DATEDIFF(ms, @Start, @End) AS
> Elapsed
> select count(*) from Orders
> go
> -- Add an index
> DECLARE @Start datetime, @End datetime
> SET @Start = CURRENT_TIMESTAMP
> CREATE INDEX Orders_ProdID ON Orders(ProdID)
> SET @End = CURRENT_TIMESTAMP
> SELECT 'Adding an index', @Start, @End, DATEDIFF(ms, @Start, @End) AS
> Elapsed
> go
> -- DROP INDEX Orders.Orders_ProdID
> -- Flush buffers
> CHECKPOINT
> DBCC DROPCLEANBUFFERS
> DBCC FREEPROCCACHE
> go
> -- Perform a typical query
> DECLARE @Start datetime, @End datetime
> SET @Start = CURRENT_TIMESTAMP
> SELECT     p.ProdID,
>            COUNT(CASE WHEN LEN(c.ClientName) < 20 THEN 1 END) AS
> ShortNameOrders,
>            COUNT(CASE WHEN LEN(c.ClientName) > 30 THEN 1 END) AS
> LongNameOrders,
>            COUNT(c.ClientName) AS TotalOrders
> INTO       #Results
> FROM       Products AS p
> INNER JOIN Orders   AS o
>       ON   o.ProdID = p.ProdID
> INNER JOIN Clients  AS c
>       ON   c.ClientID = o.ClientID
> WHERE      p.ProdID LIKE 'D%'
> GROUP BY   p.ProdID
> SET @End = CURRENT_TIMESTAMP
> SELECT 'Query with join', @Start, @End, DATEDIFF(ms, @Start, @End) AS
> Elapsed
> SELECT * FROM #Results
> DROP TABLE #Results
> go
> -- Clean up the mess
> DROP TABLE Orders
> DROP TABLE Products
> DROP TABLE Clients
> go

Hugo,

I was going to perform this test in oracle, but then I came across the
following passage:

Tom...
I was told once that if you only need a single char
use CHAR(1) since using VARCHAR2(1) has overhead
due to the Oracle having to maintain how long
the value is in a VARCHAR2 field

Is this a load of rubbish or is there some truth
to it?

Followup:

that is rubbish.

A char(n) is a varchar2(n) that is blank padded to it's maximum length.
 The
implementation of a char physically on disk is identical to a varchar2
-- there is a length byte for a char(1) just as there is for a
varchar2(1).

http://www.jlcomp.demon.co.uk/faq/char_vs_varchar2.html
3. As opposed to a wide-spread misunderstanding, char (1) does *not*
use less space in the database than varchar2 (1) when the value 'a' is
stored because also char (1) has a length field. This can be proved by
the dump () function.

I guess this storage quirk doesn't really affect TPC-C benchmarks:)


It might be interesting, however, to dig down into the insert
performance difference. As I mentioned, 2 extra bytes shouldn't really
affect the speed of index unique scan.

BTW, I always use INTEGERs for id columns.