Path: news.netfront.net!paganini.linuxd.org!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail
From: Axel Schwenke <axel.schwenke@gmx.de>
Newsgroups: comp.databases.mysql
Subject: Re: mysql lenght() and char_length not working for longer texts
Date: Tue, 7 Jul 2015 10:00:15 +0200
Organization: xl@home
Lines: 54
Message-ID: <fhds6c-cjj.ln1@xl.homelinux.org>
References: <6dd880fe-c726-4913-b05e-d06bc0e42d4c@googlegroups.com>
    <mn3qj0$fse$1@dont-email.me>
    <varchar-length-20150705231630@ram.dialup.fu-berlin.de>
    <mnes4u$u7l$1@dont-email.me> <4939682.h9gfqvCFYZ@PointedEars.de>
Reply-To: axel.schwenke@gmx.de
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Injection-Info: mx02.eternal-september.org; posting-host="e6f7d978b7c9398287b8d263d7684ab7";
 logging-data="4485"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+wmTuprcri13WD63eshU3Zv9TNbGriki8="
X-Newsreader: knews 1.0b.1
Cancel-Lock: sha1:1OYq8e7ONCKzBdjrgT8AZtc/pz0=
Xref: news.netfront.net comp.databases.mysql:2458

Thomas 'PointedEars' Lahn <PointedEars@web.de> wrote:
> Lennart Jonsson wrote:
>
>> If you store unicode characters in a column you better use a character
>> set that supports it, like utf-8
>>
>> create table t (x varchar(3) not null)
>> engine = innodb character set utf8;
>
> _UTF-8_ is _not_ a character set.  It is a character _encoding_ for a
> character set.

Smartass!

MySQL itself uses the term "CHARACTER SET" in the parser and in the
manual. Your point is completely useless in the context "MySQL".
If you are not willing to help with THE PROBLEM but would rather BE
RIGHT then I ask you to be silent. You are not helpful.

>> Assuming this is true, there are characters that are represented with
>> more than two bytes so just doubling the space wont suffice,
>
> It will not.  In UTF-8, a Unicode character can be encoded with up to 6
> 8-bit bytes.

Again: the context is MySQL. Here a column with CHARACTER SET utf8 can
use at most 3 bytes per character because MySQL supports only
characters from the BMP (Unicode U+0 ... U+FFFF) and those can all be
encoded in utf8 sequences with at most 3 bytes.

MySQL 5.5.3 adds support for Unicode characters beyond U+FFFF. The
respective CHARACTER SET is called utf8mb4 (plus others). See:

https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html

As for the size constraints: the length of CHAR and VARCHAR columns is
specified in characters. So it depends on the CHARCTER SET (which can
be set separately for each column) how big the maximal lenght of each
column in bytes will be. MySQL is conservative when it calculates
those lengths and checks limits. So even while it is most unlikely that
you will ever have data in a utf8 column that uses 3 bytes for each
character, MySQL still acts as if that is the case. Hence the limit
of 64K bytes per row will restrict you to utf8 [VAR]CHAR columns with
at most 21845 characters. In fact it will be less than that because
there is additional overhead.

The 64K limit is *not* dependent on the storage engine. It is a limit
of the storage engine interface. Storage engines might impose
additional restrictions on top of that. In fact InnoDB has an internal
limit of 8K per row. But then InnoDB can store VARCHAR columns outside
the row, too (like BLOB and TEXT).


XL
