Re: mysql lenght() and char_length not working for longer texts

From: Axel Schwenke <axel.schwenke_at_gmx.de>
Date: Tue, 7 Jul 2015 10:00:15 +0200
Message-ID: <fhds6c-cjj.ln1_at_xl.homelinux.org>


Thomas 'PointedEars' Lahn <PointedEars_at_web.de> wrote:
> Lennart Jonsson wrote:
>
>> If you store unicode characters in a column you better use a character
>> set that supports it, like utf-8
>>
>> create table t (x varchar(3) not null)
>> engine = innodb character set utf8;
>
> _UTF-8_ is _not_ a character set. It is a character _encoding_ for a
> character set.

Smartass!

MySQL itself uses the term "CHARACTER SET" in the parser and in the manual. Your point is completely useless in the context "MySQL". If you are not willing to help with THE PROBLEM but would rather BE RIGHT then I ask you to be silent. You are not helpful.

>> Assuming this is true, there are characters that are represented with
>> more than two bytes so just doubling the space wont suffice,
>
> It will not. In UTF-8, a Unicode character can be encoded with up to 6
> 8-bit bytes.

Again: the context is MySQL. Here a column with CHARACTER SET utf8 can use at most 3 bytes per character because MySQL supports only characters from the BMP (Unicode U+0 ... U+FFFF) and those can all be encoded in utf8 sequences with at most 3 bytes.

MySQL 5.5.3 adds support for Unicode characters beyond U+FFFF. The respective CHARACTER SET is called utf8mb4 (plus others). See:

https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html

As for the size constraints: the length of CHAR and VARCHAR columns is specified in characters. So it depends on the CHARCTER SET (which can be set separately for each column) how big the maximal lenght of each column in bytes will be. MySQL is conservative when it calculates those lengths and checks limits. So even while it is most unlikely that you will ever have data in a utf8 column that uses 3 bytes for each character, MySQL still acts as if that is the case. Hence the limit of 64K bytes per row will restrict you to utf8 [VAR]CHAR columns with at most 21845 characters. In fact it will be less than that because there is additional overhead.

The 64K limit is *not* dependent on the storage engine. It is a limit of the storage engine interface. Storage engines might impose additional restrictions on top of that. In fact InnoDB has an internal limit of 8K per row. But then InnoDB can store VARCHAR columns outside the row, too (like BLOB and TEXT).

XL Received on Tue Jul 07 2015 - 10:00:15 CEST

Original text of this message