Re: mysql lenght() and char_length not working for longer texts
Date: Tue, 07 Jul 2015 08:42:01 +0200
Message-ID: <4939682.h9gfqvCFYZ_at_PointedEars.de>
Lennart Jonsson wrote:
> If you store unicode characters in a column you better use a character
_UTF-8_ is _not_ a character set. It is a character _encoding_ for a
> set that supports it, like utf-8
>
> create table t (x varchar(3) not null)
> engine = innodb character set utf8;
The keyword “character set” in MySQL is for historic reasons.
> å,ä, and ö are two byte characters in utf-8
No, they are encoded with two 8-bit byte long code sequences each in UTF-8, the 8-bit Unicode Transformation Format.
>> The following might work: >> >> CREATE TABLE T ( N VARCHAR( 65533 ) NOT NULL ) >> ENGINE = MyISAM CHARACTER SET latin1; >>
>
> Assuming this is true, there are characters that are represented with
> more than two bytes so just doubling the space wont suffice,
It will not. In UTF-8, a Unicode character can be encoded with up to 6 8-bit bytes.
> see for example g-clef which is 4 bytes:
The code sequence for the Unicode character U+1D11E MUSICAL SYMBOL G CLEF is 4 8-bit bytes long *in UTF-8*: 0xF0 0x9D 0x84 0x9E.
> http://www.fileformat.info/info/unicode/char/1d11e/index.htm
<https://r12a.github.io/apps/conversion/>
> But beside that, the whole point with the length restriction would be
> gone since you could store twice the amount of one byte characters.
There is no such thing as "one byte characters". How many bytes encoding a character takes depends on the encoding.
-- PointedEars Twitter: _at_PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.Received on Tue Jul 07 2015 - 08:42:01 CEST