Re: mysql lenght() and char_length not working for longer texts

From: Thomas 'PointedEars' Lahn <PointedEars_at_web.de>
Date: Tue, 07 Jul 2015 08:42:01 +0200
Message-ID: <4939682.h9gfqvCFYZ_at_PointedEars.de>


Lennart Jonsson wrote:

> If you store unicode characters in a column you better use a character
> set that supports it, like utf-8
>
> create table t (x varchar(3) not null)
> engine = innodb character set utf8;

_UTF-8_ is _not_ a character set. It is a character _encoding_ for a character set. That character set is the Unicode character set, Unicode for short. (Not to be confused with UCS, the Universal Character Set.)

The keyword “character set” in MySQL is for historic reasons.

> å,ä, and ö are two byte characters in utf-8

No, they are encoded with two 8-bit byte long code sequences each in UTF-8, the 8-bit Unicode Transformation Format.

>>    The following might work:
>>
>> CREATE TABLE T ( N VARCHAR( 65533 ) NOT NULL )
>> ENGINE = MyISAM CHARACTER SET latin1;
>>

>
> Assuming this is true, there are characters that are represented with
> more than two bytes so just doubling the space wont suffice,

It will not. In UTF-8, a Unicode character can be encoded with up to 6 8-bit bytes.

> see for example g-clef which is 4 bytes:

The code sequence for the Unicode character U+1D11E MUSICAL SYMBOL G CLEF is 4 8-bit bytes long *in UTF-8*: 0xF0 0x9D 0x84 0x9E.

> http://www.fileformat.info/info/unicode/char/1d11e/index.htm

<https://r12a.github.io/apps/conversion/>  

> But beside that, the whole point with the length restriction would be
> gone since you could store twice the amount of one byte characters.

There is no such thing as "one byte characters". How many bytes encoding a character takes depends on the encoding.

<http://www.unicode.org/faq/>

-- 
PointedEars

Twitter: _at_PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
Received on Tue Jul 07 2015 - 08:42:01 CEST

Original text of this message