AW: [EXT] Re: varchar2 DATA_LENGTH vs. NLS_CHARACTERSET vs. Oracle version

From: Pausch Manfred <Manfred.Pausch_at_tgw-group.com>
Date: Thu, 17 Dec 2020 11:13:37 +0000
Message-ID: <AM7P191MB112103F0F9E56F836F754EC9C6C40_at_AM7P191MB1121.EURP191.PROD.OUTLOOK.COM>


Hi Sayan,

Thanks for your hints. It helped me to find where I was thinking the wrong way...

"UTF8 character set [...] encodes characters in one, two, or three bytes. [...] Supplementary characters inserted into a UTF8 database [...] represented by two three-byte codes and hence occupies six bytes of memory in total."

"AL32UTF8 character set [...] encodes characters in one, two, three, or four bytes. Supplementary characters require four bytes"

So the maximum 6000bytes for 2 "varchar2(1000 CHAR)" columns in the index are small enough to fit into the maximum key length of 6398bytes in a 8k block in the UTF8 database. But the maximum 8000bytes for 2 "varchar2(1000 CHAR)" columns are too large for the max key length of 6398bytes in a 8k block in the AL32UTF8 database.

Best regards.
Manfred

Von: Sayan Malakshinov <xt.and.r_at_gmail.com> Gesendet: Mittwoch, 16. Dezember 2020 16:12 An: Pausch Manfred <Manfred.Pausch_at_tgw-group.com> Cc: Oracle Mailing List <oracle-l_at_freelists.org> Betreff: [EXT] Re: varchar2 DATA_LENGTH vs. NLS_CHARACTERSET vs. Oracle version

Hi Manfred,

There is a good doc for that in MOS:  AL32UTF8 / UTF8 (Unicode) Database Character Set Implications (Doc ID 788156.1)

You need B.5) Do I need to convert from UTF8 to AL32UTF8 ? Oracle UTF8 is Unicode revision 3.0 in 8.1.7 and up. AL32UTF8 is updated with newer Unicode versions in each major release, in Oracle RDBMS 18.1 it is updated to Unicode 9.0. See also https://support.oracle.com/epmos/faces/DocumentDisplay?parent=DOCUMENT&sourceId=788156.1&id=260893.1 Unicode character sets in the Oracle database.  

On Wed, Dec 16, 2020 at 5:58 PM Pausch Manfred <mailto:Manfred.Pausch_at_tgw-group.com> wrote: Hello all,

have 2 databases. One is 11.2.0.4, the other is 19c. (yes, it's time for upgrading to 19c *g*)

11.2 has NLS_CHARACTERSET = UTF8
19c has NLS_CHARACTERSET = AL32UTF8

On both DBs I have a table
CREATE TABLE mytable

   ( "PRESETNUMBERCONFIGPRJID" NUMBER(12,0), 
     "PRESETNUMBER" NUMBER(12,0), 
     "SHELLSIZE" VARCHAR2(1000 CHAR), 
     "FILMTYPE" VARCHAR2(1000 CHAR),
     <some other insignificant columns>
  )



On 11.2 I have an index (because of a unique constraint): CREATE UNIQUE INDEX myindex ON mytable ("SHELLSIZE", "FILMTYPE") ;

On 19c creation of the same index fails with: ORA-01450: maximum key length (6398) exceeded

Tablespace blocksize on both databases is 8k.

Querying USER_TAB_COLS with
SELECT data_type, data_length, char_col_decl_length, char_length, char_used FROM   user_tab_cols
WHERE  table_name = mytable
AND    column_name in ('SHELLSIZE', 'FILMTYPE');

Returns
On 11.2:
DATA_TYPE  DATA_LENGTH CHAR_COL_DECL_LENGTH CHAR_LENGTH C

---------- ----------- -------------------- ----------- -
VARCHAR2          3000                 3000        1000 C VARCHAR2          3000                 3000        1000 C And on 19c:
DATA_TYPE  DATA_LENGTH CHAR_COL_DECL_LENGTH CHAR_LENGTH C
---------- ----------- -------------------- ----------- -
VARCHAR2          4000                 4000        1000 C VARCHAR2          4000                 4000        1000 C As I understand index creation on 19c (NLS_CHARACTERSET = AL32UTF8) fails because
data_length of both columns (4000+4000) is more than the maximum key length (6398).

But why the data_length for a "varchar2(xxx CHAR)" column is different on both databases? Has it to do with the different NLS_CHARACTERSET settings? Or with the different database versions? Or something totally different?

As far as I know UTF8 encodings are variable-width, which use up to 4bytes (depending on the character you want to store...)

So maybe someone could shed some light, please :-)

Thanks in advance
Manfred

--
Best regards,
Sayan Malakshinov
Oracle performance tuning engineer
Oracle ACE Associate
http://orasql.org
†Ûiÿü0ÁúÞzX¬¶Ê+ƒün– {ú+iÉ^ Received on Thu Dec 17 2020 - 12:13:37 CET

Original text of this message