Re: How to identify unicode characters in record

From: Martin T. <bilbothebagginsbab5_at_freenet.de>
Date: 7 Sep 2006 14:00:12 -0700
Message-ID: <1157662812.224100.18650@h48g2000cwc.googlegroups.com>

Ana C. Dent wrote:
> If I am having a good day, I can barely spell unicode.
> We are in the process of upgrading our application to support unicode
> characters.
> CREATE TABLE LOOKUP
> (ID NUMBER,
> DESCRIPTION VARCHAR2(320));
> This table exists in a 10GR2 database that supports UTF-8 character set.
>
> How do I query the databse to return all the IDs where DESCRIPTION contains
> 1 or more unicode (non-ASCII) characters?
>
> I am more than willing to RTFM, if you point me at which FM has the answer.
>
> Free clues would be much appreciated.
>

Ana - I think both the tips from Michael and Charles will work. (Byte value >=128 or byte count vs. char count)

I want to make you aware to an issue with UTF-8 columns we recently stumbled over.
It is entirely possible to insert invalid UTF-8 strings into an UTF-8 VARCHAR2 column if the client has set the wrong character set. If the client tells the server the charset matches, no conversion will take place of the bytes that the client sends as string, and whatever it sends will get inserted into the column.

best,
Martin Received on Thu Sep 07 2006 - 16:00:12 CDT