Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: Insert Statement put invalid character data into UTF8 database/column

Re: Insert Statement put invalid character data into UTF8 database/column

From: joel garry <joel-garry_at_home.com>
Date: 3 Jul 2006 15:48:02 -0700
Message-ID: <1151966882.877653.74940@h44g2000cwa.googlegroups.com>

Martin T. wrote:
> Hi all. [Oracle 9i - 9.2.0.1.0]
>
> A while back we had an incident where a insert statement in some script
> file (executed via sqlplus) managed to get us invalid data into a
> VARCHAR2 column in the database.
>
> All character data in the DB is UTF8 and this script with insert
> statements happend to contain a character from the upper ascii range
> ('▀' - 0xDF in ISO-8859-1) The script file itself is encoded in this
> character set (8859).
>
> No what happened is, that in some cases the bytes of the UTF8 column
> did not contain the correct UTF8 encoding of this character, but the
> byte-value 0xDF was just inserted into the column. That is after
> running the insert statement
> INSERT into my_table (id, abbreviation, name) values (62, 'F1', '20 mm
> Falzboden m. Verschlu▀l.');
> the UTF8-column did contain an _invalid_ UTF8 string, namely a
> byte-string that contained 0xDF.
>
> Naturally queries on this column-rows failed afterwards (e.g. in JAVA
> you get an invalid-encoding-exception)
>
> Can someone explain to me why a simple insert statements with no
> encoding convertions manages to put an invalid string into a database
> column?

Because a simple insert statement has no encoding conversions. This would be a feature. You can even insert 8 bit characters into a 7 bit character set this way. exp/imp will convert for you if you tell the NLS environment you are changing character sets. This would also be a feature. The results may sometimes be considered a misfeature.

So what do you want to wind up with? If you want to update the bad data, use sqlplus. If you don't want to insert bad data with sqlplus, scrub your data. sqlplus doesn't care. If java can't handle your data, don't use java. Is '▀' considered bad data by UTF8? http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96524/c13datyp.htm#14968

You might consider patching your database, too.

jg

--
@home.com is bogus.
What's in your database?
http://www.signonsandiego.com/uniontrib/20060703/news_mz1b3porn.html
Received on Mon Jul 03 2006 - 17:48:02 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US