Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Usenet -> c.d.o.misc -> Re: storing byte values

Re: storing byte values

From: Vladimir M. Zakharychev <>
Date: Tue, 07 Aug 2007 08:50:29 -0700
Message-ID: <>

On Aug 7, 6:44 pm, wrote:
> Thanks for your answers, but they are not very helpful... ;-)
> I can't explain the data model or the business plan, because it's not
> my model or plan. I need to migrate any existing database (within
> certain rules, of course) and it needs to work with an existing
> application.
> So... is there a way to store byte values and to define a unique
> constraint which includes the column with the byte values?
> It would be sufficient to use a CHAR or VARCHAR column, if all
> possible bytes would be returned unchanged. But even with NCHAR there
> seems to be data loss, because some bytes seem to be unbound in the
> used character set. For example: when writing the character 0x80 into
> an NCHAR column and reading it again, the returned value is 0xBF. The
> same goes for 26 other bytes. They all get converted to 0xBF.
> Do I have to pay attention to something special when using NCHAR?
> By the way, I'm using JDBC to connect to the database. I insert the
> values as string parameters of an PreparedStatement, so they should
> arrive as unicode strings in the database.

If you want to store binary data that's not subject to charset conversions, in Oracle you use RAW or BLOB. You can have more than one RAW column in a table (don't confuse RAW with LONG RAW, the latter has one column per table restriction, but LONGs are obsolete since 8i and are replaced with LOBs. They are still there for compatibility with previous releases and massive inherited code base that still uses them, but Oracle expressly warns in the docs that LONGs may disappear in any future release. Maybe they already did in 11g..?)

Any character type, be it VARCHAR2 or NVARCHAR2, is subject to charset conversions, so you can't store arbitrary binary data in such columns unless you somehow encode it into a charset-neutral form (like Base64) or cast it to RAW and back using UTL_RAW package procedures. Since you can't place constraints on BLOBs, your only obvious choice is RAW, which is limited to 4000 bytes (that is, RAW(4000) is the longest possible RAW column.)

Anyway, I urge you to reconsider your approach, which is flawed: using PRIMARY KEY constraint on a long column to enforce uniqueness of binary data of up to 10kb in length stored in that column probably won't work unless you use large block size and a composite primary key over two or three RAW columns into which the data is broken down on 4k boundaries, and even in this case it will be highly inefficient way to ensure uniqueness of your binary data. What you could do instead is this:


 ,HASH$ RAW(40)

and then hash the data using SHA-1 (see DBMS_CRYPTO package, Hash() function,) and put the hash into the HASH$ column. You can do this in a BEFORE INSERT FOR EACH ROW trigger on the table. You can get away with 32 bytes per key if you use MD5 instead of SHA-1, but chance of hash collision is slightly bigger with MD5 (still negligible unless you're going to store gazillions of unique documents.) Note that since HASH$ depends on DATA, it can't be the primary key, so I simply create a UNIQUE constraint on HASH$ to enforce uniqueness of DATA through its hash and introduce a surrogate primary key ID, which will uniquely identify all rows in the table and can be used for referential integrity.


   Vladimir M. Zakharychev
   N-Networks, makers of Dynamic PSP(tm) Received on Tue Aug 07 2007 - 10:50:29 CDT

Original text of this message