Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: storing byte values

Re: storing byte values

From: Vladimir M. Zakharychev <vladimir.zakharychev_at_gmail.com>
Date: Tue, 07 Aug 2007 10:52:39 -0700
Message-ID: <1186509159.560762.49130@d55g2000hsg.googlegroups.com>


On Aug 7, 7:50 pm, "Vladimir M. Zakharychev" <vladimir.zakharyc..._at_gmail.com> wrote:
> On Aug 7, 6:44 pm, pascal.dull..._at_veda.net wrote:
>
>
>
> > Thanks for your answers, but they are not very helpful... ;-)
>
> > I can't explain the data model or the business plan, because it's not
> > my model or plan. I need to migrate any existing database (within
> > certain rules, of course) and it needs to work with an existing
> > application.
>
> > So... is there a way to store byte values and to define a unique
> > constraint which includes the column with the byte values?
> > It would be sufficient to use a CHAR or VARCHAR column, if all
> > possible bytes would be returned unchanged. But even with NCHAR there
> > seems to be data loss, because some bytes seem to be unbound in the
> > used character set. For example: when writing the character 0x80 into
> > an NCHAR column and reading it again, the returned value is 0xBF. The
> > same goes for 26 other bytes. They all get converted to 0xBF.
> > Do I have to pay attention to something special when using NCHAR?
>
> > By the way, I'm using JDBC to connect to the database. I insert the
> > values as string parameters of an PreparedStatement, so they should
> > arrive as unicode strings in the database.
>
> If you want to store binary data that's not subject to charset
> conversions, in Oracle you use RAW or BLOB. You can have more than one
> RAW column in a table (don't confuse RAW with LONG RAW, the latter has
> one column per table restriction, but LONGs are obsolete since 8i and
> are replaced with LOBs. They are still there for compatibility with
> previous releases and massive inherited code base that still uses
> them, but Oracle expressly warns in the docs that LONGs may disappear
> in any future release. Maybe they already did in 11g..?)
>
> Any character type, be it VARCHAR2 or NVARCHAR2, is subject to charset
> conversions, so you can't store arbitrary binary data in such columns
> unless you somehow encode it into a charset-neutral form (like Base64)
> or cast it to RAW and back using UTL_RAW package procedures. Since you
> can't place constraints on BLOBs, your only obvious choice is RAW,
> which is limited to 4000 bytes (that is, RAW(4000) is the longest
> possible RAW column.)
>
> Anyway, I urge you to reconsider your approach, which is flawed: using
> PRIMARY KEY constraint on a long column to enforce uniqueness of
> binary data of up to 10kb in length stored in that column probably
> won't work unless you use large block size and a composite primary key
> over two or three RAW columns into which the data is broken down on 4k
> boundaries, and even in this case it will be highly inefficient way to
> ensure uniqueness of your binary data. What you could do instead is
> this:
>
> CREATE TABLE MY_TRICKY_TABLE (
> ID NUMBER(20) PRIMARY KEY
> ,HASH$ RAW(40)
> ,DATA BLOB
> ,CONSTRAINT UQ$MY_TRICKY_TABLE#HASH UNIQUE(HASH$)
> )
>
> and then hash the data using SHA-1 (see DBMS_CRYPTO package, Hash()
> function,) and put the hash into the HASH$ column. You can do this in
> a BEFORE INSERT FOR EACH ROW trigger on the table. You can get away
> with 32 bytes per key if you use MD5 instead of SHA-1, but chance of
> hash collision is slightly bigger with MD5 (still negligible unless
> you're going to store gazillions of unique documents.) Note that since
> HASH$ depends on DATA, it can't be the primary key, so I simply create
> a UNIQUE constraint on HASH$ to enforce uniqueness of DATA through its
> hash and introduce a surrogate primary key ID, which will uniquely
> identify all rows in the table and can be used for referential
> integrity.
>
> Hth,
> Vladimir M. Zakharychev
> N-Networks, makers of Dynamic PSP(tm)
> http://www.dynamicpsp.com

Correction to the above: obviously, I mixed hash lengths in bytes and lengths in characters of their corresponding hexadecimal representations. SHA-1 raw hash size is 20 bytes and MD5 is 16 bytes, so declare the HASH$ as RAW(20) or RAW(16). 40 and 32 would be appropriate for VARCHAR2, respectively.

Regards,

   Vladimir M. Zakharychev
   N-Networks, makers of Dynamic PSP(tm)    http://www.dynamicpsp.com Received on Tue Aug 07 2007 - 12:52:39 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US