Re: Remove dupes with CLOB columns involved

From: Vladimir M. Zakharychev <vladimir.zakharychev_at_gmail.com>
Date: Tue, 23 Jun 2009 01:37:00 -0700 (PDT)
Message-ID: <d078cfdc-ce60-45b9-805b-efd5a05c6969_at_e21g2000yqb.googlegroups.com>



On Jun 23, 11:45 am, "Álvaro G. Vicario" <alvaro.NOSPAMTH..._at_demogracia.com> wrote:
> John Hurley escribió:
>
>
>
> > On Jun 22, 11:25 am, "Álvaro G. Vicario"
> > <alvaro.NOSPAMTH..._at_demogracia.com> wrote:
> >> I've created a GLOBAL TEMPORARY TABLE where one of the columns stores
> >> the result of concatenating a variable (and unknown) number of
> >> VARCHAR2(100) cells. I didn't want to hit the 4,000 character limit when
> >> the source table has more than 40 rows so I created such temp column as
> >> CLOB. But when I joined my temporary table with some other regular
> >> tables I got an "ORA-00932: inconsistent datatypes: expected - got CLOB"
> >> error message. The problem seems to be that it's a "SELECT DISTINCT"
> >> query and I've read in docs that it's a known limit. Replacing DISTINCT
> >> with GROUP BY doesn't change this.
>
> >> Looking at the documentation, I appeared to me that TO_CHAR() could be a
> >> solution but it isn't: it fixes ORA-00932 but I get "ORA-22835: Buffer
> >> too small for CLOB to CHAR or BLOB to RAW conversion (actual: 4001,
> >> maximum: 4000)" when the CLOB contains more than 4000 bytes. Some forums
> >> suggest DBMS_LOB.SUBSTR() but it also fails for 4000+ (ORA-06502:
> >> PL/SQL: error : buffer de cadenas de caracteres demasiado pequeño
> >> numérico o de valor ORA-06512: en línea 1).
>
> >> I just need to run a query where some of the columns are CLOB and get
> >> rid of duplicated rows. Is there an easy approach I could use?
>
> >> It's a PHP/5.2+OCI8 app running on top of Oracle 10g Express Edition in
> >> a Windows XP box. Thank you in advance,
> > Could you give an overview of what exactly you are trying to
> > accomplish?
>
> > Perhaps a redesign of the database tables involved might be a better
> > idea.  While there's nothing exactly wrong with BLOBs or CLOBs ...
> > they can get unwieldy and quite complicated unless planned and tested
> > from the beginnings of design phases.
>
> It's part of a module that prints custom reports (defined by users).
> Among (manu) other things, each reportable object has a list of
> attributes. A simplified example:
>
> object:
>         object_id
>         object_name
>
> attribute:
>         attribute_id
>         object_id
>         attribute_name
>
> I'm emulating MySQL's GROUP_CONCAT() function so I get one row per
> object with all the attributes in one cell:
>
> object_name, attribute_list
> 'object_1', 'attr_1_1; attr_1_2; attr_1_2'
> 'object_2', 'attr_2_1; attr_2_2'
>
> It happened that the code I'm using [1][2] implies some complex
> operations (CONNECT BY and the like) and was crashing Oracle Express
> Edition if used as subquery in the main query. So I decided to run these
> queries first and store the result in a temporary table. Since the
> attribute table can have lots of rows I created attribute_list as CLOB
> to prevent truncation errors.
>
> So far I'm considering two possibilities:
>
> - Get over with it and just truncate large data (after all, 4,000 bytes
> of attributes might not be a real life possibility).
> - Fetch all rows and remove dupes in the application side (the PHP code
> might not be optimal but it's easy to write).
>
> [1]http://halisway.blogspot.com/2006/08/oracle-groupconcat-updated-again...
> [2]http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2...
>
> --
> --http://alvaro.es- Álvaro G. Vicario - Burgos, Spain
> -- Mi sitio sobre programación web:http://borrame.com
> -- Mi web de humor satinado:http://www.demogracia.com
> --

Another way to solve this: just compute a cryptographic hash on attribute list (MD5 or SHA1, whichever you prefer.) Regardless how long your attribute lists are, their hashes are fixed length and are perfectly comparable. Something similar to this might do:

select distinct dbms_crypto.hash(clob_col, 2 / *dbms_crypto.hash_md5*/)
  from my_gtt_with_clob

Hth,

   Vladimir M. Zakharychev
   N-Networks, makers of Dynamic PSP(tm)    http://www.dynamicpsp.com Received on Tue Jun 23 2009 - 03:37:00 CDT

Original text of this message