Re: Remove dupes with CLOB columns involved

From: Mark D Powell <Mark.Powell_at_eds.com>
Date: Tue, 23 Jun 2009 06:40:19 -0700 (PDT)
Message-ID: <f177be9b-3a2e-4e77-9daa-2cca38e50faa_at_l8g2000vbp.googlegroups.com>



On Jun 23, 4:37 am, "Vladimir M. Zakharychev" <vladimir.zakharyc..._at_gmail.com> wrote:
> On Jun 23, 11:45 am, "Álvaro G. Vicario"
>
>
>
>
>
> <alvaro.NOSPAMTH..._at_demogracia.com> wrote:
> > John Hurley escribió:
>
> > > On Jun 22, 11:25 am, "Álvaro G. Vicario"
> > > <alvaro.NOSPAMTH..._at_demogracia.com> wrote:
> > >> I've created a GLOBAL TEMPORARY TABLE where one of the columns stores
> > >> the result of concatenating a variable (and unknown) number of
> > >> VARCHAR2(100) cells. I didn't want to hit the 4,000 character limit when
> > >> the source table has more than 40 rows so I created such temp column as
> > >> CLOB. But when I joined my temporary table with some other regular
> > >> tables I got an "ORA-00932: inconsistent datatypes: expected - got CLOB"
> > >> error message. The problem seems to be that it's a "SELECT DISTINCT"
> > >> query and I've read in docs that it's a known limit. Replacing DISTINCT
> > >> with GROUP BY doesn't change this.
>
> > >> Looking at the documentation, I appeared to me that TO_CHAR() could be a
> > >> solution but it isn't: it fixes ORA-00932 but I get "ORA-22835: Buffer
> > >> too small for CLOB to CHAR or BLOB to RAW conversion (actual: 4001,
> > >> maximum: 4000)" when the CLOB contains more than 4000 bytes. Some forums
> > >> suggest DBMS_LOB.SUBSTR() but it also fails for 4000+ (ORA-06502:
> > >> PL/SQL: error : buffer de cadenas de caracteres demasiado pequeño
> > >> numérico o de valor ORA-06512: en línea 1).
>
> > >> I just need to run a query where some of the columns are CLOB and get
> > >> rid of duplicated rows. Is there an easy approach I could use?
>
> > >> It's a PHP/5.2+OCI8 app running on top of Oracle 10g Express Edition in
> > >> a Windows XP box. Thank you in advance,
> > > Could you give an overview of what exactly you are trying to
> > > accomplish?
>
> > > Perhaps a redesign of the database tables involved might be a better
> > > idea.  While there's nothing exactly wrong with BLOBs or CLOBs ...
> > > they can get unwieldy and quite complicated unless planned and tested
> > > from the beginnings of design phases.
>
> > It's part of a module that prints custom reports (defined by users).
> > Among (manu) other things, each reportable object has a list of
> > attributes. A simplified example:
>
> > object:
> >         object_id
> >         object_name
>
> > attribute:
> >         attribute_id
> >         object_id
> >         attribute_name
>
> > I'm emulating MySQL's GROUP_CONCAT() function so I get one row per
> > object with all the attributes in one cell:
>
> > object_name, attribute_list
> > 'object_1', 'attr_1_1; attr_1_2; attr_1_2'
> > 'object_2', 'attr_2_1; attr_2_2'
>
> > It happened that the code I'm using [1][2] implies some complex
> > operations (CONNECT BY and the like) and was crashing Oracle Express
> > Edition if used as subquery in the main query. So I decided to run these
> > queries first and store the result in a temporary table. Since the
> > attribute table can have lots of rows I created attribute_list as CLOB
> > to prevent truncation errors.
>
> > So far I'm considering two possibilities:
>
> > - Get over with it and just truncate large data (after all, 4,000 bytes
> > of attributes might not be a real life possibility).
> > - Fetch all rows and remove dupes in the application side (the PHP code
> > might not be optimal but it's easy to write).
>
> > [1]http://halisway.blogspot.com/2006/08/oracle-groupconcat-updated-again...
> > [2]http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2...
>
> > --
> > --http://alvaro.es-Álvaro G. Vicario - Burgos, Spain
> > -- Mi sitio sobre programación web:http://borrame.com
> > -- Mi web de humor satinado:http://www.demogracia.com
> > --
>
> Another way to solve this: just compute a cryptographic hash on
> attribute list (MD5 or SHA1, whichever you prefer.) Regardless how
> long your attribute lists are, their hashes are fixed length and are
> perfectly comparable. Something similar to this might do:
>
> select distinct dbms_crypto.hash(clob_col, 2 /
> *dbms_crypto.hash_md5*/)
>   from my_gtt_with_clob
>
> Hth,
>    Vladimir M. Zakharychev
>    N-Networks, makers of Dynamic PSP(tm)
>    http://www.dynamicpsp.com- Hide quoted text -
>
> - Show quoted text -

Computing a checksum/hash for comparison purposes is one method that I think has merit but you might just want to consider using the dbms_lob.compare procedure to compare the lob values for being equal.

HTH -- Mark D Powell -- Received on Tue Jun 23 2009 - 08:40:19 CDT

Original text of this message