Re: Determining which rows have characters outside of the standard ASCII (0-127)

From: Guillermo Alan Bort <cicciuxdba_at_gmail.com>
Date: Tue, 9 Nov 2010 11:00:03 -0300
Message-ID: <AANLkTikEr6JgH4TL0Vta6wwCYsrr5DRwRdFX0pJsfdco_at_mail.gmail.com>



can you describe the table authors for us? Alan.-

On Tue, Nov 9, 2010 at 10:43 AM, Bobak, Mark <Mark.Bobak_at_proquest.com>wrote:

> Hi Bill,
>
>
>
> I tried your suggestion, but, it doesn’t seem to work as expected:
>
>
>
> pqrac101:[pqprd1]:(/home/oracle):$csscan
>
>
>
>
>
> Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9
> 08:16:45 2010
>
>
>
> Copyright (c) 1982, 2005, Oracle. All rights reserved.
>
>
>
>
>
> Username: adds_at_prd1
>
>
>
> Password:
>
>
>
> Connected to:
>
> Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit
> Production
>
> With the Partitioning, Real Application Clusters and Data Mining options
>
>
>
> (1)Full database, (2)User, (3)Table, (4)Column: 1 > 4
>
>
>
> Current database character set is WE8ISO8859P1.
>
>
>
> Enter new database character set name: > US7ASCII
>
>
>
> Enter array fetch buffer size: 1024000 >
>
>
>
> Enter number of scan processes to utilize(1..32): 1 >
>
>
>
> Enter column name to scan: > DOCUMENTS.DOC_AUTHORS
>
>
>
> Enter column name to scan: >
>
>
>
> Enumerating tables to scan...
>
>
>
> table(s) contain no character type columns
>
>
>
> Scanner terminated successfully.
>
>
>
> Not sure what “contain no character type columns” actually means…..
>
>
>
> -Mark
>
>
>
>
>
> *From:* Johnson, William L (TEIS) [mailto:WLJohnson_at_tycoelectronics.com]
> *Sent:* Tuesday, November 09, 2010 8:04 AM
> *To:* Bobak, Mark; oracle-l_at_freelists.org
> *Subject:* RE: Determining which rows have characters outside of the
> standard ASCII (0-127)
>
>
>
> Could you try using Oracle’s csscan utility and pick a US7ASCII database as
> the target for the migration? This should identify any rows with data
> outside the normal US7ASCII character set…
>
>
>
> Bill
>
>
> ------------------------------
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *Bobak, Mark
> *Sent:* Tuesday, November 09, 2010 7:56 AM
> *To:* oracle-l_at_freelists.org
> *Subject:* Determining which rows have characters outside of the standard
> ASCII (0-127)
>
>
>
> Hi all,
>
>
>
> I’m trying to (efficiently) determine which rows have column values with
> characters outside of the range of 0-127.
>
>
>
> My first attempt was something like this:
>
>
>
> select doc_id,doc_authors from documents where
> regexp_instr(doc_authors,’[0x80-0xFF]’) > 0;
>
>
>
> But, that seems to select every row in the documents table, not just the
> ones containing characters with values in the range 128-255.
>
>
>
> Looking at one of the rows returned from the query above, with
> dump(doc_authors) confirms that rows being returned don’t have characters
> in the range 128-255.
>
>
>
> This is a Unicode database, so, I also tried:
>
>
>
> select doc_id,doc_authors from documents where
> regexp_instr(doc_authors,’[0c0080-0cFFFF]’) > 0;
>
>
>
> but, again, this seems to return every row.
>
>
>
> So, can someone offer me a clue here?
>
>
>
> Honestly, this is the first time I’ve tried using any of Oracle’s REGEXP
> functions.
>
>
>
> I’m sure I’m just doing something stupid, but I don’t have a clue what it
> is, and the examples I’ve run across in the manuals and on the web, don’t
> have anything similar to what I’m trying to do.
>
>
>
>
>
> AdvThanksance,
>
>
>
> -Mark
>
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Nov 09 2010 - 08:00:03 CST

Original text of this message