RE: Determining which rows have characters outside of the standard ASCII (0-127)

From: Bobak, Mark <Mark.Bobak_at_proquest.com>
Date: Tue, 9 Nov 2010 09:12:58 -0500
Message-ID: <6AFC12B9BFCDEA45B7274C534738067F5E9E4E59_at_AAPQMAILBX02V.proque.st>



Nope.

-----Original Message-----

From: Johnson, William L (TEIS) [mailto:WLJohnson_at_tycoelectronics.com] Sent: Tuesday, November 09, 2010 9:12 AM To: Herring Dave - dherri; Bobak, Mark; oracle-l_at_freelists.org Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

Is this a partitioned table?
Bug 9488049: CSSCAN DOES NOT SCAN PARTITIONED IOT TABLE.

-----Original Message-----

From: Herring Dave - dherri [mailto:Dave.Herring_at_acxiom.com] Sent: Tuesday, November 09, 2010 9:10 AM To: Mark.Bobak_at_proquest.com; Johnson, William L (TEIS); oracle-l_at_freelists.org Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

Mark,

Could something like the following work for you?

select doc_id, doc_authors
  from documents
 where doc_authors != convert(doc_authors, 'us7ascii');

Dave Herring  | DBA
Acxiom Global Technology Solutions   

630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax 1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA_at_DNB.com

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Bobak, Mark Sent: Tuesday, November 09, 2010 7:44 AM To: Johnson, William L (TEIS); oracle-l_at_freelists.org Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

Hi Bill,

I tried your suggestion, but, it doesn't seem to work as expected:

pqrac101:[pqprd1]:(/home/oracle):$csscan

Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9 08:16:45 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Username: adds_at_prd1

Password:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters and Data Mining options

(1)Full database, (2)User, (3)Table, (4)Column: 1 > 4

Current database character set is WE8ISO8859P1.

Enter new database character set name: > US7ASCII

Enter array fetch buffer size: 1024000 >

Enter number of scan processes to utilize(1..32): 1 >

Enter column name to scan: > DOCUMENTS.DOC_AUTHORS

Enter column name to scan: >

Enumerating tables to scan...

table(s) contain no character type columns

Scanner terminated successfully.

Not sure what "contain no character type columns" actually means.....

-Mark

From: Johnson, William L (TEIS) [mailto:WLJohnson_at_tycoelectronics.com] Sent: Tuesday, November 09, 2010 8:04 AM To: Bobak, Mark; oracle-l_at_freelists.org Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

Could you try using Oracle's csscan utility and pick a US7ASCII database as the target for the migration?  This should identify any rows with data outside the normal US7ASCII character set...

Bill



From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Bobak, Mark Sent: Tuesday, November 09, 2010 7:56 AM To: oracle-l_at_freelists.org
Subject: Determining which rows have characters outside of the standard ASCII (0-127)

Hi all,

I'm trying to (efficiently) determine which rows have column values with characters outside of the range of 0-127.

My first attempt was something like this:

select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0x80-0xFF]') > 0;

But, that seems to select every row in the documents table, not just the ones containing characters with values in the range 128-255.

Looking at one of the rows returned from the query above, with dump(doc_authors) confirms that rows being returned don't have characters in the range 128-255.

This is a Unicode database, so, I also tried:

select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;

but, again, this seems to return every row.

So, can someone offer me a clue here?

Honestly, this is the first time I've tried using any of Oracle's REGEXP functions.

I'm sure I'm just doing something stupid, but I don't have a clue what it is, and the examples I've run across in the manuals and on the web, don't have anything similar to what I'm trying to do.

AdvThanksance,

-Mark



The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged.

If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.

If you have received this communication in error, please resend this communication to the sender and delete the original message or any copy of it from your computer system.

Thank You.


--

http://www.freelists.org/webpage/oracle-l Received on Tue Nov 09 2010 - 08:12:58 CST

Original text of this message