RE: Determining which rows have characters outside of the standard ASCII (0-127)

From: Bobak, Mark <Mark.Bobak_at_proquest.com>
Date: Tue, 9 Nov 2010 08:43:35 -0500
Message-ID: <6AFC12B9BFCDEA45B7274C534738067F5E9E4CB8_at_AAPQMAILBX02V.proque.st>



Hi Bill,

I tried your suggestion, but, it doesn't seem to work as expected:

pqrac101:[pqprd1]:(/home/oracle):$csscan

Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9 08:16:45 2010

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Username: adds_at_prd1

Password:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters and Data Mining options

(1)Full database, (2)User, (3)Table, (4)Column: 1 > 4

Current database character set is WE8ISO8859P1.

Enter new database character set name: > US7ASCII

Enter array fetch buffer size: 1024000 >

Enter number of scan processes to utilize(1..32): 1 >

Enter column name to scan: > DOCUMENTS.DOC_AUTHORS

Enter column name to scan: >

Enumerating tables to scan...

table(s) contain no character type columns

Scanner terminated successfully.

Not sure what "contain no character type columns" actually means.....

-Mark

From: Johnson, William L (TEIS) [mailto:WLJohnson_at_tycoelectronics.com] Sent: Tuesday, November 09, 2010 8:04 AM To: Bobak, Mark; oracle-l_at_freelists.org Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

Could you try using Oracle's csscan utility and pick a US7ASCII database as the target for the migration? This should identify any rows with data outside the normal US7ASCII character set...

Bill



From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Bobak, Mark Sent: Tuesday, November 09, 2010 7:56 AM To: oracle-l_at_freelists.org
Subject: Determining which rows have characters outside of the standard ASCII (0-127)

Hi all,

I'm trying to (efficiently) determine which rows have column values with characters outside of the range of 0-127.

My first attempt was something like this:

select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0x80-0xFF]') > 0;

But, that seems to select every row in the documents table, not just the ones containing characters with values in the range 128-255.

Looking at one of the rows returned from the query above, with dump(doc_authors) confirms that rows being returned don't have characters in the range 128-255.

This is a Unicode database, so, I also tried:

select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;

but, again, this seems to return every row.

So, can someone offer me a clue here?

Honestly, this is the first time I've tried using any of Oracle's REGEXP functions.

I'm sure I'm just doing something stupid, but I don't have a clue what it is, and the examples I've run across in the manuals and on the web, don't have anything similar to what I'm trying to do.

AdvThanksance,

-Mark

--

http://www.freelists.org/webpage/oracle-l Received on Tue Nov 09 2010 - 07:43:35 CST

Original text of this message