Re: Determining which rows have characters outside of the standard ASCII (0-127)

From: Guillermo Alan Bort <cicciuxdba_at_gmail.com>
Date: Tue, 9 Nov 2010 11:00:27 -0300
Message-ID: <AANLkTik3cCMQ==rB6_py+E15kxuyUgNvR7sg1btu18kg_at_mail.gmail.com>



that was the table documents, sorry.
Alan.-

On Tue, Nov 9, 2010 at 11:00 AM, Guillermo Alan Bort <cicciuxdba_at_gmail.com>wrote:

> can you describe the table authors for us?
> Alan.-
>
>
>
> On Tue, Nov 9, 2010 at 10:43 AM, Bobak, Mark <Mark.Bobak_at_proquest.com>wrote:
>
>> Hi Bill,
>>
>>
>>
>> I tried your suggestion, but, it doesn’t seem to work as expected:
>>
>>
>>
>> pqrac101:[pqprd1]:(/home/oracle):$csscan
>>
>>
>>
>>
>>
>> Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9
>> 08:16:45 2010
>>
>>
>>
>> Copyright (c) 1982, 2005, Oracle. All rights reserved.
>>
>>
>>
>>
>>
>> Username: adds_at_prd1
>>
>>
>>
>> Password:
>>
>>
>>
>> Connected to:
>>
>> Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit
>> Production
>>
>> With the Partitioning, Real Application Clusters and Data Mining options
>>
>>
>>
>> (1)Full database, (2)User, (3)Table, (4)Column: 1 > 4
>>
>>
>>
>> Current database character set is WE8ISO8859P1.
>>
>>
>>
>> Enter new database character set name: > US7ASCII
>>
>>
>>
>> Enter array fetch buffer size: 1024000 >
>>
>>
>>
>> Enter number of scan processes to utilize(1..32): 1 >
>>
>>
>>
>> Enter column name to scan: > DOCUMENTS.DOC_AUTHORS
>>
>>
>>
>> Enter column name to scan: >
>>
>>
>>
>> Enumerating tables to scan...
>>
>>
>>
>> table(s) contain no character type columns
>>
>>
>>
>> Scanner terminated successfully.
>>
>>
>>
>> Not sure what “contain no character type columns” actually means…..
>>
>>
>>
>> -Mark
>>
>>
>>
>>
>>
>> *From:* Johnson, William L (TEIS) [mailto:WLJohnson_at_tycoelectronics.com]
>> *Sent:* Tuesday, November 09, 2010 8:04 AM
>> *To:* Bobak, Mark; oracle-l_at_freelists.org
>> *Subject:* RE: Determining which rows have characters outside of the
>> standard ASCII (0-127)
>>
>>
>>
>> Could you try using Oracle’s csscan utility and pick a US7ASCII database
>> as the target for the migration? This should identify any rows with data
>> outside the normal US7ASCII character set…
>>
>>
>>
>> Bill
>>
>>
>> ------------------------------
>>
>> *From:* oracle-l-bounce_at_freelists.org [mailto:
>> oracle-l-bounce_at_freelists.org] *On Behalf Of *Bobak, Mark
>> *Sent:* Tuesday, November 09, 2010 7:56 AM
>> *To:* oracle-l_at_freelists.org
>> *Subject:* Determining which rows have characters outside of the standard
>> ASCII (0-127)
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I’m trying to (efficiently) determine which rows have column values with
>> characters outside of the range of 0-127.
>>
>>
>>
>> My first attempt was something like this:
>>
>>
>>
>> select doc_id,doc_authors from documents where
>> regexp_instr(doc_authors,’[0x80-0xFF]’) > 0;
>>
>>
>>
>> But, that seems to select every row in the documents table, not just the
>> ones containing characters with values in the range 128-255.
>>
>>
>>
>> Looking at one of the rows returned from the query above, with
>> dump(doc_authors) confirms that rows being returned don’t have characters
>> in the range 128-255.
>>
>>
>>
>> This is a Unicode database, so, I also tried:
>>
>>
>>
>> select doc_id,doc_authors from documents where
>> regexp_instr(doc_authors,’[0c0080-0cFFFF]’) > 0;
>>
>>
>>
>> but, again, this seems to return every row.
>>
>>
>>
>> So, can someone offer me a clue here?
>>
>>
>>
>> Honestly, this is the first time I’ve tried using any of Oracle’s REGEXP
>> functions.
>>
>>
>>
>> I’m sure I’m just doing something stupid, but I don’t have a clue what it
>> is, and the examples I’ve run across in the manuals and on the web, don’t
>> have anything similar to what I’m trying to do.
>>
>>
>>
>>
>>
>> AdvThanksance,
>>
>>
>>
>> -Mark
>>
>>
>>
>>
>>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Nov 09 2010 - 08:00:27 CST

Original text of this message