RE: Determining which rows have characters outside of the standard ASCII (0-127)
Date: Tue, 9 Nov 2010 08:03:53 -0500
Message-ID: <2F161F8A09B99B4ABF8AE832D546E7890C0DBC2200_at_us194mx002.tycoelectronics.net>
Could you try using Oracle's csscan utility and pick a US7ASCII database as the target for the migration? This should identify any rows with data outside the normal US7ASCII character set...
Bill
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Bobak, Mark Sent: Tuesday, November 09, 2010 7:56 AM To: oracle-l_at_freelists.org
Subject: Determining which rows have characters outside of the standard ASCII (0-127)
Hi all,
I'm trying to (efficiently) determine which rows have column values with characters outside of the range of 0-127.
My first attempt was something like this:
select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0x80-0xFF]') > 0;
But, that seems to select every row in the documents table, not just the ones containing characters with values in the range 128-255.
Looking at one of the rows returned from the query above, with dump(doc_authors) confirms that rows being returned don't have characters in the range 128-255.
This is a Unicode database, so, I also tried:
select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;
but, again, this seems to return every row.
So, can someone offer me a clue here?
Honestly, this is the first time I've tried using any of Oracle's REGEXP functions.
I'm sure I'm just doing something stupid, but I don't have a clue what it is, and the examples I've run across in the manuals and on the web, don't have anything similar to what I'm trying to do.
AdvThanksance,
-Mark
--
http://www.freelists.org/webpage/oracle-l
Received on Tue Nov 09 2010 - 07:03:53 CST