Home » RDBMS Server » Server Administration » Restricting characters in a UTF8 database (Oracle
Restricting characters in a UTF8 database [message #603981] Sat, 21 December 2013 20:42 Go to next message
Messages: 13
Registered: April 2013
Location: us
Junior Member
Version Information
Oracle Database 11g Enterprise Edition Release - 64bit Production
PL/SQL Release - Production
CORE Production
TNS for 64-bit Windows: Version - Production
NLSRTL Version - Production

We are building an application that will store data about companies. It will get feeds from 100+ sources. The database character set is AL32UTF8

The user requirement is that the database should allow storing any 'Latin' and 'Arabic' characters. Looking at Unicode specification (http://www.unicode.org/charts/) the Latin and Arabic characters in Unicode are in the following ranges:

- Basic Latin (ASCII) [0000-007F]
- Latin-1 Supplement [0080-00FF]
- Latin Extended-A [0100-017F]
- Latin Extended-B [0180-024F]
- Latin Extended-C [2C60-2C7F]
- Latin Extended-D [A720-A7FF]

- Arabic [0600-06FF]
- Arabic Supplement [0750-077F]
- Arabic Extended [08A0-08FF]

Questions I have

1. Once we get data from a source its first loaded in a temporary staging table. Is there an easy way to query the staging table to find out if specific
column (e.g. Company Name) have any data that is not covered in the above acceptable Character ranges (so that it can be rejected and not be loaded in the
master tables).

2.Since we would be getting large volumes of such data, the check should ideally work in reasonable amount of time.

3.We need to create a specification document for our data providers. I am wondering what we need to specify in that document - will it suffice to say
that the files should be encoded in UTF8 and the characters should be in the code ranges that our application accepts (specified above)?

Thanks in advance for your help.

Re: Restricting characters in a UTF8 database [message #603986 is a reply to message #603981] Sun, 22 December 2013 01:11 Go to previous messageGo to next message
John Watson
Messages: 6178
Registered: January 2010
Location: Global Village
Senior Member
The character set scanner?
Re: Restricting characters in a UTF8 database [message #603987 is a reply to message #603981] Sun, 22 December 2013 01:12 Go to previous message
Michel Cadot
Messages: 63370
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Have a look at REGEXP_LIKE

Previous Topic: oracle upgrade from 9i to 11g fails with SP2-0714: invalid combination of STARTUP options
Next Topic: Oracle 11g Installation (merged)
Goto Forum:

Current Time: Wed Jun 29 07:42:23 CDT 2016

Total time taken to generate the page: 0.14435 seconds