Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Usenet -> c.d.o.server -> Re: what characterset to use?

Re: what characterset to use?

From: Ben <>
Date: Tue, 28 Aug 2007 10:54:57 -0700
Message-ID: <>

On Aug 27, 8:18 am, Ben <> wrote:
> On Aug 24, 6:40 am, "Martin T." <> wrote:
> > Ben wrote:
> > > On Aug 23, 2:48 pm, wrote:
> > >> On Thu, 23 Aug 2007 07:02:10 -0700, Ben <> wrote:
> > >>> I'm not saying it is feasible to have a database set to use US7ASCII
> > >>> as its character set. I'm simply saying that in the scenario that
> > >>> Sybrand listed, 1 database and 1 client both being set to us7ascii, I
> > >>> don't see the issue. UNLESS you introduce a client using a different
> > >>> character set.
> > >> Ok, again
> > >> Client set to US7ASCII
> > >> Database set to US7ASCII
> > >> You send an eight bit character.
> > >> Oracle sees 7 bit client character set, 7 bit server character set
> > >> --->
> > >> What will happen if all of a sudden someone decides to export using 7
> > >> bit NLS_LANG and import into 8 bit database.
> > >> Please don't imply I'm making up fairy tales, I'm talking stories for
> > >> grown ups!!!!
> > >> REAL WORLD HORROR STORIES with customers getting GROSS!!!
> > >> And yes: this explanation is on Metalink!!!
> > >> --
> > > I'm not implying anything. I'm trying to understand.
> > > How do you insert an 8 bit character with a 7 bit client into a db
> > > with a 7 bit character set? Wouldn't that be a square peg round hole
> > > kind of thing? You of course wouldn't get the 8 bit character back out
> > > of the 7 bit db.
> > > (...)
> > What you really need to understand is that the character sets (NLS_LANG)
> > you tell Oracle about are ONLY there for conversion.
> > Oracle doesn't give a damn if you have a database which is told it's
> > UTF8 and the values residing in the VARCHAR2 columns are actually
> > WE8ISO8859P2. Oracle doesn't care. Only if you try to retrieve this data
> > with e.g. a JDBC driver, you will get a Java Exception telling you it
> > could not convert the UTF8 data.
> > So the character sets of the clients and the DB are only there to tell
> > oracle if it has to convert or not. And if they are the same, no
> > conversion will take place. (And if the client "lies" you end up with
> > crap in your DB.)
> > br,
> > Martin- Hide quoted text -
> > - Show quoted text -
> Thank you Martin, I wasn't really understanding that Oracle would
> actually allow the insertion of the data. Good explanation. Thank you
> again.- Hide quoted text -
> - Show quoted text -

This is exactly why I have such trouble understanding matters on this. I'm re-reading the Globalization Guide again, it's still confusing. All the documentation on this subject is just too contradictive to really get a good grasp on it unless you've been dealing with it for years. Here is a section taken directly from the Globalization Manual pg 2-12

"If you choose a database character set that is different from the character set on the client operating system, then the Oracle database can convert the operating system character set to the database character set. Character set conversion has the following disadvantages:

  Potential data loss
  Increased overhead

Character set conversions can sometimes cause data loss. For example, if you are
converting from character set A to character set B, then the destination character set B must have the same character set repertoire as A. Any characters that are not available in character set B are converted to a replacement character. The replacement character is often specified as a question mark or as a linguistically related character. For example, (a with an umlaut) may be converted to a. If you have distributed environments, then consider using character sets with similar character repertoires to avoid loss of data.

Character set conversion may require copying strings between buffers several times
before the data reaches the client. The database character set should always be a
superset or equivalent of the native character set of the client's operating system. The character sets used by client applications that access the database usually determine which superset is the best choice.

If all client applications use the same character set, then that character set is usually the best choice for the database character set. When client applications use different character sets, the database character set should be a superset of all the client character sets. This ensures that every character is represented when converting from a client character set to the database character set.

For best performance, choose a character set that avoids character set conversion and uses the most efficient encoding for the languages desired. Single-byte character sets result in better performance than multibyte character sets, and they also are the most efficient in terms of space requirements. However, single-byte character sets limit how many languages you can support."

After reading this it sounds like Oracle is saying that conversion is a bad thing and that the database character set should be set the same as the client. This is the reason why there is so much confusion, at least on my part, on character sets. I know that you can't just take this aside from all the rest of the facts of needing multi byte characters and supporting different client character sets but the wording in the documentation is just bad or at the least confusing. Received on Tue Aug 28 2007 - 12:54:57 CDT

Original text of this message