Re: Help, my developers are killing me with varchar2(4000)

From: <rgdavi_at_monsanto.com>
Date: Mon, 28 Jul 2008 05:40:35 -0700 (PDT)
Message-ID: <42dd862d-f16c-4925-8c0c-54cb0c035918@m73g2000hsh.googlegroups.com>


From Tom Kyte's blog http://tkyte.blogspot.com/search?q=varchar2%284000%29

See, you just cannot make this stuff up... Real email, received today from another Oracle person - asking me a question:

A developer wants to represent all data types as Varchar2(2000) so that they won't have to change sizes in the future. This includes Numbers and Dates. The DBA wants to know if there are any management, performance, or indexing implications.

Ouch (wonder why they stopped at 2000 with the varchar2?) That hurts doesn't it...

Here was my response:

They are suggesting one of the worst ideas known to human data processing people. I cannot over state how BAD AN IDEA THIS IS ENTIRELY. Funny thing - dates are 7 bytes, you cannot "under" or "over" size them (you never CHANGE their size), timestamps are fixed sizes as well - 7, 11, or 13 bytes depending on fractional seconds and timezone needs.

Funny thing part two: if someone defines a Number(2), we can "alter table t modify X number(3)" to increase the size. But, if someone was to MAX OUT all numbers - we CANNOT shrink them!!!

Funny thing part three: same with varchar2 - we can make them bigger anytime we need - immediately, absolutely. We cannot however fix the oversized varchar2 in the future when the column has data - we can GROW, we cannot SHRINK

Funny thing part four: I spend 20 minutes on this topic - using the right datatype - in all of my seminars, over and over and over again. Here is a cut and paste from my book Effective Oracle by Design.

Use the Correct Datatype

Using the correct datatype seems like common sense, but virtually every system I look at does one of the following:

• Uses a string to store dates or times • Uses a string to store numbers
• Uses VARCHAR2(4000) to store all strings. • Uses CHAR(2000) to store all strings, wasting tons of space and forcing

the use of a lot of trim function calls

• Puts text in a BLOB (raw) type

I have a very simple rule: Put dates in dates, numbers in numbers, and strings in strings. Never use a datatype to store something other than what it was designed for, and use the most specific type possible. Furthermore, only compare dates to dates, strings to strings, and numbers to numbers. When dates and numbers are stored in strings, or stored using inappropriate lengths, your system suffers:

• You lose the edit upon insertion to the database, verifying that your dates are actual dates and numbers are valid numbers. • You lose performance.
• You potentially increase storage needs. • You definitely decrease data integrity.

How many of you know what ORA-01722 or ORA-01858 errors are off the top of your head? I bet many of you do, because they are so prevalent in systems where numbers are stored in strings (ORA-01722: invalid number) and dates in strings (ORA-01858: a non-numeric character was found where a numeric was expected).

How Data Integrity Decreases

Using an incorrect datatype is wrong for many reasons, but the first and foremost is data integrity. Systems that use strings for dates or numbers will have some records with dates that are not valid and numbers that are not numbers. It is just the nature of the game here. If you permit any string in your date field, at some point, you will get dirty data in there.

Without data-integrity rules in place, the integrity of your data is questionable. I’ve needed to write the functions to convert strings to dates but return NULL when the date won’t convert. I’ve also needed to try one of five date formats to see if I can get the date to convert. Can you look at 01/02/03 and tell what date that is? Is that yy/mm/dd, dd/mm/yy, or something else?

How Performance Suffers

Beyond the obvious data-integrity issues associated with incorrect datatypes, there are other subtle issues. To demonstrate, we’ll use an example of a table with two date columns. One will be stored in a string using YYYYMMDD and the other as a DATE type. We will index these values and analyze the tables completely. Received on Mon Jul 28 2008 - 07:40:35 CDT

Original text of this message