Re: Help, my developers are killing me with varchar2(4000)

From: Mark D Powell <Mark.Powell_at_eds.com>
Date: Mon, 28 Jul 2008 09:56:41 -0700 (PDT)
Message-ID: <4ee2210b-8dd3-4604-b04d-5cefeecef203@w7g2000hsa.googlegroups.com>


On Jul 28, 8:40 am, rgd..._at_monsanto.com wrote:
> From Tom Kyte's blog  http://tkyte.blogspot.com/search?q=varchar2%284000%29
>
> See, you just cannot make this stuff up...
> Real email, received today from another Oracle person - asking me a
> question:
>
> A developer wants to represent all data types as Varchar2(2000) so
> that they won't have to change sizes in the future. This includes
> Numbers and Dates. The DBA wants to know if there are any management,
> performance, or indexing implications.
>
> Ouch (wonder why they stopped at 2000 with the varchar2?) That hurts
> doesn't it...
>
> Here was my response:
>
> They are suggesting one of the worst ideas known to human data
> processing people.  I cannot over state how BAD AN IDEA THIS IS
> ENTIRELY.
>
> Funny thing - dates are 7 bytes, you cannot "under" or "over" size
> them (you never CHANGE their size), timestamps are fixed sizes as well
> - 7, 11, or 13 bytes depending on fractional seconds and timezone
> needs.
>
> Funny thing part two: if someone defines a Number(2), we can "alter
> table t modify X number(3)" to increase the size.  But, if someone was
> to MAX OUT all numbers - we CANNOT shrink them!!!
>
> Funny thing part three: same with varchar2 - we can make them bigger
> anytime we need - immediately, absolutely.  We cannot however fix the
> oversized varchar2 in the future when the column has data - we can
> GROW, we cannot SHRINK
>
> Funny thing part four: I spend 20 minutes on this topic - using the
> right datatype - in all of my seminars, over and over and over again.
> Here is a cut and paste from my book Effective Oracle by Design.
>
> Use the Correct Datatype
>
> Using the correct datatype seems like common sense, but virtually
> every system I look at does one of the following:
>
> •    Uses a string to store dates or times
> •    Uses a string to store numbers
> •    Uses VARCHAR2(4000) to store all strings.
> •    Uses CHAR(2000) to store all strings, wasting tons of space and
> forcing
>
> the use of a lot of trim function calls
>
> •    Puts text in a BLOB (raw) type
>
> I have a very simple rule: Put dates in dates, numbers in numbers, and
> strings in strings. Never use a datatype to store something other than
> what it was designed for, and use the most specific type possible.
> Furthermore, only compare dates to dates, strings to strings, and
> numbers to numbers. When dates and numbers are stored in strings, or
> stored using inappropriate lengths, your system suffers:
>
> •    You lose the edit upon insertion to the database, verifying that
> your dates are actual dates and numbers are valid numbers.
> •    You lose performance.
> •    You potentially increase storage needs.
> •    You definitely decrease data integrity.
>
> How many of you know what ORA-01722 or ORA-01858 errors are off the
> top of your head? I bet many of you do, because they are so prevalent
> in systems where numbers are stored in strings (ORA-01722: invalid
> number) and dates in strings (ORA-01858: a non-numeric character was
> found where a numeric was expected).
>
> How Data Integrity Decreases
>
> Using an incorrect datatype is wrong for many reasons, but the first
> and foremost is data integrity. Systems that use strings for dates or
> numbers will have some records with dates that are not valid and
> numbers that are not numbers. It is just the nature of the game here.
> If you permit any string in your date field, at some point, you will
> get dirty data in there.
>
> Without data-integrity rules in place, the integrity of your data is
> questionable. I’ve needed to write the functions to convert strings to
> dates but return NULL when the date won’t convert. I’ve also needed to
> try one of five date formats to see if I can get the date to convert.
> Can you look at 01/02/03 and tell what date that is? Is that yy/mm/dd,
> dd/mm/yy, or something else?
>
> How Performance Suffers
>
> Beyond the obvious data-integrity issues associated with incorrect
> datatypes, there are other subtle issues. To demonstrate, we’ll use an
> example of a table with two date columns. One will be stored in a
> string using YYYYMMDD and the other as a DATE type. We will index
> these values and analyze the tables completely.

The arguments you have detailed are pretty good. Namely if a data value should have only 5 characters then the database should only accept 5 characters because any value longer than 5 characters is obviously an error and bad data should not be accepted.

Altering varchar2 columns to be longer and number columns to allow larger values is pretty much a painless rdbms base table change (as you noted).

HTH -- Mark D Powell -- Received on Mon Jul 28 2008 - 11:56:41 CDT

Original text of this message