Re: oracle disk space

From: Jon Rosen <jfr_at_locus.com>
Date: Fri, 06 Mar 1992 23:06:06 GMT
Message-ID: <1992Mar06.230606.01016390_at_locus.com>


In article <1992Mar4.204557.8668_at_qiclab.scn.rain.com> tcox_at_qiclab.scn.rain.com (Thomas B. Cox) writes:
>In article <19924.972.6724_at_dosgate> "howard kaplan" <howard.kaplan_at_canrem.com> writes:
>>I have heard an unconfirmed *rumour* about disk space usage in
>>future versions of Oracle. According to this rumour,
 

>>1) Character fields will have all of their trailing blanks stored on
>> the database, requiring much more disk space than at present.
 [munch]
>>2) This change is required for conformity to an SQL standard.
>
>The ANSI committee decided that the CHAR datatype should pad. (I
>forget which ANSI standard this was, maybe ANSI SQL-2, maybe SQL89, I
>dunno. If you really care I'll go find out.)
>
>>This is especially confusing, because I thought that SQL only
>>specified logical representations, not physical ones.
>
>Beats me.
 

  Then you should do some research. The ANSI definition of CHAR, which   "requires" padding with blanks to the exact length defined for the   column has nothing to do with how an actual implemented database   engine would or should physically store the data. It is solely   concerned with the logical representation of the CHAR field and   that representation's impact on other SQL issues.

  Example:

       A CHAR(8) column with a value of 'SALLY' when concatenated
       to a CHAR(8) column with a value of 'SMITH' will result in
       a CHAR(16) column with a value of 'SALLY   SMITH   ' whereas
       the same columns if defined as VARCHAR(8) or VARCHAR would
       result in a VARCHAR column with a value of 'SALLYSMITH'.

  This is purely a semantical issue. Whether the CHAR(8)   version of 'SALLY' is stored in 8 bytes or 5 bytes or   compressed using a Hamming code or whatever is totally   irrelevent from the standpoint of the standards definition.

  There are other situations where this also becomes important,   particularly when comparing character fields. SQL has generally   assumed characters fields of any type to be padded with blanks   for comparison purposes (although some implementations do NOT   do this with varying length fields). Since there are potential   semantic conflicts that arise depending on whether VARCHAR fields   are assumed to be padded or not and to what length, there are   still ongoing debates about how this should be handled.

>>Can anyone clarify the truth behind this rumour?
 

  Truth regarding the semantics. No truth regarding the requirement   that actual physical implementations MUST store the data in some   specific format. I.e., for compression purposes, a database   implementation may choose to chop off all padding on character   fields as long as it is restored prior to use in an actual query   or when returning the data to a program.

>Two final points.
>
>One, the padding behavior can perhaps be useful if you're looking for
>certain kinds of performance gains, because after the RDBMS finds the
>row it wants, it can skip in a fixed distance (the width of your
>padded CHAR field(s), in fact) to get the column it wants, rather than
>reading in the row looking for the begin-column marker for the desired
>column.
 

  Of course, this is pretty trivial if you use length bytes to store   varying length character strings (which is actually essential since   using null-delimited strings eliminates the possibility of storing   an X'00' inside the string). Typical database engines that support   compression on the actual physical DASD uncompress as they read   the data anyway.

>Two, I'm giving you this info from memory, from an Alpha-level v7
>class I took some months ago. I don't have ANY reference material
>here, like the v7 DBA Guide. If anybody really cares about the
>details of this, like how exactly the word VARCHAR will be treated in
>v7 (is it really a reserved word? what if i try to define a column as
>having that datatype?), or how one turns on or off the v6
>compatibility mode, then let me know.
 

  VARCHAR is certainly a reserved word in the SQL2 standard. If I was   a database vendor, I'd certainly put it on my reserved word list   right now if for no other reason than to reduce the amount of code   I would break when I am forced to implement the next SQL standard   completely.

  Jon Rosen Received on Sat Mar 07 1992 - 00:06:06 CET

Original text of this message