Re: oracle disk space
Date: Fri, 06 Mar 1992 23:06:06 GMT
Message-ID: <1992Mar06.230606.01016390_at_locus.com>
In article <1992Mar4.204557.8668_at_qiclab.scn.rain.com> tcox_at_qiclab.scn.rain.com (Thomas B. Cox) writes:
>In article <19924.972.6724_at_dosgate> "howard kaplan" <howard.kaplan_at_canrem.com> writes:
>>I have heard an unconfirmed *rumour* about disk space usage in
>>future versions of Oracle. According to this rumour,
>>1) Character fields will have all of their trailing blanks stored on
>> the database, requiring much more disk space than at present.
[munch]
>>2) This change is required for conformity to an SQL standard.
>
>The ANSI committee decided that the CHAR datatype should pad. (I
>forget which ANSI standard this was, maybe ANSI SQL-2, maybe SQL89, I
>dunno. If you really care I'll go find out.)
>
>>This is especially confusing, because I thought that SQL only
>>specified logical representations, not physical ones.
>
>Beats me.
Then you should do some research. The ANSI definition of CHAR, which "requires" padding with blanks to the exact length defined for the column has nothing to do with how an actual implemented database engine would or should physically store the data. It is solely concerned with the logical representation of the CHAR field and that representation's impact on other SQL issues.
Example:
A CHAR(8) column with a value of 'SALLY' when concatenated to a CHAR(8) column with a value of 'SMITH' will result in a CHAR(16) column with a value of 'SALLY SMITH ' whereas the same columns if defined as VARCHAR(8) or VARCHAR would result in a VARCHAR column with a value of 'SALLYSMITH'.
This is purely a semantical issue. Whether the CHAR(8) version of 'SALLY' is stored in 8 bytes or 5 bytes or compressed using a Hamming code or whatever is totally irrelevent from the standpoint of the standards definition.
There are other situations where this also becomes important, particularly when comparing character fields. SQL has generally assumed characters fields of any type to be padded with blanks for comparison purposes (although some implementations do NOT do this with varying length fields). Since there are potential semantic conflicts that arise depending on whether VARCHAR fields are assumed to be padded or not and to what length, there are still ongoing debates about how this should be handled.
>>Can anyone clarify the truth behind this rumour?
Truth regarding the semantics. No truth regarding the requirement that actual physical implementations MUST store the data in some specific format. I.e., for compression purposes, a database implementation may choose to chop off all padding on character fields as long as it is restored prior to use in an actual query or when returning the data to a program.
>Two final points.
>
>One, the padding behavior can perhaps be useful if you're looking for
>certain kinds of performance gains, because after the RDBMS finds the
>row it wants, it can skip in a fixed distance (the width of your
>padded CHAR field(s), in fact) to get the column it wants, rather than
>reading in the row looking for the begin-column marker for the desired
>column.
Of course, this is pretty trivial if you use length bytes to store varying length character strings (which is actually essential since using null-delimited strings eliminates the possibility of storing an X'00' inside the string). Typical database engines that support compression on the actual physical DASD uncompress as they read the data anyway.
>Two, I'm giving you this info from memory, from an Alpha-level v7
>class I took some months ago. I don't have ANY reference material
>here, like the v7 DBA Guide. If anybody really cares about the
>details of this, like how exactly the word VARCHAR will be treated in
>v7 (is it really a reserved word? what if i try to define a column as
>having that datatype?), or how one turns on or off the v6
>compatibility mode, then let me know.
VARCHAR is certainly a reserved word in the SQL2 standard. If I was a database vendor, I'd certainly put it on my reserved word list right now if for no other reason than to reduce the amount of code I would break when I am forced to implement the next SQL standard completely.
Jon Rosen Received on Sat Mar 07 1992 - 00:06:06 CET