Home » SQL & PL/SQL » SQL & PL/SQL » Most disk space-efficient number type? (Oracle RAC 11gR1 on Oracle EL5)
Most disk space-efficient number type? [message #313059] |
Thu, 10 April 2008 09:23  |
Mopper
Messages: 11 Registered: April 2008 Location: Belgium
|
Junior Member |
|
|
Hi,
I'm currently estimating the required disk space for a table that will contain the results of a data mining algorithm. This table will contain something along the lines of 500,000,000,000 rows, and will have 3 columns: one with a foreign key to a generated primary key in the source data table, one with a foreign key to a table that contains the models, and one BINARY_DOUBLE column that contains the predicted result.
I need to keep the disk space as limited as possible, but the column that stores the result (BINARY_DOUBLE) can not be touched. So I was wondering what the best format would be for the foreign keys (and the primary keys in tables they refer to). As of now, both are declared as SMALLINT, but I am not sure if this is the most efficient way.
The value for the foreign key to the source data table should never exceed 100,000,000 and the value for the foreign key to the table with the models should never exceed 2'000'000; so any datatype that can represent all these values should do, as long as it's as small as possible on disk.
I would love to hear your suggestions on this matter, as we only recently started working with Oracle, and how things are stored at low level isn't exactly the first thing you learn 
Thanks in advance.
[Edit: typo.]
[Updated on: Thu, 10 April 2008 09:24] Report message to a moderator
|
|
|
Re: Most disk space-efficient number type? [message #313061 is a reply to message #313059] |
Thu, 10 April 2008 09:29   |
ThomasG
Messages: 3212 Registered: April 2005 Location: Heilbronn, Germany
|
Senior Member |
|
|
The data type doesn't seem to matter.
According to the Concepts Manual
Quote: |
Internal Numeric Format
Oracle stores numeric data in variable-length format. Each value is stored in scientific notation, with 1 byte used to store the exponent and up to 20 bytes to store the mantissa. The resulting value is limited to 38 digits of precision. Oracle does not store leading and trailing zeros. For example, the number 412 is stored in a format similar to 4.12 x 102, with 1 byte used to store the exponent(2) and 2 bytes used to store the three significant digits of the mantissa(4,1,2). Negative numbers include the sign in their length.
Taking this into account, the column size in bytes for a particular numeric data value NUMBER(p), where p is the precision of a given value, can be calculated using the following formula:
ROUND((length(p)+s)/2))+1
|
|
|
|
Re: Most disk space-efficient number type? [message #313064 is a reply to message #313061] |
Thu, 10 April 2008 09:36   |
ThomasG
Messages: 3212 Registered: April 2005 Location: Heilbronn, Germany
|
Senior Member |
|
|
Ah, found the additional directly below that :
Quote: |
Floating-Point Numbers
....
BINARY_FLOAT is a 32-bit, single-precision floating-point number datatype. Each BINARY_FLOAT value requires 5 bytes, including a length byte.
BINARY_DOUBLE Datatype
BINARY_DOUBLE is a 64-bit, double-precision floating-point number datatype. Each BINARY_DOUBLE value requires 9 bytes, including a length byte.
|
|
|
|
|
Re: Most disk space-efficient number type? [message #313068 is a reply to message #313059] |
Thu, 10 April 2008 09:45   |
S.Rajaram
Messages: 1027 Registered: October 2006 Location: United Kingdom
|
Senior Member |
|
|
Search on compression in oracle reference manual. It is useful on tables which are only inserted and selected. If you do updates then compress could prove expensive. For more information check the oracle reference manual or asktom.oracle.com.
Hope that helps
Regards
Raj
|
|
|
|
|
Re: Most disk space-efficient number type? [message #313238 is a reply to message #313059] |
Fri, 11 April 2008 03:49   |
Mopper
Messages: 11 Registered: April 2008 Location: Belgium
|
Junior Member |
|
|
I've been playing around with the number formats and compression this morning. This is the table I've been experimenting with:
CREATE TABLE "DATAMINER"."RESULT_TEST"
( "FK_DATA" NUMBER(38,0),
"FK_BETA" NUMBER(38,0),
"PREDICTION" BINARY_DOUBLE
)
I put 581,012 records in it. So, to get the size of the table, I did:
SELECT BYTES FROM USER_SEGMENTS WHERE SEGMENT_NAME ='RESULT_TEST';
BYTES
----------
15728640
That's about 15MB, and comes down to just over 27 byte per record. So, given that a BINARY_DOUBLE always takes 9 byte, the other 2 must take about 9 byte each as well, which I found rather a lot. (FK_DATA is between 2467 and 583'478, FK_BETA is 1 for every record).
I had declared both the FK_DATA as the FK_BETA columns as SMALLINT, but this still gave a scale of 38, which is a lot more than I need. So, I dropped the table and re-created it with both FK_ columns declared as NUMBER(10,0) (which is about the upper limit of what I'll ever need), and inserted all the data again. Now, when I checked the size of the table again, it was exactly the same as before. This could be explained by the fact that Oracle only takes as much space as it needs for a NUMBER, and since none of the values are bigger than NUMBER(10), none of them were affected. But. That would mean that Oracle needs 9 byte to store values between 2467 and 583'478, not to mention 9 byte to store a simple '1' value? I find this really strange.
A more successful story is the compression though. Since this table's data is most likely to get written once, never updated, but read a lot I took compression out for a spin. After a
ALTER TABLE RESULT_TEST MOVE COMPRESS; , (the table's data was ordered by the least discriminating columns first) I got the following size:
SELECT BYTES FROM USER_SEGMENTS WHERE SEGMENT_NAME ='RESULT_TEST';
BYTES
----------
8388608
That's about 8MB, or 14.4 byte/record. That's almost half of what it was! This will save us gigabytes on the final result. But, even though this table is suited very well for compression, it still leads me to believe that Oracle is wasting space when not compressing the table. Anyhow, I'm happy with the result we've got so far, especially with the compression 
Greetings
|
|
|
|
Goto Forum:
Current Time: Fri Feb 14 10:04:55 CST 2025
|