Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: oracle internal storage of numbers

Re: oracle internal storage of numbers

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Fri, 30 Oct 1998 12:38:00 +0100
Message-ID: <71c8fr$23i$1@minus.oleane.net>


Oracle uses Packed Decimal encoding for numbers (like COBOL). This = ensures an exact precision without roundings when using decimal numbers, = even if it's not the most compact encoding. It stores one nibble (4 bits) per digit plus one nibble for the sign. It = uses binary encoding for decimal point position. This means that a NUMBER(38) uses 38 nibbles for digits, plus one nibble = for the sign. This gives 39 nibbles plus one nibble unused filled by 0. = Oracle knows the number precision used by the datatype definition but = still uses a base-10 floating point notation using a base-10 integer = exponent for the decimal point position.

However, nibbles are not very fast for computings and not compact = enough. So Decimal Packing can occur on Byte boundaries, encoding 2 = digits per byte using their binary equivalent. This does not add any =

compaction on data but this preserves the exact precision requirements =

(while still allowing one free bit for each byte only used for now for =
values in range 0..99). Note that Byte-Level Decimal Packing implies a = base-100 exponent.

To see the Decimal Packing in action, do not use "od -c" or "od -o", but = use hex dump which gets things clearer: # od -x -j 077000 test2.dbf
# result rearranged, see below

<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00> <................>
*
<00 00 00 00 00 00 00 00 00 00 00>2c 01 02 04 c3  <...........>,....
 07 3d 1c<04 74 69 65 6c>2c 00 02 04 c3 11 08 41  .=.<.tiel>,......A
<03 6a 6a 6a>2c 00 02 02 c1 1f<03 69 69 69>2c 00  <.jjj>,.....<.iii>,.
 02 02 c1 15<03 68 68 68>2c 00 02 02 c1 05<03 67> ....<.hhh>,.....<.g>
<67 67 2c 00 02 02 c1 04<03 66 66 66>2c 00 02 02  <gg>,.....<.fff>,...
 c1 03<04 65 65 65 65>2c 00 02 02 c1 02<04 64 64> ..<.eeee>,.....<.dd>
<64 64>2c 00 02 02 c2 16<05 71 72 73 74 75>2c 00  <dd>,.....<.qrstu>,.
 02 03 c2 02 33<03 67 68 69>2c 00 02 02 c1 0b<03> ....3<.ghi>,.....<.>
<64 65 66>2c 00 02 01 80<03 61 62 63>00 01 00 0f <def>,....<.abc>....
<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00> <................>
*
You can immediately see that each string is preceded by a length byte, = and that records are packed with their actual data length, not with = their declared maximum lengths. This reduces the count of unknown bytes, = as you can see in the above dump where decoded areas are shown within = <angle brackets>. It also shows that records are dumped in the reverse = order in which they were stored, starting at the end of a 2KB page. It also shows that each record starts with a hex 0x2c byte, and that the = number of bytes to encode the numbers are varying, the shortest sequence = being used to encode the integer 0 (in record 'abc'). So this suggests the following encodings for numbers (using format: = "(string,integer) -> hex" dump below), and other field delimiters =
(without completely reordering):

('tiel', 66027) -> 2c? 01? 02?, 04:( c3 07 3d 1c ), 04:'tiel'

('jjj', 160764) -> 2c? 00? 02?, 04:( c3 11 08 41 ), 03:'jjj'
('iii', 30) -> 2c? 00? 02?, 02:( c1 1f ), 03:'iii'
('hhh', 20) -> 2c? 00? 02?, 02:( c1 15 ), 03:'hhh'
('gggg', 4) -> 2c? 00? 02?, 02:( c1 05 ), 04:'gggg'
('ffff', 3) -> 2c? 00? 02?, 02:( c1 04 ), 04:'ffff'
('eeee', 2) -> 2c? 00? 02?, 02:( c1 03 ), 04:'eeee'
('dddd', 1) -> 2c? 00? 02?, 02:( c1 02 ), 04:'dddd'
('qrstu', 2100) -> 2c? 00? 02?, 02:( c2 16 ), 04:'qrstu'
('ghi', 150) -> 2c? 00? 02?, c2:( 02 33 ), 03:'ghi'
('def', 10) -> 2c? 00? 02?, 02:( c1 0b ), 03:'def'
('abc', 0) -> 2c? 00? 02?, 01:( 80 ), 03:'abc'

                -> 00? 01? 00? 0f?

Now this clearly indicates that bytes 0x2c and 0x0f are used as record = terminators so we will ignore them to decode numbers. And 0x00 bytes = are used as a prefix, and is probably reserved as packed null fields = indicators.

What does it show also ? That Byte-Level Packing effectively occurs on a = base-100 level. And that all couples decimal digits are grouped in the = same byte using their binary representation but with an offset of 1. =
(digits "01" become a byte with value 2, digits "30" become a byte with =
value 31 (0x1f in hex). An no null bytes are used to represent number = mantissa. There's an exception with number 0 which cannot not comply = with exponent/mantissa normalization, so it uses a special byte value = 128 (0x80 in hex) which is not a valid representation for any other = 2-digits. It also show that there's a byte prefix to store the base-100 = exponent, plus a leading byte to store the varying length in bytes of = the whole number that follows (the exponent byte and the mantissa = bytes).

I also show here a special meaning for the varying length indicator of = numbers (look at representation of integer 150 in record 'ghi', where = this byte is 0xc2 instead of 0x02). I think that because numbers have a = limited precition of 38 digits, i.e. 1 byte for the base-100 exponent = plus 19 bytes of mantissa, this only covers a total length in range 1 to = 20. So the two highmost significant bits can be used for another use and = must be ignored in the representation. In fact this can be used to = represent a sign bit and an exponent sign in the case of fractional = numbers. So I think that the number was not 150 but -150 ... May be you = can verify it yourself and decode all your numbers.

Finally the base-100 exponent has an offset of 0xC1, where the mantissa = is normalized with the smallest integer in base 100 (e.g. both numbers = 12 and 1200 will have the same mantissa bytes 0x0d, but will have = different base-100 exponents: for 12 the exponent is 0 so the exponent = byte will be 0xC1, and for 1200 the exponent byte will be 1 so the = exponent byte will be 0xC2).

But if we look at the representation of 150 in the 'ghi' record, the = exponent byte is 0x02 instead of 0xC2. This means that the two highmost = bits of the exponent are not significant here. Then because number 0 = uses the 0x80 exponent byte, this drives us with a null exponent value = which is reserved for numbers without mantissa. So base-100 exponents = will have a value in range 1..31, this is enough to represent numbers of = maximum magnitude 100^31, plus 38 digits of mantissa, i.e. integer = numbers of 62+38 = 100 digits with a precision of 38 digits.

For fractional numbers, negative values of the exponent may occur in = range -32 to 0 with a minimum mantissa of 1, so the minimum not null = number will be 100^-32 i.e. 10^-64. It's up to you to verify that the = length indicator does not combine with the exponent byte to give smaller = numbers with greater precision.

I let you discover the role of other bytes (marked above with a '?' = sign) within data pages.

Sandy Jeffers wrote:
My company had a crash and we very much need to recover some data. To do = this, I need to know how to decode number formats? How I tried to decode 'em



First thing that I did was to create a simple table with two fields. = Field 1 was a number and Field 2 was a string. I then preceded to stuff = various number,string combos in the table. I'd then od the file = underlying the tablespace underlying the table in question.

create table delme2(foo1 number(38), foo2 varchar2(256)) tablespace test_crash;

insert into delme2 values (0, 'abc');
insert into delme2 values (10, 'def');
insert into delme2 values (150, 'ghi');
insert into delme2 values (2100, 'qrstu');
insert into delme2 values (1, 'dddd');
insert into delme2 values (2, 'eeee');
insert into delme2 values (3, 'fff');
insert into delme2 values (4, 'ggg');
insert into delme2 values (20, 'hhh');
insert into delme2 values (30, 'iii');
insert into delme2 values (160764, 'jjj');
insert into delme2 values (66027, 'tiel');

The goal was to insert numbers in such a way that a pattern could be = discerned. I couldn't. But any help would be appreciated. # od -c -j 077000 test2.dbf
0000000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 *
0000560 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 , 001 002 004 303 0000600 007 = 034 004 t i e l , \0 002 004 303 021 \b = A
0000620 003 j j j , \0 002 002 301 037 003 i i i , \0 0000640 002 002 301 025 003 h h h , \0 002 002 301 005 003 g 0000660 g g , \0 002 002 301 004 003 f f f , \0 002 002 0000700 301 003 004 e e e e , \0 002 002 301 002 004 d d 0000720 d d , \0 002 002 302 026 005 q r s t u , \0 0000740 002 003 302 002 3 003 g h i , \0 002 002 301 013 003 0000760 d e f , \0 002 001 200 003 a b c \0 001 \0 017 0001000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 *
# od -b -j 077000 test2.dbf
0000000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 *
0000560 000 000 000 000 000 000 000 000 000 000 000 054 001 002 004 303 0000600 007 075 034 004 164 151 145 154 054 000 002 004 303 021 010 101 0000620 003 152 152 152 054 000 002 002 301 037 003 151 151 151 054 000 0000640 002 002 301 025 003 150 150 150 054 000 002 002 301 005 003 147 0000660 147 147 054 000 002 002 301 004 003 146 146 146 054 000 002 002 0000700 301 003 004 145 145 145 145 054 000 002 002 301 002 004 144 144 0000720 144 144 054 000 002 002 302 026 005 161 162 163 164 165 054 000 0000740 002 003 302 002 063 003 147 150 151 054 000 002 002 301 013 003 0000760 144 145 146 054 000 002 001 200 003 141 142 143 000 001 000 017 0001000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 * Received on Fri Oct 30 1998 - 05:38:00 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US