Re: Representing data on disk in an MV database - was Re: foundations of relational theory? - some references for the truly starving

From: Mike Preece <michael_at_preece.net>
Date: 23 Oct 2003 15:43:22 -0700
Message-ID: <1b0b566c.0310231443.4ce7517f_at_posting.google.com>


Jonathan Leffler <jleffler_at_earthlink.net> wrote in message news:<bUTlb.2937$wc3.186_at_newsread3.news.pas.earthlink.net>...
> Mike Preece wrote:
> > [...]
> > It might be worth mentioning at this stage that the amount of data you
> > can store in a Pick item will require far less physical disk space
> > than on any other (uncompressed) database. There are two reasons for
> > this:
> > 1) because every field is of variable length - every datum is just as
> > long as it needs to be and no more and empty fields occupy no space at
> > all (other than single character system delimiters) - and
> > 2) [...]
>
> Does that mean that all field values are stored as printable character
> strings?

Yes they are.

> I've seen similar comments before, and always been puzzled
> by them. It seems to be the only way to understand the use of a fixed
> byte code as the value separator (sorry - I've forgotten the correct
> MV terminology).

Apology not necessary. Thanks anyway - it's like a breath of fresh air.

Char(255) = End of item mark
Char(254) = Attribute mark
Char(253) = Value mark
Char(252) = SubValue mark

>
> If I'm correct, and if an MV database is storing integer values in the
> range 0..999 (or maybe 0..9999),

There used to be an integer limit. I'm honestly not sure if there is anymore. I'll look it up if you'd like.

> then the space used is no worse than
> in a system which uses a binary data representation, but as soon as
> you're dealing with numbers in the millions, instead of using a fixed
> length (4 bytes for a 32-bit integer), this system seems to be using a
> lot more space. (Yes, I quietly ignored negative numbers - I'm
> curious about how they're represented on disk too.)
>

OK. I'm prepared to concede this point - to a degree. Packed decimal isn't allowed in Pick - out of the box, you'd have to 'roll-your-own' functions to allow it. This would potentially mess with system delimiters (x'7F' would be a problem for example). The number 12345 takes 5 bytes, as does 123.45 (numbers are generally stored in internal format with input and output conversions applied). Minus 123.45 takes 6 bytes. Obviously packed decimals take up less space. I suppose then, if your data comprises mostly packed decimals in a database that caters for variable length data (I wonder how fields would be delimited though), you would be able to get more data into less physical disk space. As a matter of academic interest I'd like to know which database can do this and how it does it.  

> How significant that is depends on many factors. Informix databases,
> for example, store a TIMESTAMP(5) - actually, DATETIME YEAR TO
> FRACTION(5) in Informix's rather weird notation, but equivalent to the
> SQL standard TIMESTAMP type - in just 11 bytes of data, but the text
> version requires 25 bytes. Further, that's a fixed 25 bytes (or, at
> best, the variable part is just 0-5 digits after the decimal point
> separating seconds from fractional seconds, and possibly the decimal
> point itself - 19-25 bytes).
>

I'm sorry. I've read that a couple of times. Are you interested in the number of bytes required to store the current date & time? And you also want to store fractions of seconds? Milliseconds?  

> What about other complex data types? Indeed, does an MV database
> support typed data?
>

Everything's a string. You can, and most (if not all) Pick applications out there do, require that certain fields conform to various criteria (type, pattern, whatever) on input. This is a major difference between Pick and SQL relational. We write applications. It's the applications that make the rules. The database is a powerful and *flexible* tool - it doesn't make the rules in the same way as SQL relational does. Bad news for DBAs if your company switched to Pick - they'll have to retrain. There simply isn't any need for them.

Cheers
Mike.

PS. Nice post - shot me down and did it without a detectable trace of arrogance. Received on Fri Oct 24 2003 - 00:43:22 CEST

Original text of this message