Re: When and when not to put information in a database.
Date: Mon, 3 Mar 2003 03:15:27 -0600
"Steve Kass" <skass_at_drew.edu> wrote in message
> I don't quite see how you figure 10,000 times as many reads with
> normalization. The same number of prices need to be read in either
> case. The amount of data stored does go up (but the non-normalized
> table exceeds SQL Server requirements on the length of a row).
> Not normalized: 10,000 floats + 3-character tag per instrument
> Normalized: 10,000*(1 float + 1 simulation_id (2 bytes?) + one 3-byte tag)
> So each instrument goes from 80,003 bytes to 130,000 bytes of raw data.
> Yes, many more rows need to be accessed, but I don't think accessing all
> 10,000 prices from one row (even if rows were allowed to be so long)
> would be as fast as accessing the information from one very short row of
> the normalized table.
I haven't mentioned this, but a single-precision float is 4 bytes, and this type isn't supported (on SQL Server, AFAIK) The 3 byte tag you mention looks right from my sample data, but really, we we can't identify each instrument with a 3-byte string like "ibm" or "aol". If you're saying 3 bytes meaning 200K instruments fits with 3-byte identifiers, then yes. binary(3) would do. 2 bytes per indexer works as well. So we've got 130,000 versus 40,000 bytes. Additionally, there is likely some unneeded overhead in any pre-fab client objects, which frightens me.
Another issue I have is that there's less control over what data is in memory and what's on disk. When you throw disk accessing into the mix, with the fact that this data will be hit purely at random, it nearly kills the database possibility.
Regardless, I'll do a small bit of testing on this and post my results when I get a chance.
> Indexes can change this analysis. But there may actually be a benefit
> here. The
> short rows may provide more opportunities for caching or parallelism,
> and more
> possible locking strategies. There may also be some creative approaches
> to query writing, that allow
> It may turn out not to work at all, but before you decide not to
> normalize, you may want to do either a more thorough analysis
> of required i/o as well as some testing. I'm not sure a normalized
> table is better, but the query optimizer and the reliability of the
> theory constantly surprise me.
I agree in theory and I like it MUCH more on paper. (The SQL Server query optimizer amazes me frequently as well) I'm just not sold to the point that all data belongs in a general purpose DBMS. Received on Mon Mar 03 2003 - 10:15:27 CET