When and when not to put information in a database.
Date: 27 Feb 2003 12:29:49 -0800
Message-ID: <3db71c8a.0302271229.77b100b2_at_posting.google.com>
I'm working on a financial risk mgmt. system, and there's a piece of the data that, IMO, doesn't really belong in a DBMS. However, I'm having a difficult time convincing our designers of this. (I am, after all, just a programmer--what do I know.)
What I'm looking for is a little disinterested, unbiased advice on whether some intermediate data should be in a database or otherwise. Also, if there is any generally accepted rule of thumb regarding this. (Better yet, a reference to a book or publication.) Let me explain the situation...
The Data
Various historical statistics are calculated for ~200,000 financial instruments. From these statistics, a monte carlo simulation is run, producing 10,000 simulated market moves. Theoretical instrument prices are calculated for each move. (prices are currently single precision floats--4 bytes) So we have ~200K records with 40K bytes of raw data apiece. ~8 gigs of raw data.
Primary Use Case
An application processes hundreds of portfolios, each having positions in various instruments, and calculates some statistical info for each portfolio. Total number of positions processed ~= 13 million. (Note: this is essentially random access of the data.) This must be designed to run within a fairly agressive timeframe.
Proposed Designs (Database)
(1) One school of thought in the department is to normalize
completely. (!) A table that looks like this:
TheoreticalPrice
(
InstrumentID (FKey, integer) ScenarioSequence (FKey, integer) // 1-10000 Price (float)
)
I have a big "I told ya so" ready if and when this is attempted.
(2) Binary or blob-type storage. IMHO, if we _must_ use a database,
this seems the most reasonable for such a data type, but it seems like
forcing a square peg into a round hole. Also, the performance seems
to be nearly unacceptable.
(3) Breaking the 10,000 scenarios into chunks. E.g., 20 or 50 price
columns to a record, with a sequencing field, like above) This seems
like a square peg in the round hole to me. Doesn't it?
Hey, I throw any data into a database that makes sense to me. In fact, I have a personal web site in which a bunch of jpgs stored in the database--so I'm not just anti-database.
However, I think not all data fits well in a database. But my colleagues that are disagreeing haven't dealt with such data before, and so I'm having no luck convincing them.
I can find no good discussions about this on the internet groups or web searches, can't find any books about it, so I've decided to try here.
Can anyone give me some good advice?
Jesse
me _at_ jesseconnell dot com Received on Thu Feb 27 2003 - 21:29:49 CET