When and when not to put information in a database.

From: Jesse Connell <me_at_jesseconnell.com>
Date: 27 Feb 2003 12:29:49 -0800
Message-ID: <3db71c8a.0302271229.77b100b2_at_posting.google.com>

I'm working on a financial risk mgmt. system, and there's a piece of the data that, IMO, doesn't really belong in a DBMS. However, I'm having a difficult time convincing our designers of this. (I am, after all, just a programmer--what do I know.)

What I'm looking for is a little disinterested, unbiased advice on whether some intermediate data should be in a database or otherwise. Also, if there is any generally accepted rule of thumb regarding this.  (Better yet, a reference to a book or publication.) Let me explain the situation...

The Data

Various historical statistics are calculated for ~200,000 financial instruments. From these statistics, a monte carlo simulation is run, producing 10,000 simulated market moves. Theoretical instrument prices are calculated for each move. (prices are currently single precision floats--4 bytes) So we have ~200K records with 40K bytes of raw data apiece. ~8 gigs of raw data.

Primary Use Case

An application processes hundreds of portfolios, each having positions in various instruments, and calculates some statistical info for each portfolio. Total number of positions processed ~= 13 million. (Note: this is essentially random access of the data.) This must be designed to run within a fairly agressive timeframe.

Proposed Designs (Database)

(1) One school of thought in the department is to normalize
completely. (!) A table that looks like this:


    InstrumentID        (FKey, integer)
    ScenarioSequence    (FKey, integer)   // 1-10000
    Price               (float)


I have a big "I told ya so" ready if and when this is attempted.

(2) Binary or blob-type storage. IMHO, if we _must_ use a database,
this seems the most reasonable for such a data type, but it seems like forcing a square peg into a round hole. Also, the performance seems to be nearly unacceptable.

(3) Breaking the 10,000 scenarios into chunks. E.g., 20 or 50 price
columns to a record, with a sequencing field, like above) This seems like a square peg in the round hole to me. Doesn't it?

My proposed solution:

I've developed (on my own time) a custom data storage application (COM singleton, hosted in a Windows Service, if you're curious) which serves as a central host for this data. It's simple, it's clean, and it's FAST. It has all the flexibility needed, the code is not complex or large. It has all the locking necessary to make reading/writing/other operations safe. Reading, inserting, modifiying data is simple and flexible. It uses the LAN/domain security policy, so some users/processes can read/write, some can read, some can backup/etc. and some can't touch it. This took me about two weeks.

Finally, here's the catch. This methodology is frowned upon by the powers that be in my company. It's information that's not being housed in a database.

Hey, I throw any data into a database that makes sense to me. In fact, I have a personal web site in which a bunch of jpgs stored in the database--so I'm not just anti-database.

However, I think not all data fits well in a database. But my colleagues that are disagreeing haven't dealt with such data before, and so I'm having no luck convincing them.

I can find no good discussions about this on the internet groups or web searches, can't find any books about it, so I've decided to try here.

Can anyone give me some good advice?

me _at_ jesseconnell dot com Received on Thu Feb 27 2003 - 21:29:49 CET

Original text of this message