Re: When and when not to put information in a database.

From: Steve Kass <skass_at_drew.edu>
Date: Fri, 28 Feb 2003 03:19:06 -0500
Message-ID: <b3n5ur$969$1_at_slb3.atl.mindspring.net>


Jesse,

  If I understand this right, while you have a large volume of data, you need relatively little I/O and arithmetic. You don't say whether you check every simulated move for each portfolio, but even if you don't, 13M positions processed, even assuming no overlap, means you access less than 1% of the data.

  What am I missing or what haven't you mentioned? - this doesn't sound hard for a normalized table that's keyed on InstrumentID and ScenarioSequence. It sounds like there's a good chance it will parallelize well, too, since the simulated moves sound independent of one another.

  It sounds like you are basically doing some (large) matrix multiplication, and it might be faster in a traditional array-type data structure, but I'm guessing there are other issues - updates, the frequency with which the scenarios are re-enacted, etc.

SK

Jesse Connell wrote:

>I'm working on a financial risk mgmt. system, and there's a piece of
>the data that, IMO, doesn't really belong in a DBMS. However, I'm
>having a difficult time convincing our designers of this. (I am,
>after all, just a programmer--what do I know.)
>
>What I'm looking for is a little disinterested, unbiased advice on
>whether some intermediate data should be in a database or otherwise.
>Also, if there is any generally accepted rule of thumb regarding this.
> (Better yet, a reference to a book or publication.) Let me explain
>the situation...
>
>The Data
>
>Various historical statistics are calculated for ~200,000 financial
>instruments. From these statistics, a monte carlo simulation is run,
>producing 10,000 simulated market moves. Theoretical instrument
>prices are calculated for each move. (prices are currently single
>precision floats--4 bytes) So we have ~200K records with 40K bytes of
>raw data apiece. ~8 gigs of raw data.
>
>Primary Use Case
>
>An application processes hundreds of portfolios, each having positions
>in various instruments, and calculates some statistical info for each
>portfolio. Total number of positions processed ~= 13 million. (Note:
>this is essentially random access of the data.) This must be designed
>to run within a fairly agressive timeframe.
>
>
>Proposed Designs (Database)
>
>(1) One school of thought in the department is to normalize
>completely. (!) A table that looks like this:
>
>TheoreticalPrice
>(
> InstrumentID (FKey, integer)
> ScenarioSequence (FKey, integer) // 1-10000
> Price (float)
>)
>
>I have a big "I told ya so" ready if and when this is attempted.
>
>
>(2) Binary or blob-type storage. IMHO, if we _must_ use a database,
>this seems the most reasonable for such a data type, but it seems like
>forcing a square peg into a round hole. Also, the performance seems
>to be nearly unacceptable.
>
>(3) Breaking the 10,000 scenarios into chunks. E.g., 20 or 50 price
>columns to a record, with a sequencing field, like above) This seems
>like a square peg in the round hole to me. Doesn't it?
>
>My proposed solution:
>
>I've developed (on my own time) a custom data storage application (COM
>singleton, hosted in a Windows Service, if you're curious) which
>serves as a central host for this data. It's simple, it's clean, and
>it's FAST. It has all the flexibility needed, the code is not complex
>or large. It has all the locking necessary to make
>reading/writing/other operations safe. Reading, inserting, modifiying
>data is simple and flexible. It uses the LAN/domain security policy,
>so some users/processes can read/write, some can read, some can
>backup/etc. and some can't touch it. This took me about two weeks.
>
>Finally, here's the catch. This methodology is frowned upon by the
>powers that be in my company. It's information that's not being
>housed in a database.
>
>Hey, I throw any data into a database that makes sense to me. In
>fact, I have a personal web site in which a bunch of jpgs stored in
>the database--so I'm not just anti-database.
>
>However, I think not all data fits well in a database. But my
>colleagues that are disagreeing haven't dealt with such data before,
>and so I'm having no luck convincing them.
>
>I can find no good discussions about this on the internet groups or
>web searches, can't find any books about it, so I've decided to try
>here.
>
>Can anyone give me some good advice?
>Thanks,
>Jesse
>
>me _at_ jesseconnell dot com
>
>
Received on Fri Feb 28 2003 - 09:19:06 CET

Original text of this message