Re: When and when not to put information in a database.
Date: Sun, 2 Mar 2003 17:34:48 -0600
Message-ID: <b3u4a8$mr8$1_at_bob.news.rcn.net>
Steve,
I'll try to explain the data and it's use-cases a little better.
200K instruments, 10,000 simulated prices each, but let me do this with 5
instruments and 10 simulated prices:
Instrument: price1, price2, ..., price10
Now we have some portfolios, with positions:
Portfolio #
Portfolio #1:
For each portfolio, we grab each instrument with all 10 simulated prices,
and then do some statistical stuff. (This involves all 10 prices for each
instrument included)
But in my case, I've got ~200,000 instruments, not 5, and 10,000 simulated
prices, not 10. Several hundred portfolios with anywhere from 1 or 2, up to
a over a hundred thousand positions each. (Some of these portfolios are
large firms like Merryl Lynch, Goldman Sachs, etc.) A total of around
13,000,000 positions, when summed over all portfolios.
In a given day, about 150K of the 200K instruments will end up being
accessed, and all 10,000 prices are needed for each position. (Which
instruments will be needed won't be known in advance, so we calculate for
all instruments)
The data storage should be able to serve ad-hoc queries, as well as daily
batch jobs. But when you access an instrument, the 4249th simulated price
has no useful meaning out of the context of the other 9999 prices. SQL on a
completely normalized database: "SELECT Price FROM TheoreticalPrices WHERE
InstrumentID = ? and Sequence = 4249" So any ad-hoc queries will be
accessing an instrument with ALL prices, not just one or few. (I could
explain the why of this, but that would take a while--our quants explained
it to me and it makes perfect sense.)
Does this make it a little more clear?
"Steve Kass" <skass_at_drew.edu> wrote in message
news:b3n5ur$969$1_at_slb3.atl.mindspring.net...
ABC: 50.234, 54.223, ..., 49.841
DEF: 23.401, 22.198, ..., 28.598
GHI: 34.422, 34.199, ..., 34.841
JKL: 15.208, 13.781, ..., 14.234
MNO: 86.942, 81.295, ..., 78.384
(Instrument Quantity)
(Instrument Quantity)
. . .
ABC 150
JKL 200
#2:
GHI 150
JKL 1000
MNO 5000
#3:
ABC 8000
GHI 1000
JKL 4000
Etc.
> Jesse,
>
> If I understand this right, while you have a large volume of data,
> you need relatively little I/O and arithmetic. You don't say whether
> you check every simulated move for each portfolio, but even if you
> don't, 13M positions processed, even assuming no overlap, means you
> access less than 1% of the data.
>
> What am I missing or what haven't you mentioned? - this doesn't
> sound hard for a normalized table that's keyed on InstrumentID
> and ScenarioSequence. It sounds like there's a good chance it
> will parallelize well, too, since the simulated moves sound independent
> of one another.
>
> It sounds like you are basically doing some (large) matrix
> multiplication, and it might be faster in a traditional array-type
> data structure, but I'm guessing there are other issues - updates,
> the frequency with which the scenarios are re-enacted, etc.
>
> SK
>
>
Received on Mon Mar 03 2003 - 00:34:48 CET