Re: Large dataset performance

From: jma <junkmailavoid_at_yahoo.com>
Date: 21 Mar 2007 17:07:45 -0700
Message-ID: <1174522065.068812.265680_at_e1g2000hsg.googlegroups.com>


Hello Paul,

the story is like this: the server is given a set of engineering simulations that need to be performed. The clients (software clients, solvers actually) perform simulations. The latter by the way are legacy software, so there's little if any touching. My idea is that instead of waiting for each solver to spit its output to the disk and collect the output, which has many problems, especially with the control of the files and what (human or sw) users might do with them, provide the clients with a stream to a single repository and write there whatever has to be written. This also saves me from having the server collecting, copying, pasting and bundling files, where even if the files are there, sound and safe, each action is a problem of its own. Now, using a database saves me the trouble to develop everything on my own and gets me into using a dedicated high quality fit-for- purpose application. Further, having all those gigas in one file, the next thing as you might guess is start digging. Digging means creating all kinds of views from the data in order to postprocess as well as bundling parts and results for visualization. Oh, and don't forget, I also need to store geometry models, materials, metadata and all kinds of stuff in the same place, so that the server will be able to use the script and the repository to nicely start the clients.

Hope now its clear :-/

Ï/Ç paul c Ýãñáøå:
> jma wrote:
> >>This is not an SQL problem and there is not enough information in the
> >>question to answer it properly, eg., is there some application
> >>requirement that the 3.4 M rows be written atomically (all or nothing),
> >>are 100 users going to do this 100 times per day each, etc., etc..
> >>There was another comment about fewer commits which would make no sense
> >>if some transaction notion was involved, in fact it would be dangerous.
> >>
> >
> >
> > Hello Paul,
> >
> > the situation is like this: I have to handle the case where a set of
> > remote clients (between 4-16) need to connect to a system and store
> > the result of their analysis. The result is typically a 100-200MB
> > matrix, but can be more. The number of such matrices would be between
> > 100-200. The clients can write it in a local file and I can have a
> > server parsing that file into a database. I think this is not as
> > elegant as providing the clients with direct connection to the
> > database and have them write their data there. So I am trying to
> > figure a way that the clients can (as usual ASAP) store their data.
> > Going through text based queries is a killer. Even setting up sets of
> > SQL commands takes a lot of time. So I am looking for alternatives,
> > such as writing blobs. But with blobs I have to read them back to
> > memory to find what I am looking for and I am loosing the whole
> > functionality of a relational database. So my question is what are the
> > alternatives (if any)?
> >
> > BR
> >
> > jma
> >
> >
>
> I'll take a quick flyer and say that you've given me an opening when you
> worry about losing "functionality of a relational database". Ie., now
> we are getting down to brass tacks, namely the application. What is it
> that you want to do?
>
> (If I were the CEO of a typical public corporation, I might consider
> 200MB a useful result for shareholders because I could be certain that
> most of them couldn't assimilate that much correctly. Most of the time,
> I'd call such volume an intermediate result, except maybe if it was
> really really good analysis and the purpose was to print it in stone for
> posterity, auditors or historians.)
>
> p
Received on Thu Mar 22 2007 - 01:07:45 CET

Original text of this message