Re: Why bother with a database if ...

From: Michael J. Moore <mNiOcShPaAeMl.j.moore_at_wcom.com>
Date: Tue, 19 Sep 2000 16:45:55 GMT
Message-ID: <7LMx5.18$6f.954@pm02news.wcom.com>

Thanks Richard,

If I might be so bold as to impose upon you a second time .......

scenario 1) I have a million session records in a database table and must read each record sequentially. I am using Parallel Server and the machine has, say, 8 CPUs.

scenario 2) I have a million session records in a flat file and I read each one of them sequentially.

Note: the order in which the session records are processed is irrelevant Note: as each session is read, there will be multiple look-ups against other database tables but I feel that this fact is irrelevant as I am only trying to determine if there could possibly be any benefit to storing the "session records" in the database.

Should I expect to see scenario 1. run faster because while CPU 1 is engaged in a read operation, CPU2 can be engaged in a seek operation therefor in scenario 1 the program requesting the data is served the data more quickly,
OR....
would scenario 1 only be single threading the I/O requests and therefor not benefit from the Parallel Server feature?

I will check the news group for your answer. thanks again
Mike

"Richard Senior" <richard_at_r-senior.demon.co.uk> wrote in message news:8q77ch$gtl$1_at_gate.local...
> In article <_7Bx5.219701$i5.3017214_at_news1.frmt1.sfba.home.com>,
> "Michael J. Moore" <hicamel_at_home.com> writes:
>
> > if for example you have a large number of records that you just need to
> > store once and then read them sequentially once. We collect sessions
from a
> > network, aprox a million per day. These session records are the input to
a
> > process which looks up additional information based on the content of
the
> > session record. There does not seem to be any reason to dump these
session
> > records into a database table just so that we can sequentially
processing
> > them through the system. Or is there?
> >
> > The only possible reason I could come up with is maybe, if we had
multiple
> > CPU's and the table was spread over many disks, that a parallel query
might
> > actually be faster than a sequential read of a flat file.
>
> [snip]
>
> The benefits of using multiple CPUs and spreading data across disk
> mechanisms are different. Multiple CPUs will only deliver significant
> performance benefits if there is more than one reasonably CPU-intensive
> process or thread to make use of their cycles. Multiple disk mechanisms
> can allow processes running in parallel (whether time-sliced or on
> multiple CPUs) to make use of CPU cycles where other processes are waiting
> for the disk mechanism.
>
> Reasonably active multi-user DBMSs obviously benefit from both but would
> your program?
>
> Spreading data across disks is not in itself a good reason for choosing a
> database; you can spread data using an appropriate level of RAID or at the
> application level by dividing the sequential file into multiple files and
> spreading these across disks. You would need a multi-threaded application
> so that some threads work while other threads are waiting. The feasibility
> of this depends on whether you can access records out of order.
>
> You say you look up additional information based on the session record? If
> this 'additional information' is held in a database, that would be a
> better reason for using a database, which is likely to be more efficient
> at joining to that data than a simple program?
>
> --
> Regards,
>
> Richard
Received on Tue Sep 19 2000 - 11:45:55 CDT