Re: Perfomance & large database

From: Steve Long <steven.long_at_erols.com>
Date: Fri, 20 Apr 2001 19:35:30 -0400
Message-ID: <9bqh91$566$1_at_bob.news.rcn.net>


given one has already considered the hardware aspects such as controller to disk ratio, controller cache, disk drive cache, processors, bus speed, transfer rates, etc (or you just have to use the hardware you have), i will address the software issues.

solutions generally come in three types: low tech, high tech, and something in between. low tech is easy to implement, but, perhaps, suboptimal. high tech is optimal but usually complex, non-trivial, and expensive. you can figure what is inbetween. keep in mind that, more often than IT staff wish to admit it, low tech is the best solution.

for the problem you have presented, a fix format, flat file structure with some kind of indexing may be the best approach. given your assertion that inserts, updates, and deletes are never performed, or the file is rebuilt in batch when such operations are required, your only challenge is determining an appropriate storage and retrieval algorithm.

a binary tree structure (b-tree) is O[ log2 (n) ] over the number of records and should be sufficiently fast for your purposes. the key would be the ID of the records. since you say the number of records is "very large" (a VERY relative term), consider the following table which shows for a given number of records, the most number of keys that must be searched to find a given record, if it exists.

# records            # comparisons
    1 million               20
  10 million               24
100 million               27
    1 billion                30

as you can see, the number of comparisons increases only slightly (by 10) while the number of records increases dramtically (factor of 1000).

there are other algorithims which provide more efficient searching but also require proportionately more storage (the old "space vs time" trade-off) and more complex programming. besides, when you can find any one record out of one billion with only 20 comparisons, why do the extra work on more complex algorithms and programs when only a few comparisons are saved? a b-tree algorithm can be found on the net at a number of sites is various languages and most any text book on data structures or algorithms.

if you use fixed length records (each field is of fixed length), retrieval is even faster since the location of any record in a file can be computed directly (again, you will give up some space in exchange for speed). using variable length records saves space, but takes more time to index and locate.

i hope this helps.

<quered_at_esiee.fr> wrote in message news:3AE05841.26FF56E6_at_esiee.fr...
> Hi,
>
> I have to build a 'high' performance database konwing that the requests
> will always be of the same type : find an address given an ID. This is
> the ONLY resquest performed but on a very large database.
> Since I'm working under linux, I've thought of creating my own structure
> to store the data, such as using a full disk and accessing only a part
> of it from an address found in a hash table. Since the disk can be seen
> as a file, it should work well (?).
> The problem is that I fear this system might not be as performant as a
> standard database system (well optimised) . . .
>
> Can someone advice me about which database would be the best to answer
> always the same kind of very simple request ???
> Is the system I've thought of not too optimistic ???
>
> Thanks
>
> David Quere
>
Received on Sat Apr 21 2001 - 01:35:30 CEST

Original text of this message