Re: Perfomance & large database

From: Jonathan Leffler <jleffler_at_informix.com>
Date: Wed, 25 Apr 2001 17:47:53 GMT
Message-ID: <3AE70DB7.B6D8BFFD_at_informix.com>


Steve Long wrote:
> [...snip...]
>
> a binary tree structure (b-tree) is O[ log2 (n) ] over the number of records
> and should be sufficiently fast for your purposes. the key would be the ID
> of the records. since you say the number of records is "very large" (a VERY
> relative term), consider the following table which shows for a given number
> of records, the most number of keys that must be searched to find a given
> record, if it exists.
>
> # records # comparisons
> 1 million 20
> 10 million 24
> 100 million 27
> 1 billion 30
>
> as you can see, the number of comparisons increases only slightly (by 10)
> while the number of records increases dramtically (factor of 1000).

20-30 disk reads would be horribly slow. If they are all in-memory, then it won't be such a big problem. B-Trees would probably be better for this; you'd fit many keys per page (rather than just two with a binary tree), and the number of disk accesses plummets (3 or 4 would be plausible).

> there are other algorithims which provide more efficient searching but also
> require proportionately more storage (the old "space vs time" trade-off) and
> more complex programming. besides, when you can find any one record out of
> one billion with only 20 comparisons, why do the extra work on more complex
> algorithms and programs when only a few comparisons are saved? a b-tree
> algorithm can be found on the net at a number of sites is various languages
> and most any text book on data structures or algorithms.
>
> if you use fixed length records (each field is of fixed length), retrieval
> is even faster since the location of any record in a file can be computed
> directly (again, you will give up some space in exchange for speed). using
> variable length records saves space, but takes more time to index and
> locate.
>
> i hope this helps.
>
> <quered_at_esiee.fr> wrote in message news:3AE05841.26FF56E6_at_esiee.fr...
> > Hi,
> >
> > I have to build a 'high' performance database konwing that the requests
> > will always be of the same type : find an address given an ID. This is
> > the ONLY resquest performed but on a very large database.
> > Since I'm working under linux, I've thought of creating my own structure
> > to store the data, such as using a full disk and accessing only a part
> > of it from an address found in a hash table. Since the disk can be seen
> > as a file, it should work well (?).
> > The problem is that I fear this system might not be as performant as a
> > standard database system (well optimised) . . .
> >
> > Can someone advice me about which database would be the best to answer
> > always the same kind of very simple request ???
> > Is the system I've thought of not too optimistic ???
> >
> > Thanks
> >
> > David Quere
> >
 

-- 
Yours,
Jonathan Leffler (Jonathan.Leffler_at_Informix.com) #include <disclaimer.h>
Guardian of DBD::Informix v1.00.PC1 -- http://www.perl.com/CPAN
     "I don't suffer from insanity; I enjoy every minute of it!"
Received on Wed Apr 25 2001 - 19:47:53 CEST

Original text of this message