Distributed Database question

From: Matt Green <mgreen_at_cs.oberlin.edu>
Date: 2000/06/07
Message-ID: <Pine.OSF.3.96.1000607180231.11857A-100000_at_occs.cs.oberlin.edu>#1/1


I am doing some basic research into a large-scale distributed database system for cataloging the locations (URLs) of web documents. In this system, each server regularly updates the database with the contents of its filesystem. I can easily implement this catalog as a single-server flat database, where all updates and queries go to the same place. However, this doesn't scale very well.

I can also implement the database by partitioning the records across several database servers, according to some key (for instance, all files of a given type are recorded in one database.) This reduces the load on each of the servers, but I have no way of knowing that I have chosen the right key, and I'm not really sure if there is one particular key I can choose.

Are there systems that can dynamically distribute data amongst a network of databases-- in such a way that new data can enter at any node and be directed to the correct place? There would have to be some sort of dynamic balancing, to insure that similar data stays together, but does not concentrate so much that portions of the network are overloaded by traffic?

Please pardon my ignorance-- I don't have much knowledge in this area. I'm not sure if I'm asking for something that already exists, or has proven to be a totally insoluble problem. Any advice or references you could give me would be greatly appreciated. Thanks,

Matt Green

---
 mgreen_at_cs.oberlin.edu
Received on Wed Jun 07 2000 - 00:00:00 CEST

Original text of this message