Re: Horizontally scaling a database

From: Nigel Thomas <nigel_cl_thomas_at_yahoo.com>
Date: Thu, 29 Mar 2007 02:04:16 -0700 (PDT)
Message-ID: <23126.54091.qm@web58702.mail.re1.yahoo.com>

Sadeep

Have you looked at Hibernate Shards (http://www.hibernate.org/414.html)? That gives some of the facilities that John Darrah mentions below. However, the key word is "some".

The idea is that you can
- partition data across multiple databases (into "shards") - using application supplied strategies, which can be changed over time

configure sequential and/or parallel selection strategies to retrieve data from those multiple shards
use the normal Hibernate model to build your application

It looks like a good start, but there are some pretty serious limitations as you run through the documentation. Perhaps most serious is in Chapter 6 "Limitations" http://www.hibernate.org/hib_docs/shards/reference/en/html/shards-limitations.html: "Hibernate Shards does not currently support cross-shard object graphs": That means you can't have inter-shard (as opposed to intra-shard) relationships handled automatically "Hibernate Shards does not provide support for distributed transactions ..." - ie you have to plug in your own transaction manager HQL support is very limited - they say "You're probably better off staying clear of HQL in this release if you can help it." Replicated data is not (well) supported (see 6.6) However, even given all that, you may still get some mileage out of it for your project - even if only as a learning exercise.

Regards Nigel

On 3/17/07, John Darrah <darrah.john_at_gmail.com> wrote: Seems like you would need to logically partition your data onto separate servers using whatever on back end (mysql, BerkleyDB, etc). You would then need application servers sitting in front that have an index into where the data is physically stored based on the partition key. These application servers would act as routers and aggregaters sending the actual queries to appropriate servers and aggregating the results back to the requester. I don't know of anything that would do this out of the box but it can be done. If there wasn't a good key that could consistently reduce the number of back end servers to a small subset of the total then you would forget about the index routing strategy all together and just broadcast to all back end servers every time. How long would you need to retain the data? A much simpler strategy to implement would be to keep the data on one database and partition it you can put a LOT of rows into a partitioned table in oracle, several years worth, even at your requirements. I realize you are looking at shared nothing but it'll cost you to implement the solution I describe above. You will also need exceptional developers and architects that have implemented systems like this before.

--
http://www.freelists.org/webpage/oracle-l

Received on Thu Mar 29 2007 - 04:04:16 CDT