Re: DB Design of GMail: Multiple tables vs. one table?

From: Yiorgos Adamopoulos <adamo+news_at_dblab.ece.ntua.gr>
Date: Sat, 2 Apr 2005 19:24:48 +0000 (UTC)
Message-ID: <slrnd4tsc0.1cht.adamo+news_at_ithaca.dbnet.ece.ntua.gr>


On 2005-04-01, joeserel_at_gmail.com <joeserel_at_gmail.com> wrote:
> I want to start an interesting topic about the DB design behind gmail:
>
> Since the mail box is 1G, there must be a cluster of servers provides
> gmail service. One server, suppose it has 500G hard drive, may support

Although it may have not been tested with the volume of users, mailboxes and emails, DBmail <URL:http://www.dbmail.org/> should give you a start on how such a project is designed (I have not used it, and only skimmed through its documentation).

FWIW, I am now implementing an email solution for about 70K mailboxes, and I am not using any database to store the mailbox itself. I leave that to the filesystem (which knows how to do that well enough). OTOH, all data related to the user, including mail and mailbox routing, are stored on an Oracle database (nothing religious here, the MIS in implemented in Oracle already).

> 500 persons. How do you think google design DB? 500 tables for each
> user? (mail1, mail2, ...mail500) or one big table mail hold them all?

I do not know what Google does (although I would love to) but your approach is very simplistic. What if the user has many mailboxes (say to be accessible via IMAP)? Does each mailbox get to be a table? What about the mail itself? Is it stored separate from its headers or not? How are the extra (X-*) headers dealt? How does one search over the mailbox (for IMAP operations)? Is it possible that it is not a Relational database? Are they converting each email into some XML format of their own and then storing to an XML database? And many more details for a system that must scale in both TBs and users...

> What is the performance difference between these two choice?

So you see this cannot be answered.

-- 
#include <std/disclaimer.h>
Received on Sat Apr 02 2005 - 21:24:48 CEST

Original text of this message