Re: Designing a database system for multiple autonomous clients.

From: Marshall <marshall.spight_at_gmail.com>
Date: 16 Apr 2007 13:42:58 -0700
Message-ID: <1176756178.354778.295750_at_b75g2000hsg.googlegroups.com>


On Apr 16, 12:54 pm, anthony.gal..._at_gmail.com wrote:
> Currently we have a subscription service website which manages time
> and attendance for companies using a MySQL database. Currently all
> companies and users are in the same database. Every company has a
> companyid and every user has a userid and is joined to a company by
> the companyid. We do all standard select, insert, update, delete
> queries you would expect using the companyid and userid as keys for
> joining and selecting records across tables. The companyid and userid
> is looked up and stored in the session when a user logs in.
>
> This has worked well until recently where there have been a few
> concerns. One is the potential security risks of running all queries
> on the companyid condition. While we take every precaution to not pass
> the companyid in any forms or urls for sql injection if someone was
> able to alter their session companyid in some freak way they could get
> to the sensitive data of any other customer. Also, any coding errors
> as some of our quries are quite complex have the potential for
> altering other users data with careless bugs (and there have been a
> few). More importantly though is the issue of locking data and
> preventing race conditions. Currently there is a single admin per
> company who can run a report for a record set, review those records
> and mark the ones to be updated, and then submit it for processing. We
> need to add the ability to have more than one admin with this ability.
> Obviously, if two users then try to do this same this there a
> potential race condition which may occur. On top of that, employees
> have the ability to modify their records as well. If an admin runs a
> report and is reviewing it and an employee updates a record obviously
> the admin will not see it and someones data will be overwritten.
> Obviously this is undesirable. We considered using read locks on the
> table so that while an admin is reviewing records no one else could
> update until they submit their action, obtain a write lock on the
> table, update their data, and release the lock. The problem is that
> locking the table will lock ALL users from the table not just the
> users of the specific company. I looked into creating a view on the
> specific data and locking the view but as far as I can tell from the
> documentation, a lock on a view requires a lock on all the base tables
> involved and now were back to square one.
>
> On top of all this, recently we had a corruption in a MySQL table
> which affected ALL users records. It was fixed easily enough with
> MySQL's repair utilities but it affected service for all users for
> almost 20 minutes.
>
> So, we thought about changing our approach to create a new database
> with every new company. This would ensure that anyone can ever access
> only one company's data and prevent and possibility of cross data
> contamination or security breaches. This would also allow table locks
> to be employed as an admin could perform their actions with the
> knowledge that they are essentially locking the system from their
> employees until they are done, which is an acceptable compromise. Any
> table corruption would affect only one company as well without
> affecting the entire system. Also, as we grow tables, are growing
> exponentially. We're in the 10's of mb for some tables so it's not an
> issue at the moment but MySQL can only handle up to 4GB per table.
> This would ensure plenty of growth per company before it ever became
> an issue. And lastly, a single company's data can easily be backed up
> and restored in it's entiretly with affecting the rest of the system.
> We currently run hourly backups and in the event a database or table
> needs to be restored a single company will lose at most one hour of
> data instead of all companies in the system. In one case we are
> already doing this of sorts. We have a customer on a dedicated machine
> just for them in which we customized the software to fit their
> specific needs with their own MySQL server and database.
>
> The downsides include having to keep a master database to manage a
> single signon solution which I think is there the real problem with
> this plan is. All new users and companies would have to be inserted
> into a master database table to generate unique ids and to store what
> datasource to use when authenticating the user. Then there is the
> overhead of creating an entire new database everytime a user signs up
> with our system and updating the database structure across hundreds of
> companies, potentially thousands, as we add features. It would mean
> adding a new datasource for every new company. It would also prevent
> the ability for us to track and monitor system statistics easily, like
> the total number of users since it would be the the aggregate value of
> one table in hundreds of databases.
>
> Maybe this is a completely ridiculous solution but it seems like it
> might have it's benefits (as well as trade offs). We're gearing up for
> a potential ground up rewrite of our system and this would be the
> place to start.
> Any comments or suggestions for or against would be greatly
> appreciated.
>
> Anthony

Some random thoughts:

*Don't* have one database or table per company. That's just insane. Ridiculously simple queries, like "how many customers do I have?" become hideous. Further, it won't accomplish what you want it to. You mentioned the possibility that the company id could get corrupted in the session; all you've done is change the datum of concern from the company id to the database spec. Also: many databases = management nightmare.

It seems that if you are using a poorly set up MySQL instance.

http://dev.mysql.com/doc/refman/5.0/en/full-table.html

Notice that even antique versions of MySQL can handle databases larger than 4GB; however some filesystems cannot. Use NTFS if on Windows or, I dunno, ext3? on linux? Also from your post I get the idea that you are using the MyISAM table handler. Bad bad bad. This is the cause of both corruption and of lock contention. Basically the MyISAM table handler is suitable only for data that you actively hope to soon be rid of. There is really no reason to use anything other than the innodb table handler. This supports MVCC, instead of the antique row level locking of MyISAM.

I have administered databases of many terrabytes that use innodb. Without corruption. With MyISAM, expect to have corruption as a regular companion. I used to administer a tiny MyISAM database that corrupted itself with every unscheduled reboot. Seriously. Every system crash or power interruption corrupted the database. It was hideous.

Some applications will save all values for a row and before updating, check to make sure every value is the same, meaning that the row has not been modified since it was first queried. Or you can use timestamps or transaction stamps. None of these mechanisms are very satisfying, though.

Marshall Received on Mon Apr 16 2007 - 22:42:58 CEST

Original text of this message