Designing a database system for multiple autonomous clients.

From: <anthony.galano_at_gmail.com>
Date: 16 Apr 2007 12:54:39 -0700
Message-ID: <1176753279.098978.238400_at_e65g2000hsc.googlegroups.com>



Currently we have a subscription service website which manages time and attendance for companies using a MySQL database. Currently all companies and users are in the same database. Every company has a companyid and every user has a userid and is joined to a company by the companyid. We do all standard select, insert, update, delete queries you would expect using the companyid and userid as keys for joining and selecting records across tables. The companyid and userid is looked up and stored in the session when a user logs in.

This has worked well until recently where there have been a few concerns. One is the potential security risks of running all queries on the companyid condition. While we take every precaution to not pass the companyid in any forms or urls for sql injection if someone was able to alter their session companyid in some freak way they could get to the sensitive data of any other customer. Also, any coding errors as some of our quries are quite complex have the potential for altering other users data with careless bugs (and there have been a few). More importantly though is the issue of locking data and preventing race conditions. Currently there is a single admin per company who can run a report for a record set, review those records and mark the ones to be updated, and then submit it for processing. We need to add the ability to have more than one admin with this ability. Obviously, if two users then try to do this same this there a potential race condition which may occur. On top of that, employees have the ability to modify their records as well. If an admin runs a report and is reviewing it and an employee updates a record obviously the admin will not see it and someones data will be overwritten. Obviously this is undesirable. We considered using read locks on the table so that while an admin is reviewing records no one else could update until they submit their action, obtain a write lock on the table, update their data, and release the lock. The problem is that locking the table will lock ALL users from the table not just the users of the specific company. I looked into creating a view on the specific data and locking the view but as far as I can tell from the documentation, a lock on a view requires a lock on all the base tables involved and now were back to square one.

On top of all this, recently we had a corruption in a MySQL table which affected ALL users records. It was fixed easily enough with MySQL's repair utilities but it affected service for all users for almost 20 minutes.

So, we thought about changing our approach to create a new database with every new company. This would ensure that anyone can ever access only one company's data and prevent and possibility of cross data contamination or security breaches. This would also allow table locks to be employed as an admin could perform their actions with the knowledge that they are essentially locking the system from their employees until they are done, which is an acceptable compromise. Any table corruption would affect only one company as well without affecting the entire system. Also, as we grow tables, are growing exponentially. We're in the 10's of mb for some tables so it's not an issue at the moment but MySQL can only handle up to 4GB per table. This would ensure plenty of growth per company before it ever became an issue. And lastly, a single company's data can easily be backed up and restored in it's entiretly with affecting the rest of the system. We currently run hourly backups and in the event a database or table needs to be restored a single company will lose at most one hour of data instead of all companies in the system. In one case we are already doing this of sorts. We have a customer on a dedicated machine just for them in which we customized the software to fit their specific needs with their own MySQL server and database.

The downsides include having to keep a master database to manage a single signon solution which I think is there the real problem with this plan is. All new users and companies would have to be inserted into a master database table to generate unique ids and to store what datasource to use when authenticating the user. Then there is the overhead of creating an entire new database everytime a user signs up with our system and updating the database structure across hundreds of companies, potentially thousands, as we add features. It would mean adding a new datasource for every new company. It would also prevent the ability for us to track and monitor system statistics easily, like the total number of users since it would be the the aggregate value of one table in hundreds of databases.

Maybe this is a completely ridiculous solution but it seems like it might have it's benefits (as well as trade offs). We're gearing up for a potential ground up rewrite of our system and this would be the place to start.
Any comments or suggestions for or against would be greatly appreciated.

Anthony Received on Mon Apr 16 2007 - 21:54:39 CEST

Original text of this message