Re: Generating fake databases

From: Derek Asirvadem <derek.asirvadem_at_gmail.com>
Date: Thu, 13 Oct 2011 01:09:26 -0700 (PDT)
Message-ID: <30ebc867-8644-4175-8c63-2963f293972e_at_l8g2000pro.googlegroups.com>


On Oct 11, 10:54 pm, Roy Hann <specia..._at_processed.almost.meat> wrote:

> If it is relevant, assume I want to create databases for an SQL DBMS.

Well, if you have an SQL database, life is very simple. Most of my clients are banks, and quite often we need to create a copy of the PRODUCTION database for UAT purposes. It appears we have the same need as you, in that we need really good, plausible/realistic values, with FK data distributions matching PK values. Just write a few scripts to *obfuscate* the Prod data, and thus eliminate customer confidentiality issues, security issues, etc, so that the UAT team and development team (who have no access to Prod, and no security clearance) can view and use the obfuscated data without breaching any legal or policy requirements.

The need for scripts is because questions such as (a) whether a column needs to be obfuscate, (b) what type of obfuscation is required, has to be made on a table-column basis, and (c) you can them to run in batches, in parallel, for speed. Further, as your testing progresses, and you identify issues, you (d) need to progress your scripts. Such scripts are simple enough to write, no "product" is necessary. Our scripts on the last project take about one hour to execute on a 1.2TB database.

> and the "data" must satisfy my database constraints

Well assuming you means database, when you uses the word "database", the data in the database will *always* satisfy the database constraints, regardless of whether it is obfuscated or transformed or hammered. (maybe you mean something else by that statement ?). You *have* implemented the C in ACID, haven't you ? If you have not, then it is not a database, it is a bucket of fish. In that case, you will have to expend some effort transforming it into a database, before you can expect database capabilities from it.

Regards
Derek Received on Thu Oct 13 2011 - 10:09:26 CEST

Original text of this message