Re: Generating fake databases

From: Roy Hann <specially_at_processed.almost.meat>
Date: Wed, 12 Oct 2011 07:57:20 +0000 (UTC)
Message-ID: <j73h90$833$1_at_speranza.aioe.org>


Rob wrote:

> On Oct 11, 5:54 am, Roy Hann <specia..._at_processed.almost.meat> wrote:
>> Can anyone point me towards any papers, articles or web pages that
>> discuss efficient techniques for generating large volumes of completely
>> synthetic database content having specified characteristics? I'd
>> settle for pointers to any software tools that might exist.
>>
>> I am not interested in generating mere random values; I want to
>> efficiently generate plausible/realistic values for multiple tables
>> and the "data" must satisfy my database constraints and have specified
>> distributions of key values.
>>
>> If it is relevant, assume I want to create databases for an SQL DBMS.
>>
>> Needless to say I've made a stab at Googling for what I want but I
>> haven't been able to guess effective search terms.
>>
>> --
>> Roy
>
> I needed to run some performance tests and used the Wisconsin
> Database as a starting point. If you go to ACM and search on
> David DeWitt (UWisconsin CompSci prof), you'll find references.
> If that isn't possible and Google can't help, let me know and
> I'll get you references or a paper.

Thanks Rob. I found a nice summary paper by David DeWitt that describes his database. Unfortunately it is pretty much exactly what I don't want. :-(

According to deWitt his "data" is generated using cyclic and random functions for each attribute, and makes no attempt to recreate the characteristics of a real database, with skewed distributions of key values, and correlation and autocorrelation, and gaps.

-- 
Roy
Received on Wed Oct 12 2011 - 09:57:20 CEST

Original text of this message