Re: Generating fake databases

From: Rob <rmpsfdbs_at_gmail.com>
Date: Tue, 11 Oct 2011 08:51:26 -0700 (PDT)
Message-ID: <1799eafc-ce90-449a-80c8-311f7c523b92_at_l39g2000pro.googlegroups.com>


On Oct 11, 5:54 am, Roy Hann <specia..._at_processed.almost.meat> wrote:
> Can anyone point me towards any papers, articles or web pages that
> discuss efficient techniques for generating large volumes of completely
> synthetic database content having specified characteristics? I'd
> settle for pointers to any software tools that might exist.
>
> I am not interested in generating mere random values; I want to
> efficiently generate plausible/realistic values for multiple tables
> and the "data" must satisfy my database constraints and have specified
> distributions of key values.
>
> If it is relevant, assume I want to create databases for an SQL DBMS.
>
> Needless to say I've made a stab at Googling for what I want but I
> haven't been able to guess effective search terms.
>
> --
> Roy

I needed to run some performance tests and used the Wisconsin Database as a starting point. If you go to ACM and search on David DeWitt (UWisconsin CompSci prof), you'll find references. If that isn't possible and Google can't help, let me know and I'll get you references or a paper.

I found that the WisconsinDB wasn't large enough for my puposes, so I increased it's size and padded out the records until I was able to produce discernable differences in benchmark-like tests I was running.

Rob Received on Tue Oct 11 2011 - 17:51:26 CEST

Original text of this message