Re: 2 million row database from hell! help a newbie in over his head.

From: David Cressey <david_at_dcressey.com>
Date: Tue, 21 Aug 2001 19:35:15 GMT
Message-ID: <TJyg7.278$Iw2.18373_at_petpeeve.ziplink.net>


As a seasoned database guy, leaning more towards practice than theory, here's my two cents worth
about your inquiry.

I worry a lot more about 450 tables than I do about 2 million rows.

I had more than 2 million rows in the fact table of a star schema, and the DB engine just barely slowed down on the queries I put it through.

Don't ignore normalization issues, but don't treat normalization as the holy grail of database design. Knowing when to normalize is as important as knowing how to normalize. In particular, as I read your post, this database is not an upstream collection point for data from users. Rather it's a downstream DB loaded en masse and queried for publishing. If that's true, normalization may not be as important to you as some other considerations.

Are the queries batch and preplanned or are they more interactive and ad hoc?

--
Regards,
    David Cressey
    www.dcressey.com
"dave rau" <dave_at_projectfx.com> wrote in message
news:11c09d6a.0108210954.347c9b16_at_posting.google.com...

> I must preempt my message with the fact that I am a total newbie to
> database design both conceptually and in practice. I have been
> learning some mySQL and PHP recently and have recently taken on a very
> large database project at my cushy guuv-ment job to publish data to
> the web.
>
> I have encountered some problems, mostly dealing with how to
> restructure this database for optimization in mysql and for
> publishing/searching on the web.
>
> The database I am working with contains a rainfall site number, a
> collection date, a collection value and the status of the data,
> whether the data was incomplete, or if the power went out on a
> station, etc.
>
> I have paired importent columns down to the following: ID (*primary
> key), Site_number, Collection_date, value, and status. I am not even
> sure if this is the correct way to pair the data down however.
>
> The biggest problem is that the database currently lists values for
> every day, for 50 years, for 450 rainfall stations!!! Currently the
> database has over 2 million rows!! Newbie or not, this seems pretty
> excessive and clunky for publishing to the web.
>
> With that said, would I be better off by eliminating the Site_number
> column and making a table for EACH rainfall station (all 450 of them)?
> Should I make a table for each year? Keep in mind that all 450+
> stations have a value for every day for 50+ years, some go back to
> 1910 (90 years). Since zero is an acceptable value for rainfall it is
> possible that much of the value columns are zero.
>
> *also, I have been looking into some books on conceptual database
> design, is database design for mere mortals a recommended book on
> anyones list? I would be interested in some others. thanks in adv
> guys!
>
> dave
Received on Tue Aug 21 2001 - 21:35:15 CEST

Original text of this message