Re: 500'000 records - who does best?

From: Mark Parsons <mparsons_at_fleming.csc.bcm.tmc.edu>
Date: 21 Dec 1992 20:45:05 GMT
Message-ID: <1h5achINN2i2_at_gazette.bcm.tmc.edu>


In article <1992Dec21.182948.16832_at_bernina.ethz.ch>, flog_at_open.ch (Florian Gutzwiller) writes:
> If you'd have to realize a server client based database with Suns as servers
> and NeXT as clients. Single records would be an average of 30KB in size and
> there would be a number of 20 concurrent users.

Ummmmmmmm, let's see . . .. a 30KB record size? Sounds like you just might want to rethink your table definition? Or is a large chunk of this text or pic data?

In Sybase, data pages are limited to 2K(a little less for the actual data since there's some overhead). Records do/will not span more than one page soooooooo, a 30KB record length is not possible in Sybase. Unless you're talking about text(and pic?) data. If you have a column in your record that is TEXT type, what gets stored in the record is a pointer to a separate data page where the TEXT entry resides. And yes, TEXT entries CAN span data pages.

(We're using the 4.2 version of the server and I don't think the 4.8/4.9.1 versions are different in this respect . . . any comments?)

>
> Would you choose Sybase or Oracle ?
> Who has experience with similar solutions/environments ?

As for having 500,000 records . .. and what kind of performance to expect . . . that's gonna depend on a slew of 'options':

  1. What is actual transaction/selection rate, one per user per hour? 5 per user per minute? Is this table used primarily for selects or update/insert/deletes? How many indexes do you have defined and how are they defined? The more indexes you have, the longer it takes to do update/insert/deletes since each index has to be updated. The larger the indexes, the fewer that fit on a data page and so the longer it takes to do reads and writes just to the indexes. The more users doing more transactions the slower the system . . .
  2. How many devices do you have and how is the table (in question) defined across them? One disk is gonna cause a bottleneck during read/write times. More disks, with separate disk controllers can relieve some of this bottleneck. By the way, 500Kx30K = 15 Terra Bytes so you'll need separate disks anyway!!! ;-) This doesn't take into consideration the GB's (TB's???) worth of disk space you'll need for indexes . . . unless you were planning on going without indexes???
  3. Is all data equally likely to be selected? Would it make since to break the table up into some sort of time frames, i.e., most recent(most selected) in one table and older(less desired) data in 'history' tables?
  4. In any case(with 15TB!!) you're gonna need to look at breaking the table, and the records up. At this size of a database, the time needed to do a table scan(ouch!), update of multiple indexes, multiple/join selects is gonna take mucho time . . .
  5. More . . .

If you are, in fact, working with a database that's this size . . . . . . I don't envy you. I would give some very serious thought to downsizing the database . . . separate tables, try to optimize on field types(tinyint vs int), think very serious about separate databases even . . on separate machines . . .

20 people hammering on 15TB++ worth of data on one machine(is this even possible . . technically possible, sure . . but realistically??) is not going to work.

I know: "Mark, Mark, it's a hypothetical question" . . . . but it's a hypothetical question that needs more than just a cursory answer to which database engine is best . . .

 . . . . . $0.88 left!! ;-)

Mark Received on Mon Dec 21 1992 - 21:45:05 CET

Original text of this message