Re: Newbie question on table design.

From: Alvin Ryder <alvin321_at_telstra.com>
Date: 3 May 2007 21:05:09 -0700
Message-ID: <1178251509.255355.72770_at_n76g2000hsh.googlegroups.com>


On Apr 26, 3:48 pm, somedevelo..._at_gmail.com wrote:
> Hi,
>
> I have a table T with -- let's say -- a million+ records. The
> application allows logical deletion of a small fraction of these
> records. (Un-deletion of deleted records is allowed if you have admin
> privs via a Priveleges table.)
>
> I can easily think of 2 ways to support record un-/deletion.
>
> 1. Add an 'is_deleted' column that can take a binary 1/0 value (1 =
> deleted, 0 = not-deleted).
>
> This approach will result in...
> + simplest possible SQL;
> - more space overhead (for at least some if not all DBMSes, now or
> in future) due
> to addition of a full column for a million+ rows most of which
> will
> anyway hold a single value (namely, 0 = not-deleted).
>
> 2. Alternatively, create another Deleted_T table in which the records
> deleted from T would be inserted.
>
> This approach will result in...
> - slightly more involved SQL;
> + far less space overhead as only very rare, exception information
> ends up getting recorded.
>
> Presently, I tend to like #2 because...
> * I can live with the 'slightly more involved SQL';
> * I can live with the small speed overhead of consulting an
> additional table (Deleted_T);
> * I welcome every bit of free disk space on by server.
>
> I cannot foresee any other de/merits entailing these approaches. If
> you can, would you please point them out and/or recommend me your
> favorite.
>
> Thanks.
> A DBMS newbie.

Hmm, I would not let those physical economics govern the decision. Do what is logically correct.

Besides you may be surprised by the physics after all. For instance you can most probably sneak in an that extra bit or so many byte at no additional cost. Most databases keep tuples (rows, records, whatever) in blocks. The block size might be 8196 bytes, blocks contain some overhead and then an *integral number of tuples*. There will also be other spare space so not every single bit of the block will be occupied. True your straw can be the bit that breaks the camel's back, you will then incur a quantum cost of another block but chances are you will probably sneak it in without extra cost.

Secondly if you only need to support this undelete feature for one table then that's one story but what if you need to do it across all or most of your tables. Your schema size will virtually double. More tables, more indexes, more constraints, more queries, buffers ... Now I would see that as a more serious implication for performance, memory, CPU, network and disk usage.

Physically you'll win some and loose some, so you may as well do the right thing from a logical perspective. Besides even if a logically correct approach costs a little more I would still go with it, it'll be resources well spent.

Cheers. Received on Fri May 04 2007 - 06:05:09 CEST

Original text of this message