RE: null or value
Date: Wed, 16 Sep 2009 19:18:06 -0400
well, first of all, Toon obviously missed that only 1000 or so of the records will be INactive. So if you want to usually find all the ACTIVE records, as Mr. Powell said, no index is going to help you.
Second, let's just say on occasion you wanted to find ONLY the inactive records (perhaps cleaning up a query set result via a MINUS against just the INactive records.) Then you would have the archetypal example of using the trick in Oracle that completely null entries are omitted from the index. So in that case you would document in the comments on the table in the data dictionary that NULL means active, Y means inactive, and name the column something like INACTIVE. And of course you could constrain the values to Y and NULL. And while no index will help you find the huge majority efficiently, there might well be many cases where the cheapest cost will be to ignore the fact some of the records are inactive and clean them up at the end. Depending on the relationship of the rows including the inactive rows in the original query could also be invalid, and someone who knows whether that is true would have to decide whether the "post production" MINUS trick is valid for a given query.
Using the database to tell you what values mean and defining a domain are legitimate. If you are not going to document your database, then I agree you should not use NULL for anything but NULL, and only then if you can't figure out a way not to use NULL at all.
Finally, if the quantities of inactive and active were reversed, then what Toon wrote would have been spot on, except maybe it's a little misleading to write "Put NULLs in all the others", since you can add the nullable column with a default of NULL and just leave them NULL to do the minimum update.
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
On Behalf Of Julio Aguilar-Chang
Sent: Wednesday, September 16, 2009 3:14 PM To: chet.justice_at_gmail.com
Cc: toon.koppelaars_at_rulegen.com; Randy.Steiner_at_nyct.com; oracle-l_at_freelists.org
Subject: Re: null or value
I must agree with Chet. If the two possible values are active or inactive, then populate the column with one of those two values. If you have a null, then you don't know whether that was done on purpose, or did the app not populate the column correctly - you just don't know why there is a null.
chet justice wrote:
As I'm not a real DBA (I only play one on occasion at work), I won't (can't) get into which type of index would be better.
However, I do object from a data perspective, to defaulting the inactive rows to NULL. NULL = UNKNOWN correct? In this case, it's not unknown, it's an inactive record. Use 'N' or whatever identifier you choose.
Or am I just being a tad obsessive?
On Wed, Sep 16, 2009 at 2:56 PM, Toon Koppelaars <toon.koppelaars_at_rulegen.com> wrote:
Assuming that by far the majority of those rows fall into the "Inactive" category, I think there are a few options:
- Add a nullable column to that table, and put a 'Y' (or whatever) into it to indicate the row is an Active one. Put NULL's in all others. Then create a regular b-tree index on that column for quick access of the Active rows. By not putting 'N' into all the inactive rows, you'll ensure this index is going to remain small (compared to other indexes on that table).
- (assuming there's an PK on that index), create a new table, and only store the PK-values in that table to indicate the ones that are active. So PK of this new-table = PK of the big table. New table has small set of PK's that can be used to drive nested-loops to the big one.
- Variation on 1: *do* put 'N'-s in for the inactive rows, but create a function based index that has no-entries for these rows. Eg. index 'case [column] when 'Y' then 'Y' else null end
On Wed, Sep 16, 2009 at 8:43 PM, Steiner, Randy <Randy.Steiner_at_nyct.com> wrote:
I have a table with 1 - 2 million records. I need to add a column to indicate if the records is active or not. I would guess that only 1,000 of the records would not be active. Should I make one of the values null? So I could put a Y or Null? Or put Y or N?
Would a b-tree or bitmap index do any good?
Generally I would want to see all the records that are active
(co)Author: "Applied Mathematics for Database Professionals" www.RuleGen.com/pls/apex/f?p=14265:13