Re: Tables Larger When using DBMS_REDEFINITION

From: MARK BRINSMEAD <mark.brinsmead_at_gmail.com>
Date: Tue, 24 Feb 2015 01:47:20 -0500
Message-ID: <CAAaXtLDhi1Qe8fLWVKWgo=-+VDtd6Um-GoBi_fSE8On-BTjM6g_at_mail.gmail.com>

A valid point. But the existence of a primary key in the absence of an (enforced) constraint still leads to the problem that Mladen mentions -- errors in the ETL could lead to duplicated data, and those errors could go undetected, most likely forever.

That said, unenforced PKs can make sense, for example, if you have something like a materialized view which you refresh only by complete refreshes from a source that *does* have an enforced PK, you can still have a high degree of confidence in the integrity of the data.

Data Warehouses, though, are kind of odd cats in this sense, aren't they? Mainly by chance, I don't happen to work with them a lot (and have never had to design one from the ground up) but I understand that it is not an uncommon practice to denormalize data. (This is why DWs get so big, right?)

When the data is denormalized, its not going to be in 3NF, or even 2NF. I would like to hope that if you do it right, you can still keep it in 1NF (every table has a primary key), but if you go mad with the denormalization (and maybe drop some "unneeded" columns along the way), I could see the data becoming so blurred that the only possible PK would be a synthetic one.

Of course, therein probably lies the trick -- denormalizing the data, but preserving all of the PKs from all of the datasources, such that your denormalized table still has a (natural) PK. So long as you do this, the data is valid (and you can -- with some work -- TEST its validity), and the decision of whether or not to declare/implement a PK constraint becomes (mostly) a design choice. Even so, if you choose not to declare it, then the database cannot enforce it, so you have no way of being certain you have introduced no errors (without scanning the table, at least).

I think I would like to live in a world where all tables at least have a PK (although it may be okay if sometimes it is undeclared/unenforced). Of course, that might not apply in the world of "big data", but that's another sort of animal entirely.

In the example discussed in this thread, it sounds like somebody decided to throw away some of the columns that should constitute the primary key following denormalization of the data. If that's the case, it may someday prove to be a mistake. Time will (usually) tell.

On Mon, Feb 23, 2015 at 10:55 PM, Tim Gorman <tim_at_evdbt.com> wrote:

> Joining this thread late, but vocabulary is important. Don't confuse
> "primary keys" with "primary key constraints". The former is a combination
> of column values to uniquely identify a row, the latter is a mechanism used
> to enforce the former.
>
> Just because a table doesn't employ primary key constraints doesn't mean
> that it doesn't have a primary key, and that it is not enforced.
>
>
>
>
> On 2/23/15 18:26, Mladen Gogala (Redacted sender mgogala_at_yahoo.com for
> DMARC) wrote:
>
> On 02/23/2015 12:35 PM, Dave.Noble_at_wolseley.com wrote:
>
> Not really the subject of the thread… but it is quite common practise to
> have no primary key on a datawarehouse fact table. Sometimes it’s logically
> not possible to uniquely identify a row eg same customer buys same product
> from same branch on same day – business is not interested in the time of
> day, or the sales order number, but they are interested in the number of
> individual sales. Unfortunately the elegance of Primary Keys and Relational
> Database Design has been scuppered by those darned realities of life.
>
> You could include the time, but nobody’s interested. You could include a
> generated surrogate key and primary key that, but why would you invent work
> when there’s enough already. If you need to get close to a row to maintain
> or delete it then you use the foreign keys (Bitmap Indexed) from the
> dimensions.
>
> So however crazy a design offence this may be, we live without a Primary
> Key on the Fact and it hasn’t caused us a moments problem in 5 years of
> live running.
>
>
>
>
> There is only one question: your database has been running for quite some
> time. How do you know it's logically correct? How do you know that your ETL
> procedure hasn't missed a beat here and there or that it hasn't loaded the
> same record twice? Simple truth is that it is very hard to control that and
> that you would have to check ETL logs and compare them with the primary
> after each and every log. Second question is whether your primary tables
> have the primary key? Hopefully, they do.
> Second disagreement is that this doesn't have anything to do with the
> initial thread. The whole problem was created because the table being
> redefined by the DBMS_REDEFINITION does not have the primary key.
> Last, if your business is only interested in the number of daily
> purchases, then why give them all rows in de-normalized form? Loading just
> the requested data, with GROUP BY stuff done on the primary side looks much
> more economical solution than loading all rows, without means of
> distinguishing among them. This is really a philosophical question,
> something for the late Lex de Haan. Relational databases are based on the
> set theory. Set theory is based on the notion that it is possible to tell
> the elements of a set apart. Without the possibility of telling them apart,
> it is not possible to answer the question of how many elements are there in
> the set.
> The fact that we call sets "tables" is just a convenience. This is what
> happens when you have a big table, full of rows which are impossible to
> tell apart:
>
> http://dilbert.com/strip/2004-12-26
>
> I have seen such "designs" many times. If it is not possible to have a
> clear uniquely identifying property of each row, maybe the table itself is
> not worth creating? Such questions do save storage space, you know.
>
> --
> Mladen Gogala
> Oracle DBAhttp://mgogala.freehostia.com
>
>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Tue Feb 24 2015 - 07:47:20 CET

This message: [ Message body ]
Next message: Anton: "dba_2pc_pending transaction_id to sql_text"
Previous message: Mladen Gogala: "Re: Tables Larger When using DBMS_REDEFINITION"
In reply to: Tim Gorman: "Re: Tables Larger When using DBMS_REDEFINITION"
Next in thread: Dave.Noble_at_wolseley.com: "RE: Tables Larger When using DBMS_REDEFINITION"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message