Re: INSTEAD OF INSERT trigger and SELECT

From: Galen Boyer <galenboyer_at_hotpop.com>
Date: 16 Mar 2003 13:09:41 -0600
Message-ID: <ullzflztj.fsf@hotpop.com>

On Sun, 16 Mar 2003, gq437x_at_yahoo.de wrote:
> Galen Boyer <galenboyer_at_hotpop.com> writes:
>

>> On Fri, 14 Mar 2003, gq437x_at_yahoo.de wrote:
>>> 
>>> We insert into one table from many clients via mass inserts.
>>
>> How does the data get there?  Do the clients insert it?  Do you guys
>> get file feeds and load into a staging table?  (ie, what's the steps
>> to performing a mass insert?)

>
> Hello Galen,
>
> Thank you for asking. No, we are not importing files.
>
> The data is produced by the clients. It is a result of some
> calculation. The client inserts the rows at the end of each
> calculation cycle into the database using the OCI C-API using an
> executemany. The rows produced by one client can partly overlapp with
> rows produced by other clients on a similar, but slightly different
> calculation. Each client ensures it will send each row only once to
> the database.

Hm... Since the data source, ie client, isn't part of the key, the whole solution of how to handle duplicate rows should handle the fact that clients might send duplicate rows?

> The natural key data is so large, that we can't afford to include
> them into referring tables. So, rows in other tables referre to rows
> in this table through the artificial ID value for space reasons.
> Therefore the artificial primary key (ID) for each row is calculated
> from the natural key data on the client (kind of compression). This
> ID calculation is guaranteed to produce unique results.

This looks like it might be the culprit. I would bet that two different rows are "hashing" to the same key (especially since the error is a PK violation?). Why not just use a sequence id as the PK and then use a unique index on the natural key? Then, check for unique index violations when inserting? (ie, you rolled your own "hashed natural key")

> The 'select count(*)' part in my trigger is meant to tell me the
> difference between acceptable identical rows and invalid ID
> calculations on the client.
>
> We thought about preloading the existing rows from the database,
> but the table with trigger will be larger than the local client
> memory. And preloading rows will not prevent collisions of
> rows produced by different clients in parallel
> after the calculation has started. So, the duplicate detection has toe
> be done by the database. Thats where it belongs anyway, IMHO.

To isolate this, you could add to the existing trigger so, on insert, it inserts all inserted rows to a new unprotected table and you could do some analysis on that table? I bet you'll find duplicate PKs with different attributes on that row.

-- 
Galen deForest Boyer
Sweet dreams and flying machines in pieces on the ground.

Received on Sun Mar 16 2003 - 13:09:41 CST