Re: Inserting rows into master table from multiple threads

From: Robert Klemme <shortcutter_at_googlemail.com>
Date: Thu, 20 Dec 2007 22:06:34 +0100
Message-ID: <5t03qrF1abn1gU2@mid.individual.net>

On 20.12.2007 22:05, Robert Klemme wrote:

> On 20.12.2007 15:05, DA Morgan wrote:
>> sameergn_at_gmail.com wrote:
>>> On Dec 19, 9:30 pm, DA Morgan <damor..._at_psoug.org> wrote:
>>>> samee..._at_gmail.com wrote:
>>>>> On Dec 19, 8:45 am, DA Morgan <damor..._at_psoug.org> wrote:
>>>>>> samee..._at_gmail.com wrote:

>>>>>>> Hi,
>>>>>>> We have a master table, say with two columns name (varchar) and id
>>>>>>> (number) and a stored procedure insert_name_details(name, other
>>>>>>> details).
>>>>>>> SP will first check if name is present in master table or not. If
>>>>>>> present, it will get the id otherwise it will insert a row in master
>>>>>>> table with a new sequence number and use that id in other tables as
>>>>>>> foreign key. So its basically an atomic "get or create-if-not found"
>>>>>>> operation.
>>>>>>> There are two ways of handling concurrency here.
>>>>>>> Option 1
>>>>>>> -------------
>>>>>>> select id from master_table;
>>>>>>> if (not_found)
>>>>>>> {
>>>>>>> lock master_table;
>>>>>>> (again) select id from master_table;
>>>>>>> if (still_not_found)
>>>>>>> {
>>>>>>> insert record in master table;
>>>>>>> }
>>>>>>> }
>>>>>>> Option 2
>>>>>>> ------------
>>>>>>> select id from master_table;
>>>>>>> if (not_found)
>>>>>>> {
>>>>>>> try
>>>>>>> {
>>>>>>> insert record in master table;
>>>>>>> }
>>>>>>> catch (PrimaryKeyVIolationException)
>>>>>>> {
>>>>>>> select id from master_table;
>>>>>>> }
>>>>>>> }
>>>>>>> Both these options are for stored procedure implementation although
>>>>>>> code is written is Java style.
>>>>>>> Which method is preferred? Locking table or catching exceptions? I
>>>>>>> know that using exceptions should not be used to drive logical
>>>>>>> flow,
>>>>>>> but it makes life simpler for a programmer vs. locking table. Also
>>>>>>> locking tables means holding resources from other threads, but table
>>>>>>> will be locked for a brief amount of time i.e. till master row is
>>>>>>> inserted.
>>>>>>> Any comments, suggestions or alternatives?
>>>>>>> Thanks,
>>>>>>> Sameer

>>>>>> Neither of these is a good idea.
>>>>>> First of all this is Oracle not SQL Server so the data type is not
>>>>>> VARCHAR. A strong clue that you are not dealing with Oracle as 
>>>>>> someone
>>>>>> who understands the way it works.
>>>>>> Second what is the form of your primary key? If it is a surrogate key
>>>>>> it should be generated by a sequence object making a collision 
>>>>>> impossible.
>>>>>> Third, assuming a natural key and a possible collision which is more
>>>>>> likely? A duplicate or a non-duplicate? Assuming some competence in
>>>>>> system design a duplicate is highly unlikely so you should just to 
>>>>>> the
>>>>>> insert and trap the rare exception.
>>>>>> Again, as at OTN where you posted the exact same question, it appears
>>>>>> you think Oracle is SQL Server ... it is not.
>>>>>> -- 
>>>>>> Daniel A. Morgan
>>>>>> Oracle Ace Director & Instructor
>>>>>> University of Washington
>>>>>> damor..._at_x.washington.edu (replace x with u to respond)
>>>>>> Puget Sound Oracle Users Groupwww.psoug.org
>>>>> Daniel,
>>>>> This code is for Oracle 10g database. I meant varchar2, but wrote
>>>>> varchar just to keep it generic, pseudo-code like, to indicate a
>>>>> character columns. Anyway, that was just an example and not real
>>>>> columns anyway.
>>>>> I have seen "select ... for update" construct when you want to
>>>>> atomically get and update a row, but was not sure what is the
>>>>> recommended approach for inserting a new master row in a multi-
>>>>> threaded environment.
>>>>> The key will be generated by a sequence object.
>>>>> Thanks,
>>>>> Sameer
>>>> SELECT FOR UPDATE makes perfect sense if you are going to update but
>>>> that appears to not be what you are doing. If you are inserting then
>>>> just perform the insert like this:
>>>>
>>>> BEGIN
>>>>    INSERT INTO
>>>>    (....,....)
>>>>    VALUES
>>>>    (...,...);
>>>> EXCEPTION
>>>>    WHEN DUP_VAL_ON_INDEX THEN
>>>>      ... do something else ...
>>>> END;
>>>>
>>>> This is, again, assuming a good design and collisions are rare.
>>>>
>>>> What you did not address is why you think a collision is possible.
>>>> In most cases that possibility is a red flag to it being a bad design.
>>>> -- 
>>>> Daniel A. Morgan
>>>> Oracle Ace Director & Instructor
>>>> University of Washington
>>>> damor..._at_x.washington.edu (replace x with u to respond)
>>>> Puget Sound Oracle Users Groupwww.psoug.org
>>>
>>> We are trying to log HTTP requests from tomcat to a database table.
>>> This would include URL, referrer, user agent (browser), HTTP method
>>> (GET/POST) etc. It would require lot of space if we log all the
>>> character data as it is, so we decided to normalize it, but all
>>> possible values of URL, referrer, UA are not known in advance. So a
>>> master table for them will be built on the fly as and when new data
>>> arrives. It may then be possible for two threads to create an entry in
>>> master table at the same time.
>>>
>>> With this approach, contention would arise only during initial phase,
>>> when master table is being built. After some time most of the values
>>> would go into master table and their ID will always be found while
>>> inserting access log record.
>>>
>>> ( I am aware of the AccessLogValve mechanism provided by Tomcat but we
>>> are logging some application specific data in each row and making
>>> these inserts using a separate thread to minimize impact on service
>>> processing thread)
>>
>> It seems you are indicating collisions will be rare or perhaps never
>> happen so just use an optimistic methodology and trap for the rare
>> occasion when an issue arises as I have suggested.
> 
> I believe the situation is different: while there might be few 
> collisions for URLs there are likely only few collisions for HTTP 
> methods (this table could even be prefilled as the data set is known 
> beforehand) and more but still fewer collisions for browser.
> 
> Having said that it seems different strategies for different columns are 
> in order.
> 
> URL: option 3: direct insert without select, catch unique violation 
> error and return selected id
> 
> Browser: allow multiple entries, option 1 but without the locking.
> 
> HTTP method: prefill, for the rest use the same approach as for browser
> 
> etc.

PS: A nice example for the importance of knowing the data.

robert Received on Thu Dec 20 2007 - 15:06:34 CST