RE: Philosophical question on primary keys

From: Powell, Mark D <mark.powell_at_eds.com>
Date: Fri, 7 Aug 2009 13:55:37 -0400
Message-ID: <D1DC33E67722D54A93F05F702C99E2A90439B8CF_at_usahm208.amer.corp.eds.com>



The columns in this table appear to be static data values by their nature so hopefully this will not be a problem. It is also the only index on the table if I remember the initial post correctly.  

Based on the issue being reported I am not sure there is a good alternative, other than using both columns as part of the PK, since Richard Foote has written about the performance penalty of using a non-unique index to support the PK. Using both columns was not desired by the OP,

  • Mark D Powell -- Phone (313) 592-5148
	From: Tim Gorman [mailto:tim_at_evdbt.com] 
	Sent: Friday, August 07, 2009 12:59 PM
	To: Powell, Mark D
	Cc: oracle-l_at_freelists.org
	Subject: Re: Philosophical question on primary keys
	
	
	Jay and Mark,
	
	A recent "situation" involving massive performance problems
resulting from sessions contending for TX enqueues probably resulted from a decision similar to this.         

        An extremely active OLTP table (call it "order activity") had two unique indexes in addition to several other non-unique indexes:         

  • ID
  • ID and STATUS

        As you can tell, the second index on ID and STATUS was redundant, but it was probably created based on a similar chain of reasoning described here. The fatal flaw was that unique indexes become essentially "in doubt" when a transaction modifies them but is yet uncommitted. Other transactions wishing to modify that unique index to add a new unique data value cannot do so until the first transaction commits. Thus, unique indexes become a point of serialization.         

        Having the unique index on ID was necessary and manageable, since it was modified only on INSERT and DELETE operations, which were relatively well-controlled. The unique index on the ID and STATUS columns was very problematic however, because the application modified the STATUS column almost constantly. While this was causing serious performance problems all along, it was regarded as acceptable, or at least not intolerable. The application developers, wielding the unassailable argument of "its been like that for more than four years" did not want to discuss either dropping the ID/STATUS index or making it non-unique to reduce the contention and queueing. If it ain't broke, why fix it?         

        This was true (of course) until volume increased dramatically. The argument of "its worked for years and nothing has changed" always focuses on code and fails to notice changes in usage and volume. The straw that broke the camel's back was an "archiving" batch job which deleted orders and inserted them into an "archive" schema. This consisted of about 30 insert statements followed by 30 delete statements followed by a single commit. This batch job usually ran during an overnight window, but with the increase in data volumes it was now spilling over into prime time. The long-running uncommitted transaction for the archival job meant the two unique indexes were essentially "in doubt" for hours, and any modifications could not proceed until the thing committed. This was bad enough for the programs performing inserts of new orders, but for the apps which wanted to update status it was like hitting a brick wall.         

        Moral of the story: always question the use of unique indexes and always seek to minimize their quantity as well as their scope, if possible, because they are a source of serialization. With partitioned tables, unique indexes often lead to global indexes, which presents another set of impossibilities and difficulties on top of the original difficulties. I'm not saying that unique indexes are evil, but I've noticed them spew pea soup occasionally...         

        Hope this helps...         

	Tim Gorman
	consultant - Evergreen Database Technologies, Inc.
	P.O. Box 630791, Highlands Ranch CO  80163-0791
	website   = http://www.EvDBT.com/
	email     = Tim_at_EvDBT.com
	mobile    = +1-303-885-4526
	fax       = +1-303-484-3608
	Yahoo IM  = tim_evdbt


	Powell, Mark D wrote: 

		 Have you considered?
		
		Dropping the PK, creating a unique index on message_id,
folder_id, then
		adding the PK constraint.  Oracle should use the
existing index to
		enforce the PK.
		
		
		-- Mark D Powell --
		Phone (313) 592-5148
		
		
		-----Original Message-----
		From: oracle-l-bounce_at_freelists.org
		[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of
		Jay.Miller_at_tdameritrade.com
		Sent: Friday, August 07, 2009 12:09 PM
		To: oracle-l_at_freelists.org
		Subject: Philosophical question on primary keys
		
		We have a table with a primary key of message_id.  This
table is
		occasionally queried on that column alone but most often
on both the id
		and a folder_id column (also numeric) with no other data
being returned.                 

                There are no foreign keys pointing to this primary key.                 

                I'm trying to squeeze every last bit of performance out of one piece of

                SQL that accounts 92% of all database i/o.                 

                Using an index on message_id,folder_id reduces LIOs from 36 to 32 for a

                typical query as opposed to using the primary key index on message_id.                 

                The question is whether to create a new index or change the primary key

                to include both columns.                 

		Arguments against modifying primary key: 
		The primary key is just message_id, adding folder_id
doesn't make it any
		more unique.  Also folder_id currently only exists in
this table so if
		for some unforeseen reason we someday need to point a
foreign key to
		this table this might cause a problem.  No one thinks
this will ever be
		necessary but who knows what might happen in the future.
		
		Arguments for modifying primary key: 
		One less index on the table means less overhead for
inserts/updates.
		One less index is less storage used.  Also, I'm having
trouble getting
		the optimizer to use the second index in our test
environment without
		resorting to an index hint which I prefer to avoid.  
		
		
		
		Comments welcome.  Thanks!
		
		
		Here's the SQL in case anyone wants to take a look at
it:
		SELECT i.message_priority_cd, COUNT(*) AS count_label 
		FROM 	CLIENTMSG_ADMIN.message_instance i,
		CLIENTMSG_ADMIN.message_transmission m, 
			CLIENTMSG_ADMIN.secure_inbox_message
secure_inbox WHERE
		i.message_id = secure_inbox.message_id AND m.message_id
= i.message_id
		AND m.channel_cd = 'SECURE_INBOX' 
		AND i.account_nbr = :1
		AND (m.delivery_status_cd = 'PENDING' OR
m.delivery_status_cd
		='DELIVERED')
		AND m.message_read_ts IS NULL
		AND m.delete_ts IS NULL
		AND (i.expiration_ts > current_timestamp OR
i.expiration_ts IS NULL) AND
		secure_inbox.folder_id <> 3 AND secure_inbox.folder_id
<> 2 GROUP BY
		i.message_priority_cd; 
		
		--
		http://www.freelists.org/webpage/oracle-l
		
		
		--
		http://www.freelists.org/webpage/oracle-l
		
		
		
		
		  


--
http://www.freelists.org/webpage/oracle-l
Received on Fri Aug 07 2009 - 12:55:37 CDT

Original text of this message