Re: Looking for ideas on blocked sessions updating partitioned table with CLOB (solved)

From: Tim Gorman <tim.evdbt_at_gmail.com>
Date: Tue, 18 Sep 2018 08:41:21 -0600
Message-ID: <65fa3c6a-8021-0856-3f45-8df869eaec50_at_gmail.com>



Chris,

You have a gift for telling the story.  This could make an amazing newsletter article or a fantastic presentation that you'll enjoy creating and presenting.  The fun part of such an article or presentation would be explaining the underlying technology, so that the reader/audience enjoys the punchline as much as we have.

For an article in a newsletter, please consider RMOUG SQL>Update

<http://www.rmoug.org/newsletter/>, the NoCOUG Journal 
<http://nocoug.org/newsletter.html>, IOUG Select 
<http://select.ioug.org/>, or UKOUG Oracle Scene 
<http://www.ukoug.org/membership-new/member-activities/oracle-scene/>?  
There might be others I have overlooked, for which I apologize.

Hope this helps!

-Tim

On 9/18/18 08:22, Chris Taylor wrote:
> So, I need to own up to something.
>
> My whole approach to this problem was wrong.  I had been told that the
> problem with this table was the CLOB and the number of sessions
> updating the table at the same time.  So, my whole approach was based
> on that information being TRUE.
>
> I was blinded to the 'real' problem and wasn't actively looking for an
> alternative answer until I posted this thread.  Due to the discussion
> here, I took a few steps back and actually asked myself "why would
> this be row locks/blocks?" and WHY don't I see internal locking on the
> clob if the clob is the problem?
>
> At that point I was in 'question everything' mode and started really
> looking at the Event that the blocking session was sitting in (instead
> of just assuming it was a problem with design).
>
> The blocking sessions were ALWAYS sitting in EVENT: SQL*Net MORE DATA
> FROM CLIENT and WAIT_CLASS: NETWORK and the wait times were reaching
> upward of 2 minutes before the wait_time_micro restarted counting.
>
> When I saw that (and really looked at it), I was like that makes no
> sense. Why would a session be getting latency from the PHP server?
> (Why is it taking so long to send an object from the PHP server to the
> database?)
>
> Running strace on the sessions on both the db server and the php
> server showed the sessions just sitting in either read() or write() to
> the network socket and never completing.
>
> Large file transfers INTO the main db server would stall.  (Transfers
> OUT of the db server were fine) - this affected SQLNet, SCP, ncat
> etc.  Outgoing was fine, Inbound was terrible.
>
> Ultimately we discovered these 2 settings had been DISABLED on this
> box at some point (as its been around a while):
>
> /proc/sys/net/ipv4/tcp_sack
>
> /proc/sys/net/ipv4/tcp_timestamps
>
>
> Both of those were "0" whereas all our other db servers those were "1"
> and weren't experiencing any problem.
>
> After setting both of those to "1" , our sessions now complete and no
> longer block.
>
> I know some of you may be thinking "How could he miss all the network
> waits?"  - And that's a good question.  All I can say is sometimes you
> get "blinded" by assumptions and aren't really seeing the problem for
> what it is.
>
> I've known this a long, long time.  So, remember kiddos you may have
> to question 'what you know versus what you think you know' ;)
>
> Also, the very fact of sharing this problem with the you on the list
> gave me the ability to step back and look at it with a new mindset.
>
> Anyway, I wanted to share that as a testament (and a huge thank you)
> to the community we have here.
>
> Thanks,
> Chris
>
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Aug 31, 2018 at 11:48 AM Jonathan Lewis
> <jonathan_at_jlcomp.demon.co.uk <mailto:jonathan_at_jlcomp.demon.co.uk>> wrote:
>
>
> If the event is "enq TX: row lock wait" the p1/p2/p3 values will
> be about the TX lock in v$lock.  p2 = id1, p3 = id2, and p1 will
> encode "TX" and the lock mode.
> Select p1raw (if its available) or to_char(p1,'XXXXXXXXXXXXXXXX')
> and you'll get 00000000054580006 - or possibly a 4 on the end. If
> it's a mode 4 lock then that's a clue that it's not really about a
> table row and more likely to be about an index/IOT or some
> internal anomaly.
>
> If you've got the ASH data then current_obj# may give you the
> object_id of the object being accessed, but it's not entirely
> reliable.
>
> Regards
> Jonathan Lewis
>
> ________________________________________
> From: Stefan Knecht <knecht.stefan_at_gmail.com
> <mailto:knecht.stefan_at_gmail.com>>
> Sent: 31 August 2018 16:04:20
> To: Jonathan Lewis
> Cc: christopherdtaylor1994_at_gmail.com
> <mailto:christopherdtaylor1994_at_gmail.com>; ORACLE-L
> Subject: Re: Looking for ideas on blocked sessions updating
> partitioned table with CLOB
>
> You can also look at the p1/p2/p3 values of the sessions that are
> blocked by the TX contention - and see which object the contention
> is on. If it's the lobindex, that may lead to further clues about
> some of the possibilities raised by Jonathan
>
> Stefan
>
>
>
> On Fri, Aug 31, 2018 at 8:55 PM, Jonathan Lewis
> <jonathan_at_jlcomp.demon.co.uk
> <mailto:jonathan_at_jlcomp.demon.co.uk><mailto:jonathan_at_jlcomp.demon.co.uk
> <mailto:jonathan_at_jlcomp.demon.co.uk>>> wrote:
>
> There is a major problem with basicfile LOBs (with similar, but
> not so drastic symptoms appearing for securefile LOBs), so as a
> first strategic step you should probably be planning to find a
> painless way to move to securefile LOBs.
>
> I've written an entire system about a problem with basicfile LOBs
> when you do inserts and deletes - but the problem is the same if
> you do lots of updates because for a LOB an update is a delete
> followed by an insert (with the old LOB left in the LOB segment
> rather than being copied to the UNDO segment).  The way LOBs
> handle deletes is that the LOB index is a two-part index of which
> the first part indexes in time order the chunks of LOBs that have
> been deleted and the second part indexes by lobid the current LOB
> chunks.
>
> A problem appears when you try to insert a LOB and there isn't any
> free space in the segment, but there is plenty of freeable space.
> Your session will free up all the freeable space (deleting LOB
> index entries) as it goes.  This can take a very long time and
> while it's happening any other session that want to insert a LOB
> will wait for your session to finish freeing up the space.
>
> Unfortunately I've only see HW enqueues appearing as a result of
> this action, I don't think I've seen TX enqueues (and I wouldn't
> really expect to see the "enq TX - row lock contention" enqueue). 
> THe series start at this URL:
> http://jonathanlewis.wordpress.com/2017/01/26/basicfile-lobs/
>
> Different possibility - is the LOB declared with multiple
> freepools ? This is the default and it means the LOBindex has only
> the two parts I describe above.  If you have N freepools then the
> index consistents of N pairs of parts. This doesn't avoid the
> problem I've described above (your session will try to free ALL
> the free space from ALL the pools if it needs space), but having a
> single freepool may explain your observations.  If I insert a very
> large LOB PERHAPS you can't insert one until I finish insert
> LOBIndex entries into my bit of index, in which case maybe you'd
> show a row wait of some sort - though, again I'd expect a
> different wait to appear (maybe buffer busy).
>
> Another thought - your LOBs enable in-row storage:  how many of
> them are short enough to fit in the row (a few, lots, most) and do
> they get updated many times in situ, and are many of them likely
> to start short and then grow  a few times before getting too long
> ? How long are the rest of the columns in the row ? I'm thinking
> about the possibility that your smaller LOBs spend some time
> growing in-row and causing row migration before they get long
> enough to move into the LOB segment - and row migraion does funny
> things with ITLs. Maybe something odd happens as a row that's
> migrated moves its LOB to the LOB segment and the row migrates
> back to it's original block.
>
> ---
>
> In the absence of answers - a plan for the future might be to
> range partition by sess_id and hash subpartiiton by sess_id:  I'm
> assuming that sess_id is a value that's going to increase with
> time when I say that.  If that is the case then the benefit of
> doing this composite partitioning is that on a regular basis you
> get a new partition and all the action moves into it and any
> garbage that's happened in the earlier partition can be cleared up
> (or dropped).
>
> If you stick with basicfiles then look at freepools just in case
> it's relevant - securefile lobs automatically have a better
> strategy for concurrency but there is a parameter you can set of
> increase concurrency. (Can't remember which it is at present).
>
> Since I've mentioned the series - one feature of the mechanism is
> that the LOB segment can become much larger than it needs to be,
> so if you can sum() the sizes of the lobs that exceed the
> in-linesize and it's much smaller than segment size you'll know
> that your update mechanism is introducing some sort of problem.
>
> Regards
> Jonathan Lewis
>
>
>
>
>
>
>
>
>
>
>
> Regards
> Jonathan Lewis
>
>
>
>
> ________________________________________
> From: oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org><mailto:oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org>>
> <oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org><mailto:oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org>>> on behalf of Chris Taylor
> <christopherdtaylor1994_at_gmail.com
> <mailto:christopherdtaylor1994_at_gmail.com><mailto:christopherdtaylor1994_at_gmail.com
> <mailto:christopherdtaylor1994_at_gmail.com>>>
> Sent: 31 August 2018 14:23:46
> To: knecht.stefan_at_gmail.com
> <mailto:knecht.stefan_at_gmail.com><mailto:knecht.stefan_at_gmail.com
> <mailto:knecht.stefan_at_gmail.com>>
> Cc: ORACLE-L
> Subject: Re: Looking for ideas on blocked sessions updating
> partitioned table with CLOB
>
> Ah, very good questions.
>
> For the blocking issue:
> 1. Blocking Type = enq: TX - row lock contention, yet the SESS_IDs
> (PK) is different.
> 2. Basicfile LOBs (I assume these were carried over from an
> upgrade from a prev version)
>
> For the PHP object transfer being slow issue:
> 1. TNS settings:
>   db server:
>      sqlnet.ora
>      ---------------------------------
>      send_buf_size=2097152
>      recv_buf_size=2097152
>      default_sdu_size=32768
>      default_tdu_size=32768
>
>      tnsnames.ora
>      -----------------------------
>      Nothing on the tnsnames.ora
>
>    web/app server:
>       tnsnames.ora
>       -------------------------------
>       Nothing specified here either for SDU etc
>
> 2. OS = Red hat Linux 6.8 64-bit , kernel 2.6.32-642.el6.x86_64
>
> 3. Block Size = 8192
>
> Thanks,
> Chris
>
>
> On Thu, Aug 30, 2018 at 10:02 PM Stefan Knecht
> <knecht.stefan_at_gmail.com
> <mailto:knecht.stefan_at_gmail.com><mailto:knecht.stefan_at_gmail.com
> <mailto:knecht.stefan_at_gmail.com>><mailto:knecht.stefan_at_gmail.com
> <mailto:knecht.stefan_at_gmail.com><mailto:knecht.stefan_at_gmail.com
> <mailto:knecht.stefan_at_gmail.com>>>> wrote:
> When you say "blocking" - what event are the blocked sessions
> waiting on?
>
> Also, what's your TNS config - particularly SDU sizes between the
> mid-tier and the database (TNS connection string and the receiving
> listener)?
>
> Are you using securefile or basicfile LOBs?
>
> What OS is the database on and what block size are you using in
> the tablespace where the lobs are stored?
>
> On Fri, Aug 31, 2018 at 6:21 AM, Chris Taylor
> <christopherdtaylor1994_at_gmail.com
> <mailto:christopherdtaylor1994_at_gmail.com><mailto:christopherdtaylor1994_at_gmail.com
> <mailto:christopherdtaylor1994_at_gmail.com>><mailto:christopherdtaylor1994_at_gmail.com
> <mailto:christopherdtaylor1994_at_gmail.com><mailto:christopherdtaylor1994_at_gmail.com
> <mailto:christopherdtaylor1994_at_gmail.com>>>> wrote:
> Env: 12.1.0.2
>
> We have a table that stores session data (base64 encoded) from web
> sessions.
> The table has 32 HASH partitions and contains a CLOB.
>
> The table is hash partitioned by SESSION_ID which is a 64-char string.
>
> We have this code that executed in PHP into the Oracle database:
> UPDATE sessions_table
> SET session_data = :session_data  // (CLOB)
> WHERE session_id = :session_id;
>
> For 95% of the sessions this runs very,very fast.
>
> For 5% of the sessions that LOB being passed from PHP is 15MB and
> for some reason PHP is really slow about transmitting those
> session OBJECTS over to the database. (Verified through strace etc).
>
> While those sessions are waiting to complete the update to the
> CLOB, they end up BLOCKING other sessions that aren't trying to
> update the same session id but are in the same partition.
>
> I believe they're blocked on the CLOB on the same partition that
> the long running session has open as I've confirmed that the
> SESS_ID being updated is different in most cases.
>
> So, here's my question, what performance strategy should I be
> investigating for CLOBS.  Should I just add more partitions and
> spread out the likelihood that a session will end up in the same
> partition?  Seems logical but I'm not sure how CLOBS play into this.
>
> The CLOBS are stored like this:
> Segment Name = SYS_LOB0051640773C00002$$
> Index Name = SYS_LOB0051640773C00002$$
> Chunk = 8192
> PCTVERSION = 10
> CACHE = YES
> LOGGING = None
> Encrypt = None
> Compression = None
> IN_ROW = Yes
> Partition = Yes
> Retention = Yes
>
> Looking for any ideas on how to prevent a session from blocking
> other non-related sessions that are updating the CLOB.
>
> I'm trying to mitigate the blocking at the db layer while we stand
> up a product such as Redis to handle the session caching at the
> server level (and remove it from the db).
>
> Chris
>
>
>
>
>
>
>
> --
> //
> zztat - The Next-Gen Oracle Performance Monitoring and Reaction
> Framework!
> Visit us at zztat.net
> <http://zztat.net><http://zztat.net><http://zztat.net/> |
> _at_zztat_oracle | fb.me/zztat
> <http://fb.me/zztat><http://fb.me/zztat><http://fb.me/zztat> |
> zztat.net/blog/
> <http://zztat.net/blog/><http://zztat.net/blog/><http://zztat.net/blog/>
>
>
>
> --
> //
> zztat - The Next-Gen Oracle Performance Monitoring and Reaction
> Framework!
> Visit us at zztat.net <http://zztat.net><http://zztat.net/> |
> _at_zztat_oracle | fb.me/zztat
> <http://fb.me/zztat><http://fb.me/zztat> | zztat.net/blog/
> <http://zztat.net/blog/><http://zztat.net/blog/>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Sep 18 2018 - 16:41:21 CEST

Original text of this message