RE: Full table scan -- uniform extent allocation - extent allocation map issue?

From: Clay Jackson (cjackson) <"Clay>
Date: Tue, 26 Nov 2019 17:19:05 +0000
Message-ID: <MWHPR19MB0141D61A9E649B92A269C4909B450_at_MWHPR19MB0141.namprd19.prod.outlook.com>





Some random thoughts –

As other folks have already pointed out, there are several things that seem to make this a bit of an “edge case” - Uneven segment sizes and large volume of allocation and deallocation. At a recent User Group, one of the Oracle PMs offered the thought that one of Oracle’s greatest challenges these days was “making then best of really bad situations”. I suspect that’s true in this case.

Consider – doing a full table scan is already a “bad situation”; except in OTHER edge cases, like the classic case of a very small, frequently accessed table that can be kept entirely in memory. As the original writer suggests, the “name of the game” is minimizing “work”. So, given that, I think Oracle has “done the right thing” by simply reading blocks in allocation order (close to a sequential scan). Doing anything else, like trying to read in “L1” order, is ALWAYs going to imply EXTRA work.

So, IMHO, Oracle’s got it right here.

Clay Jackson
Database Solutions Sales Engineer
clay.jackson_at_quest.com<mailto:clay.jackson_at_quest.com> office 949-754-1203 mobile 425-802-9603 [cid:image001.png_at_01D5A43A.8C19FEF0]

From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> On Behalf Of Vishnu Potukanuma Sent: Tuesday, November 26, 2019 8:30 AM To: jonathan_at_jlcomp.demon.co.uk
Cc: Oracle L <oracle-l_at_freelists.org> Subject: Re: Full table scan -- uniform extent allocation - extent allocation map issue?

CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.

Hi Jonathan,

The trace file indicate for autoallocate - yes.. i must have pasted the old one when i was testing auto allocate (in auto allocate this issue diminishes as the extent sizes become large)

Oracle allocates extents to a segment at any allocation in the data file (uniform size) when the demand for space arises which is logical thing but as oracle reads extents in the order they are allocated during the full table scans rather than where they are stored, the IO becomes random rather than sequential given time when extent sizes are low and the workload mainly contains frequent segment allocation and deallocation and that if other sessions doing index range scans saturate IOPS at the storage layer especially HDDs as these are random in nature, reduces the overall throughput of the full table scan...

I mean when the full table scans are slow, usually we end up looking at various things, but could be a factor in very rare cases (OLTP) and could be common in warehouses where segments are frequently dropped or created, and this random extent allocation can grow right under our nose and we never detect it. There are various other scenarios, edge cases i am looking at, but i am really interested as to why a session blindly reads the extents in order they are allocated rather than in the order they are stored based on L1DBA.

Please give me sometime (have to get dinner before its too late), i will send the complete test case with which this thing can be replicated.

Thanks,
Vishnu

On Tue, Nov 26, 2019 at 8:58 PM Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk<mailto:jonathan_at_jlcomp.demon.co.uk>> wrote: insert into table10 select * From table1; drop table1;

The extent information you've supplied shows us that you've created the final table in a tablespace using autoallocate - the early extents are 64KB, the next 1MB, the last few you've shown are 8MB.

If you want to supply a test case that has a chance of reproducing then you need to tell us WHAT size uniform size, preferably giving us a script that allows us change a couple of names (tablespace / file) and create the tablespace.

  • create 10 tables with 30 mb in size.

Needs one sample Create table script - did you create it with "initial 30M" (obviously not); did you create it with "create as select ...", did you do anything odd with pctfree; how did you populate it. Do you need the tables to allocate 30MB, or should the highwater mark be pushed to 30MB as well.

  • insert into table10 select * From table1;

That's probably going to work - because we'd probably assume you wanted all 10 tables to look alike.

  • drop table1;

That should work, but do you have your recyclebin enabled - that may affect where the next insert puts the data (especially if some doesn't follow your instruction that the tablespace should be 350MB).

  • similarly perform the same randomly

If my results don't match your results, might that be because your results are highlighting a flaw in your system, or because I didn't choose the same random order as you did ? Don't supply a "random" test if you want to demonstrate a reproducible anomaly. Tell us the exact order you used to insert and drop.

  • interestingly the fix is to read the extents one after the other based on the L1 DBA rather than the order in which they are allocated to the segment

Are you saying you think this is what Oracle does, or what you think Oracle should do ?

Regards
Jonathan Lewis



From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org> <oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>> on behalf of Vishnu Potukanuma <vishnupotukanuma_at_gmail.com<mailto:vishnupotukanuma_at_gmail.com>> Sent: 26 November 2019 14:12
To: Oracle L
Subject: Full table scan -- uniform extent allocation - extent allocation map issue?

Hi,

I have observed an interesting behaviour today, while working on other aspects and my hunch appears to be true ,,, I will start by saying that there are too many variables and scenarios where things can be optimal but i am talking about the worst case scenario where the segments are frequently created and dropped in a tablespace.

This works by exploiting certain aspects of the database such as how oracle maintains extent map in the segment header and how the extent allocation is done or how the full table scan reads the segment. Inherently in this case there is nothing we can do as there are too many variables. The impact may be small in terms of the overall response times but this typically reduces the lifespan of the disks as the disk spindles ends up moving from higher to lower tracks and vice versa over and over again... instead of just in a simple arc from higher tracks to lower tracks. in worst case this could add delay when the segment is very large and if the extent sizes are small (this probability of this occuring is very less but there is still a chance).

when it comes to oracle while performing full table scans, the process starts by reading the extent map and starts reading extents one by one until the last extent or high water mark is reached... but what if the extents allocated to the segment are stored in random areas just like clustering_factor? interestingly the fix is to read the extents one after the other based on the L1 DBA rather than the order in which they are allocated to the segment, in most cases, oracle does a pretty good job allocating extents close to each other to the segment, but things can get messy.

 this is the simplest way to explain and this goes gradually from here and we can make however we like it. create a tablespace with 350mb in size and uniform extent allocation, create 10 tables with 30 mb in size.
insert into table10 select * From table1; drop table1;
similarly perform the same randomly so that the final end result is only one table. here we are just simulating a case where the subsequent extents allocated to the segment are on different regions randomly in a datafile, thus their locations are on the disk as well (which could be or could not be based on several factors such as how fragmented the disks are etc, i wont go in that space as it adds too much complexity).. here we perform a full table scan, oracle reads the segment header to read the extent map and reads the extents one by one...

Notice the block numbers in the traces.

WAIT #139711646963136: nam='db file sequential read' ela= 612 file#=5 block#=71810 blocks=1 obj#=75252 tim=248445129880
WAIT #139711646963136: nam='db file scattered read' ela= 763 file#=5 block#=71811 blocks=5 obj#=75252 tim=248445130757
WAIT #139711646963136: nam='db file scattered read' ela= 771 file#=5 block#=71816 blocks=8 obj#=75252 tim=248445131732
....
....
WAIT #139711646963136: nam='db file scattered read' ela= 1941 file#=5 block#=44736 blocks=64 obj#=75252 tim=248445270874
WAIT #139711646963136: nam='db file scattered read' ela= 1788 file#=5 block#=44802 blocks=62 obj#=75252 tim=248445273696
WAIT #139711646963136: nam='db file scattered read' ela= 1902 file#=5 block#=44864 blocks=64 obj#=75252 tim=248445276573
WAIT #139711646963136: nam='db file scattered read' ela= 1756 file#=5 block#=44930 blocks=62 obj#=75252 tim=248445279335
....
WAIT #139711646963136: nam='db file scattered read' ela= 3078 file#=5 block#=132 blocks=128 obj#=75252 tim=248445887737
WAIT #139711646963136: nam='db file scattered read' ela= 3045 file#=5 block#=260 blocks=124 obj#=75252 tim=248445893065
WAIT #139711646963136: nam='db file scattered read' ela= 3399 file#=5 block#=384 blocks=128 obj#=75252 tim=248445898641
....
...
WAIT #139711646963136: nam='db file scattered read' ela= 12269 file#=5 block#=97540 blocks=124 obj#=75252 tim=248446182677
WAIT #139711646963136: nam='db file scattered read' ela= 17440 file#=5 block#=97664 blocks=128 obj#=75252 tim=248446202405
WAIT #139711646963136: nam='db file scattered read' ela= 3282 file#=5 block#=97792 blocks=128 obj#=75252 tim=248446208027
...
...
WAIT #139711646963136: nam='db file scattered read' ela= 3279 file#=5 block#=61058 blocks=128 obj#=75252 tim=248446281696
WAIT #139711646963136: nam='db file scattered read' ela= 3115 file#=5 block#=61186 blocks=126 obj#=75252 tim=248446287141
WAIT #139711646963136: nam='db file scattered read' ela= 2990 file#=5 block#=61312 blocks=128 obj#=75252 tim=248446292359
WAIT #139711646963136: nam='db file scattered read' ela= 3109 file#=5 block#=61440 blocks=128 obj#=75252 tim=248446297711
....
...
WAIT #139711646963136: nam='db file scattered read' ela= 3125 file#=5 block#=7812 blocks=128 obj#=75252 tim=248446303109
WAIT #139711646963136: nam='db file scattered read' ela= 3246 file#=5 block#=7940 blocks=124 obj#=75252 tim=248446309728
WAIT #139711646963136: nam='db file scattered read' ela= 3180 file#=5 block#=8064 blocks=128 obj#=75252 tim=248446315169

the L1 DBA extent map:
  Auxillary Map


   Extent 0     :  L1 dba:  0x01411880 Data dba:  0x01411883
   Extent 1     :  L1 dba:  0x01411880 Data dba:  0x01411888
   ....
   Extent 35    :  L1 dba:  0x0140ae80 Data dba:  0x0140ae82
   Extent 36    :  L1 dba:  0x0140af00 Data dba:  0x0140af02
   Extent 37    :  L1 dba:  0x0140af80 Data dba:  0x0140af82
   ...
   Extent 38    :  L1 dba:  0x0140b000 Data dba:  0x0140b002
   Extent 39    :  L1 dba:  0x0140b080 Data dba:  0x0140b082
   Extent 40    :  L1 dba:  0x0140b100 Data dba:  0x0140b102
   Extent 41    :  L1 dba:  0x0140b180 Data dba:  0x0140b182
   Extent 42    :  L1 dba:  0x0140b200 Data dba:  0x0140b202
   Extent 43    :  L1 dba:  0x0140b280 Data dba:  0x0140b282
   ...
   ...
   Extent 92 : L1 dba: 0x01417c80 Data dba: 0x01417c84    Extent 93 : L1 dba: 0x01418080 Data dba: 0x01418084    ...
   Extent 94    :  L1 dba:  0x0140ee80 Data dba:  0x0140ee82
   Extent 95    :  L1 dba:  0x01401e80 Data dba:  0x01401e84
   Extent 96    :  L1 dba:  0x01402280 Data dba:  0x01402284
   Extent 97    :  L1 dba:  0x01402680 Data dba:  0x01402684
   Extent 98    :  L1 dba:  0x01402a80 Data dba:  0x01402a84

When it comes to index range scan things are as expected, as the rowids are sorted it reads all the blocks in one arc (i am not talking about prefetching, or batched (db file parallel read which exploits IOPS at a storage layer but a very simplistic case with all prefetching etc turned off) but for full table scans things appear to be different as it simply following the extent map and not in an order of their L1 DBA,

maybe this can be explained by something, but I am still not be able to figure out any other valid reason as to why we should not perform a full table scan based on the storage order extent map with the exception of last extent. Please let me know if I have missed anything?

Thanks,
Vishnu
--
http://www.freelists.org/webpage/oracle-l<https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.freelists.org%2Fwebpage%2Foracle-l&data=02%7C01%7Cclay.jackson%40quest.com%7C5f05abd5c0894eb59b0108d7728e1318%7C91c369b51c9e439c989c1867ec606603%7C0%7C0%7C637103826849761206&sdata=gFpqKs1cKbH7KPEQPvX41RzD%2BtvxBcqpDXhH6wy%2BK3w%3D&reserved=0>



--
http://www.freelists.org/webpage/oracle-l


Received on Tue Nov 26 2019 - 18:19:05 CET

Original text of this message