MIME-Version: 1.0
References: <CAHSa0M2jh6f+2PFDcOQy++asC5VD-Nd9iAbPHNAribaB2OoS8Q@mail.gmail.com>
In-Reply-To: <CAHSa0M2jh6f+2PFDcOQy++asC5VD-Nd9iAbPHNAribaB2OoS8Q@mail.gmail.com>
From: Jonathan Lewis <jlewisoracle@gmail.com>
Date: Sun, 2 Aug 2020 16:26:19 +0100
Message-ID: <CAGtsp8kxs3UYcwO+y-SVxLxi3kab7dCn0wS45mUAva5bd3vwag@mail.gmail.com>
Subject: Re: Hash to range partition
To: ORACLE-L <oracle-l@freelists.org>
Content-Type: multipart/alternative; boundary="0000000000007ed60805abe6a5b1"
Sender: oracle-l-bounce@freelists.org
Errors-to: oracle-l-bounce@freelists.org
Precedence: normal
Reply-To: jlewisoracle@gmail.com
--0000000000007ed60805abe6a5b1
Content-Type: text/plain; charset="UTF-8"

 You said all your indexes were global (including the unique index on the
partition key).
You've also said that data access is mostly by ID from recent months, but
haven't given any indication of what the queries might look like - whether
it's "id between 10000000 and 11000000" and whether it's only on equality.

Note - the "hash partition to avoid index contention on sequences"
guideline is about globally hash partitioning the index to avoid contention
on highly concurrent inserts - you've got the "opposite" construction a
global index with hash partitioned tables !

If Oracle is going to visit a row by ID it will do an index probe and visit
the table - it's not going to visit all the partitions. The only difference
you would get by range partitioning on ID is that a local index on PK might
have blevel 3  instead of blevel 4, which would give
one less buffer visit and (MAYBE) one less physical read to get to the
row.  But you could hash partition the index to the same effect.

If you are doing large index range scans you will be visiting all 32
partitions on a single segment index range scan, but you'd still do the
same number of table blocks visits and (except for very small range scans)
reads, and you'd still be caching the same fraction of the table (the
"recent months".   The most significant effect of range partitioning with
local indexing is that the (default) clustering factor of the ID index
would drop because rows with consecutive IDs would tend to be in adjacent
blocks rather than rotating around 32 blocks from hash partitions - which
means seeing tablescans prematurely when you're expecting to see index
range scans.  You could avoid this with the global index simply by setting
the table preference "table cache blocks" to a value greater than 32 (I'd
go for the maximum of 255 in your cast) when gathering index stats. See:
https://jonathanlewis.wordpress.com/2013/05/09/clustering_factor-2/ and
follow the links to Richard Foote's  blog on the topic.

(I've assumed your overnight load is in big batches, low level of
concurrency, with a batch having a set of sequential IDs - variations from
that pattern will have an impact on the analysis).

Regards
Jonathan Lewis


On Thu, Jul 30, 2020 at 11:17 PM Ram Raman <veeeraman@gmail.com> wrote:

> All,
>
> We have a 1.4B row table that is hash partitioned by a key (ID). The key
> is increasing monotonically and seems to be coming from a sequence. The
> loads happen nightly and those are the only time the table is inserted to;
> during the day time only SELECTs run against the table.
>
> The table has 4 indexes on it, including one on the ID column. None of the
> indexes are partitioned (!)
>
> table t:
> --------
>
> ID NUMBER(16)
> ...
> EFFDT DATE
> ..
>
> Unique Index on ID column. Two other single column indexes on 2 number
> columns and one single column index on a date column.
>
> Queries that run in the day time typically access past several months of
> data from the table by the ID key mostly. It is accessed via other columns
> as well, but that is less than 20% of the time. There are about 10M rows
> per month and we have 32 hash partitions. No INSERTS in the day time.
>
> Since the load is happening nightly and there are only queries in the day
> time, I am considering testing out range partitioning the table rather than
> hash partitioning. It seems hash partitioning is recommended for keys based
> on sequences to avoid index contention during inserts. However, with the
> hash based approach the queries are scanning all partitions having to go
> through all 1.4B rows for a few months of records. If I go with range
> partitioning (one per month) on the ID key and if the queries access past
> few months of data only, I feel we can see a substantial performance
> improvement with queries only having to visit a fraction of the partitions
> and rows.
>
> I am also planning on creating a locally partitioned index on the ID
> column and also partitioned indexes on the other 3 columns as well. Not
> sure if it will increase the load time. One option for the indexes is that
> the locally partitioned index will have the new partition created as needed
> and the remaining 3 indexes dropped and rebuilt nightly. Will this be
> faster than the index getting updated during the daily loads?
>
> Can the listers share their thoughts on moving to range partition and
> indexes during load.
>
> Ram.
> --
>
>
>

--0000000000007ed60805abe6a5b1
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">
<div>You said all your indexes were global (including the unique index on t=
he partition key).</div><div>You&#39;ve
 also said that data access is mostly by ID from recent months, but=20
haven&#39;t given any indication of what the queries might look like -=20
whether it&#39;s &quot;id between 10000000 and 11000000&quot; and whether i=
t&#39;s only on
 equality.</div><div><br></div><div>Note - the &quot;hash partition to avoi=
d=20
index contention on sequences&quot; guideline is about globally hash=20
partitioning the index to avoid contention on highly concurrent inserts -
 you&#39;ve got the &quot;opposite&quot; construction a global index with h=
ash=20
partitioned tables !</div><div><br></div><div>If Oracle is going to=20
visit a row by ID it will do an index probe and visit the table - it&#39;s=
=20
not going to visit all the partitions. The only difference you would get
 by range partitioning on ID is that a local index on PK might have=20
blevel 3=C2=A0 instead of blevel 4, which would give <br></div><div>one les=
s=20
buffer visit and (MAYBE) one less physical read to get to the row.=C2=A0 Bu=
t=20
you could hash partition the index to the same effect.</div><div><br></div>=
<div>If
 you are doing large index range scans you will be visiting all 32=20
partitions on a single segment index range scan, but you&#39;d still do the=
=20
same number of table blocks visits and (except for very small range=20
scans) reads, and you&#39;d still be caching the same fraction of the table=
=20
(the &quot;recent months&quot;.=C2=A0=C2=A0 The most significant effect of =
range=20
partitioning with local indexing is that the (default) clustering factor
 of the ID index would drop because rows with consecutive IDs would tend
 to be in adjacent blocks rather than rotating around 32 blocks from=20
hash partitions - which means seeing tablescans prematurely when you&#39;re=
=20
expecting to see index range scans.=C2=A0 You could avoid this with the=20
global index simply by setting the table preference &quot;table cache block=
s&quot;
 to a value greater than 32 (I&#39;d go for the maximum of 255 in your cast=
)
 when gathering index stats. See: <a href=3D"https://jonathanlewis.wordpres=
s.com/2013/05/09/clustering_factor-2/" target=3D"_blank">https://jonathanle=
wis.wordpress.com/2013/05/09/clustering_factor-2/</a> and follow the links =
to Richard Foote&#39;s=C2=A0 blog on the topic.</div><div><br></div><div>(I=
&#39;ve
 assumed your overnight load is in big batches, low level of=20
concurrency, with a batch having a set of sequential IDs - variations=20
from that pattern will have an impact on the analysis).</div><div><br></div=
><div>Regards</div><font color=3D"#888888"><div>Jonathan Lewis</div><div><b=
r></div></font>

</div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">=
On Thu, Jul 30, 2020 at 11:17 PM Ram Raman &lt;<a href=3D"mailto:veeeraman@=
gmail.com">veeeraman@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D=
"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(2=
04,204,204);padding-left:1ex"><div dir=3D"ltr"><div>All,</div><div><br></di=
v>We have a 1.4B row table that is hash partitioned by a key (ID). The key =
is increasing monotonically and seems to be coming from a sequence. The loa=
ds happen nightly and those are the only time the table is inserted to; dur=
ing the day time only SELECTs=C2=A0run against the table. <br><br>The table=
 has 4 indexes on it, including one on the ID column. None of the indexes a=
re partitioned (!)<br><br>table t:<br>--------<br><br>ID 	NUMBER(16)<br>...=
<br>EFFDT	DATE<br>..<br>=C2=A0=C2=A0<br>Unique Index on ID column. Two othe=
r single column indexes on 2 number columns and one single column index on =
a date column.<br>=C2=A0=C2=A0<br>Queries that run in the day time typicall=
y access past several months of data from the table by the ID key mostly. I=
t is accessed via other columns as well, but that is less than 20% of the t=
ime. There are about 10M rows per month and we have 32 hash partitions. No =
INSERTS in the day time.<br>=C2=A0 <br>Since the load is happening nightly =
and there are only queries in the day time, I am considering=C2=A0testing o=
ut range partitioning the table rather than hash partitioning. It seems has=
h partitioning is recommended for keys based on sequences to avoid index co=
ntention during inserts. However, with the hash based approach the queries =
are scanning all partitions having to go through all 1.4B rows=C2=A0for a f=
ew months of records. If I go with range partitioning (one per month) on th=
e ID key and if the queries access past few months of data only, I feel we =
can see a substantial performance improvement with queries only having to v=
isit a fraction of the partitions and rows.=C2=A0<div>=C2=A0=C2=A0</div><di=
v>I am also planning on creating a locally partitioned index on the ID colu=
mn and also partitioned indexes on the other 3 columns as well. Not sure if=
 it will increase the load time. One option for the indexes is that the loc=
ally partitioned index will have the new partition created as needed and th=
e remaining 3 indexes dropped and rebuilt nightly. Will this be faster than=
 the index getting updated during the daily loads?<div>=C2=A0=C2=A0<br>Can =
the listers share their thoughts on moving to range partition=C2=A0and inde=
xes during load.=C2=A0<br clear=3D"all"><div>=C2=A0=C2=A0</div><div>Ram.</d=
iv>-- <br><div dir=3D"ltr"><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div=
 dir=3D"ltr"><div><div dir=3D"ltr"><div>=C2=A0</div><div><br></div></div></=
div></div></div></div></div></div></div></div></div></div>
</blockquote></div>

--0000000000007ed60805abe6a5b1--
--
http://www.freelists.org/webpage/oracle-l