Re: Curiosity: single-column index on sparse data cannot be built in parallel

From: Charles Schultz <sacrophyte_at_gmail.com>
Date: Wed, 15 Jul 2015 12:27:37 -0500
Message-ID: <CAPZQniXBz8imSbqSwV6iY774QbuQZM1m4xChJyvpMrmW3rFNsQ_at_mail.gmail.com>



Jonathan, your blog post looks quite lucid and helpful - I'll start to utilize some of your observations in my ongoing testss. Thanks for wrapping it up in a way for the rest of us to see.

Just an update on the little mystery. It seems when I create the index during "off hours", the kernel allocates plenty of parallel slaves, but when I run after 9:00 am and before 4:00 pm local time, I see "Parallel operations downgraded 75 to 99 pct" (according to the AWR report). Correlating this to v$pq_tqstat, I see 302 rows during "off hours" (runs in 5 minutes) but only 8 rows during working hours (4 Producers, 2 Consumers, 1 Ranger and 1 Final QC Consumer), taking 90 minutes this morning.

It is possible this is pure coincedence. I will be doing more tests to see if this pattern holds up. One other factor is that this database is currently on ZFS (solaris) with a snapshot (copy on write), and the back-end SAN is shared across our enterprise for all developement work. I just find it exceptionally odd that this one index seems to be the sticking point.

Thanks for helping me think this out loud. :)

On Wed, Jul 15, 2015 at 6:29 AM, Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk
> wrote:

>
>
> Charles
>
> 302 rows in v$pq_tqstat is right for your parallel 100. If you check the
> blog that Mark referenced (
> https://jonathanlewis.wordpress.com/2015/07/15/pq-index-anomaly/) you'll
> see I ran parallel 4 but got 14 lines in the activity report: That's (2 x
> 4) + (1 * 4) + 1 (ranger) + 1 (final qc step).
>
> It's the v$pq_slave where I saw too many rows - but that may not be
> seeable in your case because you could have had a history of parallel
> execution which left an arbitrary number of slaves from other activities
> still available for use.
>
> I wonder if your comment about sparsity of data is on point. Perhaps
> there's a special case where the RANGER (which is the query coordinator)
> finds no data (or such a small amount of data) that it decides the query
> doesn't need to run parallel. (This seems unlikely, since the size of the
> table should still encourage it to do a parallel tablescan, but it's
> another possibility to bear in mind.)
>
>
> Regards
> Jonathan Lewis
> http://jonathanlewis.wordpress.com
> _at_jloracle
> ------------------------------
> *From:* Charles Schultz [sacrophyte_at_gmail.com]
> *Sent:* 15 July 2015 12:03
>
> *To:* Jonathan Lewis
> *Cc:* ORACLE-L
> *Subject:* Re: Curiosity: single-column index on sparse data cannot be
> built in parallel
>
> Jonathan,
>
> Thanks for the continued feedback, this is excellent information and
> insight. I will be running more tests today, since my test late last night
> proved that sometimes the index can indeed be built using parallel slaves.
> As you pointed out, sometimes my query against v$pq_tqstat returns way "too
> many" rows - even though I only ever request parallel 100, sometimes I get
> 302 rows returned. I have not yet analyzed the rows (but I did save them in
> an output file).
>
> And yes, we are licensed for AWR, and I have started to take snapshots
> between each test to better isolate the stats. Once I reproduce the problem
> (that's always fun, trying to get a bad run), I will take a closer look at
> those "parallel downgraded" messages. Your observation about most slaves
> having nothing to do does make sense, but I have a hard time explaining why
> it runs in 4 with parallel slaves and 50+ minutes with no apparent parallel
> slaves running.
>
> More to follow later in the day. Thanks again.
>
> On Tue, Jul 14, 2015 at 11:48 PM, Jonathan Lewis <
> jonathan_at_jlcomp.demon.co.uk> wrote:
>
>>
>> Charles,
>>
>> Apologies for the delay - I had to shut down for the day before your
>> reply came through.
>>
>> I've written a blog note about the anomaly - planned to re-read it and
>> polish it a little this morning.
>>
>> Your non-null data is 3% of your index, and a parallel build would mean
>> RANGE distribution of the data, so if your build ran anything less than
>> parallel 33 (approx) the point would come where only one slave (in the
>> second set) would be doing anything because it would be the one that got
>> all the real data and had to sort and build. That's why you (I believe)
>> you might see a point where only one process seemed to be doing any work.
>> There is still benefit in running parallel, of course, since you have to
>> scan and discard the 97% of the data that is NULL and can't go into the
>> index.
>>
>> The anomaly is that when I examined v$pq_tqstat after creating my 310,000
>> row index from my 10,000,000 row table (running parallel 4) the first set
>> of slaves had passed 10 million rows to the second set of slaves whiich
>> means they must have been passing all the (null, rowid) entries rather than
>> discarding them.
>>
>>
>> Are you licensed to dig back into the ASH history ? Can you find out if
>> the one running process was a Pnnn process or a normal shadow process ? If
>> it was a normal shadow the problem might simply be that at the moment the
>> create index started all the PX slaves were involved in building other
>> indexes. If you can run an AWR report for the time around the start of
>> build you might get a clue from the statistics about "Parallel operations
>> downgraded to ..."
>>
>>
>>
>>
>>
>> Regards
>> Jonathan Lewis
>> http://jonathanlewis.wordpress.com
>> _at_jloracle
>> ------------------------------
>> *From:* Charles Schultz [sacrophyte_at_gmail.com]
>> *Sent:* 15 July 2015 01:05
>> *To:* Jonathan Lewis
>> *Cc:* ORACLE-L
>> *Subject:* Re: Curiosity: single-column index on sparse data cannot be
>> built in parallel
>>
>> I didn't check $pq_tqstat, but I show results when I next do the
>> test. I say it was not running in parallel because only one session was
>> active via Enterprise Manager (yeah, I was being lazy, I know) and the
>> creation took over 1 hour the first test, 51 minutes the next test, whereas
>> all my other indexes (mix of local and non-partitioned) on the same table
>> were taking less than 20 minutes (some less than 10).
>>
>> Jonathan, can you elaborate a little more on the "interesting anomaly"?
>> :) It is entirely possible I am hitting that, but would like to learn more.
>>
>> As far as parallel 100, I was just trying to open the doors to see
>> where my bottlenecks are. The extent sizes are included in the index ddl
>> (5m); again, I don't have issues with the other indexes, but there is
>> something about this being on a single column, so you may be on the right
>> track with large extents and small volumes of data per slave. You are
>> making me curious now. :) I love the anthromorphic angle on "refusing to
>> cooperate". *grin*
>>
>> PS - in case it is not obvious, this is a vendor supplied table and
>> index.
>> PS2 - tests currently executing on Sun T5440, for those that care about
>> such things.
>> PS3 - as to "why"; I got a lot of indexes to build in a short time, so I
>> am looking for the bounds. :)
>>
>>
>> On Tue, Jul 14, 2015 at 4:12 PM, Jonathan Lewis <
>> jonathan_at_jlcomp.demon.co.uk> wrote:
>>
>>>
>>>
>>> I'm not going to try for 500M rows, but I just built a table with 10M
>>> rows with your pattern of data and ran a parallel create index.
>>>
>>> There is an interesting anomaly to the create index (11.2.0.4) that
>>> might make the build look like a serial build - how are you showing that
>>> the build isn't parallel ?
>>> I queried v$pq_tqstat after the build.
>>>
>>> Parallel 100 seems a little optimistic for such a "small" index - 17M
>>> rows at about 13 bytes per row gives about 220MB or 2MB per slave to build.
>>> What's your default extent size - perhaps there's something about large
>>> extents and small volumes of data per slave that makes Oracle refuse to
>>> co-operate.
>>>
>>> Regards
>>> Jonathan Lewis
>>> http://jonathanlewis.wordpress.com
>>> _at_jloracle
>>> ------------------------------
>>> *From:* oracle-l-bounce_at_freelists.org [oracle-l-bounce_at_freelists.org]
>>> on behalf of Charles Schultz [sacrophyte_at_gmail.com]
>>> *Sent:* 14 July 2015 21:59
>>> *To:* ORACLE-L
>>> *Subject:* Curiosity: single-column index on sparse data cannot be
>>> built in parallel
>>>
>>> Good day,
>>>
>>> I am trying to find the technical reason for why Oracle cannot use
>>> parallel slaves to build a single-column index on a sparse column with few
>>> distinct values:
>>>
>>>
>>> F COUNT(*)
>>> - ----------
>>> 538937561
>>> Y 51464
>>> N 17486819
>>>
>>>
>>> Just by playing around, I discovered that if I put this column as the
>>> leading edge on an index with many columns, it can be built in parallel.
>>>
>>> According to the relevant documentation
>>> <http://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel007.htm#i1009131>,
>>> I would expect that the base table is sampled, but perhaps the random
>>> sampling returns 0 keys due to the sparse nature of the column?
>>>
>>>
>>> Facts:
>>> Oracle Enterprise Edition 11.2.0.4
>>> table is partitioned, but not on this key (lol)
>>> parallel_max_servers=3600
>>>
>>> ddl extracted via datapump:
>>> CREATE INDEX "FIMSMGR"."FGBTRND_ENCD_INDEX" ON "FIMSMGR"."FGBTRND"
>>> ("FGBTRND_DEFER_GRANT_IND")
>>> PCTFREE 10 INITRANS 2 MAXTRANS 255
>>> STORAGE(INITIAL 5242880 NEXT 5242880 MINEXTENTS 1 MAXEXTENTS
>>> 2147483645
>>> PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
>>> BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
>>> TABLESPACE "FIN_LARGE_INDX" PARALLEL 100 ;
>>>
>>> --
>>> Charles Schultz
>>>
>>
>>
>>
>> --
>> Charles Schultz
>>
>
>
>
> --
> Charles Schultz
>

-- 
Charles Schultz

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jul 15 2015 - 19:27:37 CEST

Original text of this message