Skip navigation.

Doug Burns

Syndicate content
Updated: 8 hours 53 min ago

Recurring Conversations – Incremental Statistics (Part 3)

Wed, 2014-03-05 10:26
As Connor pointed out in his comment on the last post, Incremental Statistics might still turn out to be a bit trickier than you thought once you start using them on many or very large tables, even once you've got the basics right. 

Which is why there's still a place for the detailed technical blog posts, once you've got the basics clear. The problem with not having the basics clear is that I've had people tell me things about Incremental Stats based on the blog posts and, when I read them, they make sense to me but I can also see why they might confuse others. Anyway, for completeness, here are some terrific posts about Incremental Stats.

Randolf Geist always has excellent stuff, as does John Hallas and, of course, when Maria Colgan was the Optimizer PM, she used to write about this stuff all the time.

Read blogs, by all means, but maybe start off with White Papers and the documentation and full presentations?

Recurring Conversations – Incremental Statistics (Part 2)

Mon, 2014-03-03 11:30

In the first part, I explained that Incremental Statistics are designed to allow a partitioned tables Global Statistics to be updated based on a combination of

1) Statistics gathered by analysing the contents of one or more partitions that have just been loaded or have been updated (and see this blog post for more depth on what 'updated' means!)

2) The statistics of existing partitions which are represented by synopses that are already stored in the SYSAUX tablespace.

Using this combination, we can avoid scanning every partition in the table every time we want to update the Global Stats which is an expensive operation that is likely to be unfeasible every time we load some data into a large table.

For me, the key word here is Incremental. Global Statistic updates are an incremental process, building on previous statistics (represented by the synopses) and only updating the Global Statistics based on the changes introduced by loading new partitions.

Understanding this might clear up another area of confusion I keep coming across. After upgrading their database to 11g, people often want to try out Incremental Global Stats on one of their existing large tables because they've always struggled to keep their Global Stats consistent and up to date. Maybe it's just the sites I work at but I'd say this is the most popular use case. Incrementals for a planned large partitioned table in your new systems might be a sensible idea, but there are a lot more existing systems out there with Global Stats collection problems that people have struggled with for years.

Most people I've spoken to initially had the impression that they simply flick the INCREMENTAL switch and perhaps modify some of the parameters to their existing DBMS_STATS calls so that GRANULARITY is AUTO and they use AUTO sampling sizes. All of which is discussed in the various white papers and blog posts out there.

Then they get a hell of a surprise when the very first gather runs for ages! How long is ages? I don't know in your particular case but I've seen this running for hours and hour and hours and people are crying in to their keyboards wondering why something that was supposed to make things run more quickly is so much slower than their usual stats calls.

The best way I've found to explain this phenomenon is to concentrate on the synopses that describe the existing partitions. Where do you think they come from? How are they calculated and populated if you don't ask Oracle to look at the existing data in your enormous table? That's what needs to happen. In order to make future updates to your Global Stats much more efficient, we first need to establish the baseline describing your existing data that Oracle will use as the foundation for the later incremental updates.

Generating the synopses as the baseline for future improvements will be a relatively painful for the largest tables (if it wasn't, you probably wouldn't be so interested in Incrementals ;-)), but it does only have to happen once. You just need to understand that it does have to happen and plan for it as part of your migration.

My personal suggestion is usually to just delete all of the existing stats and start from scratch with modern default parameter values and tidy up any stats-related junk that might be lingering around large, critical tables. Painful but probably worth it!

Recurring Conversations – Incremental Statistics (Part 1)

Wed, 2014-01-29 12:00
When I first started blogging, most of the material came from issues that I'd run into and how they were solved but, as I've spent most of the last 7 months not logging in to anything much ;-), it occurred to me that another area which I haven't used for a long time is what I'll call recurring conversations.

Even though my blogging has slowed to a crawl, I still spend a lot of time having similar conversations with multiple people and on work chat channels on the same topics which indicates to me that those topics are not well understood. Because I work with a lot of smart people, it's typically not the details that they struggle with. They've read detailed technical blog posts and have performed multiple web searches so have read the most detailed material available and yet somehow they're missing the *point*. Maybe that's the problem with learning everything via blogs and white papers? That it's no substitute for someone explaining the fundamental concepts and design of features? I could probably rephrase that as, maybe there's no substitute for actually reading a book or attending a course occasionally? I appreciate how out-of-date that view might be though.

I'm sure some will have already realised that Recurring Conversations could probably be called Frequently Asked Questions, so let me begin this first post with

'Why are my Global Statistics taking longer to gather when I use Oracle's snazzy 11g Incremental Global Statistics feature than when I don't?'

This has baffled a lot of people I know because I'm not sure they understand fully why the feature was introduced. They want to convert one of their existing partitioned tables to use Incremental Global Statistics and so they test the performance by doing something like this.

1) Delete all of the stats on a large partitioned table.
2) Set INCREMENTAL to FALSE and then gather table stats using GRANULARITY =>'GLOBAL '
3) Set INCREMENTAL to TRUE and then gather table stats using GRANULARITY =>'GLOBAL '

When they time this they find that 3 takes just as long as 2 and, in fact, it takes a little longer! This is useless? What is the point of this new feature if it doesn't speed up the gathering of Global stats?

First I want to look at what we asked Oracle to do in steps 2) and 3) above.

2) Visited all of the partitions of the table to gather information and then update the Global stats on the table.
3) Visited all of the partitions of the table to gather information, update the Global stats on the table and generate synopses for future use.

On that basis, why *wouldn't* option 3 take longer than option 2? They do more or less the same thing but 3) has to do a little additional work.

So if it isn't quicker to gather Global Stats using Incremental Global Statistics, why would you use it?

The benefits don't come from the initial gathering of Global Stats but when you gather stats on new Partitions and *don't* need to gather Global Stats any more. Instead Oracle uses those handy synopses to update them which is a much quicker operation! The Real World cycle of use then looks like this.

1) Delete all of your existing table stats.
2) Set INCREMENTAL to TRUE.
3) Gather table stats using GRANULARITY =>'GLOBAL  AND PARTITION' without supplying  a PARTNAME. This will re-gather all of your Global and Partition stats across the table and build the initial synopses. Note that at this stage you have achieved no reductions in stats collection times.
4) As you load partitions with new data or the data changes and you need to update your stats, use one of a number of options but the one I tend to use is to gather the stats on each specific partition using GRANULARITY=>' GLOBAL AND PARTITION'  with PARTNAME set to the name of the partition we've just loaded. Oracle will now gather Partition stats on just the one Partition and update the Global Stats and the synopses based on the new data that's been introduced.

Bingo – you've just maintained accurate Global Stats without having to trawl through the entire table again!

That's the point.

It's about *not* gathering Global Stats but also not letting them drift hopelessly out of whack with the contents of the table. Measuring the performance of a full Global Stats gathering operation doesn't illustrate the performance benefits.