Re: Large pages

From: Noons <wizofoz2k_at_gmail.com>
Date: Tue, 13 Nov 2012 18:38:31 -0800 (PST)
Message-ID: <2b35f5db-4533-4d76-99e8-4867b865fcf2_at_u4g2000pbo.googlegroups.com>



On Nov 13, 9:53 pm, "Jonathan Lewis" <jonat..._at_jlcomp.demon.co.uk> wrote:

> I can't work out from your statement exactly which bit of what I've said is
> not correct. Could you please clarify.

You said:
>In Unix (unless you use intimate shared memory) you get one map per
>process - so an Oracle system with 1,000 processes would end up using
as
>much memory for maps of the SGA as it would on the SGA itself if it
were
>using standard pages.

That is incorrect, In Unix there is no such thing as a memory map per process.
The largepages feature has nothing to do with the memory of each process.
Memory map is an incorrect term to describe virtual memory translation to physical addresses in the context of a process or group of processes.

> The "memory map" certainly maps logical addresses to physical memory
> addresses - easy enough to see (if you know the answer) when you look at
> some of the x$ objects and see wildly different addresses for objects - for
> example, Oracle maps the PGA for a process into an entirely different set
> of (logical) addresses from the SGA - to the extent that the (logical) gap
> between memory addresses can be far larger than the physical memory
> available on the machine.

What memory Oracle allocates for each required chunk does not constitute a memory map. There is no such thing as a process "memory map". Memory allocation is not the same as memory mapping.

> But (depending on O/S and choice of O/S parameters the page size used in
> the mapping may vary, each process may have it's own map, or all processes
> that attach to the same physical memory may share a single map of the
> memory (e.g. Solaris Intimate Shared Memory).

Not quite correct. Page size is not an attribute of the process. It's an attribute of the virtual memory management that is used to map a given virtual address space to actual memory, whatever the process.

> I wrote a note about this a
> coupld of years ago highlighting a surprise side effect
> (http://jonathanlewis.wordpress.com/2010/06/23/memory/); more
> significantly Christo Kutrovsky (of Pythian) posted a video of a
> presentation demonstrating the various effects (and how to measure them)
> when configuring Linux.
> (http://www.pythian.com/news/741/pythian-goodies-free-memory-swap-orac...
>  )

Yes, but what might happen in Linux - according to Christo - does not define what Unix does: last time I looked they were NOT the same OS!

As I said: in Unix, largepages are not part of a process "memory map". They are used to define a portion (NOT an address range, simply a quantity) of physical memory that will be managed by virtual memory pages of a certain size. It is perfectly possible - in fact almost mandatory - to have multiple regions of physical memory dedicated to different types of paging. In Aix for example, there are 3 sizes of pages possible which can be used in various combinations by various processes, singly or concurrently: 4K, 65K and 16M. In Aix 7.1 and P7 hardware there is a 4th size, but I won't go into that.

What one can do in Unix is allocate a certain amount of physical memory to a certain type of paging. Once that is done, processes are free to use that memory as native address space for execution, or as attached segments of "shared" memory. As an example, it is perfectly possible for a process using 4KB pages to attach a shared memory segment that is managed - from the point of view of virtual-to- physical addressing - as 16MB pages. Nothing to do with the process. All processes attaching to that shared segment will use its pagesize for those addresses, regardless of what they started with.

Here is the breakdown of such page sizes and how much physical memory is managed by each pagesize in my Aix test system:

aubdc00-ora01t:sandy$vmstat -P ALL
System configuration: mem=16384MB

pgsz            memory                           page
----- -------------------------- ------------------------------------
           siz      avm      fre    re    pi    po    fr     sr    cy
   4K  1222496   341519   615634     0     0     0     0      0     0
  64K    32138    31995      143     0     0     0     0      0     0
  16M      600      376      224     0     0     0     0      0     0

Or in round numbers:
9GB of 16MB pages,
2GB of 64K pages and
5GB of 4K pages.
for a grand total as indicated above of around 16GB of physical memory.

Note that the free 16MB pags are 224 above (fre column): this will become relevant below.

Where each pagesize region of physical memory starts is entirely up to the OS to manage. It will be mapped to virtual addresses anyway, not relevant here. What is important to note is that the memory managed by each type of pagesize is contiguous, whatever its quantity may be.

A program can map virtual memory to physical memory with various page sizes, or it can use its entire address space in one type of page size. For example, I can run sqlplus in 4K pagespace, using a SGA entirely in 16MB pages.

See above for the pages active in 16MB: they are the SGA, and there are 376 of them, with 224 spare not being used at the moment.

If I now set the environment variables that cause Aix to execute sqlplus ENTIRELY in 16MB space, I'll get the following: aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70

kthr    memory              page              faults
cpu           large-page
----- ----------- ------------------------ ------------
----------------------- -----------

 r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec alp flp
 2 1 2390433 604576 0 0 0 0 0 0 10 4194 588 6 3 91 0 0.06 8.9 376 224
aubdc00-ora01t:sandy$export
LDR_CNTRL=LARGE_PAGE_TEXT=Y_at_LARGE_PAGE_DATA=M (from now on ALL programs in my session will be executing - BY DEFAULT - entirely in 16MB pagesize - the column "flp" will measure how many are free)
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr    memory              page              faults
cpu           large-page
----- ----------- ------------------------ ------------
----------------------- -----------

 r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec alp flp
 2 1 2403604 620137 0 0 0 0 0 0 10 4194 588 6 3 91 0 0.06 8.9 396 204
(notice how the free largepages - flp column - has dropped to 204? That's vmstat itself running in largepages) (Now for sqlplus:)
aubdc00-ora01t:sandy$sqlplus
SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 14 12:23:00 2012 Copyright (c) 1982, 2011, Oracle. All rights reserved. Enter user-name: / as sysdba
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> !vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr    memory              page              faults
cpu           large-page
----- ----------- ------------------------ ------------
----------------------- -----------

 r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec alp flp
 2 1 2480102 600964 0 0 0 0 0 0 10 4194 588 6 3 91 0 0.06 8.9 430 170
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr    memory              page              faults
cpu           large-page
----- ----------- ------------------------ ------------
----------------------- -----------

 r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec alp flp
 2 1 2403607 603731 0 0 0 0 0 0 10 4194 588 6 3 91 0 0.06 8.9 396 204
(Notice how flp dropped to 170 when sqlplus was running. And it's back to 204 flp after I exit sqlplus and run vmstat in 16MB pages)

Now to prove my point, I'll turn off the largepages for the whole process and show that sqlplus is NOT using up the largepages at all, although of course it is attaching to an Oracle instance where the SGA DOES use 16MB pages:
aubdc00-ora01t:sandy$unset LDR_CNTRL
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70

kthr    memory              page              faults
cpu           large-page
----- ----------- ------------------------ ------------
----------------------- -----------

 r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec alp flp
 2 1 2390890 604159 0 0 0 0 0 0 10 4194 588 6 3 91 0 0.06 8.9 376 224
(Notice how the flp has now popped back to 224, as initially - now vmstat is running in the default 4K pages) aubdc00-ora01t:sandy$sqlplus
SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 14 12:30:00 2012 Copyright (c) 1982, 2011, Oracle. All rights reserved. Enter user-name: / as sysdba
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> !vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr    memory              page              faults
cpu           large-page
----- ----------- ------------------------ ------------
----------------------- -----------

 r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec alp flp
 2 1 2394755 600292 0 0 0 0 0 0 10 4194 588 6 3 91 0 0.06 8.9 376 224
(and of course from inside sqlplus we can check vmstat and there is the constant 224 flp again, as sqlplus is NOT using largepages itself. Although it is attaching to a SGA that has been locked and is using largepages)
SQL> sho parameter lock_sga
NAME                                 TYPE        VALUE
------------------------------------ -----------
------------------------------
lock_sga                             boolean     TRUE

Now to prove that the SGA is indeed using the largepages, let's shutdown and startup and check flp column while we do that: aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70

kthr    memory              page              faults
cpu           large-page
----- ----------- ------------------------ ------------
----------------------- -----------

 r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec alp flp
 2 1 2390933 604112 0 0 0 0 0 0 10 4194 588 6 3 91 0 0.06 8.9 376 224
aubdc00-ora01t:sandy$sqlplus
SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 14 12:39:08 2012 Copyright (c) 1982, 2011, Oracle. All rights reserved. Enter user-name: / as sysdba
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> !vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr    memory              page              faults
cpu           large-page
----- ----------- ------------------------ ------------
----------------------- -----------

 r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec alp flp
 2 1 782537 672407 0 0 0 0 0 0 10 4194 588 6 3 91 0 0.06 8.9 0 600
SQL> startup
ORACLE instance started.
Total System Global Area     6263357440 bytes
Fixed Size                      2233112 bytes
Variable Size                1073745128 bytes
Database Buffers             5167382528 bytes
Redo Buffers                   19996672 bytes
Database mounted.
Database opened.
SQL> !vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr    memory              page              faults
cpu           large-page
----- ----------- ------------------------ ------------
----------------------- -----------

 r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec alp flp
 2 1 2386771 608244 0 0 0 0 0 0 10 4194 588 6 3 91 0 0.06 8.9 376 224
SQL> See how the free large pages (flp) changes as I shutdown and restart? That's the SGA in 16MB pages, while the rest of the programs, libraries and OS are using 4K pages.

What Christo might have found in Linux is SPECIFIC to Linux only, and I strongly suspect to the specific version and flavour he was running. Unix manages virtual memory in a different manner and there are again small differences between Solaris, Aix, and others.

But the bottom line is: largepages (hugepages, or whatever we decide to call them) have nothing to do with a per process count and ALL to do with how much physical memory is addressed by each type of paging.

Processes simply use largepages or not, either for execution space or as attached shared memory. Mixed page size spaces in a single program are perfectly possible, precisely because the pagesize and how it maps to physical memory is EXTERNAL to the entire per process memory management.

All a process knows is it uses (virtual) memory, between numeric addresses set by the OS. As in 0A000(hex) - 0FFFFF(hex). Those two limits are set by the virtual memory management mechanism of the OS, once, for each program. The CPU then maps those to physical addresses using the TLA tables set up and managed by the OS, depending on memory configuration.
There can be boundaries between those two limits, which are mapped to pages of various sizes. Remember the old Oracle shared memory setups of release 5 (and 6), where we had to re-link Oracle to use shared memory in Unix? One of the parameters was the size of shared memory available (SHMALL) and another (optional) was the virtual address where it started. All to do with that.
Thank the Gods that's gone now!

The whole thing can get very complex very quickly and there are many, many nuances so I won't go into a lot more detail here than I could. In fact, I've already put too much into it and it will be difficult to follow for a lot of folks: my apologies!

The whole field of virtual memory, paging translation and physical to virtual mapping are a separate universe that has nothing to do with individual processes, although of course it affects their execution. This was explained in detail in some of the old McGraw-Hill OS books from the 70s and 80s, which are unfortunately out of print nowadays - but no less relevant. There is also at least one Dijkstra book that describes the whole thing in detail and I do believe from a faint memory Knuth talks about it as well in one of his tomes. IBM's online technical library still has some excellent books on the subject, going into much more detail.

Kevin and I have discussed this largepage thing quite a few times in his blog, there are many entries there that might be worth folks reading as he does a much better job of addressing this than my feeble attempts to explain it.

Again, just to make my point very clear: Linux does NOT manage virtual memory the same way as Unix nor is it valid to extrapolate from Linux to Unix - or Windows, for that matter. If I may quote you, it all must be tested for validity rather than assumed. As such, I'd suggest that anyone looking at using largepages in Windows give it a try and check results before assuming anything. Worth doing? You bet: I get an immediate CPU usage improvement of between 10 and 15% in Aix by just switching the SGA into 16MB: there is a MUCH smaller TLA for the virtual memory translation to traverse when I do that, and that traversing costs CPU cycles! Received on Wed Nov 14 2012 - 03:38:31 CET

Original text of this message