Skip navigation.

Yann Neuhaus

Syndicate content
All blog entries from
Updated: 9 hours 31 min ago

SQL Saturday in Paris on 12 -13 September

Mon, 2014-07-28 22:35

As you certainly know SQL Saturday events are very popular in SQL Server world community. This is the second time the event takes place in Paris (France), but this time, we have a new format with pre-conferences on Wednesday and classic sessions on Saturday. During pre-conferences, we will talk about a particular subject for a whole day.

This time, I have the opportunity to participate twice by giving two sessions (in French) with the following program:

  • Friday: Inside the SQL Server storage and backups

If you are interested in how the SQL Server storage works and how to deal with corruption as well as backups, this session might be interesting for you.

Be careful: the pre-conferences on Wednesday are fee-paying sessions (but not that expensive). You can still register at this address.

  • Saturday: SQL Server AlwaysOn deep dive

SQL Server AlwaysOn is a new great high-availability and disaster recovery feature provided by Microsoft. You can come take a look at this session if you are concerned by questions like:

  • How to configure my Windows failover cluster and quorum in my situation?
  • What exactly is a read-only secondary replica?
  • What are the built-in tools provided by Microsoft to monitor and troubleshoot this infrastructure?

Good news: the sessions on Saturday are free!

Take a look at the agenda if you want to attend to other interesting sessions. I hope there will be many attendees! Smile

SQL monitoring shows adaptive plans

Fri, 2014-07-25 14:23

In a previous post, I have described Adaptive Plans. Even if I prefer to show plans with the SQL Monitor active html format, I had to stick with the dbms_xplan for that because SQL Monitoring did not show all information about adaptive plans.

This has been fixed in the Patchset 1 and I have run the same query to show the new feature.

First, an adaptive plan can be in two states: 'resolving' where all alternatives are possible and 'resolved' then the final plan has been choosen. It is resolved once the first execution statistics collector has made the decision about the inflection point. We can see the state in the SQL Monitor header:




Here my plan is resolved because the first execution is finished.

The plan with rowsource statistics show only the current plan, but the 'Plan Note' shows that it is an adaptive plan:




Now we have to go to the 'Plan' tab which show the equivalent of dbms_xplan.display_cursor:




Here the format is equivalent to format=>'adaptive'. It's the 'Full' plan where all branches are shown but inactive part is grayed. We have here the Statistics Collector after reading DEPARTMENTS, and we have the inactive full table scan hash join of EMPLOYEES.

Just choose the 'Final' Plan (or 'Current' if it is not yet resolved) to get only the active part:




I often prefer the tabular format to the graphical one:




We have all information: the 7 rows from DEPARTMENTS have gone through STATISTICS COLLECTOR and NESTED LOOP with index access has been choosen. Note that it is different from the previous post where HASH JOIN with full table scan was choosen because the 7 rows were higher than the inflection point.

In my current example, because I have system statistics that costs full table scan higher:


DP: Found point of inflection for NLJ vs. HJ: card = 8.35


This is higher than ny 7 rows from DEPARTMENTS.

Here is the whole sqlmon report: and how I got it:


alter session set current_schema=HR;
select /*+ monitor */ distinct DEPARTMENT_NAME from DEPARTMENTS
 where DEPARTMENT_NAME like '%ing' and SALARY>20000;

alter session set events='emx_control compress_xml=none';set pagesize 0 linesize 10000 trimspool on serveroutput off long 100000000 longc 100000000 echo off feedback off
spool sqlmon.htm
select dbms_sqltune.report_sql_monitor(report_level=>'all',type=>'html') from dual;
spool off

Note that I used the script exposed here and I used the emx_event to get the uncompressed xml, which I got from Tyler Muth:

@FranckPachot well, I asked @DBAKevlar, she asked the developer that owns it ;)

— tmuth (@tmuth) July 25, 2014

Beyond In-Memory, what's new in ?

Tue, 2014-07-22 09:53

It's just a patchset. The delivery that is there to stabilize a release with all the bug fixes. But it comes with a lot of new features as well. And not only the one that has been advertised as the future of the database. It's a huge release.

Let's have a look at what's new.

First, it seems that it will be the only patchest for 12.1

Then, there is that In-Memory option awaited for a while. There has been some demo done by Larry Ellison on Exadata or even on the Oracle SPARC M6. Of course, if you have 32 TB of memory, we can understand the need for an In-Memory optimized storage. For a more real-life usage of that option, stay tune on our blog. We investigate the features in the context of our customer concerns, to fit their needs. For example, In-Memory addresses cases where some customers use Active Data Guard to offload reporting/real-time analytics to another server. But unfortunately In-Memory is not populated on a physical standby. We probably have to wait 12.2 for that.

In-Memory is an option, so available only in Enterprise Edition.

There are other new features related with large memory. There is a part of buffer cache dedicated to big tables (you just set the percentage) to be cached for In-Memory Parallel Query. And there is also a mode where all the database is in buffer cache. About performance and Parallel Query, a new transformation has been introduced to optimize the group by operation when joining a fact table to dimensions.

Second new feature is the range-partitioned hash cluster. Oracle CLUSTER segments is a very old feature but not widely used. Hash cluster is the fastest way to access to a row because the key can be directly transformed to a rowid. Unfortunately maintenance is not easy, especially when the volume increases. And we have partitioning which is the way to ease maintenance with growing tables but, until today, we can't partition a hash cluster. I mean, not in a supported way because Oracle uses it on SPARC for the TPC benchmarks - applying a specific patch (10374168) for it.

Well, the good news is that we can finally partition hash clusters with the simple syntax:

create cluster democ1 (sample_time timestamp,sample_id number)
hashkeys 3600 hash is sample_id size 8192
partition by range (sample_time) (
partition P12 values less than( timestamp'2014-04-26 12:00:00' )

Another nice feature is Attribute Clustering. Lot of other RDBMS has the ability to arrange rows but Oracle puts any insert anywhere in a heap table, depending only on where some free space is left. The alternative is IOT of course. But it can be good to try to cluster rows on one or several columns. It's better for index access, it's better for cache efficiency, it's better for storage indexes (or in-memory min/max), for ILM compression, etc. We can finally do it and I'll blog soon about that. 

Attribute Clustering is not an option, but available only in Enterprise Edition.


I think those two features are my favorite ones. Because the best optimization we can do, without refactoring the application design, is to place data in the way it will be retreived.


The trend today is to store unstructured data as JSON. XML was nice, but it's verbose. JSON is easier to read and even PostgreSQL can store JSON in its latest version. So Oracle has it in you can store and index it. Once again stay tuned on this blog to see how it works.

Something important was missing in Oracle SQL. How do you grant a read only user? You grant only select privilege? But that's too much because with a select privilege we can lock a table (with LOCK or SELECT FOR UPDATE). So we have now a READ privilege to prevent that. That's my favorite new feature for developers.

Then there are a few improvements on multitenant, such as the possibility to save the state of a pluggable database so that it can be automatically opened when the CDB startup. We already addressed that in in our Database Management Kit. An undocumented parameter, _multiple_char_set_cdb, let us imagine that we will be able to have different characterset for the PDB - probably in the future. Currently it's set to false.

And once again as beta testing partners we have put the pressure to have a fix for what we consider as serious availability bug. The behaviour in beta was even worse about CDB availability and I finally had a bug opened (Bug 19001390 - PDB SYSTEM TABLESPACE MEDIA FAILURE CAUSES THE WHOLE CDB TO CRASH) that should be fixed in 12.1

About fixes, some restrictions are now gone: we can finally use ILM with multitenant and we can have supplemental logging while using a move partition online. And you can have Flashback Data Archive in multitenant as well.

All that is good news, but remember, even if it's only the 4th digit that is increased in the version number, it's a brand new version with lot of new features. So, when do you plan to upgrade ? 11g is supported until January 2015. Extended support is free until January 2016 given that you are in the terminal patchset ( So either you don't want to be in the latestet release and you will have to upgrade to before the end of the year, waiting for 12.2 maybe in 2016. Or you want those new features and will probably go to for 2015.

Talking about upgrade, there's a bad news. We thought that multitenancy can accelarate upgrade time. Because the data dictionary is shared, you just have to plug a PDB into a newer version CDB and it's upgraded. And we show that in our 12c new features workshop by applying a PSU. But we have tested the upgrade to in the same way, and it's not that simple. Plugging is quick when you have only new patches that did not change the dictionary. It's still true for PSU when the dictionary changes are limited to the root container. But when you upgrade to you have to synchronize all the PDB dictionaries (all that magic behind object links and metadata links) and that takes time. It takes the same time as upgrading a non-CDB. Conclusion: you don't save time when you do it by plug/unplug.

But I have good news as well for that because I've tested a 1 minute downtime migration from to Dbvisit replicate, the affordable replication solution, supports multitenant in it's latest version, both as source and target. If your application is compatible (which is easy to check with the 30 days trial) then it's a good way to migrate without stress and with minimal downtime. It's available for Standard Edition as well, but currently the download can install only an Enterprise Edition.

Backup an SQL Server database from On-Premise to Azure

Sun, 2014-07-20 23:46

SQL Server database backup & restore from On-Premise to Azure is a feature introduced with SQL Server 2012 SP1 CU2. In the past, it could be used with these three tools:

  • Transact-SQL (T-SQL)
  • PowerShell
  • SQL Server Management Objects (SMO)

With SQL Server 2014, backup & restore can also be enabled via SQL Server Management Studio (SSMS).

Oracle EM agent 12c thread leak on RAC

Thu, 2014-07-17 22:24

In a previous post about nproc limit, I wrote that I had to investigate the nproc limit with the number of threads because my Oracle 12c EM agent was having thousands of threads. This post is a short feedback about this issue and the way I have found the root cause. It concerns the enterprise manager agent 12c on Grid Infrasctructure >=



The issue was:


ps -o nlwp,pid,lwp,args -u oracle | sort -n
   1  8444  8444 oracleOPRODP3 (LOCAL=NO)
   1  9397  9397 oracleOPRODP3 (LOCAL=NO)
   1  9542  9542 oracleOPRODP3 (LOCAL=NO)
   1  9803  9803 /u00/app/oracle/product/agent12c/core/ /u00/app/oracle/product/agent12c/core/ agent /u00/app/oracle/product/agent12c/agent_inst/sysman/log/emagent.nohup
  19 11966 11966 /u00/app/11.2.0/grid/bin/oraagent.bin
1114  9963  9963 /u00/app/oracle/product/agent12c/core/ ... emagentSDK.jar oracle.sysman.gcagent.tmmain.TMMain


By default ps has only one entry per process, but each processes can have several threads - implemented on linux as light-weight process (LWP). Here, the NLWP column shows that I have 1114 threads for my EM 12c agent - and it was increasing every day until it reached the limit and the node failed ('Resource temporarily unavailable').

The first thing to do is to know what those threads are. The ps entries do not have a lot of information, but I discovered jstack which every java developer should know, I presume. You probably know that java has very verbose (lengthy) stack traces. Jstack was able to show me thousands of them in only one command:


$ jstack 9963
2014-06-03 13:29:04
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.14-b01 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007f3368002000 nid=0x4c9b waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CRSeOns" prio=10 tid=0x00007f32c80b6800 nid=0x3863 in Object.wait() [0x00007f31fe11f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at oracle.eons.impl.NotificationQueue.internalDequeue(
	- locked  (a java.lang.Object)
	at oracle.eons.impl.NotificationQueue.dequeue(
	at oracle.eons.proxy.impl.client.base.SubscriberImpl.receive(
	at oracle.eons.proxy.impl.client.base.SubscriberImpl.receive(
	at oracle.eons.proxy.impl.client.ProxySubscriber.receive(
	at oracle.sysman.db.receivelet.eons.EonsMetric.beginSubscription(
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
	at java.util.concurrent.ThreadPoolExecutor$
	at oracle.sysman.gcagent.util.system.GCAThread$

I don't paste all of them here. We have the 'main', we have a few GCs and 'Gang workers' which are present in all JVMs and we have a few enterprise manager threads. And what was interesting was that I had thousands of "CRSeOns" that seemed to be increasing.

Some guesses: I'm on RAC, and I have a 'ons' resource and the EM agent tries to subscribe to it. Goggle search returned nothing, and that's the reason I put that in a blog post now. Then I searched MOS, and bingo, there is a note: Doc ID 1486626.1. It has nothing to do with my issue, but has an interesting comment in it:

In cluster version and higher, the ora.eons resource functionality has been moved to EVM. Because of this the ora.eons resource no longer exists or is controlled by crsctl.

It also explains how to disable EM agent subscription:

emctl setproperty agent -name disableEonsRcvlet -value true

I'm in and I have thousands of threads related to a functionality that doesn't exist anymore. And that leads to some failures in my 4 nodes cluster.

The solution was simple: disable it.

For a long time I have seen a lot of memory leaks or CPU usage leaks related to the enterprise manager agent. With this new issue, I discovered a thread leak and I also faced a SR leak when trying to get support for the 'Resource temporarily unavailable' error, going back and forth between OS, Database, Cluster and EM support teams...

SQL Server Perfmon does not start automatically

Mon, 2014-07-14 19:28

I have recently used perfmon (performance monitor) at a customer site. I created a Data Collector Set to monitor CPU, Memory, Disk, and Network during one day. Then, I ran the monitor and I received a "beautiful" error message…

Partial Join Evaluation in Oracle 12c

Sun, 2014-07-13 21:21

Do you think that it's better to write semi-join SQL statements with IN(), EXISTS(), or to do a JOIN? Usually, the optimizer will evaluate the cost and do the transformation for you. And in this area, one more transformation has been introduced in 12c which is the Partial Join Evaluation (PJE).

First, let's have a look at the 11g behaviour. For that example, I use the SCOTT schema, but I hire a lot more employees in departement 40:


SQL> alter table EMP modify empno number(10);
Table altered.
SQL> insert into EMP(empno,deptno) select rownum+10000,40 from EMP,(select * from dual connect by level


Why department 40? I'll explain it below, but I let you think about it before. In the default SCOTT schema, there is a department 40 in DEPT table, but which has no employees in EMP. And the new transformation is not useful in that case.


11g behaviour

Now, I'm running the following query to check all the departments that have at least one employee:

I can write it with IN:


SQL_ID  6y71msam9w32r, child number 0
select distinct deptno,dname from dept 
 where deptno in ( select deptno from emp)

Plan hash value: 1754319153

| Id  | Operation          | Name | Starts | E-Rows | A-Rows | Buffers |
|   0 | SELECT STATEMENT   |      |      1 |        |      4 |      15 |
|*  1 |  HASH JOIN SEMI    |      |      1 |      4 |      4 |      15 |
|   2 |   TABLE ACCESS FULL| DEPT |      1 |      4 |      4 |       7 |
|   3 |   TABLE ACCESS FULL| EMP  |      1 |  15068 |    388 |       8 |

Predicate Information (identified by operation id):

   1 - access("DEPTNO"="DEPTNO")


or with EXISTS:


SQL_ID  cbpa3zjtzfzrn, child number 0
select distinct deptno,dname from dept 
 where exists ( select 1 from emp where emp.deptno=dept.deptno)

Plan hash value: 1754319153

| Id  | Operation          | Name | Starts | E-Rows | A-Rows | Buffers |
|   0 | SELECT STATEMENT   |      |      1 |        |      4 |      15 |
|*  1 |  HASH JOIN SEMI    |      |      1 |      4 |      4 |      15 |
|   2 |   TABLE ACCESS FULL| DEPT |      1 |      4 |      4 |       7 |
|   3 |   TABLE ACCESS FULL| EMP  |      1 |  15068 |    388 |       8 |
Predicate Information (identified by operation id):

   1 - access("DEPTNO"="DEPTNO")


Both are good. We didn't have to read the whole EMP table. I have 15000 rows in my table, I do a full scan on it, but look at the A-Rows: only 388 rows were actually read.

The HASH JOIN first read the DEPT table in order to build the hash table. So it already knows that we cannot have more than 4 distinct departments.

Then we do the join to EMP just to check which of those departments have an employee. But we can stop as soon as we find the 4 departments. This is the reason why we have read only 388 rows here. And this is exactly what a Semi Join is: we don't need all the matching rows, we return at most one row per matching pair.

Ok. What if we write the join ourselves?


SQL_ID  2xjj9jybqja87, child number 1
select distinct deptno,dname from dept join emp using(deptno)

Plan hash value: 2962452962

| Id  | Operation           | Name | Starts | E-Rows | A-Rows | Buffers |
|   0 | SELECT STATEMENT    |      |      1 |        |      4 |     129 |
|   1 |  HASH UNIQUE        |      |      1 |  15068 |      4 |     129 |
|*  2 |   HASH JOIN         |      |      1 |  15068 |  14014 |     129 |
|   3 |    TABLE ACCESS FULL| DEPT |      1 |      4 |      4 |       7 |
|   4 |    TABLE ACCESS FULL| EMP  |      1 |  15068 |  14014 |     122 |

Predicate Information (identified by operation id):

   2 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")


Bad luck. We have to read all the rows. More rows and more buffers.


12c behaviour

Let's do the same in


SQL_ID  2xjj9jybqja87, child number 0
select distinct deptno,dname from dept join emp using(deptno)

Plan hash value: 1629510749

| Id  | Operation           | Name | Starts | E-Rows | A-Rows | Buffers |
|   0 | SELECT STATEMENT    |      |      1 |        |      4 |      14 |
|   1 |  HASH UNIQUE        |      |      1 |      4 |      4 |      14 |
|*  2 |   HASH JOIN SEMI    |      |      1 |      4 |      4 |      14 |
|   3 |    TABLE ACCESS FULL| DEPT |      1 |      4 |      4 |       7 |
|   4 |    TABLE ACCESS FULL| EMP  |      1 |  15068 |    388 |       7 |

Predicate Information (identified by operation id):

   2 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")


Same plan but less rows have been read. If we check the outlines, we see the new feature:


      PARTIAL_JOIN(@"SEL$58A6D7F6" "EMP"@"SEL$1")


And here is what we see in the optimizer trace:


PJE: Checking validity of partial join eval on query block SEL$58A6D7F6 (#1)
PJE: Passed validity of partial join eval by query block SEL$58A6D7F6 (#1)
PJE: Partial join eval conversion for query block SEL$58A6D7F6 (#1).
PJE: Table marked for partial join eval: EMP[EMP]#1


The hints that controls the feature are PARTIAL_JOIN and NO_PARTIAL_JOIN and there are enabled by _optimizer_partial_join_eval which appeared in 12c.

But of course, the optimization is useful only when we have all the values at the beginning of the table. This is why I added at least one employee in department 40. If there are some rows in DEPT that have no matching row in EMP, then Oracle cannot know the result before reaching the end of the table.

Master Data Services installation for SQL Server 2012

Sun, 2014-07-13 20:32

This posting is a tutorial for installing Master Data Services on your Windows Server 2012. Microsoft SQL Server Master Data Services (MDS) is a Master Management product from Microsoft, code-named Bulldog. It is the rebranding of the Stratature MDM product, titled +EDM and acquired in June 2007 by Microsoft. Initially, it was integrated for the first time in Microsoft SQL Server 2008 as an additional installer. But since SQL Server 2012, Master Data Services is integrated as a feature within the SQL Server installer.



Master Data Services is part of the Enterprise Information Management (EMI) technologies, provided by Microsoft, for managing information in an enterprise.

EMI technologies include:

  • Integration Services
  • Master Data Services
  • Data Quality Services



Master Data Services covers five main components:

  • MDS Configuration Manager tool: used to configure Master Data Services
  • MDS Data Manager Web Application: used essentially to perform administrative tasks
  • MDS Web Service: used to extend or develop custom solutions
  • MDS Add-in for Excel: used to manage data, create new entities or attributes …


SQL Server Editions & Versions

Master Data Services can be installed only with the following SQL Server Editions & Versions:

  • SQL Server 2008 R2 edition: Datacenter or Enterprise versions
  • SQL Server 2012 or SQL Server 2014 editions: Enterprise or BI versions


Master Data Services prerequisites in SQL Server 2012

First, Master Data Services is based on an application web named Master Data Manager Web Application, in order to perform administrative task, for example. This web application is hosted by Internet Information Services (IIS), so it is a necessary prerequisite.

Furthermore, to be able to display the content from the web application, you need Internet Explorer 7 or later (Internet Explorer 6 is not supported) with Silverlight 5.

Moreover, if you planned to use Excel with Master Data Services, you also need to install Visual Studio 2010 Tools for Office Runtime, plus the Master Data Services Add-in for Microsoft Excel.

Finally, often forgotten, but PowerShell 2.0 is required for Master Data Services.

 Let’s resume the requirements for Master Data Services:

  • Internet Information Services (IIS)
  • Internet Explorer 7 or later
  • Silverlight 5
  •  PowerShell 2.0
  • Visual Studio 2010 Tools for Office Runtime and Excel Add-in for Microsoft Excel (only if you plan to use Excel with Master Data Services).


Configuration at the Windows Server level

In the Server Manager, you have to activate the Web Server (IIS) Server Roles to be able to host the Master Data Web Application, as well as the .Net 3.5 feature.

For the Server Roles, you have to select:

  • Web Server (IIS)

For the Server Features, you have to select:

- .NET Framework 3.5 Features
  - .NET Framework 3.5
  - HTTP Activation
- .NET Framework 4.5 features
  - .NET Framework 4.5
  - ASP.NET 4.5
  - WCF Services
    - HTTP Activation
    - TCP Port Sharing




For the IIS features selection, you have to select:

- Web Server
  - Common HTTP Features
    - Default Document
    - Directory Browsing
    - HTTP Errors
    - Static Content
  - Health and Diagnostics
    - HTTP Logging
    - Request Monitor
  - Performance
    - Static Content Compression
  - Security
    - Request Filtering
    - Windows Authentication
  - Application Development
    - .NET Extensibility
    - .NET Extensibility 4.5
    - ASP.NET 3.5
    - ASP.NET 4.5
    - ISAPI Extensions
    - ISAPI Filters
  - Management Tools
    - IIS Management Console





Installation of SQL Server 2012

Master Data Services stores its data on a SQL Server database, so you need a SQL Server Engine installed.

Of course, SQL Server Engine can be installed on a different Windows Server. So the Windows Server with Master Data Services installed is used as a Front Server.

Then, in order to personalize the roles of your Master Data Services, you also need to install Management Tools.

At the features installation step, you have to select:

  • Database Engine Services
  • Management Tools
  • Master Data Services



At this point, Master Data Services should be installed with all the needed prerequisites.



However, Master Data Services cannot be used without configuring it. Three main steps need to be performed through the MDS Configuration Manager:

  • First, you have to create a MDS database
  • Then, you have to create a MDS web application hosted in IIS
  • Finally, you have to link the MDS database with the MDS web application

SQL Server 2014: Are DENY 'SELECT ALL USERS SECURABLES' permissions sufficient for DBAs?

Wed, 2014-07-02 20:09

SQL Server 2014 improves the segregation of duties by implementing new server permissions. The most important is the SELECT ALL USERS SECURABLES permission that will help to restrict database administrators from viewing data in all databases.

My article is a complement to David Barbarin's article 'SQL Server 2014: SELECT ALL USERS SECURABLES & DB admins'.

Java Mission Control 5.2 (7u40) deserves your attention

Mon, 2014-06-30 20:38

Recently, some new versions of java were made available. Most people think Java updates are boring and only security-oriented. But one of the last updates (7u40) includes a feature which deserves attention. I mean Java Mission Control 5.2.


Hotspot incoming

If you know the Oracle JRockit JVM a little bit, you might have heard about JMC, which was called JRockit Mission Control in the past. In fact, it’s a tool suite embedded with the Hotspot JDK since this so called 7u40 release which allows to monitor and profile your JVM.

Previously, it was only available for JRockit JVMs and it has now been ported to the Hotspot. JMC is a way more accurate and complete than JConsole or other embedded tools. It does not affect JVM performances more than 1%.




Mission Completed

JMC gathers low level information thanks to its Flight Recorder tool which listens and waits for internal events. It can then monitor, manage, profile, and eliminate memory leaks from the JVM.

The new version of JMC now has a new browser with subnodes for available server side services with their states. It is supported by Eclipse 3.8.2/4.2.2 and later, it allows deeper management of MBeans, especially setting values directly in the attribute tree. It converges with JRockit event management: All information provided by JRockit is now available in the Hotspot as well.




To enable JMC, you will have to add the following arguments to the JVM:





Understanding JVM Java memory leaks

Thu, 2014-06-26 21:38

Lots of people think that Java is free of memory management as the JVM uses a Garbage Collector. The goal of this collector is to free objects that are no longer used in the program without the developer being forced to declare that the object can be collected. Everything is automatic.

It’s really helpful and it avoids wasting time in managing memory. But, as it is an automatic process, it can produce some issues. They are better known as memory leaks.


What is a memory leak?

A leak appears when an object is no longer used in the program but is still referenced somewhere at a location that is not reachable. Thus, the garbage collector cannot delete it. The memory space used for this object will not be released and the total memory used for the program will grow. This will degrade performances over time and the JVM may run out of memory.


Identifying memory leaks

Memory leaks are difficult to identify, they require a good knowledge of the application. Usually, they are related to an Out Of Memory Error exception (also called OOM). But note that not all Out Of Memory Errors imply memory leaks, and not all memory leaks are due to Out Of Memory Errors.

Having an Out Of Memory Error is a first sign but you must make a difference between a “normal” OOM and a memory leak.

For example, if the program loads data from external files and one time a file is bigger than it has been expected it could result in an OOM. But this one is “normal” as the design of the application was not able to handle such a big file.

In the other hand, if the program is used to process data with the same size or which are similar to each other and you get an OOM it may be a memory leak. In addition a memory leak generally eats free space gradually, if your memory has been fulfilled suddenly it might be a “normal” OOM and you should look at the stack trace to find out the origin.

Another symptom of a memory leak would be an allocation issue. In fact when too much space is taken in the heap and not freed allocations of new objects may be complicated. Mainly if objects are huge. When an object is first created in memory it goes to a part of the heap called Eden, then if the object survives several garbage collections it goes to the tenured space. Usually there are only old objects in the tenured space; objects which are long life based. But when the Eden is fulfilled by leaks the objects directly go to the tenured space and you can see an abnormal stack of objects in the tenured space as it is generally not fulfilled. So if you have an OOM of a tenured space it might be an allocation issue (or simply your heap configuration is too low).

The heap usage can be checked with tools such as VisualVM, jconsole, Java Mission Control (only for 7u40+ JRE) and so on. I will describe the use of this kind of tools in future blogs.


How to prevent memory leaks

As said before you cannot really prevent memory leaks as this is related to the design of the application. If you are not aware of the type of application you are running or the internal design it uses, you cannot prevent leaks.

Many applications, either desktop built or web-based, will use many threads to run. Some threads such as ThreadLocal can store references to objects which reference their classes which reference their class loader. These threads can keep objects references in order to use objects later in the thread as in methods without passing it as argument. This way, when a web app is redeployed (e.g. in tomcat) a new class loader is created to load the new application but LocalThreads might not be dealocated as they are loaded or they use things from the permgen. As you may know the permgen is a part of the JVM heap which is usually not collected by the GC. And this produces memory leaks.

What you have to remember is to reduce the use of such a threads and be careful of the implementation and design of the application you are using. Of course, memory leaks can result from other reasons but the principle is the same: Objects allocated are no longer reachable and they are not released by the garbage collector.

Oracle Parallel Query: Did you use MapReduce for years without knowing it?

Thu, 2014-06-26 06:42

I've read this morning that MapReduce is dead. The first time I heard about MapReduce was when a software architect proposed to stop writing SQL on Oracle Database and replace it with MapReduce processing. Because the project had to deal with a huge amount of data in a small time and they had enough budget to buy as many cores as they need, they wanted the scalability of parallel distributed processing.

The architect explained how you can code filters and aggregations in Map & Reduce functions and then distribute the work over hundreds of CPU cores. Of course, it's very interesting, but it was not actually new. I was doing this for years on Oracle with Parallel Query. And not only filters and aggregations, but joins as well - and without having to rewrite the SQL statements.

I don't know if MapReduce is dead, but for 20 years we are able to just flip a switch (ALTER TABLE ... PARALLEL ...) and bring scalability with parallel processing. Given that we understand how it works.

Reading a parallel query execution plan is not easy. In this post, I'll just show the basics. If you need to go further, you should have a look at some Randolf Geist presentations and read his Understanding Parallel Execution article. My goal is not to go very deep, but only to show that it is not that complex.

I'll explain how Parallel query works by showing an execution plan for a simple join between DEPT and EMP tables where I want to read EMP in parallel - and distribute the join operation as well.

For the fun of it, and maybe because it's easier to read at the first time, I've done the execution plan on an Oracle 7.3.3 database (1997):




Let's start by the end. I want to read the EMP table by several processes (4 processes because I've set the parallel degree to 4 on table EMP). The table is not partitioned. It is a heap table where rows are scattered into the segment without any specific clustering. So each process will process an arbitrary range of blocks and this is why you see an internal query filtering on ROWID between :1 and :2. My session process, which is known as the 'coordinator', and which will be represented in green below, has divided the range of rowid (it's a full table scan, that reads all blocks from start to high water mark) and has mandated 4 'producer' processes to do the full scan on their part. Those producers are represented in dark blue below.

But then there is a join to do. The coordinator could collect all the rows from the 'producer' processes and do the join, but that is expensive and not scalable. We want the join to be distributed as well. Each producer process can read the DEPT table and do the join, which is fine if it is a small table only. But anyway, we don't want the DEPT table to be read in parallel because we have not set a parallel degree on it. So the EMP table will be read by only one process: my session process, which does all the no-parallel (aka the serial) things in addition to its 'coordinator' role.

Then we have a new set of 4 processes that will do the Hash Join. They need some rows from DEPT and they need some rows from EMP. They are the 'consumer' processes that will consume rows from 'producers', and are represented in pink below. And they don't need them randomly. Because it is a join, each 'consumer' process must have the pairs of rows that match the join columns. In the plan above, you see an internal query on internal 'table queue' names. The parallel full scan on EMP distributes its rows: it's a PARALLEL_TO_PARALLEL distribution, the parallel producers sending their rows to parallel consumers. The serial full scan on DEPT distributes its rows as well: it's a PARALLEL_FROM_SERIAL distribution, the parallel consumers receiving their rows from the serial coordinator process. The key for both distributions are given by a hash function on the join column DEPTNO, so that rows are distributed to the 4 consumer processes, but keeping same DEPTNO into the same process.

We have a group by operation that will be done in parallel as well. But the processes that do the join on DEPTNO cannot do the group by which is on others columns (DNAME,JOB). So we have to distribute the rows again, but this time the distribution key is on DNAME and JOB columns. So the join consumer processes are also producers for the group by operation. And we will have a new set of consumer processes that will do the join, in light blue below. That distribution is a PARALLEL_TO_PARALLEL as it distributes from 4 producers arranged by (DEPTNO) to 4 consumers arranged by (DNAME,JOB).

At the end only one process receives the result and sends it to the client. It's the coordinator which is 'serial'. So it's a PARALLEL_TO_SERIAL distribution.

Now let's finish with my Oracle 7.3.3 PLAN_TABLE and upgrade to 12c which can show more detailed and more colorful execution plans. See here on how to get it.

I've added some color boxes to show the four parallel distributions that I've detailed above:

  • :TQ10001 Parallel full scan of EMP distributing its rows to the consumer processes doing the join.
  • :TQ10000 Serial full scan of DEPT distributing its rows to the same processes, with the same hash function on the join column.
  • :TQ10002 The join consumer receiving both, and then becoming the producer to send rows to the consumer processes doing the group by
  • :TQ10003 Those consumer processes doing the group by and sending the rows to the coordinator for the final result.



So what is different here?

First we are in 12c and the optimizer may choose to broadcast all the rows from DEPT instead of the hash distribution. It's the new HYBRID HASH distribution. That decision is done when there are very few rows and this is why they are counted by the STATISTICS COLLECTOR.

We don't see the predicate on rowid ranges, but the BLOCK ITERATOR is there to show that each process reads its range of blocks.

And an important point is illustrated here.

Intra-operation parallelism can have a high degree (here I've set it to 4 meaning that each parallel operation can be distributed among 4 processes). But Inter-operation parallelism is limited to one set of producer sending rows to one set of consumers. We cannot have two consumer operations at the same time. This is why the :TQ0001 and the :TQ10003 have the same color: it's the same processes that act as the EMP producer, and then when finished, then are reused as the GROUP BY consumer.

And there are additional limitations when the coordinator is also involved in a serial operation. For those reasons, in a parallel query plan, some non-blocking operations (those that can send rows above on the fly as they receive rows from below) have to buffer the rows before continuing. Here you see the BUFFER SORT (which buffers but doesn't sort - the name is misleading) which will keep all the rows from DEPT in memory (or tempfiles when it's big).

Besides the plan, SQL Monitoring show the activity from ASH and the time spent in each parallel process:




My parallel degree was 4 so I had 9 processes working on my query: 1 coordinator, two sets of 4 processes. The coordinator started to distribute the work plan to the other processes, then had to read DEPT and distribute its rows, and when completed it started to receive the result and send it to the client. The blue set of processes started to read EMP and distribute its rows, and when completed was able to process the group by. The red set of processes has done the join. The goal is to have the DB time distributed on all the processes running in parallel, so that the response time is equal to the longest one instead of the total. Here, it's the coordinator which has taken 18 milliseconds. The query duration was 15 milliseconds:




This is the point of parallel processing: we can do a 32 ms workload in only 15 ms. Because we had several cpu running at the same time. Of course we need enough resources (CPU, I/O and temp space). It's not new. We don't have to define complex MapReduce functions. Just use plain old SQL and set a parallel degree. You can use all the cores in your server. You can use all the servers in your cluster. If you're I/O bound on the parallel full scans, you can even use your Exadata storage cells to offload some work. And in the near future the CPU processing will be even more efficient, thanks to in-memory columnar storage.

Linux: how to monitor the nofile limit

Wed, 2014-06-18 01:47

In a previous post I explained how to measure the number of processes that are generated when a fork() or clone() call checks the nproc limit. There is another limit in /etc/limits.conf - or in /etc/limits.d - that is displayed by 'ulimit -n'. It's the number of open files - 'nofile' - and here again we need to know what kind of files are counted.



'nofile' is another limit that may not be easy to monitor, because if you just count the 'lsof' output you will include a lot of lines which are not file descriptors. So how can we count the number of files descriptors in a process?



'lsof' is a utility that show all the open files. Let's take an example:

I get the pid of my pmon process:

[oracle@VM211 ulimit]$ ps -edf | grep pmon
oracle   10586     1  0 19:21 ?        00:00:02 ora_pmon_DEMO
oracle   15494 15290  0 22:12 pts/1    00:00:00 grep pmon


And I list the open files for that process

[oracle@VM211 ulimit]$ lsof -p 10586
ora_pmon_ 10586 oracle  cwd  DIR  252,0      4096 /app/oracle/product/12.1/dbs
ora_pmon_ 10586 oracle  rtd  DIR  252,0      4096 /
ora_pmon_ 10586 oracle  txt  REG  252,0 322308753 /app/oracle/product/12.1/bin/oracle
ora_pmon_ 10586 oracle  mem  REG   0,17   4194304 /dev/shm/ora_DEMO_150175744_0
ora_pmon_ 10586 oracle  mem  REG   0,17   4194304 /dev/shm/ora_DEMO_150208513_0
ora_pmon_ 10586 oracle  mem  REG   0,17   4194304 /dev/shm/ora_DEMO_150208513_1
ora_pmon_ 10586 oracle  mem  REG   0,17   4194304 /dev/shm/ora_DEMO_150208513_2
ora_pmon_ 10586 oracle  mem  REG   0,17   4194304 /dev/shm/ora_DEMO_150208513_3
ora_pmon_ 10586 oracle  mem  REG   0,17   4194304 /dev/shm/ora_DEMO_150208513_4
ora_pmon_ 10586 oracle  mem  REG   0,17   4194304 /dev/shm/ora_DEMO_150208513_5
ora_pmon_ 10586 oracle  mem  REG  252,0   1135194 /app/oracle/product/12.1/lib/
ora_pmon_ 10586 oracle  mem  REG  252,0   6776936 /app/oracle/product/12.1/lib/
ora_pmon_ 10586 oracle  mem  REG  252,0     14597 /app/oracle/product/12.1/lib/
ora_pmon_ 10586 oracle    0r CHR    1,3       0t0 /dev/null
ora_pmon_ 10586 oracle    1w CHR    1,3       0t0 /dev/null
ora_pmon_ 10586 oracle    2w CHR    1,3       0t0 /dev/null
ora_pmon_ 10586 oracle    3r CHR    1,3       0t0 /dev/null
ora_pmon_ 10586 oracle    4r REG  252,0   1233408 /app/oracle/product/12.1/rdbms/mesg/oraus.msb
ora_pmon_ 10586 oracle    5r DIR    0,3         0 /proc/10586/fd
ora_pmon_ 10586 oracle    6u REG  252,0      1544 /app/oracle/product/12.1/dbs/hc_DEMO.dat
ora_pmon_ 10586 oracle    7u REG  252,0        24 /app/oracle/product/12.1/dbs/lkDEMO_SITE1
ora_pmon_ 10586 oracle    8r REG  252,0   1233408 /app/oracle/product/12.1/rdbms/mesg/oraus.msb

I've removed hundreds of lines with FD=mem and size=4M. I'm in AMM with memory_target=800M and SGA is implemented in /dev/shm granules. With lsof, we see all of them. And with a large memory_target we can have thousands of them (even if granule becomes 16M when memory_target is larger than 1GB). But don't worry, they don't count in the 'nofile' limit. Only 'real' file descriptors are counted - those with a numeric FD.

So, if you want to know the processes that are near the limit, you can use the following:

[oracle@VM211 ulimit]$ lsof | awk '$4 ~ /[0-9]+[rwu -].*/{p[$1"\t"$2"\t"$3]=p[$1"\t"$2"\t"$3]+1}END{for (i in p) print p[i],i}' | sort -n | tail
15 ora_dmon_    10634   oracle
16 ora_dbw0_    10608   oracle
16 ora_mmon_    10626   oracle
16 ora_rsm0_    10722   oracle
16 tnslsnr      9785    oracle
17 automount    1482    root
17 dbus-daem    1363    dbus
20 rpc.mount    1525    root
21 ora_lgwr_    10610   oracle
89 master       1811    root


The idea is to filter the output of lsof and use awk to keep only the numeric file descriptors, and aggregate per process. Then, we sort them and show the highest counts. Here the Postfix master process has 89 files open. Then log writer follows.

You can get the same information from /proc filesystem where files handles are in /proc//fd:

for p in /proc/[0-9]* ; do echo $(ls $p/fd | wc -l) $(cat $p/cmdline) ; done | sort -n | tail
15 ora_dmon_DEMO
16 ora_dbw0_DEMO
16 ora_mmon_DEMO
16 ora_rsm0_DEMO
16 /app/oracle/product/12.1/bin/tnslsnrLISTENER-inherit
17 automount--pid-file/var/run/
17 dbus-daemon--system
20 rpc.mountd
21 ora_lgwr_DEMO
89 /usr/libexec/postfix/master


Same result, much quicker and more information about the process. This is the way I prefer, but remember that if you want to see all processes, you should be logged as root.


The proof

As I did for nproc, I have written a small C program that open files (passed as arguments) for a few seconds, so that I'm sure I'm counting the right things.

And I encourage to do the same on a test system and let me know if your result differs. Here is the source:

First, I set my nofile limit to only 10

ulimit -n 10


Then, let's open 7 files. In addition with stdin, stdout and stderr we will have 10 file handles:

[oracle@VM211 ulimit]$ ./openfiles myfile1.tmp myfile2.tmp myfile3.tmp myfile4.tmp myfile5.tmp myfile6.tmp myfile7.tmp &
open file 1 of 7 getrlimit nofile: soft=10 hard=10 myfile1.tmp
open file 2 of 7 getrlimit nofile: soft=10 hard=10 myfile2.tmp
open file 3 of 7 getrlimit nofile: soft=10 hard=10 myfile3.tmp
open file 4 of 7 getrlimit nofile: soft=10 hard=10 myfile4.tmp
open file 5 of 7 getrlimit nofile: soft=10 hard=10 myfile5.tmp
open file 6 of 7 getrlimit nofile: soft=10 hard=10 myfile6.tmp
open file 7 of 7 getrlimit nofile: soft=10 hard=10 myfile7.tmp


I was able to open those 7 files. Then I check lsof:

[oracle@VM211 ulimit]$ lsof | grep openfiles
openfiles 21853    oracle  cwd       DIR  0,24    380928    9320 /tmp/ulimit
openfiles 21853    oracle  rtd       DIR 252,0      4096       2 /
openfiles 21853    oracle  txt       REG  0,24      7630    9494 /tmp/ulimit/openfiles
openfiles 21853    oracle  mem       REG 252,0    156928 1579400 /lib64/
openfiles 21853    oracle  mem       REG 252,0   1926800 1579401 /lib64/
openfiles 21853    oracle    0u      CHR 136,1       0t0       4 /dev/pts/1
openfiles 21853    oracle    1u      CHR 136,1       0t0       4 /dev/pts/1
openfiles 21853    oracle    2u      CHR 136,1       0t0       4 /dev/pts/1
openfiles 21853    oracle    3r      REG  0,24         0    9487 /tmp/myfile1.tmp
openfiles 21853    oracle    4r      REG  0,24         0    9488 /tmp/myfile2.tmp
openfiles 21853    oracle    5r      REG  0,24         0    9489 /tmp/myfile3.tmp
openfiles 21853    oracle    6r      REG  0,24         0    9490 /tmp/myfile4.tmp
openfiles 21853    oracle    7r      REG  0,24         0    9491 /tmp/myfile5.tmp
openfiles 21853    oracle    8r      REG  0,24         0    9492 /tmp/myfile6.tmp
openfiles 21853    oracle    9r      REG  0,24         0    9493 /tmp/myfile7.tmp


We see our 10 file handles and this proves that only numeric FD are counted when checking the nofile limit of 10. You see stdin, stdout, stderr as FD 0,1,2 and then my 7 files opened in read only.

Let's try to open one more file:

[oracle@VM211 ulimit]$ ./openfiles myfile1.tmp myfile2.tmp myfile3.tmp myfile4.tmp myfile5.tmp myfile6.tmp myfile7.tmp myfile8.tmp
open file 1 of 8 getrlimit nofile: soft=10 hard=10 myfile1.tmp
open file 2 of 8 getrlimit nofile: soft=10 hard=10 myfile2.tmp
open file 3 of 8 getrlimit nofile: soft=10 hard=10 myfile3.tmp
open file 4 of 8 getrlimit nofile: soft=10 hard=10 myfile4.tmp
open file 5 of 8 getrlimit nofile: soft=10 hard=10 myfile5.tmp
open file 6 of 8 getrlimit nofile: soft=10 hard=10 myfile6.tmp
open file 7 of 8 getrlimit nofile: soft=10 hard=10 myfile7.tmp
open file 8 of 8 getrlimit nofile: soft=10 hard=10 myfile8.tmp
fopen() number 8 failed with errno=24


Here the limit is reached and the open() call returns error 24 (ENFILE) because we reached the nofile=10.



When counting the processes for the nproc limit, we have seen that threads must be counted as processes. For the nofile limit we don't need to detail the threads because all threads share the file descriptor table.


Recommended values

Currently this is what is set on Oracle linux 6 for 11gR2 (in /etc/security/limits.conf):

oracle   soft   nofile    1024
oracle   hard   nofile    65536


For 12c, these are set in /etc/security/limits.d/oracle-rdbms-server-12cR1-preinstall.conf which overrides /etc/security/limits.conf:

oracle soft nofile 1024
oracle hard nofile 65536


Do you think it's a bit low? Just for information, here is what is set in the ODA X4-2:

oracle soft nofile 131072


In any case, it is a good idea to check if you are reaching the limit and the above scripts on lsof or /proc should help for that.

SQL Server: How to find the default data path?

Sun, 2014-06-15 18:16

I have read a lot of SQL Server blog postings and articles in order to find the default data path. This post covers different SQL Server versions (SQL Server 2012, SQL Server 2014, SQL Server 2008, SQL Server 2005) and provides a generic script with different methods.

SQL Server 2014: sys.dm_exec_query_profiles, a new promising feature

Fri, 2014-06-13 03:52

Among the hidden features provided by SQL Server 2014 there is a very interesting dynamic management view named sys.dm_exec_query_profiles. This feature will help the database administrators and experienced developers to troubleshoot long running queries in real-time. I'm pretty sure you had to deal with the following questions: When will this query finish? What percentage of total workload is performed by this request? Which steps are the longest? Before SQL Server 2014 it was impossible to answer the questions above!

But first, I have to admit that this new DMV has raised of lot of questions during my tests. I will try to share my thoughts and findings with you in this post.

My test consisted of running this long query:


select        YEAR(TransactionDate) AS year_tran,        MONTH(TransactionDate) AS month_tran,        FIRST_VALUE(p.ProductNumber) OVER (PARTITION BY YEAR(TransactionDate), MONTH(TransactionDate) ORDER BY TransactionDate) AS first_product_nb,        LAST_VALUE(p.ProductNumber) OVER (PARTITION BY YEAR(TransactionDate), MONTH(TransactionDate) ORDER BY TransactionDate) AS last_product_nb from AdventureWorks2012.dbo.bigTransactionHistory as a        join AdventureWorks2012.dbo.bigProduct as p              on a.ProductID = p.ProductID


On my computer this query takes 05’’47’ to run. The parallelism is enabled. My SQL Server instance can use up to 8 processors. The first time I was disappointed by viewing the output of the sys.dm_exec_query_profiles DMV during my query execution:


select * from sys.dm_exec_query_profiles




… No output! Ok what are we supposed to do to see something with sys.dm_exec_query_profiles ? Laughing  Keep good habits and go back to the SQL Server BOL that says:

To serialize the requests of sys.dm_exec_query_profiles and return the final results to SHOWPLAN XML, use SET STATISTICS PROFILE ON; or SET STATISTICS XML ON;

In other words, to have a chance to see something with this DMV we must use some session options like STATISTICS PROFILE, STATISTICS XML or force SQL Server to display the execution plan after execution query with SQL Server Management Studio. My first though was: why do we have to use some additional options to produce some output for this new DMV? These constraints severely limit the scenarios where we can use this DMV … After some reflexions and discussions with others French MVPs we can think that is normal because tracking the execution plan information is expensive and we could potentially bring a production server to its knees by enabling this feature for all the workload statements. But using additional session options can be impractical in production environments because it requires to execute itself the concerned query and it is not always possible. Fortunately Microsoft provides the query_post_execution_showplan event which can be used into a SQL Server trace or an extended event session. But the implementation design of this event has a significant impact to the performance of a SQL Server instance. Indeed, even with a short-circuit predicate this event will be triggered each time a SQL statement will be executed because the query duration is not known ahead of time (please refer to this Microsoft connect item). To summarize, using this event in OLTP production environment should be used in a short period for troubleshooting purposes. In OLAP environment the story is not the same because we don’t have to deal with a lot of short queries but only with long running queries issued by cube processing or ETL processes for example.

After discussing the pros and cons of this new feature let’s start with my precedent T-SQL statement and the use of the session option: SET STATISTICS PROFILE ON


SET STATISTICS PROFILE ON;   select        YEAR(TransactionDate) AS year_tran,        MONTH(TransactionDate) AS month_tran,        FIRST_VALUE(p.ProductNumber) OVER (PARTITION BY YEAR(TransactionDate), MONTH(TransactionDate) ORDER BY TransactionDate) AS first_product_nb,        LAST_VALUE(p.ProductNumber) OVER (PARTITION BY YEAR(TransactionDate), MONTH(TransactionDate) ORDER BY TransactionDate) AS last_product_nb from AdventureWorks2012.dbo.bigTransactionHistory as a        join AdventureWorks2012.dbo.bigProduct as p              on a.ProductID = p.ProductID


In other session:


select * from sys.dm_exec_query_profiles




This DMV provides a lot of useful information. These information are more granular than SET STATISTICS IO because  the counters returned are per operator per thread (node_id / physical_operator_name and thread_id columns).  In my case the SQL Server instance can use up to 8 processors and we can notice that for some operators (node_id) we have several threads using in parallel (thread_id). Furthermore two others columns are interesting like estimate_row_count and row_count. The former is the number of estimated rows an operator should address and the latter is the current number of rows addressed by the same operator (remember these counters are per operator per thread …). We can compare these two columns to estimate a completion percentage per operator (or per operator per thread). Finally we can categorize some information provided by this DMV in two parts: information related either to the query execution engine (row_count, estimated_row_count) or the storage execution engine (logical_read_count, write_page_count, lob_logical_read_count etc.)

As you can see, this DMV provide a lot of information to us and you will certainly customize the final output yourself depending on what you want to focus on.

Here is an example of a custom script we can create that uses the sys.dm_exec_query_profiles, sys.objects, sys.indexes, sys.dm_exec_requets, sys.dm_exec_sessions, and sys.dm_os_waiting_tasks to cover both the real-time execution information and the related waits occurring in the same time.

USE AdventureWorks2012; GO   SELECT        qp.node_id,        qp.session_id, AS [object_name], AS index_name,        qp.physical_operator_name + QUOTENAME(CAST(COUNT(qp.thread_id) AS VARCHAR(4))) AS physical_operator_name,        SUM(qp.estimate_row_count) AS estimate_row_count,        SUM(qp.row_count) AS row_count,        CASE              WHEN SUM(qp.row_count) * 1. / SUM(qp.estimate_row_count + .00001) THEN CAST(CAST(SUM(qp.row_count) * 100. / SUM(qp.estimate_row_count + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ' %'              ELSE '100 % + (' + CAST(CAST((SUM(qp.row_count) * 100. / SUM(qp.estimate_row_count + .00001)) - 100 AS DECIMAL(5,2)) AS VARCHAR(6)) + ' %)'        END AS [completed_%],        -- Query execution engine        CAST(CAST(SUM(qp.elapsed_time_ms) * 100. /(SUM(SUM(qp.elapsed_time_ms)) OVER() + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ' %' AS [total_elapsed_time_%],        CAST(CAST(SUM(qp.cpu_time_ms) * 100. /(SUM(SUM(qp.cpu_time_ms)) OVER() + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ' %' AS [total_cpu_%],        -- Storage engine        CAST(CAST(SUM(qp.logical_read_count) * 100. / SUM(SUM(qp.logical_read_count)) OVER() + .00001 AS DECIMAL(5,2)) AS VARCHAR(6)) + ' %' AS [total_logical_read_%],        CAST(CAST(SUM(qp.physical_read_count) * 100. /(SUM(SUM(qp.physical_read_count)) OVER() + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ' %' AS [total_physical_read_%],        CAST(CAST(SUM(qp.lob_logical_read_count) * 100. /(SUM(SUM(qp.lob_logical_read_count)) OVER() + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ' %' AS [lob_logical_read_%],        CAST(CAST(SUM(qp.lob_physical_read_count) * 100. /(SUM(SUM(qp.lob_physical_read_count)) OVER() + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ' %' AS [lob_physical_read_%],        CAST(CAST(SUM(qp.write_page_count) * 100. /(SUM(SUM(qp.write_page_count)) OVER() + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ' %' AS [total_write_%],        SUM(qp.logical_read_count) AS total_logical_read_count,        SUM(qp.lob_logical_read_count) AS total_lob_logical_read_count,        SUM(qp.lob_physical_read_count) AS total_lob_physical_read_count,        SUM(qp.lob_read_ahead_count) AS total_lob_read_ahead_count,        SUM(qp.physical_read_count) AS total_physical_read_count,        SUM(qp.read_ahead_count) AS total_read_ahead_count,        SUM(qp.write_page_count) AS total_write_page_count,        -- Both        SUM(qp.cpu_time_ms) AS total_cpu_time_ms,        SUM(qp.elapsed_time_ms) AS total_elapsed_time_ms,        -- wait info information        COALESCE(wait.wait_info, '') AS wait_info FROM sys.dm_exec_query_profiles AS qp(NOLOCK)        LEFT JOIN sys.objects AS o(NOLOCK)              ON qp.object_id = o.object_id        LEFT JOIN sys.indexes AS i(NOLOCK)              ON qp.index_id = i.index_id                     AND qp.object_id = i.object_id OUTER APPLY (        SELECT        STUFF(                     (SELECT                                   ',' + ws.wait_type + ' ' + QUOTENAME(CAST(SUM(COALESCE(ws.wait_duration_ms, 0)) AS VARCHAR(20)) + ' ms')                            FROM sys.dm_exec_requests AS r(NOLOCK)                                   JOIN sys.dm_exec_sessions AS s(NOLOCK)                                         ON r.session_id = s.session_id                                   JOIN sys.dm_os_waiting_tasks AS ws(NOLOCK)                                         ON ws.session_id = s.session_id                            WHERE s.session_id = qp.session_id                            GROUP BY ws.wait_type                            FOR XML PATH (''), TYPE).value('.', 'varchar(max)') , 1, 1, ''              ) ) AS wait(wait_info) GROUP BY qp.node_id, session_id, qp.physical_operator_name,,, COALESCE(wait.wait_info, '') ORDER BY node_id OPTION(RECOMPILE);


The first section of the above script concerns the general information:




1: This above script groups the information per operator. The information in brackets is the number of thread used for the concerned operator.

2: Completion is provided as a ratio between rows addressed by the corresponding operator so far and the total estimated rows. As we can expect, the estimated rows and the real rows addressed by an operator can differ. The above script highlights this gap by showing the difference in percentage between brackets if row_count is greater than estimated_row_count. This information can be useful to “estimate” the general completion of the query.

However Total_elasped_time_% and total_cpu_% columns are the relative percentage of the elapsed time and cpu time consumed by all the query execution operators.

3: For some physical operators like Clustered Index Scan, Table Scan etc … we will show the related table and index objects


The second section concerns the storage engine statistic for the query execution:




The above information can be useful to highlight the top operators depending on the resource or the operation type. For example, in my case I can notice that the cluster index scan of the bigTransactionHistory table is the main consuming physical operator for logical reads. Furthermore, we can observe that the sort operator is the root cause of spilling. Finally as expected we can notice that by design hash match operator consumes a lot of CPU and it’s the top operator in this case.


The next section shows the absolute values per operator per type of resources and operation:




Finally the last section shows the wait types and aggregated duration by all related threads in brackets during the query execution in a real time:




In my case we retrieve the wait type related to a query execution in parallel  (CXPACKET) and certainly one of the root cause of the CXPACKET wait type (IO_COMPLETION) but here we don't have sufficient information to verify it. We can just notice that the query execution generates CXPACKET and IO_COMPLETION wait types in a real time.

At this point we can ask why we don’t have a remaining time information. In fact, computing an accurate remaining time is very difficult because a lot of factors must be correlated together. We can have a naive approach by computing the remaining time with the row_count, estimated_row_count and elapsed time columns for each operator, but you will be disappointed when testing this… According to my own tests, I can claim that the result is never as accurate as you would expect. Please feel free to share your results or thoughts about computing an accurate remaining time with us!

The sys.dm_exec_query_profiles DMV can be correlated by external tools like XPERF or Perfmon. I will prepare a next blog post (as soon as I have some time) to share my experience with you.

Happy query profiling!

Linux: how to monitor the nproc limit

Tue, 2014-06-10 01:14

You probably know about 'nproc' limits in Linux which are set in /etc/limits.conf and checked with 'ulimit -u'. But do you know how to handle the monitoring and be alerted when you're close the fixed limit?

Nproc and ps

Nproc is defined at OS level to limit the number of processes per user. Oracle documentation recommends the following:

oracle soft nproc 2047
oracle hard nproc 16384

But that is often too low, especially when you have the Enterprise Manager agent or other java programs running.

Do you want to check that you are far from the limit? then you can use 'ps'. But beware, 'ps' by default does not show all processes.
In Linux, when doing multithreading, each thread is implemented as a light-weight process (LWP). And you must use the '-L' to see all of them.

Let's take an example. I have a system where 'ps -u oracle' returns 243 lines. But including LWPs shows a lot more processes which is near the limit:

$ ps h -Led -o user | sort | uniq -c | sort -n
      1 dbus
      1 ntp
      1 rpc
      1 rpcuser
      2 avahi
      2 haldaemon
      2 postfix
    166 grid
    400 root
   1370 oracle

So the 'oracle' user has 1370 processes. That's high. And this is the actual number where the nproc limit applies.

'ps -Lf' can show the detail. And even without '-L' we can display the NLWP which is the number of threads per process:

ps -o nlwp,pid,lwp,args -u oracle | sort -n
   1  8444  8444 oracleOPRODP3 (LOCAL=NO)
   1  9397  9397 oracleOPRODP3 (LOCAL=NO)
   1  9542  9542 oracleOPRODP3 (LOCAL=NO)
   1  9803  9803 /u00/app/oracle/product/agent12c/core/ /u00/app/oracle/product/agent12c/core/ agent /u00/app/oracle/product/agent12c/agent_inst/sysman/log/emagent.nohup
  19 11966 11966 /u00/app/11.2.0/grid/bin/oraagent.bin
1114  9963  9963 /u00/app/oracle/product/agent12c/core/ ... emagentSDK.jar oracle.sysman.gcagent.tmmain.TMMain

The Oracle 12c EM agent has started 1114 threads and the grid infrastructure 'oraagent.bin' has 19 threads. In addition to that I've a lot of other monothreaded processes. This is how we reach 1370 which is the exact value to compare to the nproc limit.

So what are the good values to set? About the high number of threads for EM agent 12c, there are a few bugs. And I suspect that 1000 threads is too much, especially when checking them with 'jstack' I see that they are "CRSeOns" threads that should not be used in and higher. But that's another problem which I'm currently investigating. When you reach the nproc limit, the user will not be able to create new processes. clone() calls will return EAGAIN and that is reported by Oracle as:

ORA-27300: OS system dependent operation:fork failed with status: 11 
ORA-27301: OS failure message: Resource temporarily unavailable 

And that is clearly bad when it concerns an +ASM instance or archiver processes.

The goal of the nproc limit is only to prevent 'fork bombs' where a process forks forever and exhausts all resources. So there is no problem to increase this limit. However if you set it high for some users ('oracle' and 'grid' usually), it can be a good idea to monitor the number of processes with the ps h -L above. Because having too many processes is suspect and increasing the limit just hides a process leak and defer the failure.

In 'ps h -L -o' The argument 'h' is there to remove the header line, and '-L' to show all processes including LWP. Then you can count with 'wc -l'.

The proof

In order to be sure that 'ps h -L' gives the exact number, I have tested it. In case you want to check this on your system, here is how to do it. And please report any difference.

First, set your limit to 1024 processes. This is a limit for my user, and the limit is set for my shell and all its child processes:

[oracle@VM211 ocm]$ ulimit -u 1024

Now you can check it:

[oracle@VM211 ocm]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15919
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Then you can run a small C program ( that calls fork() in a loop until it fails with EAGAIN:

[oracle@VM211 ocm]$ ./testnproc
parent says fork number 871 sucessful
 child says fork number 872 pid 1518
parent says fork number 872 sucessful
 child says fork number 873 pid 1519
parent says fork number 873 sucessful
 child says fork number 874 pid 1520
parent says fork number 874 sucessful
parent says fork number 875 failed (nproc: soft=1024 hard=1024) with errno=11

And finally, because the processes sleep for a while, you can check how many processes you have. I do that from another user account for the simple reason that I need to create 2 more processes ('ps' and 'wc') for that:

[root@VM211 ocm]# ps h -Lu oracle | wc -l
Recommended values for Oracle

Currently this is what is set on Oracle linux 6 for 11gR2 by the preinstall package (in /etc/security/limits.conf):

oracle   soft   nproc    16384
oracle   hard   nproc    16384

For 12c, these are set in /etc/security/limits.d/oracle-rdbms-server-12cR1-preinstall.conf which overrides /etc/security/limits.conf:

oracle soft nproc 16384
oracle hard nproc 16384

And just for your information, here is what is set in the ODA X4-2:

oracle soft nproc 131072

So what do you want to set? You probably don't want it too low and experience 'resource temporarily unavailable'. But what you don't want either is 100000 processes on your server. So my recommendation is to set it high but monitor it when the number of processes reaches something that is not sensible. Then you prevent having the system down in case of process leak, but you can detect it and ask for a patch.

SQL Server 2014: Availability groups failover impossible with Failover Cluster Manager

Fri, 2014-06-06 01:27

A couple of weeks ago, I worked for a customer that wanted to implement SQL Server 2012 (and not SQL Server 214) AlwaysOn with availability groups. During our tests we performed a bunch of failover tests and the customer tried to perform a failover of one of the installed availability group by using the failover cluster manager (FCM). Of course, I told him this is not best practice because the failover cluster manager is not aware of the synchronization state of the availability group. But with SQL Server 2014, the story has changed because I noticed a different behavior. I would like to share this information with you in this posting.

But let me first demonstrate the SQL Server 2012 behavior with the following example:

I have an availability group named SQL12 configured with 2 synchronous replicas and automatic failover. However, as you can see, the synchronization state of my availability database is not healthy as shown below:




Now, if I try to failover my availability group using the failover cluster manager ...






.... the next available node is SQL143 ...






... and we can notice the failover did not occur as expected because the SQL12 resource is still on the SQL141 node as shown in the following picture:




Having a look at the cluster error log does not help us in this case because we have a classic 1069 error number without helpful information:





Generating detailed cluster error logs could help us but I prefer to directly look at the SQL Server side for the moment. The AlwaysOn_health extended event is a good start to check for some existing records associated to the problem.




We have indeed some information about the failover attempt. First the SQL141SQL12 replica state changed from PRIMARY_NORMAL to RESOLVING_NORMAL due to the manual failover issued by the FCM.




Then we can see an error message that explains that the dummy database is changing its role from PRIMARY to RESOLVING because there is a problem with a role synchronization. This error is issued by the forced failover of the availability group resource that I used.




Finally, we notice the failover process did not complete succesfully and the dummy database failbacks on the SQL141SQL12 replica (availability_replica_id 1B9007CA-EE3F-486D-A974-838EFED0203D associated to the SQL141SQL12 replica in my case)




On the other side, the SQL143SQL12 secondary replica also features a lot of useful information:






To summarize, this test demonstrates clearly that the FCM is not aware of the availability databases synchronization state inside an availability group. Using FCM may result in unintended outcomes, including unexpected downtime!

Now, it's time to test the same scenario with SQL Server 2014 and a configured availability group. During the failover attempt, I get the following error message:






If we take a look at the possible owners of the corresponding availability group resource, we can see that the current cluster node that hosts the availability group is the only possible owner - unlike SQL Server 2012.




As a reminder, possible and preferred owners are resetted during the availability group creation and failover based on the primary replica and its secondaries. For fun, I decided to compare the two processes issued by the both versions of SQL Server and availability groups (SQL Server 2012 and SQL Server 2014) after having triggered an availability group failover and generating the associated cluster logs:


Get-ClusterLog -UseLocalTime -Span 5

Result with SQL Server 2012

000005d4.00000778::2014/05/26-22:10:55.088 INFO [RCM] rcm::RcmApi::AddPossibleOwner: (sql12, 1) 000005d4.00000778::2014/05/26-22:10:55.088 INFO [GUM] Node 1: executing request locally, gumId:215, my action: /rcm/gum/AddPossibleOwner, # of updates: 1 000005d4.00000778::2014/05/26-22:10:55.088 INFO [RCM] rcm::RcmGum::AddPossibleOwner(sql12,1) 000005d4.00000778::2014/05/26-22:10:55.103 ERR   mscs::GumAgent::ExecuteHandlerLocally: (5010)' because of 'The specified node is already a possible owner.' 000005d4.00000778::2014/05/26-22:10:55.103 WARN [DM] Aborting group transaction 29:29:613+1 000005d4.00000778::2014/05/26-22:10:55.103 ERR   [RCM] rcm::RcmApi::AddPossibleOwner: (5010)' because of 'Gum handler completed as failed' 000005d4.00000778::2014/05/26-22:10:55.103 WARN [RCM] sql12 cannot be hosted on node 3 000005d4.00000778::2014/05/26-22:10:55.103 WARN [RCM] Possible owners: 000005d4.00000778::2014/05/26-22:10:55.103 WARN     2 000005d4.00000778::2014/05/26-22:10:55.103 WARN     1 000005d4.00000778::2014/05/26-22:10:55.103 WARN 000005d4.00000778::2014/05/26-22:10:55.103 INFO [RCM] rcm::RcmApi::RemovePossibleOwner: (sql12, 2) 000005d4.00000778::2014/05/26-22:10:55.103 INFO [GUM] Node 1: executing request locally, gumId:215, my action: /rcm/gum/RemovePossibleOwner, # of updates: 1 000005d4.00000778::2014/05/26-22:10:55.103 INFO [RCM] rcm::RcmGum::RemovePossibleOwner(sql12,2) 000005d4.00000778::2014/05/26-22:10:55.103 INFO [RCM] Removing node 2 from resource 'sql12'. 000005d4.00000778::2014/05/26-22:10:55.103 INFO [GEM] Node 1: Sending 1 messages as a batched GEM message 000005d4.00000778::2014/05/26-22:10:55.103 INFO [RCM] rcm::RcmApi::AddPossibleOwner: (sql12, 2) 000005d4.00000778::2014/05/26-22:10:55.103 INFO [GUM] Node 1: executing request locally, gumId:216, my action: /rcm/gum/AddPossibleOwner, # of updates: 1 000005d4.00000778::2014/05/26-22:10:55.103 INFO [RCM] rcm::RcmGum::AddPossibleOwner(sql12,2) 000005d4.00000778::2014/05/26-22:10:55.103 INFO [RCM] Adding node 2 to resource 'sql12'. 000005d4.00000778::2014/05/26-22:10:55.103 INFO [GEM] Node 1: Sending 1 messages as a batched GEM message 000005d4.00000778::2014/05/26-22:10:55.103 INFO [GUM] Node 1: executing request locally, gumId:217, my action: /rcm/gum/SetGroupPreferredOwners, # of updates: 1 000005d4.00000778::2014/05/26-22:10:55.103 INFO [RCM] rcm::RcmGum::SetGroupPreferredOwners(sql12, 000005d4.00000778::2014/05/26-22:10:55.103 INFO     1 000005d4.00000778::2014/05/26-22:10:55.103 INFO     2 000005d4.00000778::2014/05/26-22:10:55.103 INFO


Result with SQL Server 2014

000005d4.00000bb0::2014/05/26-22:14:54.578 INFO [RCM] rcm::RcmApi::AddPossibleOwner: (AdventureWorksGrp, 1) 000005d4.00000bb0::2014/05/26-22:14:54.578 INFO [GUM] Node 1: executing request locally, gumId:230, my action: /rcm/gum/AddPossibleOwner, # of updates: 1 000005d4.00000bb0::2014/05/26-22:14:54.578 INFO [RCM] rcm::RcmGum::AddPossibleOwner(AdventureWorksGrp,1) 000005d4.00000bb0::2014/05/26-22:14:54.578 ERR   mscs::GumAgent::ExecuteHandlerLocally: (5010)' because of 'The specified node is already a possible owner.' 000005d4.00000bb0::2014/05/26-22:14:54.578 WARN [DM] Aborting group transaction 29:29:627+1 000005d4.00000bb0::2014/05/26-22:14:54.578 ERR   [RCM] rcm::RcmApi::AddPossibleOwner: (5010)' because of 'Gum handler completed as failed' 000005d4.00000bb0::2014/05/26-22:14:54.578 WARN [RCM] AdventureWorksGrp cannot be hosted on node 3 000005d4.00000bb0::2014/05/26-22:14:54.578 WARN [RCM] Possible owners: 000005d4.00000bb0::2014/05/26-22:14:54.578 WARN     2 000005d4.00000bb0::2014/05/26-22:14:54.578 WARN     1 000005d4.00000bb0::2014/05/26-22:14:54.578 WARN 000005d4.00000c34::2014/05/26-22:14:54.578 INFO [RCM] rcm::RcmApi::RemovePossibleOwner: (AdventureWorksGrp, 2) 000005d4.00000c34::2014/05/26-22:14:54.578 INFO [GUM] Node 1: executing request locally, gumId:230, my action: /rcm/gum/RemovePossibleOwner, # of updates: 1 000005d4.00000c34::2014/05/26-22:14:54.578 INFO [RCM] rcm::RcmGum::RemovePossibleOwner(AdventureWorksGrp,2) 000005d4.00000c34::2014/05/26-22:14:54.578 INFO [RCM] Removing node 2 from resource 'AdventureWorksGrp'. 000005d4.00000c34::2014/05/26-22:14:54.578 INFO [GEM] Node 1: Sending 1 messages as a batched GEM message 000005d4.000011f4::2014/05/26-22:14:54.578 INFO [GUM] Node 1: executing request locally, gumId:231, my action: /rcm/gum/SetGroupPreferredOwners, # of updates: 1 000005d4.000011f4::2014/05/26-22:14:54.578 INFO [RCM] rcm::RcmGum::SetGroupPreferredOwners(AdventureWorksGrp, 000005d4.000011f4::2014/05/26-22:14:54.578 INFO     1 000005d4.000011f4::2014/05/26-22:14:54.578 INFO


As I said earlier, possible and preferred owners properties are managed automatically by SQL Server AlwaysOn. We can see here this is done by the cluster resource control manager and the following functions:

rcm::RcmApi::AddPossibleOwner(), rcm::RcmApi::RemovePossibleOwner() and rcm::RcmApi::SetGroupPreferredOwners () .

You can notice that two nodes are added as possible owners with SQL Server 12. However, with SQL Server 2014 only one node is added as possible owner of the concerned availability group resource. Interesting change isn't it?

How to list all Oracle system schemas

Thu, 2014-06-05 04:51

Do you want to know which users come with the Oracle Database and which are the ones you have created? This is that not easy, especially prior to Oracle 12c. You know SYS and SYSTEM, but there are many others - especially when you have installed all options.

And it is important to know them all, e. g. when you import or apply a password policy, etc.

Let's see the options we have.


12c shows Oracle maintained users

Oracle 12c has introduced the ORACLE_MAINTAINED in the DBA_USERS dictionary view.

It is YES for the users created by oracle scripts (create database, calalog.sql, catproc.sql). So it's the right place to check when you are in 12c. 


12c Common users

Oracle 12c has another column to distinguish the user commonality. This is for CDB: the oracle dictionary is stored in the root container, and each PDB have links to it. And what is nice is that even when in a non-cdb the column is at 'YES' for Oracle maintained users and at 'NO' for created users. (except if you create C## common users in a cdb).


Default passwords

The table sys.default_pwd$ stores the default password hash value so that we can check which users can log with trivial password. Even if it includes all the Oracle maintained users, it has also some well known users which are not provided by database creation at all.


Sysaux occupants

Another idea that I took from Martin Bach is to crosscheck check with v$sysaux_occupants. Some Oracle provided users are missing, but the nice thing is that there is a description for the schemas that are listed.


Not exported by DataPump

DataPump maintains its own list of users for which the tables are not exported, and you can query sys.ku_noexp_tab for that. The old exp has also its list stored in sys.exu8usr. But both are not exhaustive.

Another exclusion list is the one used by logical standby: system.logstdby$skip which may be used as well to crosscheck (only records with 'action=0' are relevant for schema exclusion).

Let's check all that on a 12c non-cdb database. Here is the query that crosscheck all than information with outer joins:

select created,username,oracle_maintained,common,no_exp,no_expdp,no_sby,default_password,sysaux,occupant_desc
from dba_users 
left outer join 
 (select distinct name username,'Y' no_expdp from sys.ku_noexp_tab where obj_type='SCHEMA') 
left outer join (select distinct name username,'Y' no_exp from sys.exu8usr)
left outer join (select distinct name username,'Y' no_sby from system.logstdby$skip_support where action in (0,-1))
left outer join (select distinct user_name username,'Y' default_password from sys.default_pwd$)
left outer join (select schema_name username,'Y' sysaux,decode(count(*),1,min(occupant_desc)) occupant_desc from v$sysaux_occupants group by schema_name)
order by created,username;

and the result, ordered by creation date so that you can check which users were created by the database creation:

AUX 28-FEB-14 08:25:02 AUDSYS Y YES Y Y Y Y Y 28-FEB-14 08:25:02 SYS Y YES Y Y Y Y Y 28-FEB-14 08:25:02 SYSBACKUP Y YES   Y Y Y   28-FEB-14 08:25:02 SYSDG Y YES   Y Y Y   28-FEB-14 08:25:02 SYSKM Y YES   Y Y Y   28-FEB-14 08:25:02 SYSTEM Y YES Y   Y Y Y 28-FEB-14 08:25:05 OUTLN Y YES Y   Y Y   28-FEB-14 08:29:33 GSMADMIN_INTERNAL Y YES   Y Y Y   28-FEB-14 08:29:33 GSMUSER Y YES   Y Y Y   28-FEB-14 08:29:43 DIP Y YES   Y Y Y   28-FEB-14 08:30:39 XS$NULL Y YES   Y Y Y   28-FEB-14 08:31:12 ORACLE_OCM Y YES   Y Y Y   28-FEB-14 08:39:55 DBSNMP Y YES   Y Y Y Y 28-FEB-14 08:39:56 APPQOSSYS Y YES   Y Y Y   28-FEB-14 08:40:03 ANONYMOUS Y YES   Y Y Y   28-FEB-14 08:40:03 XDB Y YES   Y Y Y Y 28-FEB-14 08:44:29 GSMCATUSER Y YES   Y Y Y   28-FEB-14 08:53:38 WMSYS Y YES   Y Y Y Y 28-FEB-14 08:55:31 OJVMSYS Y YES   Y Y Y   28-FEB-14 08:59:58 CTXSYS Y YES   Y Y Y Y 28-FEB-14 09:00:44 MDSYS Y YES   Y Y Y Y 28-FEB-14 09:00:44 ORDDATA Y YES   Y Y Y Y 28-FEB-14 09:00:44 ORDPLUGINS Y YES   Y Y Y Y 28-FEB-14 09:00:44 ORDSYS Y YES   Y Y Y Y 28-FEB-14 09:00:44 SI_INFORMTN_SCHEMA Y YES   Y Y Y Y 28-FEB-14 09:09:48 OLAPSYS Y YES   Y Y Y Y 28-FEB-14 09:10:17 MDDATA Y YES Y   Y Y   28-FEB-14 09:15:09 SPATIAL_WFS_ADMIN_USR Y YES Y   Y Y   28-FEB-14 09:15:13 SPATIAL_CSW_ADMIN_USR Y YES Y   Y Y   28-FEB-14 09:20:00 LBACSYS Y YES   Y Y Y   28-FEB-14 09:20:16 APEX_040200 Y YES       Y   28-FEB-14 09:20:16 APEX_PUBLIC_USER Y YES Y     Y   28-FEB-14 09:20:16 FLOWS_FILES Y YES       Y   28-FEB-14 09:45:17 DVF Y YES   Y Y Y   28-FEB-14 09:45:17 DVSYS Y YES   Y Y Y   21-MAR-14 18:15:19 FRANCK N NO Y         09-APR-14 20:33:34 RMAN N NO Y     Y   11-APR-14 09:04:55 DEMO N NO Y     Y   23-APR-14 20:53:09 HR N NO Y   Y Y   23-APR-14 20:53:09 IX N NO Y   Y Y   23-APR-14 20:53:09 OE N NO Y   Y Y   23-APR-14 20:53:09 PM N NO Y   Y Y   23-APR-14 20:53:09 SH N NO Y   Y Y   25-APR-14 21:26:33 SQLTXADMIN N NO Y         25-APR-14 21:26:33 SQLTXPLAIN N NO Y         27-MAY-14 21:48:24 SCOTT N NO Y   Y Y  


So you see that the 12c columns are highly reliable. They include all users that have been created during the database creatio, but do not include the users that were created afterwards.

When not in 12c, the exclusion list coming from DataPump is quite good, but not exhaustive.

Sometimes the most useful features are not those that change the future of database, but rather those little things that make life easier.

In DBA_USERS, we have two very useful new columns: ORACLE_MAINTAINED and LAST_LOGIN. That is a great improvement.

How to avoid Failover Cluster with PowerShell scripting

Wed, 2014-06-04 20:15

One of my colleague asked me some days ago if I could create a PowerShell script to get rid from Windows Server Failover Cluster. The reason is that it has, in some specific cases, the bad habit to failover a resource or a service even when we don't want it. To be honest, I use a lot the Failover Cluster feature of Windows Server without any problems as a SQL Server DBA, but for this specific case, I did not find a successful configuration.

Here, we are using Failover Cluster to provide a Virtual IP address for a database using Dbvisit Standby. The database is open only on one node and the switchover or failover operation is performed by hand so we want to prevent failover of the Virtual IP resource to avoid redirecting client connection to a database that is not open.

At one of our clients, we have a Windows Server 2008 R2 cluster with two nodes and a FileShare. A Client Access Point has been provided as a service for client connections and this service must not failover automatically in any cases. For some reasons, this service failovers and I cannot avoid it even with special configuration settings.

For a solution to this problem, I propose to create a PowerShell script which will create a new IP address attached to the Network IPv4 interface. This script will read a configuration file to collect informations about the IP address like network address mask, network interface name, IP address, nodes where to start and stop the IP address, etc.

Here is the format of our configuration file:

# Configuration file for VIP management of databases
# Configuration format :
# SID            : Oracle SID for which the VIP is configured - mandatory
# VIP_ADR     : Address of the VIP - mandatory
# VIP_MASK   : VIP Network address Mask - mandatory
# VIP_IF         : Network interface to attach the VIP (i.e Local Area Connection) - mandatory
# VIP_LSNR    : VIP Listener - optional
# VIP_Node1   : Name of Node 1 - mandatory
# VIP_Node2   : Name of Node 2 - mandatory
# Note :
# The usage of parameters delimiters allows to have potentially empty (not
# defined parameters)
# How to call the script :
# .CreateIPAddress.ps1 'Start' 'DBITEST' [-F]
# Param 1: "Start" or "Stop" - mandatory
# Param 2: VIP Address name - mandatory
# Param 3: -F to force creation of the VIP Address locally even if it is not possible to check if the VIP Address exits on the second node - optional
# Example :
# DBITEST: Area Connection::test1:test2

In this blog, I will share how the script works step by step and what are the components/commands I had to learn about to create an IP on a network interface.

First of all, this PowerShell script uses the cmdlet Invoke-Command which uses Windows Remote Management (WinRM). If WinRM is not configured you can enable the WS-Management protocol on the local computer and set up the default configuration for remote management with the following command: Winrm quickconfig.

You need also to change the execution policy in your PowerShell screen with Set-ExecutionPolicy Unrestricted to allow script execution. You should also open your PowerShell screen with the Run As Administrator option.


The first thing I do is to clear my screen:


Now, I have to check if I receive the parameter I need for my script:


I have initialized my two mandatory parameters and perhaps a third one which is optional. It is time to read where my configuration file is located, test this location and initialize the variables.


I am able now to read my configuration file sequentially to find my VIP Address. I will test first character of each line to avoid comment line and when I will find a non-comment line, check the first parameter of the line to test if it is equal to my VIP Address enter into the command line.


I have found my VIP Address in my configuration file, I will now split my configuration line to collect all informations I need and connect to my second node to get back the different IP Addresses attached to Network Adapters.

If the second node doesn't respond we stop the script to avoid to create a local IP Address which already exists on the second node. But if the optional parameter "-F" has been entered we force the creation.


If we run the command "Get-WmiObject -Class Win32_NetworkAdapterConfiguration -Filter IPEnabled=TRUE -ComputerName $node2 -erroraction stop", we have a list of each Network Adapter with its IP Addresses associated.


We have now to search in the different lines if our VIP Address exists. If we find it, we have to delete this IP Address and if it doesn't exist, we should print a message.


For the moment, we have to check remotely if our IP Address existed on the remote server and if it was the case we have to delete this IP Address.

Now, based on our parameter, we will stop or start this IP Address locally.


Our VIP address is now started or stopped locally depending on the defined parameters. Finally, if we have not found the VIP Address in the configuration file, an error will be displayed and the script ends.


At present, we have a local VIP Address where the client can come to connect. If our node goes down for whatever reason when it comes back, our VIP address will by always in the same node. If we want to change for node 1 to node 2, we just have to run our script on node 2 and go for the parameter start. This will stop the IP Address on node 1 and start the IP Address on node 2.

I hope this script and the different functions I have shown here were able to help you.

Configure a Kerberos Client and Alfresco as a Kerberized application

Wed, 2014-05-28 03:57

In the previous blog, I described How to install and manage a Kerberos Server but that's useless if there are no clients and if no application have been kerberized! That's why in this blog I will explain in the first part how to install a kerberos client in linux. The second part will be dedicated to the configuration of a browser to use Kerberos tickets and the last part will explain how to configure a popular application to use the newly created Kerberos MIT KDC. This application is Alfresco (leader in Open Source solutions for Electronic Content Management).

For this blog, let's define the following properties/variables:

  • = the DNS Domain
  • = the FQDN of the KDC
  • document.write(['mpatou','EXAMPLE.COM'].join('@')) = the principal of a test user
  • = the FQDN of the Alfresco host server

All configurations below have been tested on our infrastructure.

1. Install MIT Kerberos Client

On this part, I will only present how to install a Linux Client. I think the Mac release is available as part of the Mac OS X since version 10.3 (the current release is Mavericks: 10.9) and so there is nothing to do to install a Kerberos client but this installation isn't configured. The Windows installation is quite easy if you use the installer but this isn't a perfect solution. I think the best solution for windows would be to use the Windows implementation of Kerberos to configure the connection to a Linux KDC but this is a little bit more complex and I personally have troubles to configure that...

So, all steps to install a Linux client are quite the same as those to install a Kerberos Server. Indeed, there is no specific source code for the client and so the basic installation is the same but it's not the case for the configuration.

Obviously, the first thing to do is to download the current release of the MIT Kerberos distribution for the target operating system. This could be done at the following URL: The current Linux release is krb5-1.12.1-signed.tar:

# wget
    --2014-04-01 14:00:28--
    Connecting to||:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 11950080 (11M) [application/x-tar]
    Saving to: “krb5-1.12.1-signed.tar”
    100%[===============================================>] 11,950,080  1.52M/s   in 7.3s

    2014-04-01 14:00:38 (1.56 MB/s) - “krb5-1.12.1-signed.tar” saved [11950080/11950080]

# tar  -xvf krb5-1.12.1-signed.tar

As you can see, this file is signed and you could (should) verify the integrity and identity of the software. This can be done, for example, using GNU Privacy Guard:

# gpg  --verify  krb5-1.12.1.tar.gz.asc

After that, just extract the MIT Kerberos source code:

# tar  -zxf  krb5-1.12.1.tar.gz
# cd  krb5-1.12.1/src/
# ./configure
# yum  install  *yacc*
# make
# make  install

At this step, Kerberos should be installed properly and the binaries, libraries and the documentation should be under /usr/local. The default location is sufficient for a client installation:

# krb5-config  --all
    Version:     Kerberos 5 release 1.12.1
    Vendor:      Massachusetts Institute of Technology
    Prefix:      /usr/local
    Exec_prefix: /usr/local

To set up a linux client, there is only one step remaining: tell Kerberos where to find a KDC. This is done through a configuration file named krb5.conf. In the following configuration, I've included two lines (forwardable and proxiable) that are important for Alfresco SSO using Kerberos to work properly. These two lines aren't mandatory for other kerberized applications:

# vi  /etc/krb5.conf
        default_realm = EXAMPLE.COM
        forwardable = true                            {Line only important for Alfresco}
        proxiable = true                              {Line only important for Alfresco}

        EXAMPLE.COM = {
            kdc =
            admin_server =
            default_domain =

    [domain_realm] = EXAMPLE.COM = EXAMPLE.COM

That should be enough to obtain a ticket for the test user ("kinit mpatou" OR "kinit document.write(['mpatou','EXAMPLE.COM'].join('@'))") and delete this ticket:

# klist
    klist: Credentials cache file '/tmp/krb5cc_0' not found
# kinit  mpatou
    Password for document.write(['mpatou','EXAMPLE.COM'].join('@')):
# klist
    Ticket cache: FILE:/tmp/krb5cc_0
    Default principal: document.write(['mpatou','EXAMPLE.COM'].join('@'))

    Valid starting               Expires               Service principal
    05/20/2014 10:54:48     05/20/2014 20:54:48     krbtgt/document.write(['mpatou','EXAMPLE.COM'].join('@'))
    renew until 05/21/2014 10:54:47
# kdestroy
# klist
    klist: Credentials cache file '/tmp/krb5cc_0' not found

2. Configure the Browser

Now the client should be able to acquire the first ticket (TGT) but that's not enough! The next step is to configure the Browser to use tickets. Indeed, if the client tries to access to a kerberized application, the browser has to present a ticket for that application to be logged in automatically.

a. Mozilla Firefox (Window/Linux/Mac)

Open a new window/tab in Mozilla Firefox:

  • Enter in the URL field "about:config"
  • All properties of Firefox should be displayed
  • Search for "network.negotiate" and set: network.negotiate-auth.delegation-uris = .EXAMPLE.COM
  • Set: network.negotiate-auth.trusted-uris = .EXAMPLE.COM

b. Google Chrome

i. Mac

Modify the Google Chrome application with:

open 'Google' --args --auth-schemes="basic,digest,ntlm,negotiate" --auth-server-whitelist="*EXAMPLE.COM" --auth-negotiate-delegate-whitelist="*EXAMPLE.COM"

ii. Linux

Modify Google Chrome with:

google-chrome --enable-plugins --args --auth-server-whitelist="*EXAMPLE.COM" --auth-negotiate-delegate-whitelist="*EXAMPLE.COM" --auth-schemes="basic,digest,ntlm,negotiate"

iii. Windows

Modify the Google Chrome shortcut with:

chrome.exe --auth-schemes="basic,digest,ntlm,negotiate" --auth-server-whitelist="*EXAMPLE.COM" --auth-negotiate-delegate-whitelist="*EXAMPLE.COM"

c. Internet Explorer

Open a new window/tab in Internet Explorer and:

  • Open the "tools" (ALT+X)
  • Click on "internet options" (o)
  • Choose the "Advanced" tab
  • In the security section, select the "Enable Integrated Windows Authentication*"
  • Choose the "Security" tab
  • Select "Local intranet"
  • Click on "Sites"
  • Click on "Advanced"
  • Add "http://*" or/and "https://*"
  • Click on "Close"
  • Click on "OK"
  • Click on "OK"
  • Restart the computer

If everything works fine, then the client should be able to access to kerberized applications. If it's not the case, a good start to debug the kerberos installation is to use a network analyzer like WireShark.

3. Configure Alfresco to use Kerberos SSO

Actual test configuration:

  • OS: Oracle Enterprise Linux 6
  • Alfresco: Alfresco Community Enterprise 4.2.c installed on /opt/alfresco
  • Application Server: Tomcat 7

So let's begin the configuration of a Kerberized application with Alfresco. The first thing to know about Alfresco is that Alfresco uses two main Web Clients: Alfresco Explorer and Alfresco Share. The first one is the core of Alfresco and the second one is the new interface (mainly oriented on collaboration) that uses a proxy to ask Explorer to do the job. The configuration of Alfresco Explorer is quite easy but to get Alfresco Share working it's not the same as Share add a new layer above Explorer.

If nothing is specified, all modifications below should be done on, the Alfresco host server.

For Kerberos to work properly, the maxHttpHeaderSize must be increased:

# vi  /opt/alfresco/alfresco-4.2.c/tomcat/conf/server.xml
   Connector port="8080" URIEncoding="UTF-8" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" maxHttpHeaderSize="32768"

 Then the authentication chain must allow Kerberos tickets to be used to authenticate a user:

# vi  /opt/alfresco/alfresco-4.2.c/tomcat/shared/classes/

After that, the Kerberos subsystem must be configured:

# cd  /opt/alfresco/alfresco-4.2.c/tomcat/shared/classes/alfresco/extension/subsystems
# vi  Authentication/kerberos/kerberos1/

Once this is done, the KDC must be configured to recognize Alfresco as a Kerberized application. For that purpose, enter in the KDC kadmin interface. Caution, this requires the installation of a Kerberos Server on as described in the previous blog (the important part is that kadmin must be available from a remote computer) and the installation of a Kerberos Client on If your kadmin doesn't work from a remote location, you will have to use the kadmin.local interface on the KDC host server.

# /usr/local/bin/kadmin
    addprinc  cifs/document.write(['','EXAMPLE.COM'].join('@'))
    addprinc  HTTP/document.write(['','EXAMPLE.COM'].join('@'))
    ktadd  -k  /etc/krb5cifs.keytab  cifs/document.write(['','EXAMPLE.COM'].join('@'))
    ktadd  -k  /etc/krb5http.keytab  HTTP/document.write(['','EXAMPLE.COM'].join('@'))


This will create 2 files named 'krb5cifs.keytab' and 'krb5http.keytab' on the host where the kadmin was run. That means that if you used the kadmin from, then there is nothing more to do but if you used the kadmin or kadmin.local from, then those 2 files must be moved to

# cd  /etc
# scp  krb5cifs.keytab  document.write(['root',''].join('@')):/etc
enter password:
# scp  krb5http.keytab  document.write(['root',''].join('@')):/etc
enter password:

Create or update the config file for the Java Security. Be careful that this is the Java used by Alfresco:

# vi  /opt/alfresco/alfresco-4.2.c/java/jre/lib/security/java.login.config:
    Alfresco { sufficient;

    AlfrescoCIFS { required

    AlfrescoHTTP { required
    ShareHTTP { required
    }; { sufficient;

    other { sufficient;

Configure the default Java Security to use our custom configuration by adding a line at the end of the file:

# vi  /opt/alfresco/alfresco-4.2.c/java/jre/lib/security/

 Finally, update the share-config-custom.xml file to contain the Kerberos configuration (the three images are in a unique file attached in this blog ->

# vi /opt/alfresco/alfresco-4.2.c/tomcat/shared/classes/alfresco/web-extension/share-config-custom.xml



That could be enough to get the Kerberos SSO working for Alfresco Explorer and Alfresco Share. I think there are some additional steps to get the Alfresco CIFS working too but I'm not sure. Moreover, it's possible that you get some strange exceptions that prevent Alfresco to authenticate your Kerberos Ticket. That probably comes from the Java Cryptography Extension that is missing on your Alfresco server. For that purpose, download the JCE corresponding to the Java version used by Alfresco (JCE6 or JCE7) and deploy it (I assume below that the JCE7 is under /opt):

# unzip  /opt/
# cd  /opt/alfresco/alfresco-4.2.c/java/jre/lib/security
# cp  local_policy.jar  local_policy.jar.orig
# cp  US_export_policy.jar  US_export_policy.jar.orig
# cd  /opt/UnlimitedJCEPolicy
# cp  local_policy.jar  /opt/alfresco/alfresco-4.2.c/java/jre/lib/security/
# cp  US_export_policy.jar  /opt/alfresco/alfresco-4.2.c/java/jre/lib/security/

I hope I was clear enough in my explanations and I hope I was able to share (a little bit?) my passion for open source solutions!

Good luck with Kerberos and Alfresco!