Pythian Group

Subscribe to Pythian Group feed
Love Your Data
Updated: 2 hours 55 min ago

No Arguments Here – Use xargs for File Management Performance

Wed, 2016-04-20 10:30


Database Administrators and System Administrators have this in common: managing a large number of log files is just part of the job on Linux systems.

Tools such as logrotate significantly simplify the file management task for routinely created log files. Even so, there are still many ‘opportunities’ to exercise your command line fu to manage thousands or millions of files.  These may be files that need to be moved, removed or searched.

When the files span multiple directories the find command is often used. The following command for instance will find all log files of a certain age and size and remove them.

find . -name "*.log" -size +1M -exec rm  {} \;


For a few files this will work just fine, but what happens if the number of files to be processed is several thousands, or even millions?

The xargs Difference

Let’s first create 200k files to use for testing. These files will all be empty, there is no need for any content for these tests.

The script can be used to create the directories and empty files.

As it takes some time to create the files, we will not use the rm command here, but rather just the file command. The command will be timed as well.

#  time find . -type f -name file_\* -exec file {} \; >/dev/null

real    1m24.764s
user    0m4.624s
sys     0m12.581s

Perhaps 1 minute and 24 seconds seems to be a reasonable amount of time to process so many files.

It isn’t.

Let’s use a slightly different method to process these files, this time by adding xargs in a command pipe.

 time find . -type f -name file_\* | xargs file >/dev/null

real    0m0.860s
user    0m0.456s
sys     0m0.432s

Wait, what?!  0.8 seconds? Can that be correct?

Yes, it is correct. Using xargs with find can greatly reduce the resources needed to iterate through files.

How then, is is possible for the command that used xargs to complete so much faster than the command that did not use xargs?

When iterating through a list of files with the -exec  argument to the find command, a new shell is forked for each execution of find.

For a large number of files this requires a lot of resources.

For demonstration purposes I will be using the ‘file’ command rather than ‘rm’.

Could it be that the xargs method may have benefited from the caching effects of running the first find command?

Could be – let’s run find … -exec again and see if it benefits from caching.

# time find . -type f -name file_\* -exec file {} \; >/dev/null

real    1m25.722s
user    0m3.900s
sys     0m11.893s

Clearly any caching didn’t help find … -exec.

Why Is xargs Fast?

Why is the use of xargs so much faster than find? In short it is due to find starting a new process for each file it finds when the -exec option is used.

The command ‘find | xargs’ was wrapped in a shell script to facilitate the use of strace.

The script takes 2 arguments; the number of files to pipe to xargs and the number files that xargs should send to the file command for each invocation of file.

The number of files to process is controlled by piping the output of find to head.

The xargs –max-args argument is used to control how many arguments are sent to each invocation of find.

We can now use strace with the -c option; -c accumulates a count of all calls along with timing information.

Calling the script to run for the first 10000 files, with 1000 files sent to each invocation of find:

# strace -c -f  ./ 10000 1000
MAX_FILES: 10000
MAX_ARGS: 1000
Process 11268 attached
Process 11269 attached
Process 11267 resumed
Process 11269 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.55    0.080017        5001        16         2 wait4
  0.35    0.000280           0     12372           newfstatat
  0.09    0.000074           0       208           getdents
  0.01    0.000006           0     10000           lstat
  0.00    0.000000           0       199           read
  0.00    0.000000           0       276         1 write
  0.00    0.000000           0       384        91 open
  0.00    0.000000           0       313         4 close
  0.00    0.000000           0        68        42 stat
  0.00    0.000000           0       189           fstat
  0.00    0.000000           0         5         1 lseek
  0.00    0.000000           0       209           mmap
  0.00    0.000000           0        71           mprotect
  0.00    0.000000           0        37           munmap
  0.00    0.000000           0        72           brk
  0.00    0.000000           0        41           rt_sigaction
  0.00    0.000000           0        80           rt_sigprocmask
  0.00    0.000000           0         2           rt_sigreturn
  0.00    0.000000           0        13        12 ioctl
  0.00    0.000000           0        77        77 access
  0.00    0.000000           0         2           pipe
  0.00    0.000000           0         6           dup2
  0.00    0.000000           0         1           getpid
  0.00    0.000000           0        14           clone
  0.00    0.000000           0        14           execve
  0.00    0.000000           0         2           uname
  0.00    0.000000           0         4         1 fcntl
  0.00    0.000000           0       206           fchdir
  0.00    0.000000           0         5           getrlimit
  0.00    0.000000           0         1           getuid
  0.00    0.000000           0         1           getgid
  0.00    0.000000           0         1           geteuid
  0.00    0.000000           0         1           getegid
  0.00    0.000000           0         1           getppid
  0.00    0.000000           0         1           getpgrp
  0.00    0.000000           0        14           arch_prctl
  0.00    0.000000           0         2         1 futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.080377                 24910       232 total

The largest chunk of time was spent in the wait4 system call. These are waits on execve, of which there were 14.

Of the 14 calls to execve, there was 1 each for the use of bash (the script itself), find, head and xargs, leaving 10 calls to be consumed by file.

The following command can be used if you would like to try this yourself:

strace  -f -e trace=execve  ./ 10000 1000  2>&1 | grep execve

What happens when the same type of test is run against find with the -exec argument?

There is no method (that I can find in the man page anyway) by which we can limit the number of files that are sent to the program specified in the -exec argument of find.

We can still learn what is going on, it is just necessary to wait 1.5 minutes for the command to complete.

# strace -c -f find . -type f -name file_\*  -exec file {} \; >/dev/null

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.80    4.101094          21    200000           wait4
  0.69    0.029305           0    200000           clone
  0.46    0.019278           0   2602351   1400007 open
  0.44    0.018833           0    600001           munmap
  0.31    0.013108           0   3200017           mmap
  0.30    0.012715           0   1401173           fstat
  0.16    0.006979           0   1200006   1200006 access
  0.15    0.006543           0   1202345           close
  0.15    0.006288           0   1000004    600003 stat
  0.13    0.005632           0   1000004           read
  0.12    0.004981           0    200000           lstat
  0.09    0.003704           0    600026           brk
  0.07    0.003016           0   1000009           mprotect
  0.07    0.002776           0    200001    200000 ioctl
  0.03    0.001079           0    201169           newfstatat
  0.02    0.000806           0      2347           getdents
  0.01    0.000600           0    200000           write
  0.00    0.000003           0    200001           arch_prctl
  0.00    0.000002           0    202341           fchdir
  0.00    0.000000           0         3           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0    400001    200000 execve
  0.00    0.000000           0         1           uname
  0.00    0.000000           0         1           fcntl
  0.00    0.000000           0         2           getrlimit
  0.00    0.000000           0         2         1 futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    4.236742              15811808   3600017 total


You may have noticed there are twice as many calls to execve than there were files to process.

This is due to something referenced in the comments of Unless a full path name is specified when running a command, the PATH variable is searched for that command. If the command is not found by the first invocation of execve, then another attempt is made the next directory in PATH.

The following example shows the difference between using the command name only, and then using the fully pathed name of the file command.

# strace -e trace=execve -f find -maxdepth 1 -type f -name \*.sh  -exec file {} \;  2>&1 | grep execve
execve("/usr/bin/find", ["find", "-maxdepth", "1", "-type", "f", "-name", "*.sh", "-exec", "file", "{}", ";"], [/* 83 vars */]) = 0
[pid  9267] execve("/usr/local/bin/file", ["file", "./"], [/* 83 vars */]) = -1 ENOENT (No such file or directory)
[pid  9267] execve("/usr/bin/file", ["file", "./"], [/* 83 vars */]) = 0
[pid  9268] execve("/usr/local/bin/file", ["file", "./"], [/* 83 vars */]) = -1 ENOENT (No such file or directory)
[pid  9268] execve("/usr/bin/file", ["file", "./"], [/* 83 vars */]) = 0
[pid  9269] execve("/usr/local/bin/file", ["file", "./"], [/* 83 vars */]) = -1 ENOENT (No such file or directory)
[pid  9269] execve("/usr/bin/file", ["file", "./"], [/* 83 vars */]) = 0

# strace -e trace=execve -f find -maxdepth 1 -type f -name \*.sh  -exec /usr/bin/file {} \;  2>&1 | grep execve
execve("/usr/bin/find", ["find", "-maxdepth", "1", "-type", "f", "-name", "*.sh", "-exec", "/usr/bin/file", "{}", ";"], [/* 83 vars */]) = 0
[pid  9273] execve("/usr/bin/file", ["/usr/bin/file", "./"], [/* 83 vars */]) = 0
[pid  9274] execve("/usr/bin/file", ["/usr/bin/file", "./"], [/* 83 vars */]) = 0
[pid  9275] execve("/usr/bin/file", ["/usr/bin/file", "./"], [/* 83 vars */]) = 0
Too Much Space

Regardless of how bad a practice it may be, there will be times that file and directory names may contain space characters. Literal spaces, newlines and tabs can all play havoc with file name processing;  xargs has you covered.

Two files are created to demonstrate:


# touch 'this filename has spaces' this-filename-has-no-spaces

# ls -l
total 0
-rw-r--r-- 1 jkstill dba 0 Apr 15 09:28 this filename has spaces
-rw-r--r-- 1 jkstill dba 0 Apr 15 09:28 this-filename-has-no-spaces

What happens when the output of find it piped to xargs?


 find . -type f | xargs file
./this-filename-has-no-spaces: empty
./this:                        ERROR: cannot open `./this' (No such file or directory)
filename:                      ERROR: cannot open `filename' (No such file or directory)
has:                           ERROR: cannot open `has' (No such file or directory)
spaces:                        ERROR: cannot open `spaces' (No such file or directory)

The spaces in one of the filenames causes xargs to treat each word in the filename as a separate file.

Because of this it is a good idea to use the -print0 and -0 args as seen in the following example. These arguments change the output terminator of find to the null character, as well as changing the input terminator of xargs to the null character to deal with space characters in file and directory names.


 find . -type f -print0  | xargs -0 file
./this-filename-has-no-spaces: empty
./this filename has spaces:    empty

There is quite a bit more to xargs than this, I would encourage you to read the man page and experiment with the options to better learn how to make use of it.

Hope For find

For many versions of GNU find there is an easy modification that can be made to the command line that will cause the -exec option to emulate the method xargs uses pass input to a command.

Simply by changing -exec command {} \; to  -exec command {} +, the find command will execute much faster than previously.

Here the find command has matched the performance of xargs when processing 200k files:


# time find . -type f -name file_\*  -exec file {} +  | wc
 200000  400000 8069198

real    0m0.801s
user    0m0.436s
sys     0m0.404s

This may mean a quick and simple change to maintenance scripts can yield a very large increase in performance.

Does this mean there is no longer a need for xargs?  Not really, as xargs offers levels of control over the input to piped commands that simply are not available in the find command.

If you’ve never used xargs, you should consider doing so, as it can reduce the resource usages on your systems and decrease the runtime for maintenance tasks.

Categories: DBA Blogs

Data is Everything, and Everything is Data

Tue, 2016-04-19 10:33
Exploring the phenomenon of “datafication”

In the year 2000, only a quarter of the world’s stored information was digital; the rest was on paper, film, and other analog media. Today, less than two percent of all stored information is nondigital. (1)

This is largely the result of “datafication”, a process that turns all aspects of life—preferences, opinions, telephone calls and sensor-driven information—into data.

Datafication is the driving force behind Big Data. It’s also causing a threefold shift in how we look for meaning in the information available to us: away from traditional sampling approaches, toward greater tolerance of messy, unstructured data, and into the search for correlations rather than absolute, singular causes to explain trends and events. These changes are already having major impacts in every area of our lives—from scientific research to business and finance, to healthcare, education and transportation.

Watch this video of Pythian President and CEO Paul Vallée, as he talks about datification and generating revenue.

From sampling to knowing

Representative sampling is based on the idea that, within a certain margin of error, we can make inferences about a total population from a small, randomized subset. This works well for simple questions like, “Which of our customers generate the most revenue?” but lacks the detail to effectively answer queries like, “Which customers are most profitable?” or, “Which customers are considering to leave us for another vendor?”

Inexpensive computer memory, powerful processors and sophisticated algorithms now allow us to analyze vast amounts of data rather than small samples. Using Big Data in this way has the considerable advantage of predictive capability—it can identify patterns and trends that aren’t detectable in a small sample, giving an unprecedented view of future behavior.

From clean to messy

What’s new about Big Data isn’t just that there’s lots of it. Because it comes from many different sources in many different formats, it’s not tidy like traditional datasets. Tolerating some inaccuracy may require data analysts to shift their outlooks a little, but when you’re trying to answer big, complex questions, the gain in data scope is a good trade-off against using smaller amounts of very exact data. Here’s an example.

In 2009, Google showed it’s possible to predict locations of seasonal outbreaks of the flu using nothing more than archived records of Google searches. The sheer size of the data set (think a billion searches a day in the U.S. alone) more than compensated for its messiness. After running nearly half a billion calculations against the data, Google identified 45 terms—words such as “headache” and “runny nose”—that had a strong correlation with the CDC’s data on flu outbreaks.

From cause to correlation

The Google example points to a third change brought about by datafication and Big Data: abandoning the search for certainty. Instead of looking for causes, innovative data users are looking for correlations. For example, automotive and aviation engineers are collecting and analyzing massive quantities of information on engines that have failed, looking for patterns that will help them predict when other engines might be at risk of failing in the future. They’re not seeking a single cause for a single event; they’re mapping correlations between huge numbers of events to recognize patterns that can be put to practical, preventative use.

The correlation approach has been used to spot infections in premature babies before overt symptoms appear and to predict everything from manhole cover failures to consumer purchasing habits.

Big Data insights require Big Thinking

Harnessing the powerful, often unpredictable, insights available from Big Data requires three things: as complete a dataset as possible, people with the skills required to collect, manage and analyze that data, and people who know how to ask unexpected, even visionary questions. It’s not just a matter of the right technologies—it’s about a fundamental shift in how we relate to data and what can be done with it.

Categories: DBA Blogs

SQL Server Dates, Dates and More Dates

Tue, 2016-04-19 09:44


Working with SQL Server date functions can be frustrating. This purpose of this blog is to share some date statements I use regularly, especially when doing business Intelligence and DataWarehouse solutions.

I hope you find them useful and if you have any questions or any more useful statements in relation to dates in SQL Server, please feel free to leave them in the comments below

SELECT DATEADD(d , -1 , GETDATE()) 'Yesterday'
----First Day of Current Week
SELECT DATEADD(wk , DATEDIFF(wk , 0 , GETDATE()) , 0) 'First Day of Current Week'
----Last Day of Current Week
SELECT DATEADD(wk , DATEDIFF(wk , 0 , GETDATE()) , 6) 'Last Day of Current Week'
----First Day of Last Week
SELECT DATEADD(wk , DATEDIFF(wk , 7 , GETDATE()) , 0) 'First Day of Last Week'
----Last Day of Last Week
SELECT DATEADD(wk , DATEDIFF(wk , 7 , GETDATE()) , 6) 'Last Day of Last Week'
----First Day of Current Month
SELECT DATEADD(mm , DATEDIFF(mm , 0 , GETDATE()) , 0) 'First Day of Current Month'
----Last Day of Current Month
SELECT DATEADD(ms , -3 , DATEADD(mm , 0 , DATEADD(mm , DATEDIFF(mm , 0 , GETDATE()) + 1 , 0))) 'Last Day of Current Month'
----First Day of Last Month
SELECT DATEADD(mm , -1 , DATEADD(mm , DATEDIFF(mm , 0 , GETDATE()) , 0)) 'First Day of Last Month'
----Last Day of Last Month
SELECT DATEADD(ms , -3 , DATEADD(mm , 0 , DATEADD(mm , DATEDIFF(mm , 0 , GETDATE()) , 0))) 'Last Day of Last Month'
----First Day of Current Year
SELECT DATEADD(yy , DATEDIFF(yy , 0 , GETDATE()) , 0) 'First Day of Current Year'
----Last Day of Current Year
SELECT DATEADD(ms , -3 , DATEADD(yy , 0 , DATEADD(yy , DATEDIFF(yy , 0 , GETDATE()) + 1 , 0))) 'Last Day of Current Year'
----First Day of Last Year
SELECT DATEADD(yy , -1 , DATEADD(yy , DATEDIFF(yy , 0 , GETDATE()) , 0)) 'First Day of Last Year'
----Last Day of Last Year
SELECT DATEADD(ms , -3 , DATEADD(yy , 0 , DATEADD(yy , DATEDIFF(yy , 0 , GETDATE()) , 0))) 'Last Day of Last Year'
Categories: DBA Blogs

Internals of Querying the Concurrent Requests’ Queue – Revisited for R12.2

Tue, 2016-04-19 09:01

Once upon a time I wrote about the Internal Workflow of an E-Business Suite Concurrent Manager Process. Many things have changed since that blog post, the most obvious change being the release of Oracle e-Business Suite R12.2. I decided to check if the way the concurrent manager queues were processed by concurrent manager processes were still the same. My main goal was to see if the manager processes still don’t attempt any way of coordination to distribute the requests among them.

This is how I did the testing:

  • I used the VM templates provided by Oracle to build my R12.2.4 test environment. By the way, I didn’t expect that the process of getting the environment up would be so simple! Downloading the media files from was the most time-consuming step, once done – it took me just 1 hour to un-compress everything, import the Virtual Assembly file and bring up the R12.2.4 environment on my laptop.
  • 3 Standard managers are defined by default
  • Sleep seconds were left as is = 30 seconds
  • Cache size was increased from 1 to 5.
  • Identified the 3 DB processes that belong to the Standard managers:
    select sid, serial# from v$session where module='e:FND:cp:STANDARD'
  • I enabled tracing with binds and waits for each of them like this:
    exec dbms_monitor.session_trace_enable(sid,serial#,true,true);
  • Once that was done I submitted one concurrent program – “Active users” and waited for it to complete.
  • I disabled the tracing and collected the trace files.
    exec dbms_monitor.session_trace_disable(sid,serial#);
  • Collected the trace files

I found 2 of the trace files to be very interesting. To make things more simple, the manager process “A” will be the one that executed the concurrent request, and process “B” will be the one that didn’t.

Before the “Active Users” Request Was Submitted

No other requests were running at the time I did the testing, so I clearly observed how both Managers A and B queried the FND_CONCURRENT_REQUESTS table BOTH of the trace files displayed the same method of how requests are picked up from the queue. Note, I’m showing only the lines relevant to display the main query only, and I have formatted the query text to make it more readable:

PARSING IN CURSOR #139643743645920 len=1149 dep=0 uid=100 oct=3 lid=100 tim=1460211399835915 hv=3722997734 ad='d275f750' sqlid='cd23u4zfyhvz6'
FROM Fnd_Concurrent_Requests R
WHERE R.Hold_Flag                             = 'N'
AND R.Status_Code                             = 'I'
AND R.Requested_Start_Date                   <= Sysdate
AND (R.Node_Name1                            IS NULL
OR (R.Node_Name1                             IS NOT NULL
AND FND_DCP.target_node_mgr_chk(R.request_id) = 1))
AND (R.Edition_Name                          IS NULL
OR R.Edition_Name                            <= sys_context('userenv', 'current_edition_name'))
  FROM Fnd_Concurrent_Programs P
  WHERE P.Enabled_Flag         = 'Y'
  AND R.Program_Application_Id = P.Application_Id
  AND R.Concurrent_Program_Id  = P.Concurrent_Program_Id
    FROM Fnd_Oracle_Userid O
    WHERE R.Oracle_Id = O.Oracle_Id
      FROM Fnd_Conflicts_Domain C
      WHERE P.Run_Alone_Flag = C.RunAlone_Flag
      AND R.CD_Id            = C.CD_Id
  AND (P.Execution_Method_Code                          != 'S'
  OR (R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) IN ((0,98),(0,100),(0,31721),(0,31722),(0,31757)))
AND ((R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) NOT IN ((510,40032),(510,40033),(510,42156),(510,42157),(530,43793),(530,43794),(535,42626),(535,42627),(535,42628)))
ORDER BY NVL(R.priority, 999999999),
EXEC #139643743645920:c=0,e=33,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211399835910
FETCH #139643743645920:c=0,e=546,p=0,cr=106,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211399836507
WAIT #139643743645920: nam='SQL*Net message to client' ela= 3 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211399836572

*** 2016-04-09 10:17:09.837
WAIT #139643743645920: nam='SQL*Net message from client' ela= 30000367 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211429836965
EXEC #139643743645920:c=0,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211429838767
FETCH #139643743645920:c=0,e=689,p=0,cr=106,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211429839587
WAIT #139643743645920: nam='SQL*Net message to client' ela= 4 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211429839652

*** 2016-04-09 10:17:39.840
WAIT #139643743645920: nam='SQL*Net message from client' ela= 30000325 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211459840003

It’s important to observe that:

  • All manager’s processes still compete for the same requests. If the query is executed at the same time, the same list of concurrent requests will be retrieved by all processes.
  • The constants literals used in lines 30-32 mean that the query for checking the queue is still built when the concurrent manager process starts up. These constants are mainly used to implement the specializations rules in the query.
  • Only rowid for the pending requests’ rows in FND_CONCURRENT_REQUESTS are fetched.
  • The sleep time is clearly visible on lines 41,42 and 48,49
After the “Active Users” Request Was Submitted – Starting the Concurrent Request

The manager process A was the first to pick up the submitted requests and it could be observed by the “r=1” (1 row fetched) in the FETCH call for the query we just reviewed:

FETCH #139643743645920:c=0,e=437,p=0,cr=113,cu=0,mis=0,r=1,dep=0,og=1,plh=3984653669,tim=1460211519844640

Immediately after this, the manager process A locked the row in FND_CONCURRENT_REQUESTS table, this way, the request got assigned to this process. Notice the similar where predicates used in this query, these are actually required to make sure that the request is still not picked up by another manager process. However the main thing here is the fact that the request row is accessed by the “rowid” retrieved earlier (row 45, the value of the bind variable “:reqname” is “AAAjnSAA/AAAyn1AAH” in this case). Locking of the row is done by the “FOR UPDATE OF R.status_code NoWait” clause on line 49:

PARSING IN CURSOR #139643743640368 len=4530 dep=0 uid=100 oct=3 lid=100 tim=1460211519864113 hv=4239777398 ad='cde86338' sqlid='6ya6bzgybbrmq'
SELECT R.Conc_Login_Id,
  ... excluded other 156 columns for brevity...
FROM fnd_concurrent_requests R,
  fnd_concurrent_programs P,
  fnd_application A,
  fnd_user U,
  fnd_oracle_userid O,
  fnd_conflicts_domain C,
  fnd_concurrent_queues Q,
  fnd_application A2,
  fnd_executables E,
  fnd_conc_request_arguments X
WHERE R.Status_code             = 'I'
AND (R.Edition_Name            IS NULL
OR R.Edition_Name              <= sys_context('userenv', 'current_edition_name'))
AND R.Request_ID                = X.Request_ID(+)
AND R.Program_Application_Id    = P.Application_Id(+)
AND R.Concurrent_Program_Id     = P.Concurrent_Program_Id(+)
AND R.Program_Application_Id    = A.Application_Id(+)
AND P.Executable_Application_Id = E.Application_Id(+)
AND P.Executable_Id             = E.Executable_Id(+)
AND P.Executable_Application_Id = A2.Application_Id(+)
AND R.Requested_By              = U.User_Id(+)
AND R.Cd_Id                     = C.Cd_Id(+)
AND R.Oracle_Id                 = O.Oracle_Id(+)
AND Q.Application_Id            = :q_applid
AND Q.Concurrent_Queue_Id       = :queue_id
AND (P.Enabled_Flag            IS NULL
OR P.Enabled_Flag               = 'Y')
AND R.Hold_Flag                 = 'N'
AND R.Requested_Start_Date     <= Sysdate
AND ( R.Enforce_Seriality_Flag  = 'N'
OR ( C.RunAlone_Flag            = P.Run_Alone_Flag
AND (P.Run_Alone_Flag           = 'N'
  FROM Fnd_Concurrent_Requests Sr
  WHERE Sr.Status_Code         IN ('R', 'T')
  AND Sr.Enforce_Seriality_Flag = 'Y'
  AND Sr.CD_id                  = C.CD_Id
AND Q.Running_Processes                                     <= Q.Max_Processes
AND R.Rowid                                                  = :reqname
AND ((P.Execution_Method_Code                               != 'S'
OR (R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID)       IN ((0,98),(0,100),(0,31721),(0,31722),(0,31757))))
AND ((R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) NOT IN ((510,40032),(510,40033),(510,42156),(510,42157),(530,43793),(530,43794),(535,42626),(535,42627),(535,42628))) 
FOR UPDATE OF R.status_code NoWait

The behavior of the manager process B was a little bit more interesting. It too managed to fetch the same rowid from FND_CONCURRENT_PROCESSES table belonging to the submitted “Active Users” processes. However, when it tried to lock the row in FND_CONCURRENT_REQUESTS (By using exactly the same query), this happened:

PARSING IN CURSOR #139690311998256 len=4530 dep=0 uid=100 oct=3 lid=100 tim=1460211519900924 hv=4239777398 ad='cde86338' sqlid='6ya6bzgybbrmq'
BINDS #139690311998256:
  oacdty=01 mxl=32(18) mxlc=00 mal=00 scl=00 pre=00
  oacflg=20 fl2=1000001 frm=01 csi=873 siz=0 off=64
  kxsbbbfp=7f0c2f713f20  bln=32  avl=18  flg=01
EXEC #139690311998256:c=1000,e=1525,p=0,cr=25,cu=1,mis=0,r=0,dep=0,og=1,plh=4044729389,tim=1460211519902727
ERROR #139690311998256:err=54 tim=1460211519902750

The query failed with “ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired”.
This is how the access to pending concurrent requests is serialized to make sure only one of the manager processes can run it. And, I think, relying on the well-tuned and highly efficient locking mechanism of Oracle Database is a very very smart idea.

  • The coordination between manager processes is still not happening to distribute the requests, but the managers all query the queue the same way and then compete between themselves to lock the requests’ entries on the table 1st. The process that gets the lock also gets to execute the concurrent request.
  • The cache size variable couldn’t be observed in the trace files, but as far as I remember from my previous research the process would only fetch “cache size”-number of rowids using the 1st query in this post. This could be tested by submitting larger volume of requests simultaneously.
  • The “sleep seconds” kicks in only when the manager process didn’t fetch any rowids from the queue. After all the cached requests are attempted/executed by the manager process, the queue is checked again immediately without waiting for the “sleep seconds” (Not explained in detail in this post, but it’s revealed in the trace files)
  • The DMLs used to query the FND_CONCURRENT_REQUESTS and to lock the row are very very similar to Pre-R12.2 releases of e-Business Suite (Another sign that the process hasn’t changed, though one change that I see is the addition of where clause predicates for Checking the Editions).
Categories: DBA Blogs

More Effective Anti-Entropy Repair in Cassandra

Mon, 2016-04-18 13:12
1. Introduction

Cassandra offers three different repair mechanisms to make sure data from different replicas are consistent: Hinted Hand-off, Read-Repair, and Anti-Entropy Repair.

The first two mechanisms are kind of “automatic” mechanisms that will be triggered internally within Cassandra, depending on the configuration parameter values that are set for them either in cassandra.yaml file (for Hinted Hand-off) or in table definition (for Read-Repair). The last mechanism (Anti-Entropy Repair), however, needs to be triggered with manual intervention, by running “nodetool repair” command (possibly with various options associated with it).

Despite the “manual” nature of Anti-Entropy repair, it is nevertheless necessary, because the first two repair mechanisms cannot guarantee fixing all data consistency scenarios for Cassandra. For example, 1) what if a node is down longer than “max_hint_window_in_ms” (which defaults to 3 hours)? and  2) For deletes, Anti-Entropy repair has to be executed before “gc_grace_seconds” (defaults to 10 days) in order to avoid tombstone resurrection.

At the same time, however, due to how Anti-Entropy repair is designed and implemented (Merkle-Tree building and massive data streaming across nodes), it is also a very expensive operation and can cause a big burden on all hardware resources (CPU, memory, hard drive, and network) within the cluster. In this case, how to run Anti-Entropy repair effectively becomes a constant topic within Cassandra community.

In this post, I’d like to explore and summarize some of the techniques that can help achieve a more effective Anti-Entropy repair for Cassandra.

2. What is the issue of an Anti-Entropy Repair

Before Cassandra version 2.2, a sequential full Anti-Entropy repair is the default behavior when “nodetool repair” command (with no specific options) is executed. When this happens,

1) The node that initiates the operation, called coordinator node, scans the partition ranges that it owns (either as primary or replica) one by on. For each partition range, it sends the request to each of the peer/replica nodes to build a Merkle tree.

2) The peer/replica node scans all SSTables and a major, or validation, compaction is triggered, which reads every row in the SSTables, generates a hash for it, and then adds the result to a Merkle tree.

3) Once the peer node finishes building the Merkle tree, it sends the result back to the coordinator. The coordinator node compares every Merkle tree with all other trees. If difference is detected, data is exchanged between differing nodes.

Looking at this highly summarized procedure, there are a few things that immediately caught my attention:

First, because building Merkletree requires hashing every row of all SSTables,  it is a very expensive operation, stressing CPU, memory, and disk I/O.

Second, when Merkletree difference is detected, network bandwidth can be overwhelmed to stream the large amount of data to remote nodes.

Lastly, although not obvious, it is a worse problem in my opinion, which is that this operation can cause computation repetition and therefore waste resources unnecessarily. Please see this post for an example of how this can happen.

Based on the high cost related with it, a default (pre-Cassandra 2.2) sequential full Anti-Entropy repair is, in practice, rarely considered to be a routine task to run in production environment. Actually, a common trick that many people do with Cassandra repair is simply touching all data and let read-repair do the rest of work.

However, there do have situations when an Anti-Entropy repair is required, for example, to recover from data loss. What can we do in these cases?

3. What can help to achieve a more effective Anti-Entropy repair?

Over the time, different options for “nodetool repair” command have been introduced in different Cassandra versions. These options represent different techniques that can help achieve more effective Anti-Entropy repair, which I’ll go through in more details in this section. Meanwhile, I will also try to clarify some of the confusion that new Cassandra users might have around the different nodetool repair command options (and related jargon’s) of different Cassandra versions.

3.1 Primary range repair

In Cassandra, each node manages several (token) ranges of data. One of them is the primary range which is the token range that is assigned to the node according to the token setup within the ring. Other ranges of data are actually replica of primary ranges from other nodes. Running “nodetool repair” with option “-pr” (or “–partition-range”)  on a node means that the node only repairs the data of the primary range, but not other ranges managed on this node. The default behavior of a repair (without this option) is to repair all ranges of data managed by a node.

When using this option, it avoids the cost of doing Merkle tree calculation on non-primary range data. It also helps reduce excessive data streaming across the network. One caveat of using this option is that since each node only repairs one range of data that is managed by a node, this option needs to run on ALL nodes in the ring in order to repair all the data.

This option is available in very early release (0.x) of Cassandra.

3.2 Sequential repair

“nodetool repair” option “-snapshot” (or “–with-snapshot“) means a sequential repair (also called snapshot repair) and is a feature introduced in Cassandra version 1.2 and made as default in Cassandra version 2.0. Actually, in DataStax document for Cassandra 2.0 and later, you won’t find this option. In Cassandra 2.0 and beyond, you can specify option “-par” (or “–parallel”) to tell Cassandra to run in parallel. What does this really mean?

As we mentioned in section 3.1, each node manages several different ranges of data, either as primary or replica. When “-par” option is used, it instructs Cassandra to run repair on all these ranges at the same, which means the expensive repair operation (especially the Merkletree building part) happens on multiple nodes concurrently. This could be problematic and may slow down the entire cluster.

But when using “-snapshot” option (or default in Cassandra 2.0 and beyond), for each range of data, a snapshot is first taken and the repair operation is executed on the snapshot sequentially. This means that at any time for a given replica set of data, only one replica is being repaired, allowing other replica to satisfy the application requests in a more performant way.

3.3 Incremental repair

Since Cassandra version 2.1, a new option “-ic” (or “–incremental “) is introduced for incremental repair. Starting from Cassandra version 2.2, this option becomes the default option and at the same time, option “-full” (or “–full”) is used to specify a full repair.

The option of incremental repair is a great feature that will help overcome all 3 issues as listed in Section 2. Instead of building a Merkle tree out of all SSTables (repaired or not), this option only builds the tree out of un-repaired SSTables. Therefore,

  • The size of the Merkle tree to be built is way smaller and therefore requires much less computing resources
  • When doing Merkle tree comparison, less data will be compared and potentially streamed over the network
  • Already repaired data don’t need to be computed and compared again, thus avoiding a lot of unnecessary repetition.

There are a few things that need to pay attention to when using this option, please check my previous post  Apache Cassandra 2.1 Incremental Repair for more details.

Please also be noted that when running incremental repair in Cassandra 2.1 with Leveled Compaction Strategy (LCS), it may fail with RuntimeException (see CASSANDRA-9935 for more detail).

3.4 Sub-range repair

So far in this post when talking about repairing a range of data, it means the entire range (either primary or non-primary), which is sometimes also called as endpoint range. Beginning with Cassandra version 1.1.11, “nodetool repair” has options of “-st” (or “–start-token”) and “-et” (or “–end-token”) to specify a particular sub-range of data to repair.

Conceptually, sub-range repair is much like primary range repair, except that each sub-range repair operation focuses even smaller subset of data. So in general sub-range repair shares much of the pros and cons as primary range repair. One key benefit of using sub-range repair is the freedom of specifying the repair granularity through command line, which gives the DevOps team much more flexibility regarding how to set up a repair schedule to best match the cluster’s workload status. It is also doable to consider parallel running of multiple sub-range repairs on different sets of data at the same time, either on the same node, or on different nodes. This option has actually been deemed as one of the advanced repair techniques, as per post: Advanced repair techniques

Despite the flexibility of this technique, it has to be emphasized that no matter how the repair schedule is set for sub-range repairs, all ranges of data in the ring has to be repaired at least once before gc_grace_seconds limit is reached.

3.4.1 How to calculate valid sub-range token values

When invalid sub-range values are provided to the “-st” or “-et” option of “nodetool repair” command, most likely the command will fail and throw errors about “invalid tokens”. So in order to make sub-range repair work effectively, we need a systematic method that can generate valid sub-range token values.

We know that in Cassandra, data spans multiple nodes using a token ring. Each node is responsible for one or more slices of that ring. Each slice is a token range that has the start and end point. By this understanding, a natural way to generate valid sub-range token values would be: 1) find out all token ranges associated with one node in the entire ring; 2) for each token range associated with the node, divide the range into smaller chunks of sub-ranges. The start and end point of these sub-ranges would be valid values to be fed into the “-st” or “-et” option of “nodetool repair” command. The actual method and granularity to divide a node’s token range into smaller sub-ranges is where the flexibility comes from and can be adjusted accordingly to best suit the needs of the system.

There are different ways to find token ranges that are associated with a Cassandra node. Some of them are summarized below:

  1. Run “nodetool ring” command. “nodetool status” command will also do for single-token set-up.
  2. Run CQL query against “local” or “peers” tables in “system” keyspace to get the token values associated with a host.
  3. Cassandra client drivers provides APIs to get such information as well. For example, the “Metadata” class (in package com.datastax.driver.core) of the Java client API  provides the following two methods:
    * public Set<TokenRange> getTokenRanges() :
      Returns the token ranges that define data distribution in the ring.
    * public Set<TokenRange> getTokenRanges(String keyspace,Host host) :
      Returns the token ranges that are replicated on the given host, for the given keyspace.
  4. Based on the thrift Java API describe_splits call, an open-source utility tool called “cassandra-list-subranges” has been developed to list valid sub-range values for a specific keyspace and table on a particular Cassandra node. For details of this tool, please check the GitHub repository page at

Sub-range repair is available since Cassandra version 1.1.11.

3.4.2 DataStax Enterprise (DSE) repair service

If the Cassandra cluster is running under DataStax Enterprise version, the OpsCenter provides a “repair service” starting from version 4.0. The way that this service works is to continuously repairing small chunks of data (using sub-range repair) in a cluster in the background until the entire cluster is repaired and then it starts the next cycle of processing.

From the OpsCenter Window, there is only one parameter to specify the maximum amount of time it takes to repair the entire cluster once, which is typically less than gc_grace_seconds setting. The default is 9 days, compared with default 10 days of gc_grace_seconds setting.

More advanced setting, such as the parallelism of concurrent repairs, the maximum pending repairs allowed to be running on a node at one time, and etc. can be set either in opscenterd.conf file (for all clusters) or in cluster_name.conf file (for a specific cluster).

More detailed information about DSE repair service and its configuration can be found in the following documents:

Please be noted that DSE repair service, although convenient, is only available for DSE version and not for open source Cassandra version. It also lacks the capability to specify what keyspaces or tables  to repair, such as in “nodetool repair” tool.

3.5 Data-Center vs. Cluster wide repair

Since Cassandra version 1.2, Cassandra starts to provide options for “nodetool repair” to offer the following repair capabilities regarding repair locality/scope:

  • Cassandra 1.2: “-local” (or “–in-local-dc”), only repair nodes in the same data center as the node on which the “nodetool repair” command is executed.
  • Cassandra 2.0: “-dc <dc_name>” (or “–in-dc <dc_name>”), repairs nodes in the named data center
  • Cassandra 2.1:
    • “-dcpar” (or “–dc-parallel”), repairs data center in parallel;
    • “-hosts” (or “–in-hosts”), repairs specified hosts

When none of these options is specified, all data replica across the entire cluster is repaired.

4. Conclusion

In this post, we discussed in details some of the techniques that can help with a more effective Anti-Entropy repair in Cassandra. To summarize:

  • Primary range repair is recommended for maintenance purpose and when it is used, it has to be executed on all nodes in the ring one by one to make sure the whole data range in the ring is covered.
  • When available (Cassandra version 2.1 and beyond), incremental repair will bring the biggest benefit compared with other techniques. But please be aware of the caveats related with it. Please also be noted that with the bug as specified in CASSANDRA-9935, incremental repair may have issues with Level Compaction Strategy (LCS).
  • Sub-range repair can be very useful, especially when incremental repair is not an option. But please make sure that whatever schedule that is set up using this technique has to make sure all range of data to be finished within gc_grace_seconds limit.
  • If a repair is needed to recover from data loss, a sequential, full repair is needed. Neither primary range repair nor incremental repair technique works properly for this purpose. But due to the high cost associated with this operation, sometimes it might be faster to simply wipe out the data directory on a node and let it do bootstrapping.
Categories: DBA Blogs

What’s Holding You Back From Giving Back?

Fri, 2016-04-15 10:19

This week in honour of National Volunteer Week, I’m reflecting on the importance that volunteering has had in my life.

I’ve learned from each and every one of my experiences. From wrapping holiday presents at the mall, to helping source articles for an industry magazine, to wish granting, recruiting committee and board members, and providing HR advice and counsel. These experiences eventually led me to become a member of the board of directors for a number of organizations, and I was even named Board Chair.

Ironically, the rewards and benefits that I have received from the experiences far outweigh any amount of time I have given the many organizations I have supported over the years. Volunteering has provided me the opportunity to expand my skills and experience, and take on leadership roles long before I had the credentials to be hired for them. I initially started volunteering when I moved to Ottawa, and quickly learned that there is no better way to get to know your community, meet new people and expand your network. Once I started, I never looked back. I caught the “volunteer bug.” It is an important part of my life.

I am often asked how I find the time to volunteer. I always respond with, “like anything, if it’s important to you, you can and will find the time.” As I have expanded my family and career, I seek opportunities where I can continue to share my knowledge, skills and experience in ways that do not impede on either. A perfect example of this would be career mentoring. I have been a mentor for a number of organizations including the HRPA, OCISO, and the WCT. I have been fortunate to have great mentors in the past and now pay it forward. I remain connected with many of them.

In my role as VP of HR at Pythian I was thrilled to champion our Love Your Community Programs. These programs provide our employees in over 36 countries with a volunteer day and opportunities for sponsorship – i.e. raising money for causes that are meaningful to them. The programs have allowed Pythian the opportunity to positively impact the communities where our employees live.

Volunteer Week highlights the importance of volunteering in our communities, and showcases the impact it has on the lives of both the volunteer, and the communities they support. What’s holding you back from giving back?

And because it couldn’t be said any better: “We make a living by what we get, but we make a life by what we give.”

Winston Churchill

Categories: DBA Blogs

Optimizer Stew – Parsing the Ingredients to Control Execution Plans

Fri, 2016-04-15 09:01

No matter how many times I have worked with Outlines, Baselines and Profiles, I keep having to look up reminders as to the differences between these three.

There is seemingly no end to articles to the number of articles and blog that tell you what needs to be licensed, how to use them, and which version of Oracle where each made its debut appearance.

This blog will discuss none of that.  This brief article simply shows the definitions of each from the Glossary for the most current version of the Oracle databases. As of this writing that version is

And here they are.

Stored Outline

A stored outline is simply a set of hints for a SQL statement. The hints in stored outlines direct the optimizer to choose a specific plan for the statement.

Link to Stored Outline in the Oracle Glossary

SQL plan baseline

A SQL baselines is a set of one or more accepted plans for a repeatable SQL statement. Each accepted plan contains a set of hints, a plan hash value, and other plan-related information. SQL plan management uses SQL plan baselines to record and evaluate the execution plans of SQL statements over time.

Link to SQL Plan Baseline in the Oracle Glossary

SQL profile

A SQL profile is a set of auxiliary information built during automatic tuning of a SQL statement. A SQL profile is to a SQL statement what statistics are to a table. The optimizer can use SQL profiles to improve cardinality and selectivity estimates, which in turn leads the optimizer to select better plans.

Link to SQL Profile in the Oracle Glossary

Categories: DBA Blogs

Log Buffer #469: A Carnival of the Vanities for DBAs

Wed, 2016-04-13 09:40

This Log Buffer Edition digs deep into the realms of Oracle, SQL Server and MySQL and brings together a few of the top blog posts.


We’ve all encountered a situation when you want to check a simple query or syntax for your SQL and don’t have a database around. Of course, most of us have at least a virtual machine for that, but it takes time to fire it up, and if you work from battery, it can leave you without power pretty quickly.

View Criteria is set to execute in Database mode by default. There is option to change execution mode to Both. This would execute query and fetch results from database and from memory.  Such query execution is useful, when we want to include newly created (but not committed yet) row into View Criteria result. Newly created row will be included into View Criteria resultset.

Upgrading database hardware in an organization is always a cumbersome process. The most time consuming step is, planing for the upgrade, which mainly includes choosing right hardware for your Oracle databases. After deciding on the hardware type for your databases, rest will be taken care by technical teams involved.

Gluent New World #02: SQL-on-Hadoop with Mark Rittman

The pre-12c implementation of DCD used TNS packages to “ping” the client and relied on the underlying TCP stack which sometimes may take longer. Now in 12c this has changed and DCD probes are implemented by TCP Stack. The DCD probes will now use the TCP KEEPALIVE socket.


SQL Server

Snippets will allow you to code faster by inserting chunks of code with few key strokes.

One of more common concerns among database administrators who consider migrating their estate to Azure SQL Database is their ability to efficiently manage the migrated workloads.

A SQL Server Patching Shortcut

Move an Existing Log Shipping Database to a New Monitor Server

Knee-Jerk Performance Tuning : Incorrect Use of Temporary Tables



MySQL 5.7 sysbench OLTP read-only results: is MySQL 5.7 really faster?

7 Galera Cluster presentations in Percona Live Santa Clara 18-21.4. Meet us there!

Generate JSON Data with dbForge Data Generator for MySQL v1.6!

Extending the SYS schema to show metadata locks


Categories: DBA Blogs

Installing SQL Server 2016 – Standalone Instance – New Features

Tue, 2016-04-12 10:48

In this article I am going to go through a typical install of SQL Server 2016, and explain some of the best practices to follow when setting up a production server. I will also take a look at the new features when installing SQL Server 2016 compared to older versions.

The version of SQL Server I am using in SQL Server 2016 RC2. That means that some of the features may change between the time of writing and the retail release of SQL Server 2016.


Let’s Get Started!

The first screen you come to after opening the installation media:

Initial Screen

  1. Click the Installation button highlighted, and then the first option in the list New SQL Server Standalone Installation.


New for SQL Server 2016
For the Seasoned SQL Server DBA you will notice a few additions to this screen.

  • SQL Server Management Tools can now be installed from this screen. The files and binaries will be downloaded when you click this link, as they are no longer bundled with the installation media.
  • SQL Server Data Tools can also be installed from this screen.
  • Additionally, a standalone instance of R can be installed. R is a statistical programming language embedded into SQL server 2016, making it easy for data scientists and BI professionals to get a good level of analysis without leaving the SQL Server environment.


  1. Select the edition of SQL SERVER you would like you would like to install, I chose developer as it’s a full featured installation of SQL Server that can be used for development only, and not in a production environment. You could also enter a key in here instead of selecting a version.


  1. Read and accept the licence.
  2. SQL Server will then check a few rules to make sure you can install SQL server on your hardware.
  3. Select whether or not to allow Microsoft to check for updates and click next.
  4. SQL Server Installation will then install some setup files needed for installation and perform a few more checks. As you can see below, a warning has appeared asking me to make sure the correct ports are open on the firewall. Click Ok and then Next to proceed with the installation.

Rule Check

  1. The next screen is where we want to select all the features we want to install. As my machine is being set up as a dev machine to test SQL 2016 features, I am installing all the features. In a production environment only install the features that are needed. You can also select where to install the binaries and the root instance from this screen. Click next once you have selected the settings needed.


New for SQL Server 2016

  • The ability to install a standalone instance of R is now available in the Shared Features Section.
  • The ability to install R services in the database engine is now available.
  • The ability to install Polybase Query Service for external sources is now available to install. Polybase query service allows users to query big data sources such as Hadoop using common T-SQL statements. If you are planning on installing the Polybase feature, then the Java Runtime SRE needs to be installed first.


  1. The next screen is where you will need to name the instance, if this is not the only SQL server installation on this hardware. It also confirms the installation directory as per the previous screens. As this is the only installation of SQL Server on this machine, I am going to leave these settings as default. Click next when ready to proceed.

Default Instance

  1. The Next Screen is the Server Configuration screen. You should run each service under a domain account with a strong password. Whether you use managed accounts or do not enforce password expiration is up to you. However, these accounts are going to be running your services and should always be available. As this is a test machine not connected to a domain, I will leave the defaults. You can also select your default server collation from this window by clicking on the tab at the top highlighted in yellow. It is important also to set the start-up type parameters to allow services to start automatically on reboot if needed.

The ability to allow Perform Volume Maintenance tasks to be checked from this screen is a new feature. This is a best practice among SQL server DBA to allow instant file initialization. Previously this had to be done outside of the installation window.

Server Config


New for SQL Server 2016

  • The ability to allow Perform Volume Maintenance tasks to be checked from the Server Configuration Screen.


  1. Next up is the database configuration screen, and we are going to step through the tables.

In the first tab you will want to add all the users that require sysadmin access. To administer the server, click on the Add Current user and Add buttons. I always use mixed mode authentication so I can still get to the server if Active Directory plays up, and I add a complex password for security. When your done, click on the Data Directories tab at the top.

Database Config1

This is the screen where you set up all of your default directories for your databases.

Best practice states that we should put Log files on separate disks to Data Files, as there are two different access patterns for these, so they would be more performant on separate disks. Backups should also be on their own disks where possible. Additionally, the OS should also have its own drive separate from all SQL server files. As this is a test server and I only have a c drive, I will leave them as default. Click TempDB tab when ready.

Database Config 2


New For SQL 2016

  • This is new in SQL 2016, and previously had to be configured after install. This screen allows us to create files for temp db. Best practice stated there should be 1 file per logical core up to a maximum of 8 as I have 4 cores in my machine I have created 4 files. You can also spread the files over more than one disk if needed. Once you’re happy with your selections click next.

Database Config 3


  1. The next screen is to configure Analysis Services. I have configured mine in Multidimensional mode adding myself as sysadmin and setting directories using same best practice as database engine. Look out for a further blog article on SQL server Analysis Services.



  1. Leave the default options for Reporting Services. Again keep an eye out for another article on Reporting Services.
  2. In the next screen you have to configure the users capable of using the Distributed Replay Controller.
  3. Give a name to the DRC.


  1. On the next Screen you will need to accept the terms of the features being installed by clicking accept, and then next.
  2. Finally, you can click install and that’s it! SQL Server 2016 and all its new features are installed.



Here is a great link on some of the new features available in SQL Server 2016

Categories: DBA Blogs

SQL Server 2016 : A New Security Feature – Always Encrypted

Tue, 2016-04-12 09:15

Security. This word is so important when it comes to data, and there is a reason why. Every business has it’s vital data, and this data has to be accessed only by those who are authorized. Back in 2009, I wrote an article on what measures to take when it comes to securing SQL Server.

There were days when we used to have a third party tool to encrypt the data inside SQL Server. Later, Microsoft introduced Transparent Data Encryption (TDE) bundled with the release of SQL Server 2008. You may be wondering why it is so important to encrypt the data. Inside our database, there may be a case that the customer/application has to enter and store the sensitive information such as Social Security Number (SSN) or Financial/Payment Data, which should not be read in plain text, even by a DBA. With this requirement, a strong encryption and/or data masking comes into the picture.

With the launch of SQL Server 2016 Release Candidate 0 (RC0) , Microsoft has introduced two new features that are my personal favorite – 1)  Always Encrypted and 2) Dynamic Data Masking. Today I am going to walk you through the Always Encrypted feature.

First and foremost, we need to have SQL Server 2016 RC0 installed so that we can test this feature. You can download the RC0 here.  Once you are ready with RC0, create a test database and a table with an Encrypted column to store the sensitive data. There are few prerequisites that I will list for you here. If you want, you can use the sample schema from MS.

  1. Create a sample database
  2. Create Column Master Key
  3. Generate a self-signed certificate (well, you will need to install this certificate on the machine where the application will run)
  4. Configure Column Encryption Key
  5. Create a test table with Always Encrypted column
  6. Create an application to Insert data into the sample table we created in previous step

I have created a sample app and a demo script for the reference which you can download here. Basically, what we have to remember is that we can not insert the value inside the Always Encrypted table directly, we will need to use the tool/app, and the data will always be encrypted when it goes inside the database. This will ensure that the intruder can not get the data as it travels to the database in a cipher text form.

Here is some further reading on this topic. Enjoy reading and testing an excellent feature of SQL Server 2016 RC0.

Categories: DBA Blogs

SQL On The Edge #9 – Azure SQL Database Threat Detection

Mon, 2016-04-11 15:30

Despite being well documented for several years now, every now and then we still run into clients that have bad experiences because of SQL injection attacks. If you’re not familiar, a SQL injection attack happens when an attacker exploits an application vulnerability in how they pass queries and data into the database and insert their own malicious SQL code to be executed. If you want to see different examples and get the full details, the Wikipedia page is very comprehensive.

Depending on how the application is configured, this kind of attack can go all the way from enabling attackers to see data they shouldn’t, to dropping an entire database if your application is allowed to do so. The fact that it’s an application based vulnerability also means that it really depends on proper coding and testing of all inputs in the application to prevent it. In other words, it can be very time-consuming to go back and plug all the holes if the application wasn’t securely built from the ground up.

Built-in Threat Detection

To attack this issue, and as part of the ongoing security story of SQL Server, Microsoft has now invested in the feature called Database Threat Detection. When enabled, the service will automatically scan the audit records generated from the database and will flag any anomalies that it detects. There are many patterns of injections so it makes sense to have a machine be the one reading all the SQL and flagging them. MS is not disclosing the patterns or the algorithms in an effort to make working around the detection more difficult.

What about on-premises?

This feature right now is only available on Azure SQL Db. However, we all know that Azure SQL Db is basically the testing grounds for all major new features coming to the box product. I would not be surprised if the threat detection eventually makes it to the on-premises product as well (my speculation though, nothing announced about this).

For this new feature you will need Azure SQL Db, you will also need to have auditing enabled on the database. The current way this works is by analyzing the audit records so it’s 100% reactive, nothing proactive. You will need a storage account as well since that’s where the audit logs get stored. The portal will walk you through this whole process, we’ll see that in the demo video.

Current State
As I mentioned, right now the tool is more of a reactive tool as it only lets you know after it has detected the anomaly. In the future, I would love to see a preventive configuration where one can specify a policy to completely prevent suspicious SQL from running. Sure, there can always be false alarms, however, if all the application query patterns are known, this number should be very low. If the database is open to ad-hoc querying then a policy could allow to only prevent the queries or even shut down the database after several different alerts have been generated. The more flexible the configuration, the better, but in the end what I want to see is a move from alerting me to preventing the injection to begin with.

In the demo, I’m going to go through enabling Azure SQL threat detection, some basic injection patterns and what the alerts look like. Let’s check it out!


Categories: DBA Blogs

Enter the Exadata X6

Mon, 2016-04-11 15:20

Data is exploding and Exadata is catching up. With the proliferation of cloud technology and in-memory databases; Oracle Exadata X6-2 and X6-8 has it all. It seems to be an ideal platform for hyper-convergence for any data center running Oracle products.

Following are some of the salient features of Oracle X6:

  • The compute nodes have twenty two-core Intel Xeon E5-2699 v4 processors
  • The memory is DDR4 and of size 256Gb and it can be expanded to 768Gb.
  • The local storage can now be upgraded to 8 drives from default of 4.
  • The cell servers have ten-core Intel Xeon E5-2630 v4 processors.
  • Flashcache has become massive here reaching up to 12.8TB. Full rack has 179TB of flash.
  • There will be up to 1.7 PB of disk capacity (raw) per rack.
  • The infiniband network will have 40Gb/second. There is no change to Infiniband. In X5 it became active/active – but that’s the only difference.
  • On the software side, there are many improvement but one thing which caught my eye is the feature that enables Storage Indexes to be moved along the data when a disk hits predictive failure or true failure. This surely will improve performance by a long way.

With X6, Exadata has surely come a long way forward.

Categories: DBA Blogs

Installing Sample Databases To Get Started In Microsoft SQL Server

Fri, 2016-04-08 09:20

Anyone interested in getting started in SQL server will need some databases to work with/on. This article hopes to help the new and future DBA/Developer get started with a few databases.

There are several places to get sample databases, but one starting out in SQL server should go to the Microsoft sample databases. The reason for this is that there are thousands of Blogs/Tutorials on the internet that use these databases as the basis for the tutorial.

The below steps will detail how to get the sample databases and how to attach them to SQL server to start working with them.

This blog assumes you have a version of SQL server installed if not you can click here for a great tutorial

These 2 Databases are a great start to learning SQL Server from both a Transactional and Data Warehousing point of view.

  • Now that we have downloaded these 2 files we will need to attach them one at a time. First, open SQL Server and connect to your instance.
  • Expand the object explorer tree until you can right click on the folder called databases and then left click on Attach…


  • Click the Add Button and navigate to and select the .mdf file (This is the database file you downloaded).


  • There is one step a lot of people getting started in SQL server often miss. As we have just attached a data file in order for SQL Server to bring the database online, it needs a log file which we don’t have. The trick to this is, if we remove the log file from the Attach Database window, SQL Server will automatically create a log file for us. To do this, simply select the log file and click remove.


  • Finally, when you window looks like below simply click ok to attach the database.


  • Repeat steps 3 to 6 for the second database file and any others you wish to attach.
  • The Databases are now online in SQL server and ready to be used.


And that’s it! You now have an OLTP database and a DW database for BI.

Below are links to some good starting tutorials and some additional databases.



Stack Overflow




Categories: DBA Blogs

Oracle Live SQL: Explain Plan

Thu, 2016-04-07 13:14

We’ve all encountered a situation when you want to check a simple query or syntax for your SQL and don’t have a database around. Of course, most of us have at least a virtual machine for that, but it takes time to fire it up, and if you work from battery, it can leave you without power pretty quickly. Some time ago, Oracle began to offer a new service called “Oracle Live SQL” . It provides you with the ability to test a sql query, procedure or function, and have a code library containing a lot of examples and scripts. Additionally, you can store your own private scripts to re-execute them later. It’s a really great online tool, but it lacks some features. I’ve tried to check the  execution plan for my query but, unfortunately, it didn’t work:

explain plan for 
select * from test_tab_1 where pk_id<10;

ORA-02402: PLAN_TABLE not found

So, what could we do to make it work? The workaround is not perfect, but it works and can be used in some cases. We need to create our own plan table using script from an installed Oracle database home $ORACLE_HOME/rdbms/admin/utlxplan.sql. We can open the file and copy the statement to create plan table to SQL worksheet in the Live SQL. And you can save the script in Live SQL code library, and make it private to reuse it later because you will need to recreate the table every time when you login to your environment again. So far so good. Is it enough? Let’s check.

explain plan for 
select * from test_tab_1 where pk_id<10;

Statement processed.

select * from table(dbms_xplan.display);

ERROR: an uncaught error in function display has happened; please contact Oracle support
       Please provide also a DMP file of the used plan table PLAN_TABLE
       ORA-00904: DBMS_XPLAN_TYPE_TABLE: invalid identifier

Ok, the package doesn’t work. I tried to create the types in my schema but it didn’t work. So far the dbms_xplan is not going to work for us and we have to request the information directly from our plan table. It is maybe not so convenient, but it give us enough and, don’t forget, you can save your script and just reuse it later. You don’t need to memorize the queries. Here is a simple example of how to get information about your last executed query from the plan table:

SELECT parent_id,id, operation,plan_id,operation,options,object_name,object_type,cardinality,cost from plan_table where plan_id in (select max(plan_id) from plan_table) order by 2;

 - 	0	SELECT STATEMENT	268	SELECT STATEMENT	 - 	 - 	 - 	9	49

I tried a hierarchical query but didn't find it too useful in the Live SQL environment. Also you may want to put unique identifier for your query to more easily find it in the plan_table. 

explain plan set statement_id='123qwerty' into plan_table for
select * from test_tab_1 where pk_id<10;

SELECT parent_id,id, operation,plan_id,operation,options,object_name,object_type,cardinality,cost from plan_table where statement_id='123qwerty' order by id;


Now I have my plan_table script and query saved in the Live SQL and reuse them when I want to check the plan for my query. I posted the feedback about the ability to use dbms_xplan and Oracle representative replied to me promptly and assured they are already working implementing dbms_xplan feature and many others including ability to run only selected SQL statement in the SQL worksheet (like we do it in SQLdeveloper). It sounds really good and promising and is going to make the service even better. Stay tuned.

Categories: DBA Blogs

Which Cassandra version should you use for production?

Thu, 2016-04-07 12:47
What version for a Production Cassandra Cluster?

tl;dr; Latest Cassandra 2.1.x

Long version:

A while ago, Eventbrite wrote:
“You should not deploy a Cassandra version X.Y.Z to production where Z <= 5.” (Full post).

And, in general, it is still valid up until today! Why “in general“? That post is old, and Cassandra has moved a lot since them. So we can get a different set of sentences:

Just for the ones that don’t want follow the links, and still pick 3.x for production use, read this:

“Under normal conditions, we will NOT release 3.x.y stability releases for x > 0.  That is, we will have a traditional 3.0.y stability series, but the odd-numbered bugfix-only releases will fill that role for the tick-tock series — recognizing that occasionally we will need to be flexible enough to release an emergency fix in the case of a critical bug or security vulnerability.

We do recognize that it will take some time for tick-tock releases to deliver production-level stability, which is why we will continue to deliver 2.2.y and 3.0.y bugfix releases.  (But if we do demonstrate that tick-tock can deliver the stability we want, there will be no need for a 4.0.y bugfix series, only 4.x tick-tock.)”

What about end of life?

Well, it is about stability, there are still a lot of clusters out there running 1.x and 2.0.x. And since it is an open source software, you can always search in the community or even contribute.

If you still have doubts about which version, you can always contact us!

Categories: DBA Blogs

Log Buffer #468: A Carnival of the Vanities for DBAs

Wed, 2016-04-06 15:38

This Log Buffer Edition rounds up Oracle, SQL Server, and MySQL blog posts of the week.


When using strings such as “FREQ=DAILY; BYDAY=MON,TUE,WED,THU,FRI; BYHOUR=9,10? within the scheduler, sometimes its not readily apparent how this will translate to actual dates and times of the day that the scheduled activity will run. To help you understand, a nice little utility is to use EVALUATE_CALENDAR_STRING”.

Most developers have struggled with wires in SOA composites. You may find yourself in a situation where a wire has been deleted. Some missing wires are restored by JDeveloper. Other missing wires have to be added manually, by simply re-connecting the involved adapters and components. Simple.

In-Memory Parallel Query and how it works in 12c.

Oracle recently launched a new family of offerings designed to enable organizations to easily move to the cloud and remove some of the biggest obstacles to cloud adoption. These first-of-a-kind services provide CIOs with new choices in where they deploy their enterprise software and a natural path to easily move business critical applications from on premises to the cloud.

Two Oracle Server X6-2 systems, using the Intel Xeon E5-2699 v4 processor, produced a world record x86 two-chip single application server SPECjEnterprise2010 benchmark result of 27,509.59 SPECjEnterprise2010 EjOPS. One Oracle Server X6-2 system ran the application tier and the second Oracle Server X6-2 system ran the database tier.

SQL Server:

To be able to make full use of the system catalog to find out more about a database, you need to be familiar with the metadata functions.

Powershell To Get Active Directory Users And Groups into SQL!

A code review is a serious business; an essential part of development. Whoever signs off on a code review agrees, essentially, that they would be able to support it in the future, should the original author of the code be unavailable to do it.

Change SQL Server Service Accounts with Powershell

Learn how to validate integer, string, file path, etc. input parameters in PowerShell as well as see how to test for invalid parameters.


The MySQL Utilities has announced a new beta release of MySQL Utilities. This release includes a number of improvements for usability, stability, and a few enhancements.

In this webinar, we will discuss the practical aspects of migrating a database setup based on traditional asynchronous replication to multi-master Galera Cluster.

Docker has gained widespread popularity in recent years as a lightweight alternative to virtualization. It is ideal for building virtual development and testing environments. The solution is flexible and seamlessly integrates with popular CI tools.

How ProxySQL adds Failover and Query Control to your MySQL Replication Setup

Read-write split routing in MaxScale

Categories: DBA Blogs

Six things I learned about privacy from some of the world’s top IT pros

Wed, 2016-04-06 08:32


Everyone has their own experiences with data privacy. Ask almost anyone, and they’ll be able to tell you how a retail company found some innovative way of collecting their personal data, or how Facebook displayed a suggested post or ad that was uncannily targeted at them based on personal information they did not purposely share.


The discussion about privacy can get pretty interesting when you get a group of CIOs and IT leaders in the same room to talk about it.  And that’s just what we did at Pythian’s Velocity of Innovation (Velocity) events in New York, San Francisco and Sydney. With their ears constantly to the ground for the latest trends and unparalleled insider IT knowledge, there’s no better group with whom to talk security and privacy than our Velocity panelists and attendees.


At Pythian we manage the revenue-generating systems for some of the world’s leading enterprises, and work with our clients on some of the most difficult security and privacy issues. But staying on top of the evolving threats to data privacy is a constant challenge, even for the most knowledgeable security experts. One of our biggest challenges is helping our clients use data responsibly to improve customer experiences, while protecting individual privacy and safeguarding personally identifiable information from malicious threats. So there’s always more we can learn. This is one of the reasons we keep listening to our customers and peers through various channels and events like our Velocity discussions.


Privacy is a high stakes game for companies that want to maximize their use of client data while adhering to legislation designed to protect personal information. In many countries, new, more rigorous laws are emerging to enable individuals to maintain more control over their personal information, and to help them understand when they are being monitored or when their personal information is recorded, shared or sold. These laws include the recently  reformed  EU Data protection rules and PIPEDA in Canada. But companies are responding by finding clever ways of making it attractive for consumers to trade their data for services, or worse, by making it impossible to get service at all without giving up certain information.

With these issues in mind, we had some interesting discussions at each of our Velocity sessions. Here are six main privacy-related themes that emerged at these recent events:


  1. Privacy and security are not the same

Although are related, they refer to different ideas. Privacy is a major goal of security, and relates to a consumer’s right to safeguard their personal information. It can involve vulnerable data such as social media data, customer response data, demographic data or other personal information. In general, privacy is the individual’s right to keep his or her data to himself or herself. By contrast, security refers to the protection of enterprise or government systems. Security may incorporate customer privacy as part of its agenda, but the two are not synonymous. According to Tim Sheedy, principal analyst with Forrester Research, a panelist at our Sydney event, security and privacy are different things. “But if you have a security breach, privacy becomes an issue,” he said. “So they’re closely linked.”


  1. Balance is key

Our panelists agreed that you have to balance protecting customer data with moving the business forward. But if you’re in the insurance business, like San Francisco Velocity panelist and CTO at RMS Cory Isaacson, that’s a tough balance to strike. He has to worry about how his company’s data privacy measures up against industry standards and the regulations they have meet. At the same time he has to ensure that data security doesn’t paralyze innovation in the business. He says you have to be innovative with moving the business forward, as well as keeping up with standards and regulations.

“If you listen to your ISO auditor you’re never going to get out of bed in the morning,” Isaacson said. “If you’re faced with securing highly sensitive customer data, you have to incorporate it into the way you work every day instead of letting it slow you down.”


  1. Don’t be creepy

Companies have to walk a fine line between using personal data to enhance the customer experience and being intrusive for no good reason. Forrester’s Sheedy warns that perception is everything. “Don’t be creepy,” he said. “When you’re creepy that that’s when privacy becomes an issue. Don’t underestimate the ability of people to give away information for a discount or for a benefit. But the key is that there has to be a benefit. When there isn’t an upside, it becomes creepy for customers, and you have to wonder if you’ve  gone too far.”


  1. People will share data in exchange for perks.

Our Sydney panelists and attendees exchanged ideas about new ways that companies are obtaining information about customers, behaviours and preferences. They talked about social media companies learning your behaviours by using more than just the data you share on your profile, including using tactics like tracking your location through GPS and triangulation between cell phone towers.


“When I visit Pythian’s head office in Ottawa, the nearby coffee shop has an app that asks if I want them to prepare my usual drink when it detects I’m nearby. Is that intrusive? No, because I opted in,” said Francisco Alvarez, vice president, APAC at Pythian.


A Marketing Magazine article reported that in a recent survey, a majority of consumers will share data for certain benefits: 80 percent said they would share data for rewards from a company, 79 percent would share data for cash back, and 77 percent would share data for coupons. The majority also said they would share personal data for location-based discounts (69 percent).


  1. Millennials have a higher tolerance for sharing data.

The Australian panelists talked about this extensively, citing personal experience. They expressed the same concern many of us feel  about how much data their children’s generation is a sharing online and with companies. One of our attendees had a child who was denied a prestigious scholarship because his digital footprint wasn’t favourable. But, Alvarez said, this type of consequence doesn’t seem to deter millennials from freely sharing very personal information. And when it comes to sharing information with businesses in exchange for perceived benefits, their tolerance is very high.  


In fact, according to a survey conducted by the USC Annenberg Center for Digital Future and Bovitz Inc., millennials (individuals between the ages of 18 and 35) have a different attitude than Internet users 35 years and older when it comes to sharing their personal data online with businesses.  According to the study results, millennials were more likely than older respondents (35 years and older) to trade some of their personal information in exchange for more relevant advertising.


  1. New  approaches for protecting customer data are on the horizon

In our San Francisco discussion, Pythian Chief Data Officer Aaron Lee cited Google’s new initiative for protecting customer data.  The new approach involves placing corporate applications on the Internet itself and focusing its security efforts on registered devices and user authentication.
“It’s really simple idea,” Lee said.  “In most companies the internal network is special. If you’re on the internal network you get stuff you can’t get from outside just because you’re sitting there on the network. That’s not the case anymore. With Google’s Beyond Corp initiative it is like there are no more internal networks. If you want to talk to my service, you treat it like you came in from the internet just like everybody else. That way I can apply a consistent approach to authentication, authorization, all that good stuff. It’s actually acting to reduce the surface of complexity. It’s an approach that takes the simplest way that we can secure these things, and treats them the same regardless of where they actually live,” he said.

Categories: DBA Blogs

Redshift Table Maintenance: Vacuuming

Tue, 2016-04-05 13:38

Part of the appeal of AWS’ Redshift is that it’s a managed service, which means lower administration costs. While you don’t have to hire a full time DBA to make sure it runs smoothly (from Pythian’s experience it takes ~10-20 hours/month to manage Redshift), there are still some tasks that should be attended to keep it happy:

  • Vacuuming
  • Analyzing
  • Skew analysis
  • Compression analysis
  • Query monitoring

Let us start with Vacuuming as the first topic of a series of deeper dives into this list.

Vacuuming is an integral part of performance maintenance of Redshift.  Since deletes and updates both flag the old data, but don’t actually remove it, if we’re doing those kinds of actions, vacuuming is needed to reclaim that space. Updates and deletes can be pretty big performance hits (a simple update can easily take 60 secs on a 50 million record table on a small cluster, so we’re looking at 20 minutes for a similar update on a 1 billion record table), so we try to avoid them as much as we can on large tables. The space reclamation portion of the vacuum typically accounts for 10% of the time we see spent on the tables.  We can use the SORT ONLY parameter to skip this phase, but we generally have no compelling reason to.

In addition, if tables have sort keys, and table loads have not been optimized to sort as they insert, then the vacuums are needed to resort the data which can be crucial for performance.  While loads of empty tables automatically sort the data, subsequent loads are not. We have seen query times drop by 80% from the implementation of vacuuming, but of course the impact varies with table usage patterns.

The biggest problem we face with vacuuming is the time it takes. While vacuuming does not block reads or writes, it can slow them considerably as well as take significant resources from the cluster, and you can only vacuum one table at a time. Remember that resource utilization can be constrained through WLM queues. A typical pattern we see among clients is that a nightly ETL load will occur, then we will run vacuum and analyze processes, and finally open the cluster for daily reporting. The faster the vacuum process can finish, the sooner the reports can start flowing, so we generally allocate as many resources as we can.


Let us start with the process itself.  It’s simple enough and you can get syntax documentation from AWS . There’s not too much that’s tricky with the syntax and for most use cases

VACUUM myschema.mytablename;

will suffice.  Note that INTERLEAVED sort keys need the REINDEX parameter added for all re-indexing to occur.  You can discern which tables have this set up by using the query:

select schemaname, tablename
from (SELECT
n.nspname AS schemaname
,c.relname AS tablename
,min(attsortkeyord) min_sort FROM pg_namespace AS n
INNER JOIN&nbsp; pg_class AS c ON n.oid = c.relnamespace
INNER JOIN pg_attribute AS a ON c.oid = a.attrelid
WHERE c.relkind = 'r'
AND abs(a.attsortkeyord) &gt; 0
AND a.attnum &gt; 0
group by 1,2 )
where min_sort&lt;0

In order to give the vacuum process more resources, we preface this command with

SET wlm_query_slot_count TO <N>;

where N is the maximum number of query slots we think we can get away with.  If you’re not sure what that number should be (we’ll discuss WLM queues in another post), usually 5 is a safe number though be warned that if the value of wlm_query_slot_count is larger than the number of available slots for the service class, the vacuum command will fail.

Knowing when to vacuum is reasonably straight forward. Anytime after substantial inserts, updates, or deletes are made is always appropriate, but you can be more exacting by querying two tables:

select * from STL_ALERT_EVENT_LOG where Solution LIKE ‘%VACUUM command%’


select * from SVV_TABLE_INFO where unsorted > 8

The latter check works great for daily loads. We will often set the threshold at 8 (percent) immediately after the loads, then run another vacuum process in the evening with a lower threshold (4 percent) that addresses larger tables that take a fair amount of time to vacuum since we want to avoid that situation in the morning.

Last fall AWS built a nice tool to automate vacuums, Analyze & Vacuum Schema Utility, that incorporated these queries. It works quite well, and we recommend it to our clients as a simple way to set up this maintenance. However, note that it does not automatically add the REINDEX parameter for those tables with INTERLEAVED sortkeys. The code is all available, so it is easy enough to adjust to make more custom filtering of tables (on fact_* and dim_* for instance) within a schema.

AWS has built a very useful view, v_get_vacuum_details, (and a number of others that you should explore if you haven’t already) in their Redshift Utilities repository that you can use to gain some insight into how long the process took and what it did. None of the system tables for vacuuming keep any historical information which would be nice for tracking growing process times, but you can see them for a week in STL_QUERY which gets purged to a history of 7 days. I recommend creating a simple process to track the vacuum data:

First create the table:

create table vacuum_history sortkey (xid) as select * from v_get_vacuum_details where processing_seconds > 0;

Then set up a cron process to populate:

0 18 * * * psql -h myRScluster -U myUser -p5439 -c “INSERT INTO vacuum_history SELECT * FROM v_get_vacuum_details WHERE xid > (SELECT MAX(xid) FROM vacuum_history) where processing_seconds > 0;” &> /var/log/vacuum_history.log

Once you start to see tables taking an inordinate amount of time to vacuum, some additional intervention may be appropriate. Our team recently ran into a sizable table (3 billion records) that had been taking 3 hours to vacuum daily. Some issue occurred where the table needed a partial reload of 2 billion rows. Once that finished, we ran a vacuum which kept going all afternoon. Checking SVV_VACUUM_PROGRESS we could see that it would take almost 30 hours to complete. Note that restarting a stopped vacuum does not mean the process will pick up where it left off. Since this would have impacted the daily load performance, we killed the vacuum with “cancel <pid>” using the pid pulled from

select pid, text from SVV_QUERY_INFLIGHT where text like ‘%Vacuum%’

We then ran a deep copy (created a new version of the table and ran a SELECT INTO) which took about 5 hours. The load into an empty table triggers the correct sorting, so a subsequent vacuum took only a few minutes to complete.

Just a note on killing long running vacuums: it sometimes doesn’t work especially once it’s in the initialize merge phase. We’ve found that continually issuing the cancel command while it’s in the sort phase is effective, but the point it to be wary of vacuuming large tables for their first time. Vacuums on large, unsorted tables write temporary data to disk, so there is also the potential to run out of disk and freeze the cluster, so be sure to always check that up to 3x the table size of disk space is available.

There are a few simple strategies to prevent long running vacuums:

  • Load your data in SORTKEY order: The incoming data doesn’t have to be pre-ordered, just greater than existing data.
  • Vacuum often: A table with a small unsorted region vacuums faster than one with a large unsorted region.
  • If tables become too large to vacuum within a maintenance window, consider breaking them apart: We often see multi-billion record tables where the only data being queried is from the last month or two.
  • Deep copies can be a faster solution than vacuums.
Categories: DBA Blogs

Considering Total Cost of Incident in the Age of “What if?”

Tue, 2016-04-05 12:03


I’m currently in the process of moving houses and planning a wedding – busy, exciting, happy times. There are lots of positives – more yard space, more square footage, time with friends and family, cake…the list goes on. However, mixed in with these big, happy life changes is an underlying anxiety that something could go wrong. There’s always a chance that accidents, inclement weather or illness could ruin the party for everyone. It’s not pleasant to consider and I’d honestly rather sweep it all under the rug and go on with the merry-making (mmm cake!). But the voice at the back of my head won’t stop whispering what if?


It’s this voice that’s prompted me to take steps towards protecting my investments with liability insurance. Now, I’m not keen on laying out funds where I don’t have to, and I’m always the first to try and save a buck (coupons are my best friends!), but where any large investment is being made, it only makes sense for me to part with some of my hard coupon-saved dollars to protect that. I may never need it, true… but what if?


At Pythian, we spend a lot of our time exploring the world of what if? What if there’s an outage? What if there’s a breach? What will we lose? What will it cost? These are uncomfortable questions for a lot of organizations and especially uncomfortable when things are going well. If you haven’t had an outage or a data breach yet, it is easier to assume that this won’t happen. What if we invest in protection and that spend is wasted because nothing ever goes wrong? Naïve? Maybe. Comfortable? Oh yeah.


In a post-Snowden, post-Target, post-Home Depot world, it’s more important than ever for organizations to move out of the comfortable and really consider what the total impact of any one incident could be on their environment and take steps to insure themselves against it. Let’s spend some time being uncomfortable, considering more than just the Total Cost of Operations and think instead about the Total Cost of Incident.

Categories: DBA Blogs

Mind your rdbms/audit on 12c

Tue, 2016-04-05 10:17

Recently we’ve ran into an interesting question from one of our clients. They were seeing messages of the following form in syslog:

"Aug 11 11:56:02 ***** kernel: EXT3-fs warning (device ******): ext3_dx_add_entry: Directory index full!"

I haven’t encountered this before, and did a bit of research. My initial suspicion ended up being correct, and it was due to too many files being created, somewhere in that file system. I had a look around, and eventually checked out the ORACLE_HOME of the ASM / Grid Infrastructure software, which is running version on that host.

I snooped around using du -sh to check which directories or sub-directories might be the culprit, and the disk usage utility came to a halt after the “racg” directory. Next in line would be “rdbms”. The bulb lit up somewhat brighter now. Entering the rdbms/audit directory, I issued the common command you would if you wanted to look at a directories contents: “ls”.

Five minutes later, there was still no output on my screen. Okay, we found the troublemaker. So we’re now being faced with a directory that has potentially millions of files in it. Certainly we all are aware that “rm” isn’t really able to cope with a situation like this. It would probably run for a couple minutes until it’s done parsing the directory index, and then yell “argument list too long” at us. Alternatively, we could use find, combined with -exec (bad idea), -delete, or even pipe into rm using xargs. Looking around a bit on the good ol’ friend google, I came across this very interesting blog post by Sarath Pillai.

I took his PERL one-liner, adjusted it a wee bit since I was curious how many files we actually got in there and ran it on a sandbox system with a directory with 88’000 files in it:

perl -e 'my $i=0;for(<*>){$i++;((stat)[9]<(unlink))} print "Files deleted: $i\n"'

It completed in 4.5 seconds. That’s pretty good. In Sarath’s tests he was able to delete half a million files in roughly a minute. Fair enough.

After getting the OK from the client, we ran it on the big beast. It took 10 minutes.

Files deleted: 9129797

9.1 million files. Now here comes the interesting bit. This system has been actively using ASM since May 6th, 2015. That’s only 3 months. That translates to 3 million files per month. Is this really a desirable feature? Do we need to start running Hadoop just to be able to mine the data in there?

Looking at some of the files, it seems ASM is not only logging user interactions there, but also anything and everything done by any process that connects to ASM.

As I was writing this, I happened to take another peek at the directory.

[oracle@cc1v3 audit]$ ls -1 | wc -l

Remember those numbers from before? Three million a month? Double that.

I suspect this was due to the index being full, and Linux has now re-populated the index with the next batch. Until it ran full again.

A new syslog entry just created at the same time seems to confirm that theory:

Aug 12 00:09:11 ***** kernel: EXT3-fs warning (device ******): ext3_dx_add_entry: Directory index full!

After running the PERL one-liner again, we deleted another vast amount of files:

Files deleted: 9135386

It seems that the root cause is the added time stamp to the file names of the audit files that Oracle writes in 12.1. The file names are much more unique, which gives Oracle the opportunity to generate so many more of them. Where in previous versions, with an adequately sized file system you’d probably be okay for a year or more; on, on an active database (and our big beast is very active) you have to schedule a job to remove them, and ensure it runs frequently (think 18+ million files in 3 months to put “frequently” into perspective).

Categories: DBA Blogs