Skip navigation.

DBA Blogs

Log Buffer #470: A Carnival of the Vanities for DBAs

Pythian Group - Wed, 2016-04-20 11:39

This Log Buffer Edition rounds up blog posts from Oracle, SQL Server and MySQL.

Oracle:

jq is a unix utility that can parse json files and pull out individual elements – think of it as sed/awk for json files.

Some thoughts about analyzing performance problems.

Microsites: Add a Map Component and Publish your Microsite

New Installation Cookbook: Oracle Linux 6.7 with Oracle 11.2.0.4 RAC

Are you worried about what to do now that Discoverer is almost out of support

Automatic Big Table Caching in RAC

 

SQL Server:

sp_msforeachdb: Improving on an Undocumented Stored Procedure

The Practical Problems of Determining Equality and Equivalence in SQL

TEMPDB Enhancements in SQL Server 2016

Performance Surprises and Assumptions : DATEADD()

Knee-Jerk Performance Tuning : Incorrect Use of Temporary Tables

 

MySQL:

Virtual Hosting with vsftpd + TLS encryption and MySQL on Ubuntu 15.10

MySQL 5.7.12 – Part 4: A new MySQL Command Line Shell

Database Firewall Filter in MaxScale 1.4.1

Orchestrator-agent: How to recover a MySQL database

Rosetta Stone: MySQL, Pig and Spark (Basics)

Categories: DBA Blogs

Partition Storage -- 2 : New Rows Inserted in 12.1 Partitioned Table

Hemant K Chitale - Wed, 2016-04-20 09:41
Following up from the previous post,  given the current sizes of the Partitions (The underlying Tablespace is AUTOALLOCATE)  :

SQL> col segment_name format a30
SQL> col partition_name format a12
SQL> l
1 select segment_name, partition_name, segment_type, bytes/1024
2 from user_segments
3 where segment_name like 'MY_PART_%'
4* order by 1,2
SQL> /

SEGMENT_NAME PARTITION_NA SEGMENT_TYPE BYTES/1024
------------------------------ ------------ ------------------ ----------
MY_PART_TBL P_100 TABLE PARTITION 8192
MY_PART_TBL P_200 TABLE PARTITION 8192
MY_PART_TBL P_300 TABLE PARTITION 8192
MY_PART_TBL P_400 TABLE PARTITION 8192
MY_PART_TBL_NDX P_100 INDEX PARTITION 64
MY_PART_TBL_NDX P_200 INDEX PARTITION 64
MY_PART_TBL_NDX P_300 INDEX PARTITION 64
MY_PART_TBL_NDX P_400 INDEX PARTITION 64

8 rows selected.

SQL>


I shall now insert rows so that a Partition has to grow beyond the first extent.

SQL> insert into my_part_tbl                      
2 select 25, 'New Row'
3 from dual
4 connect by level < 100001;

100000 rows created.

SQL> insert into my_part_tbl
2 select 45, 'New Row'
3 from dual
4 connect by level < 500001;

500000 rows created.

SQL> /

500000 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> select segment_name, partition_name, segment_type, bytes/1024
2 from user_segments
3 where segment_name like 'MY_PART_%'
4 order by 1,2;

SEGMENT_NAME PARTITION_NA SEGMENT_TYPE BYTES/1024
------------------------------ ------------ ------------------ ----------
MY_PART_TBL P_100 TABLE PARTITION 24576
MY_PART_TBL P_200 TABLE PARTITION 8192
MY_PART_TBL P_300 TABLE PARTITION 8192
MY_PART_TBL P_400 TABLE PARTITION 8192
MY_PART_TBL_NDX P_100 INDEX PARTITION 28672
MY_PART_TBL_NDX P_200 INDEX PARTITION 64
MY_PART_TBL_NDX P_300 INDEX PARTITION 64
MY_PART_TBL_NDX P_400 INDEX PARTITION 64

8 rows selected.

SQL>


So, now, the first Partition P_100 has grown to exceed 24MB. Let's check the Extent Sizes.

SQL> select extent_id, blocks, blocks*8192/1024 Size_KB
2 from user_extents
3 where segment_name = 'MY_PART_TBL'
4 and segment_type = 'TABLE PARTITION'
5 and partition_name = 'P_100'
6 order by 1;

EXTENT_ID BLOCKS SIZE_KB
---------- ---------- ----------
0 1024 8192
1 1024 8192
2 1024 8192

SQL>
SQL> l
1 select extent_id, blocks, blocks*8192/1024 Size_KB
2 from user_extents
3 where segment_name = 'MY_PART_TBL_NDX'
4 and segment_type = 'INDEX PARTITION'
5 and partition_name = 'P_100'
6* order by 1
SQL> /

EXTENT_ID BLOCKS SIZE_KB
---------- ---------- ----------
0 8 64
1 8 64
2 8 64
3 8 64
4 8 64
5 8 64
6 8 64
7 8 64
8 8 64
9 8 64
10 8 64
11 8 64
12 8 64
13 8 64
14 8 64
15 8 64
16 128 1024
17 128 1024
18 128 1024
19 128 1024
20 128 1024
21 128 1024
22 128 1024
23 128 1024
24 128 1024
25 128 1024
26 128 1024
27 128 1024
28 128 1024
29 128 1024
30 128 1024
31 128 1024
32 128 1024
33 128 1024
34 128 1024
35 128 1024
36 128 1024
37 128 1024
38 128 1024
39 128 1024
40 128 1024
41 128 1024
42 128 1024

43 rows selected.

SQL>


So, the Table Partition grew as 8MB Extents but the Index Partition used the normal AutoAllocate routine of 64KB Extents for the first 1MB (16 Extents) followed by 1MB Extents.

What might have been the expected size of the Table Partition ?   ** see the UPDATE at the end of this post.

SQL> exec dbms_stats.gather_table_stats('','MY_PART_TBL',granularity=>'ALL');

PL/SQL procedure successfully completed.

SQL> select partition_name, trunc(1.2*avg_row_len*num_rows/1024) Expected_KB
2 from user_tab_partitions
3 where table_name = 'MY_PART_TBL'
4 order by 1
5 /

PARTITION_NA EXPECTED_KB
------------ -----------
P_100 14179
P_200 0
P_300 0
P_400 0
P_MAX 0

SQL>


I use a 20% factor to account for PCTFREE and block/row overheads.  **  see the UPDATE at the end of  this post.

Next, what if I use Parallel Insert ?

SQL> alter table my_part_tbl parallel 4;

Table altered.

SQL> alter session enable parallel dml ;

Session altered.

SQL> insert /*+ PARALLEL (MANUAL) */
2 into my_part_tbl
3 select 125, 'New Row'
4 from dual
5 connect by level < 100001;

100000 rows created.

SQL> commit;

Commit complete.

SQL> insert /*+ PARALLEL (MANUAL) */
2 into my_part_tbl
3 select 125, 'New Row'
4 from dual
5 connect by level < 500001;

500000 rows created.

SQL> commit;

Commit complete.

SQL> insert /*+ PARALLEL (MANUAL) */
2 into my_part_tbl
3 select 125, 'New Row'
4 from dual
5 connect by level < 500001;

500000 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> l
1 select segment_name, partition_name, segment_type, extents, bytes/1024
2 from user_segments
3 where segment_name like 'MY_PART_%'
4* order by 1,2
SQL> /

SEGMENT_NAME PARTITION_NA SEGMENT_TYPE EXTENTS BYTES/1024
------------------------- ------------ ------------------ ---------- ----------
MY_PART_TBL P_100 TABLE PARTITION 3 24576
MY_PART_TBL P_200 TABLE PARTITION 4 32768
MY_PART_TBL P_300 TABLE PARTITION 1 8192
MY_PART_TBL P_400 TABLE PARTITION 1 8192
MY_PART_TBL_NDX P_100 INDEX PARTITION 43 28672
MY_PART_TBL_NDX P_200 INDEX PARTITION 48 33792
MY_PART_TBL_NDX P_300 INDEX PARTITION 1 64
MY_PART_TBL_NDX P_400 INDEX PARTITION 1 64

8 rows selected.

SQL>


Note the size of P_200.  Parallel Insert actually allocated more extents because each PX server process created at least 1 extent for the rows it inserted (Note : For a smaller row-count it is not necessary that each PX server would have inserted rows, some may not have). So, Parallel Insert can result in more allocated space.

** UPDATE :  What about the "expected size" computation based on AVG_ROW_LEN x NUM_ROWS ?

SQL> select partition_name, avg_row_len, num_rows, blocks, trunc(1.2*avg_row_len*num_rows/8192) Expected_Blocks
2 from user_tab_partitions
3 where table_name = 'MY_PART_TBL'
4 order by 1;

PARTITION_NA AVG_ROW_LEN NUM_ROWS BLOCKS EXPECTED_BLOCKS
------------ ----------- ---------- ---------- ---------------
P_100 11 1100001 3022 1772
P_200 12 1100001 3668 1933
P_300 26 1 1006 0
P_400 28 1 1006 0
P_MAX 0 0 0 0

SQL>


This shows how any"formula" based on AVG_ROW_LEN and NUM_ROWS can be inaccurate in forecasting the expected size (blocks used).  Also note that P_100 was populated with a Serial Insert while P_200 was populated with a Parallel (Degree=4) Insert.
.
.
.



Categories: DBA Blogs

No Arguments Here – Use xargs for File Management Performance

Pythian Group - Wed, 2016-04-20 09:30

 

Database Administrators and System Administrators have this in common: managing a large number of log files is just part of the job on Linux systems.

Tools such as logrotate significantly simplify the file management task for routinely created log files. Even so, there are still many ‘opportunities’ to exercise your command line fu to manage thousands or millions of files.  These may be files that need to be moved, removed or searched.

When the files span multiple directories the find command is often used. The following command for instance will find all log files of a certain age and size and remove them.

find . -name "*.log" -size +1M -exec rm  {} \;

 

For a few files this will work just fine, but what happens if the number of files to be processed is several thousands, or even millions?

The xargs Difference

Let’s first create 200k files to use for testing. These files will all be empty, there is no need for any content for these tests.

The script create.sh can be used to create the directories and empty files.

As it takes some time to create the files, we will not use the rm command here, but rather just the file command. The command will be timed as well.

#  time find . -type f -name file_\* -exec file {} \; >/dev/null

real    1m24.764s
user    0m4.624s
sys     0m12.581s

Perhaps 1 minute and 24 seconds seems to be a reasonable amount of time to process so many files.

It isn’t.

Let’s use a slightly different method to process these files, this time by adding xargs in a command pipe.

 time find . -type f -name file_\* | xargs file >/dev/null

real    0m0.860s
user    0m0.456s
sys     0m0.432s

Wait, what?!  0.8 seconds? Can that be correct?

Yes, it is correct. Using xargs with find can greatly reduce the resources needed to iterate through files.

How then, is is possible for the command that used xargs to complete so much faster than the command that did not use xargs?

When iterating through a list of files with the -exec  argument to the find command, a new shell is forked for each execution of find.

For a large number of files this requires a lot of resources.

For demonstration purposes I will be using the ‘file’ command rather than ‘rm’.

Could it be that the xargs method may have benefited from the caching effects of running the first find command?

Could be – let’s run find … -exec again and see if it benefits from caching.

# time find . -type f -name file_\* -exec file {} \; >/dev/null

real    1m25.722s
user    0m3.900s
sys     0m11.893s


Clearly any caching didn’t help find … -exec.

Why Is xargs Fast?

Why is the use of xargs so much faster than find? In short it is due to find starting a new process for each file it finds when the -exec option is used.

The command ‘find | xargs’ was wrapped in a shell script find-xargs.sh to facilitate the use of strace.

The find-xargs.sh script takes 2 arguments; the number of files to pipe to xargs and the number files that xargs should send to the file command for each invocation of file.

The number of files to process is controlled by piping the output of find to head.

The xargs –max-args argument is used to control how many arguments are sent to each invocation of find.

We can now use strace with the -c option; -c accumulates a count of all calls along with timing information.

Calling the script to run for the first 10000 files, with 1000 files sent to each invocation of find:

# strace -c -f  ./find-xargs.sh 10000 1000
MAX_FILES: 10000
MAX_ARGS: 1000
Process 11268 attached
Process 11269 attached
...
Process 11267 resumed
Process 11269 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.55    0.080017        5001        16         2 wait4
  0.35    0.000280           0     12372           newfstatat
  0.09    0.000074           0       208           getdents
  0.01    0.000006           0     10000           lstat
  0.00    0.000000           0       199           read
  0.00    0.000000           0       276         1 write
  0.00    0.000000           0       384        91 open
  0.00    0.000000           0       313         4 close
  0.00    0.000000           0        68        42 stat
  0.00    0.000000           0       189           fstat
  0.00    0.000000           0         5         1 lseek
  0.00    0.000000           0       209           mmap
  0.00    0.000000           0        71           mprotect
  0.00    0.000000           0        37           munmap
  0.00    0.000000           0        72           brk
  0.00    0.000000           0        41           rt_sigaction
  0.00    0.000000           0        80           rt_sigprocmask
  0.00    0.000000           0         2           rt_sigreturn
  0.00    0.000000           0        13        12 ioctl
  0.00    0.000000           0        77        77 access
  0.00    0.000000           0         2           pipe
  0.00    0.000000           0         6           dup2
  0.00    0.000000           0         1           getpid
  0.00    0.000000           0        14           clone
  0.00    0.000000           0        14           execve
  0.00    0.000000           0         2           uname
  0.00    0.000000           0         4         1 fcntl
  0.00    0.000000           0       206           fchdir
  0.00    0.000000           0         5           getrlimit
  0.00    0.000000           0         1           getuid
  0.00    0.000000           0         1           getgid
  0.00    0.000000           0         1           geteuid
  0.00    0.000000           0         1           getegid
  0.00    0.000000           0         1           getppid
  0.00    0.000000           0         1           getpgrp
  0.00    0.000000           0        14           arch_prctl
  0.00    0.000000           0         2         1 futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.080377                 24910       232 total

The largest chunk of time was spent in the wait4 system call. These are waits on execve, of which there were 14.

Of the 14 calls to execve, there was 1 each for the use of bash (the script itself), find, head and xargs, leaving 10 calls to be consumed by file.

The following command can be used if you would like to try this yourself:

strace  -f -e trace=execve  ./find-xargs.sh 10000 1000  2>&1 | grep execve

What happens when the same type of test is run against find with the -exec argument?

There is no method (that I can find in the man page anyway) by which we can limit the number of files that are sent to the program specified in the -exec argument of find.

We can still learn what is going on, it is just necessary to wait 1.5 minutes for the command to complete.

# strace -c -f find . -type f -name file_\*  -exec file {} \; >/dev/null

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.80    4.101094          21    200000           wait4
  0.69    0.029305           0    200000           clone
  0.46    0.019278           0   2602351   1400007 open
  0.44    0.018833           0    600001           munmap
  0.31    0.013108           0   3200017           mmap
  0.30    0.012715           0   1401173           fstat
  0.16    0.006979           0   1200006   1200006 access
  0.15    0.006543           0   1202345           close
  0.15    0.006288           0   1000004    600003 stat
  0.13    0.005632           0   1000004           read
  0.12    0.004981           0    200000           lstat
  0.09    0.003704           0    600026           brk
  0.07    0.003016           0   1000009           mprotect
  0.07    0.002776           0    200001    200000 ioctl
  0.03    0.001079           0    201169           newfstatat
  0.02    0.000806           0      2347           getdents
  0.01    0.000600           0    200000           write
  0.00    0.000003           0    200001           arch_prctl
  0.00    0.000002           0    202341           fchdir
  0.00    0.000000           0         3           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0    400001    200000 execve
  0.00    0.000000           0         1           uname
  0.00    0.000000           0         1           fcntl
  0.00    0.000000           0         2           getrlimit
  0.00    0.000000           0         2         1 futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    4.236742              15811808   3600017 total

 

You may have noticed there are twice as many calls to execve than there were files to process.

This is due to something referenced in the comments of find-xarg.sh. Unless a full path name is specified when running a command, the PATH variable is searched for that command. If the command is not found by the first invocation of execve, then another attempt is made the next directory in PATH.

The following example shows the difference between using the command name only, and then using the fully pathed name of the file command.

# strace -e trace=execve -f find -maxdepth 1 -type f -name \*.sh  -exec file {} \;  2>&1 | grep execve
execve("/usr/bin/find", ["find", "-maxdepth", "1", "-type", "f", "-name", "*.sh", "-exec", "file", "{}", ";"], [/* 83 vars */]) = 0
[pid  9267] execve("/usr/local/bin/file", ["file", "./find-xargs.sh"], [/* 83 vars */]) = -1 ENOENT (No such file or directory)
[pid  9267] execve("/usr/bin/file", ["file", "./find-xargs.sh"], [/* 83 vars */]) = 0
[pid  9268] execve("/usr/local/bin/file", ["file", "./create.sh"], [/* 83 vars */]) = -1 ENOENT (No such file or directory)
[pid  9268] execve("/usr/bin/file", ["file", "./create.sh"], [/* 83 vars */]) = 0
[pid  9269] execve("/usr/local/bin/file", ["file", "./distribution.sh"], [/* 83 vars */]) = -1 ENOENT (No such file or directory)
[pid  9269] execve("/usr/bin/file", ["file", "./distribution.sh"], [/* 83 vars */]) = 0

# strace -e trace=execve -f find -maxdepth 1 -type f -name \*.sh  -exec /usr/bin/file {} \;  2>&1 | grep execve
execve("/usr/bin/find", ["find", "-maxdepth", "1", "-type", "f", "-name", "*.sh", "-exec", "/usr/bin/file", "{}", ";"], [/* 83 vars */]) = 0
[pid  9273] execve("/usr/bin/file", ["/usr/bin/file", "./find-xargs.sh"], [/* 83 vars */]) = 0
[pid  9274] execve("/usr/bin/file", ["/usr/bin/file", "./create.sh"], [/* 83 vars */]) = 0
[pid  9275] execve("/usr/bin/file", ["/usr/bin/file", "./distribution.sh"], [/* 83 vars */]) = 0
Too Much Space

Regardless of how bad a practice it may be, there will be times that file and directory names may contain space characters. Literal spaces, newlines and tabs can all play havoc with file name processing;  xargs has you covered.

Two files are created to demonstrate:

 

# touch 'this filename has spaces' this-filename-has-no-spaces

# ls -l
total 0
-rw-r--r-- 1 jkstill dba 0 Apr 15 09:28 this filename has spaces
-rw-r--r-- 1 jkstill dba 0 Apr 15 09:28 this-filename-has-no-spaces

What happens when the output of find it piped to xargs?

 

 find . -type f | xargs file
./this-filename-has-no-spaces: empty
./this:                        ERROR: cannot open `./this' (No such file or directory)
filename:                      ERROR: cannot open `filename' (No such file or directory)
has:                           ERROR: cannot open `has' (No such file or directory)
spaces:                        ERROR: cannot open `spaces' (No such file or directory)

The spaces in one of the filenames causes xargs to treat each word in the filename as a separate file.

Because of this it is a good idea to use the -print0 and -0 args as seen in the following example. These arguments change the output terminator of find to the null character, as well as changing the input terminator of xargs to the null character to deal with space characters in file and directory names.

 

 find . -type f -print0  | xargs -0 file
./this-filename-has-no-spaces: empty
./this filename has spaces:    empty

There is quite a bit more to xargs than this, I would encourage you to read the man page and experiment with the options to better learn how to make use of it.

Hope For find

For many versions of GNU find there is an easy modification that can be made to the command line that will cause the -exec option to emulate the method xargs uses pass input to a command.

Simply by changing -exec command {} \; to  -exec command {} +, the find command will execute much faster than previously.

Here the find command has matched the performance of xargs when processing 200k files:

 

# time find . -type f -name file_\*  -exec file {} +  | wc
 200000  400000 8069198

real    0m0.801s
user    0m0.436s
sys     0m0.404s

This may mean a quick and simple change to maintenance scripts can yield a very large increase in performance.

Does this mean there is no longer a need for xargs?  Not really, as xargs offers levels of control over the input to piped commands that simply are not available in the find command.

If you’ve never used xargs, you should consider doing so, as it can reduce the resource usages on your systems and decrease the runtime for maintenance tasks.

Categories: DBA Blogs

Jonathan Lewis

Bobby Durrett's DBA Blog - Tue, 2016-04-19 17:09

I am finally getting around to finishing my four-part blog series on people who have had the most influence on my Oracle performance tuning work. The previous three people were Craig ShallahamerDon Burleson, and Cary Millsap. The last person is Jonathan Lewis. These four people, listed and blogged about in chronological order, had the most influence on my understanding of how to do Oracle database performance tuning. There are many other great people out there and I am sure that other DBAs would produce their own, different, list of people who influenced them. But this list reflects my journey through my Oracle database career and the issues that I ran into and the experiences that I had. I ran into Jonathan Lewis’ work only after years of struggling with query tuning and getting advice from others. I ran into his material right around the time that I was beginning to learn about how the Oracle optimizer worked and some of its limits. Jonathan was a critical next step in my understanding of how Oracle’s optimizer worked and why it sometimes failed to pick the most efficient way to run a query.

Jonathan has produced many helpful tuning resources including his blog, his participation in online forums, and his talks at user group conferences, but the first and most profound way he taught me about Oracle performance tuning was through his query tuning book Cost-Based Oracle Fundamentals. It’s $30 on Amazon and that is an incredibly small amount of money to pay compared to the value of the material inside the book. I had spent many hours over several years trying to understand why the Oracle optimizer some times choses the wrong way to run a query. In many cases the fast way to run something was clear to me and the optimizer’s choices left me stumped. The book helped me better understand how the Oracle optimizer chooses what it thinks is the best execution plan. Jonathan’s book describes the different parts of a plan – join types, access methods, etc. – and how the optimizer assigns a cost to the different pieces of a plan. The optimizer chooses the plan with the least cost, but if some mistake causes the optimizer to calculate an unrealistic cost then it might choose a poor plan. Understanding why the optimizer would choose a slow plan helped me understand how to resolve performance issues or prevent them from happening, a very valuable skill.

There is a lot more I could say about what I got from Jonathan Lewis’ book including just observing how he operated. Jonathan filled his book with examples which show concepts that he was teaching. I think that I have emulated the kind of building of test scripts that you see throughout his book and on his blog and community forums. I think I have emulated not only Jonathan’s approach but the approaches of all four of the people who I have spotlighted in this series. Each have provided me with profoundly helpful technical information that has helped me in my career. But they have also provided me with a pattern of what an Oracle performance tuning practitioner looks like. What kind of things do they do? To this point in my career I have found the Oracle performance tuning part of my job to be the most challenging and interesting and probably the most valuable to my employers. Jonathan Lewis and the three others in this four-part series have been instrumental in propelling me along this path and I am very appreciative.

Bobby

Categories: DBA Blogs

USING SELECT 'X' in query/subqueries.

Learn DB Concepts with me... - Tue, 2016-04-19 13:26
 
USING SELECT 'X' in query/sub-queries.


--------------------------------------------------------
--  DDL for Table TAB1
--------------------------------------------------------

  CREATE TABLE "ATEST"."TAB1"
   (    "ID" NUMBER,
    "NAME" VARCHAR2(20 BYTE)
   ) ;

Insert into ATEST.TAB1 (ID,NAME) values (1,'AAA');
Insert into ATEST.TAB1 (ID,NAME) values (2,'BBB');
Insert into ATEST.TAB1 (ID,NAME) values (3,'EEE');
Insert into ATEST.TAB1 (ID,NAME) values (4,'FFF');


--------------------------------------------------------
--  DDL for Table TAB2
--------------------------------------------------------

  CREATE TABLE "ATEST"."TAB2"
   (    "ID" NUMBER,
    "NAME" VARCHAR2(20 BYTE)
   ) ;

Insert into ATEST.TAB2 (ID,NAME) values (1,'CCC');
Insert into ATEST.TAB2 (ID,NAME) values (2,'DDD');


Get records that exits in TAB1 and not in TAB2 using select 'X' :


select * from TAB1 f where not exists (select 'X' from TAB2 where id=f.id);
ID    NAME
--    ---- 
4    FFF
3    EEE

IN the above query we get output of all the records from TAB1 that doesnt match with TAB2 ID's.
Hence we do not get the records with ID's 1 & 2 as they only exits in TAB1.
This is just like using "select * from TAB1 f where not exists (select ID from TAB2 where id=f.id);"


Get records that exits in TAB1 and in TAB2 using select 'X' :


select * from TAB1 f where exists (select 'X' from TAB2 where id=f.id);

ID    NAME
--    ---- 
1    AAA
2    BBB

IN the above query we get output of all the records from TAB1 that exist with same ID in TAB2 .
Hence we get only records with ID 1 & 2 as they exists in both TABLES.
This is just like using "select * from TAB1 f where exists (select ID from TAB2 where id=f.id);"
Categories: DBA Blogs

Data is Everything, and Everything is Data

Pythian Group - Tue, 2016-04-19 09:33
Exploring the phenomenon of “datafication”

In the year 2000, only a quarter of the world’s stored information was digital; the rest was on paper, film, and other analog media. Today, less than two percent of all stored information is nondigital. (1)

This is largely the result of “datafication”, a process that turns all aspects of life—preferences, opinions, telephone calls and sensor-driven information—into data.

Datafication is the driving force behind Big Data. It’s also causing a threefold shift in how we look for meaning in the information available to us: away from traditional sampling approaches, toward greater tolerance of messy, unstructured data, and into the search for correlations rather than absolute, singular causes to explain trends and events. These changes are already having major impacts in every area of our lives—from scientific research to business and finance, to healthcare, education and transportation.

Watch this video of Pythian President and CEO Paul Vallée, as he talks about datification and generating revenue.

From sampling to knowing

Representative sampling is based on the idea that, within a certain margin of error, we can make inferences about a total population from a small, randomized subset. This works well for simple questions like, “Which of our customers generate the most revenue?” but lacks the detail to effectively answer queries like, “Which customers are most profitable?” or, “Which customers are considering to leave us for another vendor?”

Inexpensive computer memory, powerful processors and sophisticated algorithms now allow us to analyze vast amounts of data rather than small samples. Using Big Data in this way has the considerable advantage of predictive capability—it can identify patterns and trends that aren’t detectable in a small sample, giving an unprecedented view of future behavior.

From clean to messy

What’s new about Big Data isn’t just that there’s lots of it. Because it comes from many different sources in many different formats, it’s not tidy like traditional datasets. Tolerating some inaccuracy may require data analysts to shift their outlooks a little, but when you’re trying to answer big, complex questions, the gain in data scope is a good trade-off against using smaller amounts of very exact data. Here’s an example.

In 2009, Google showed it’s possible to predict locations of seasonal outbreaks of the flu using nothing more than archived records of Google searches. The sheer size of the data set (think a billion searches a day in the U.S. alone) more than compensated for its messiness. After running nearly half a billion calculations against the data, Google identified 45 terms—words such as “headache” and “runny nose”—that had a strong correlation with the CDC’s data on flu outbreaks.

From cause to correlation

The Google example points to a third change brought about by datafication and Big Data: abandoning the search for certainty. Instead of looking for causes, innovative data users are looking for correlations. For example, automotive and aviation engineers are collecting and analyzing massive quantities of information on engines that have failed, looking for patterns that will help them predict when other engines might be at risk of failing in the future. They’re not seeking a single cause for a single event; they’re mapping correlations between huge numbers of events to recognize patterns that can be put to practical, preventative use.

The correlation approach has been used to spot infections in premature babies before overt symptoms appear and to predict everything from manhole cover failures to consumer purchasing habits.

Big Data insights require Big Thinking

Harnessing the powerful, often unpredictable, insights available from Big Data requires three things: as complete a dataset as possible, people with the skills required to collect, manage and analyze that data, and people who know how to ask unexpected, even visionary questions. It’s not just a matter of the right technologies—it’s about a fundamental shift in how we relate to data and what can be done with it.
Sources
1. https://www.foreignaffairs.com/articles/2013-04-03/rise-big-data

Categories: DBA Blogs

Partition Storage -- 1 : Default Partition Sizes in 12c

Hemant K Chitale - Tue, 2016-04-19 09:17
11g 11.2.0.2 introduced a change whereby the default Initial Extent of a Table Partition was 8MB.  However, this did not apply to Index Partitions which could still start with 64KB extents in an AutoAllocate Tablespace.

12cR1 now introduces a parameter to enable large Initial Extent for Index Partitions as well.

SQL> connect / as sysdba                          
Connected.
SQL> select banner from v$version where banner like 'Oracle Database%';

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

SQL> show parameter deferred

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
deferred_segment_creation boolean TRUE
SQL>
SQL> connect hemant/hemant
Connected.
SQL> create table my_part_tbl (id_column number(6), data_column varchar2(100))
2 partition by range(id_column)
3 (partition p_100 values less than (101),
4 partition p_200 values less than (201),
5 partition p_300 values less than (301),
6 partition p_400 values less than (401),
7 partition p_max values less than (maxvalue))
8 /

Table created.

SQL> create index my_part_tbl_ndx on my_part_tbl(id_column) local;

Index created.

SQL>
SQL> select segment_name, partition_name, segment_type, bytes/1024
2 from user_segments
3 where segment_name like 'MY_PART_%'
4 order by 1,2;

no rows selected

SQL> 
SQL> insert into my_part_tbl values (51,'Fifty One');

1 row created.

SQL> insert into my_part_tbl values (151,'One Hundred Fifty One');

1 row created.

SQL> insert into my_part_tbl values (251, 'Two Hundred Fifty One');

1 row created.

SQL> select segment_name, partition_name, segment_type, bytes/1024
2 from user_segments
3 where segment_name like 'MY_PART_%'
4 order by 1,2;

SEGMENT_NAME PARTITION_NA SEGMENT_TYPE BYTES/1024
------------------------------ ------------ ------------------ ----------
MY_PART_TBL P_100 TABLE PARTITION 8192
MY_PART_TBL P_200 TABLE PARTITION 8192
MY_PART_TBL P_300 TABLE PARTITION 8192
MY_PART_TBL_NDX P_100 INDEX PARTITION 64
MY_PART_TBL_NDX P_200 INDEX PARTITION 64
MY_PART_TBL_NDX P_300 INDEX PARTITION 64

6 rows selected.

SQL>


I can enable large Index Partition Extent with a parameter (which can be set with ALTER SESSION)


SQL> alter session set "_index_partition_large_extents"=TRUE;

Session altered.

SQL> insert into my_part_tbl values (351,'Three Hundred Fifty One');

1 row created.

SQL> select segment_name, partition_name, segment_type, bytes/1024
2 from user_segments
3 where segment_name like 'MY_PART_%'
4 order by 1,2;

SEGMENT_NAME PARTITION_NA SEGMENT_TYPE BYTES/1024
------------------------------ ------------ ------------------ ----------
MY_PART_TBL P_100 TABLE PARTITION 8192
MY_PART_TBL P_200 TABLE PARTITION 8192
MY_PART_TBL P_300 TABLE PARTITION 8192
MY_PART_TBL P_400 TABLE PARTITION 8192
MY_PART_TBL_NDX P_100 INDEX PARTITION 64
MY_PART_TBL_NDX P_200 INDEX PARTITION 64
MY_PART_TBL_NDX P_300 INDEX PARTITION 64
MY_PART_TBL_NDX P_400 INDEX PARTITION 8192

8 rows selected.

SQL>


However, I can rebuild the Index Partition Extent as well :

SQL> alter index my_part_tbl_ndx rebuild partition p_400 storage (initial 64K);

Index altered.

SQL> select segment_name, partition_name, segment_type, bytes/1024
2 from user_segments
3 where segment_name like 'MY_PART_%'
4 order by 1,2;

SEGMENT_NAME PARTITION_NA SEGMENT_TYPE BYTES/1024
------------------------------ ------------ ------------------ ----------
MY_PART_TBL P_100 TABLE PARTITION 8192
MY_PART_TBL P_200 TABLE PARTITION 8192
MY_PART_TBL P_300 TABLE PARTITION 8192
MY_PART_TBL P_400 TABLE PARTITION 8192
MY_PART_TBL_NDX P_100 INDEX PARTITION 64
MY_PART_TBL_NDX P_200 INDEX PARTITION 64
MY_PART_TBL_NDX P_300 INDEX PARTITION 64
MY_PART_TBL_NDX P_400 INDEX PARTITION 64

8 rows selected.

SQL>


In the next post, we'll see more Extents for the Partitions.
,
,
,
Categories: DBA Blogs

SQL Server Dates, Dates and More Dates

Pythian Group - Tue, 2016-04-19 08:44

 

Working with SQL Server date functions can be frustrating. This purpose of this blog is to share some date statements I use regularly, especially when doing business Intelligence and DataWarehouse solutions.

I hope you find them useful and if you have any questions or any more useful statements in relation to dates in SQL Server, please feel free to leave them in the comments below

----Today
SELECT GETDATE() 'Today'
----Yesterday
SELECT DATEADD(d , -1 , GETDATE()) 'Yesterday'
----First Day of Current Week
SELECT DATEADD(wk , DATEDIFF(wk , 0 , GETDATE()) , 0) 'First Day of Current Week'
----Last Day of Current Week
SELECT DATEADD(wk , DATEDIFF(wk , 0 , GETDATE()) , 6) 'Last Day of Current Week'
----First Day of Last Week
SELECT DATEADD(wk , DATEDIFF(wk , 7 , GETDATE()) , 0) 'First Day of Last Week'
----Last Day of Last Week
SELECT DATEADD(wk , DATEDIFF(wk , 7 , GETDATE()) , 6) 'Last Day of Last Week'
----First Day of Current Month
SELECT DATEADD(mm , DATEDIFF(mm , 0 , GETDATE()) , 0) 'First Day of Current Month'
----Last Day of Current Month
SELECT DATEADD(ms , -3 , DATEADD(mm , 0 , DATEADD(mm , DATEDIFF(mm , 0 , GETDATE()) + 1 , 0))) 'Last Day of Current Month'
----First Day of Last Month
SELECT DATEADD(mm , -1 , DATEADD(mm , DATEDIFF(mm , 0 , GETDATE()) , 0)) 'First Day of Last Month'
----Last Day of Last Month
SELECT DATEADD(ms , -3 , DATEADD(mm , 0 , DATEADD(mm , DATEDIFF(mm , 0 , GETDATE()) , 0))) 'Last Day of Last Month'
----First Day of Current Year
SELECT DATEADD(yy , DATEDIFF(yy , 0 , GETDATE()) , 0) 'First Day of Current Year'
----Last Day of Current Year
SELECT DATEADD(ms , -3 , DATEADD(yy , 0 , DATEADD(yy , DATEDIFF(yy , 0 , GETDATE()) + 1 , 0))) 'Last Day of Current Year'
----First Day of Last Year
SELECT DATEADD(yy , -1 , DATEADD(yy , DATEDIFF(yy , 0 , GETDATE()) , 0)) 'First Day of Last Year'
----Last Day of Last Year
SELECT DATEADD(ms , -3 , DATEADD(yy , 0 , DATEADD(yy , DATEDIFF(yy , 0 , GETDATE()) , 0))) 'Last Day of Last Year'
Categories: DBA Blogs

Internals of Querying the Concurrent Requests’ Queue – Revisited for R12.2

Pythian Group - Tue, 2016-04-19 08:01

Once upon a time I wrote about the Internal Workflow of an E-Business Suite Concurrent Manager Process. Many things have changed since that blog post, the most obvious change being the release of Oracle e-Business Suite R12.2. I decided to check if the way the concurrent manager queues were processed by concurrent manager processes were still the same. My main goal was to see if the manager processes still don’t attempt any way of coordination to distribute the requests among them.

This is how I did the testing:

  • I used the VM templates provided by Oracle to build my R12.2.4 test environment. By the way, I didn’t expect that the process of getting the environment up would be so simple! Downloading the media files from edelivery.oracle.com was the most time-consuming step, once done – it took me just 1 hour to un-compress everything, import the Virtual Assembly file and bring up the R12.2.4 environment on my laptop.
  • 3 Standard managers are defined by default
  • Sleep seconds were left as is = 30 seconds
  • Cache size was increased from 1 to 5.
  • Identified the 3 DB processes that belong to the Standard managers:
    select sid, serial# from v$session where module='e:FND:cp:STANDARD'
  • I enabled tracing with binds and waits for each of them like this:
    exec dbms_monitor.session_trace_enable(sid,serial#,true,true);
  • Once that was done I submitted one concurrent program – “Active users” and waited for it to complete.
  • I disabled the tracing and collected the trace files.
    exec dbms_monitor.session_trace_disable(sid,serial#);
  • Collected the trace files

I found 2 of the trace files to be very interesting. To make things more simple, the manager process “A” will be the one that executed the concurrent request, and process “B” will be the one that didn’t.

Before the “Active Users” Request Was Submitted

No other requests were running at the time I did the testing, so I clearly observed how both Managers A and B queried the FND_CONCURRENT_REQUESTS table BOTH of the trace files displayed the same method of how requests are picked up from the queue. Note, I’m showing only the lines relevant to display the main query only, and I have formatted the query text to make it more readable:

PARSING IN CURSOR #139643743645920 len=1149 dep=0 uid=100 oct=3 lid=100 tim=1460211399835915 hv=3722997734 ad='d275f750' sqlid='cd23u4zfyhvz6'
SELECT R.Rowid
FROM Fnd_Concurrent_Requests R
WHERE R.Hold_Flag                             = 'N'
AND R.Status_Code                             = 'I'
AND R.Requested_Start_Date                   &amp;amp;amp;amp;amp;lt;= Sysdate
AND (R.Node_Name1                            IS NULL
OR (R.Node_Name1                             IS NOT NULL
AND FND_DCP.target_node_mgr_chk(R.request_id) = 1))
AND (R.Edition_Name                          IS NULL
OR R.Edition_Name                            &amp;amp;amp;amp;amp;lt;= sys_context('userenv', 'current_edition_name'))
AND EXISTS
  (SELECT NULL
  FROM Fnd_Concurrent_Programs P
  WHERE P.Enabled_Flag         = 'Y'
  AND R.Program_Application_Id = P.Application_Id
  AND R.Concurrent_Program_Id  = P.Concurrent_Program_Id
  AND EXISTS
    (SELECT NULL
    FROM Fnd_Oracle_Userid O
    WHERE R.Oracle_Id = O.Oracle_Id
    AND EXISTS
      (SELECT NULL
      FROM Fnd_Conflicts_Domain C
      WHERE P.Run_Alone_Flag = C.RunAlone_Flag
      AND R.CD_Id            = C.CD_Id
      )
    )
  AND (P.Execution_Method_Code                          != 'S'
  OR (R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) IN ((0,98),(0,100),(0,31721),(0,31722),(0,31757)))
  )
AND ((R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) NOT IN ((510,40032),(510,40033),(510,42156),(510,42157),(530,43793),(530,43794),(535,42626),(535,42627),(535,42628)))
ORDER BY NVL(R.priority, 999999999),
  R.Priority_Request_ID,
  R.Request_ID
END OF STMT
EXEC #139643743645920:c=0,e=33,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211399835910
FETCH #139643743645920:c=0,e=546,p=0,cr=106,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211399836507
WAIT #139643743645920: nam='SQL*Net message to client' ela= 3 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211399836572

*** 2016-04-09 10:17:09.837
WAIT #139643743645920: nam='SQL*Net message from client' ela= 30000367 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211429836965
...
EXEC #139643743645920:c=0,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211429838767
FETCH #139643743645920:c=0,e=689,p=0,cr=106,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211429839587
WAIT #139643743645920: nam='SQL*Net message to client' ela= 4 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211429839652

*** 2016-04-09 10:17:39.840
WAIT #139643743645920: nam='SQL*Net message from client' ela= 30000325 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211459840003
...

It’s important to observe that:

  • All manager’s processes still compete for the same requests. If the query is executed at the same time, the same list of concurrent requests will be retrieved by all processes.
  • The constants literals used in lines 30-32 mean that the query for checking the queue is still built when the concurrent manager process starts up. These constants are mainly used to implement the specializations rules in the query.
  • Only rowid for the pending requests’ rows in FND_CONCURRENT_REQUESTS are fetched.
  • The sleep time is clearly visible on lines 41,42 and 48,49
After the “Active Users” Request Was Submitted – Starting the Concurrent Request

The manager process A was the first to pick up the submitted requests and it could be observed by the “r=1” (1 row fetched) in the FETCH call for the query we just reviewed:

FETCH #139643743645920:c=0,e=437,p=0,cr=113,cu=0,mis=0,r=1,dep=0,og=1,plh=3984653669,tim=1460211519844640

Immediately after this, the manager process A locked the row in FND_CONCURRENT_REQUESTS table, this way, the request got assigned to this process. Notice the similar where predicates used in this query, these are actually required to make sure that the request is still not picked up by another manager process. However the main thing here is the fact that the request row is accessed by the “rowid” retrieved earlier (row 45, the value of the bind variable “:reqname” is “AAAjnSAA/AAAyn1AAH” in this case). Locking of the row is done by the “FOR UPDATE OF R.status_code NoWait” clause on line 49:

PARSING IN CURSOR #139643743640368 len=4530 dep=0 uid=100 oct=3 lid=100 tim=1460211519864113 hv=4239777398 ad='cde86338' sqlid='6ya6bzgybbrmq'
SELECT R.Conc_Login_Id,
  R.Request_Id,
  ... excluded other 156 columns for brevity...
FROM fnd_concurrent_requests R,
  fnd_concurrent_programs P,
  fnd_application A,
  fnd_user U,
  fnd_oracle_userid O,
  fnd_conflicts_domain C,
  fnd_concurrent_queues Q,
  fnd_application A2,
  fnd_executables E,
  fnd_conc_request_arguments X
WHERE R.Status_code             = 'I'
AND (R.Edition_Name            IS NULL
OR R.Edition_Name              &amp;amp;amp;amp;amp;lt;= sys_context('userenv', 'current_edition_name'))
AND R.Request_ID                = X.Request_ID(+)
AND R.Program_Application_Id    = P.Application_Id(+)
AND R.Concurrent_Program_Id     = P.Concurrent_Program_Id(+)
AND R.Program_Application_Id    = A.Application_Id(+)
AND P.Executable_Application_Id = E.Application_Id(+)
AND P.Executable_Id             = E.Executable_Id(+)
AND P.Executable_Application_Id = A2.Application_Id(+)
AND R.Requested_By              = U.User_Id(+)
AND R.Cd_Id                     = C.Cd_Id(+)
AND R.Oracle_Id                 = O.Oracle_Id(+)
AND Q.Application_Id            = :q_applid
AND Q.Concurrent_Queue_Id       = :queue_id
AND (P.Enabled_Flag            IS NULL
OR P.Enabled_Flag               = 'Y')
AND R.Hold_Flag                 = 'N'
AND R.Requested_Start_Date     &amp;amp;amp;amp;amp;lt;= Sysdate
AND ( R.Enforce_Seriality_Flag  = 'N'
OR ( C.RunAlone_Flag            = P.Run_Alone_Flag
AND (P.Run_Alone_Flag           = 'N'
OR NOT EXISTS
  (SELECT NULL
  FROM Fnd_Concurrent_Requests Sr
  WHERE Sr.Status_Code         IN ('R', 'T')
  AND Sr.Enforce_Seriality_Flag = 'Y'
  AND Sr.CD_id                  = C.CD_Id
  ))))
AND Q.Running_Processes                                     &amp;amp;amp;amp;amp;lt;= Q.Max_Processes
AND R.Rowid                                                  = :reqname
AND ((P.Execution_Method_Code                               != 'S'
OR (R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID)       IN ((0,98),(0,100),(0,31721),(0,31722),(0,31757))))
AND ((R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) NOT IN ((510,40032),(510,40033),(510,42156),(510,42157),(530,43793),(530,43794),(535,42626),(535,42627),(535,42628))) 
FOR UPDATE OF R.status_code NoWait

The behavior of the manager process B was a little bit more interesting. It too managed to fetch the same rowid from FND_CONCURRENT_PROCESSES table belonging to the submitted “Active Users” processes. However, when it tried to lock the row in FND_CONCURRENT_REQUESTS (By using exactly the same query), this happened:

PARSING IN CURSOR #139690311998256 len=4530 dep=0 uid=100 oct=3 lid=100 tim=1460211519900924 hv=4239777398 ad='cde86338' sqlid='6ya6bzgybbrmq'
...
BINDS #139690311998256:
...
Bind#2
  oacdty=01 mxl=32(18) mxlc=00 mal=00 scl=00 pre=00
  oacflg=20 fl2=1000001 frm=01 csi=873 siz=0 off=64
  kxsbbbfp=7f0c2f713f20  bln=32  avl=18  flg=01
  value="AAAjnSAA/AAAyn1AAH"
EXEC #139690311998256:c=1000,e=1525,p=0,cr=25,cu=1,mis=0,r=0,dep=0,og=1,plh=4044729389,tim=1460211519902727
ERROR #139690311998256:err=54 tim=1460211519902750

The query failed with “ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired”.
This is how the access to pending concurrent requests is serialized to make sure only one of the manager processes can run it. And, I think, relying on the well-tuned and highly efficient locking mechanism of Oracle Database is a very very smart idea.

Conclusions
  • The coordination between manager processes is still not happening to distribute the requests, but the managers all query the queue the same way and then compete between themselves to lock the requests’ entries on the table 1st. The process that gets the lock also gets to execute the concurrent request.
  • The cache size variable couldn’t be observed in the trace files, but as far as I remember from my previous research the process would only fetch “cache size”-number of rowids using the 1st query in this post. This could be tested by submitting larger volume of requests simultaneously.
  • The “sleep seconds” kicks in only when the manager process didn’t fetch any rowids from the queue. After all the cached requests are attempted/executed by the manager process, the queue is checked again immediately without waiting for the “sleep seconds” (Not explained in detail in this post, but it’s revealed in the trace files)
  • The DMLs used to query the FND_CONCURRENT_REQUESTS and to lock the row are very very similar to Pre-R12.2 releases of e-Business Suite (Another sign that the process hasn’t changed, though one change that I see is the addition of where clause predicates for Checking the Editions).
Categories: DBA Blogs

Links for 2016-04-18 [del.icio.us]

Categories: DBA Blogs

Partner Webcast – Collaboration in a digital world with Oracle Documents Cloud

The modern digital workplace requires a new set of capabilities, one that leverages the best in consumerization, social engagement and business empowerment. The key to this transformation are the...

We share our skills to maximize your revenue!
Categories: DBA Blogs

More Effective Anti-Entropy Repair in Cassandra

Pythian Group - Mon, 2016-04-18 12:12
1. Introduction

Cassandra offers three different repair mechanisms to make sure data from different replicas are consistent: Hinted Hand-off, Read-Repair, and Anti-Entropy Repair.

The first two mechanisms are kind of “automatic” mechanisms that will be triggered internally within Cassandra, depending on the configuration parameter values that are set for them either in cassandra.yaml file (for Hinted Hand-off) or in table definition (for Read-Repair). The last mechanism (Anti-Entropy Repair), however, needs to be triggered with manual intervention, by running “nodetool repair” command (possibly with various options associated with it).

Despite the “manual” nature of Anti-Entropy repair, it is nevertheless necessary, because the first two repair mechanisms cannot guarantee fixing all data consistency scenarios for Cassandra. For example, 1) what if a node is down longer than “max_hint_window_in_ms” (which defaults to 3 hours)? and  2) For deletes, Anti-Entropy repair has to be executed before “gc_grace_seconds” (defaults to 10 days) in order to avoid tombstone resurrection.

At the same time, however, due to how Anti-Entropy repair is designed and implemented (Merkle-Tree building and massive data streaming across nodes), it is also a very expensive operation and can cause a big burden on all hardware resources (CPU, memory, hard drive, and network) within the cluster. In this case, how to run Anti-Entropy repair effectively becomes a constant topic within Cassandra community.

In this post, I’d like to explore and summarize some of the techniques that can help achieve a more effective Anti-Entropy repair for Cassandra.

2. What is the issue of an Anti-Entropy Repair

Before Cassandra version 2.2, a sequential full Anti-Entropy repair is the default behavior when “nodetool repair” command (with no specific options) is executed. When this happens,

1) The node that initiates the operation, called coordinator node, scans the partition ranges that it owns (either as primary or replica) one by on. For each partition range, it sends the request to each of the peer/replica nodes to build a Merkle tree.

2) The peer/replica node scans all SSTables and a major, or validation, compaction is triggered, which reads every row in the SSTables, generates a hash for it, and then adds the result to a Merkle tree.

3) Once the peer node finishes building the Merkle tree, it sends the result back to the coordinator. The coordinator node compares every Merkle tree with all other trees. If difference is detected, data is exchanged between differing nodes.

Looking at this highly summarized procedure, there are a few things that immediately caught my attention:

First, because building Merkletree requires hashing every row of all SSTables,  it is a very expensive operation, stressing CPU, memory, and disk I/O.

Second, when Merkletree difference is detected, network bandwidth can be overwhelmed to stream the large amount of data to remote nodes.

Lastly, although not obvious, it is a worse problem in my opinion, which is that this operation can cause computation repetition and therefore waste resources unnecessarily. Please see this post for an example of how this can happen.

Based on the high cost related with it, a default (pre-Cassandra 2.2) sequential full Anti-Entropy repair is, in practice, rarely considered to be a routine task to run in production environment. Actually, a common trick that many people do with Cassandra repair is simply touching all data and let read-repair do the rest of work.

However, there do have situations when an Anti-Entropy repair is required, for example, to recover from data loss. What can we do in these cases?

3. What can help to achieve a more effective Anti-Entropy repair?

Over the time, different options for “nodetool repair” command have been introduced in different Cassandra versions. These options represent different techniques that can help achieve more effective Anti-Entropy repair, which I’ll go through in more details in this section. Meanwhile, I will also try to clarify some of the confusion that new Cassandra users might have around the different nodetool repair command options (and related jargon’s) of different Cassandra versions.

3.1 Primary range repair

In Cassandra, each node manages several (token) ranges of data. One of them is the primary range which is the token range that is assigned to the node according to the token setup within the ring. Other ranges of data are actually replica of primary ranges from other nodes. Running “nodetool repair” with option “-pr” (or “–partition-range”)  on a node means that the node only repairs the data of the primary range, but not other ranges managed on this node. The default behavior of a repair (without this option) is to repair all ranges of data managed by a node.

When using this option, it avoids the cost of doing Merkle tree calculation on non-primary range data. It also helps reduce excessive data streaming across the network. One caveat of using this option is that since each node only repairs one range of data that is managed by a node, this option needs to run on ALL nodes in the ring in order to repair all the data.

This option is available in very early release (0.x) of Cassandra.

3.2 Sequential repair

“nodetool repair” option “-snapshot” (or “–with-snapshot“) means a sequential repair (also called snapshot repair) and is a feature introduced in Cassandra version 1.2 and made as default in Cassandra version 2.0. Actually, in DataStax document for Cassandra 2.0 and later, you won’t find this option. In Cassandra 2.0 and beyond, you can specify option “-par” (or “–parallel”) to tell Cassandra to run in parallel. What does this really mean?

As we mentioned in section 3.1, each node manages several different ranges of data, either as primary or replica. When “-par” option is used, it instructs Cassandra to run repair on all these ranges at the same, which means the expensive repair operation (especially the Merkletree building part) happens on multiple nodes concurrently. This could be problematic and may slow down the entire cluster.

But when using “-snapshot” option (or default in Cassandra 2.0 and beyond), for each range of data, a snapshot is first taken and the repair operation is executed on the snapshot sequentially. This means that at any time for a given replica set of data, only one replica is being repaired, allowing other replica to satisfy the application requests in a more performant way.

3.3 Incremental repair

Since Cassandra version 2.1, a new option “-ic” (or “–incremental “) is introduced for incremental repair. Starting from Cassandra version 2.2, this option becomes the default option and at the same time, option “-full” (or “–full”) is used to specify a full repair.

The option of incremental repair is a great feature that will help overcome all 3 issues as listed in Section 2. Instead of building a Merkle tree out of all SSTables (repaired or not), this option only builds the tree out of un-repaired SSTables. Therefore,

  • The size of the Merkle tree to be built is way smaller and therefore requires much less computing resources
  • When doing Merkle tree comparison, less data will be compared and potentially streamed over the network
  • Already repaired data don’t need to be computed and compared again, thus avoiding a lot of unnecessary repetition.

There are a few things that need to pay attention to when using this option, please check my previous post  Apache Cassandra 2.1 Incremental Repair for more details.

Please also be noted that when running incremental repair in Cassandra 2.1 with Leveled Compaction Strategy (LCS), it may fail with RuntimeException (see CASSANDRA-9935 for more detail).

3.4 Sub-range repair

So far in this post when talking about repairing a range of data, it means the entire range (either primary or non-primary), which is sometimes also called as endpoint range. Beginning with Cassandra version 1.1.11, “nodetool repair” has options of “-st” (or “–start-token”) and “-et” (or “–end-token”) to specify a particular sub-range of data to repair.

Conceptually, sub-range repair is much like primary range repair, except that each sub-range repair operation focuses even smaller subset of data. So in general sub-range repair shares much of the pros and cons as primary range repair. One key benefit of using sub-range repair is the freedom of specifying the repair granularity through command line, which gives the DevOps team much more flexibility regarding how to set up a repair schedule to best match the cluster’s workload status. It is also doable to consider parallel running of multiple sub-range repairs on different sets of data at the same time, either on the same node, or on different nodes. This option has actually been deemed as one of the advanced repair techniques, as per post: Advanced repair techniques

Despite the flexibility of this technique, it has to be emphasized that no matter how the repair schedule is set for sub-range repairs, all ranges of data in the ring has to be repaired at least once before gc_grace_seconds limit is reached.

3.4.1 How to calculate valid sub-range token values

When invalid sub-range values are provided to the “-st” or “-et” option of “nodetool repair” command, most likely the command will fail and throw errors about “invalid tokens”. So in order to make sub-range repair work effectively, we need a systematic method that can generate valid sub-range token values.

We know that in Cassandra, data spans multiple nodes using a token ring. Each node is responsible for one or more slices of that ring. Each slice is a token range that has the start and end point. By this understanding, a natural way to generate valid sub-range token values would be: 1) find out all token ranges associated with one node in the entire ring; 2) for each token range associated with the node, divide the range into smaller chunks of sub-ranges. The start and end point of these sub-ranges would be valid values to be fed into the “-st” or “-et” option of “nodetool repair” command. The actual method and granularity to divide a node’s token range into smaller sub-ranges is where the flexibility comes from and can be adjusted accordingly to best suit the needs of the system.

There are different ways to find token ranges that are associated with a Cassandra node. Some of them are summarized below:

  1. Run “nodetool ring” command. “nodetool status” command will also do for single-token set-up.
  2. Run CQL query against “local” or “peers” tables in “system” keyspace to get the token values associated with a host.
  3. Cassandra client drivers provides APIs to get such information as well. For example, the “Metadata” class (in package com.datastax.driver.core) of the Java client API  provides the following two methods:
    * public Set<TokenRange> getTokenRanges() :
      Returns the token ranges that define data distribution in the ring.
    * public Set<TokenRange> getTokenRanges(String keyspace,Host host) :
      Returns the token ranges that are replicated on the given host, for the given keyspace.
  4. Based on the thrift Java API describe_splits call, an open-source utility tool called “cassandra-list-subranges” has been developed to list valid sub-range values for a specific keyspace and table on a particular Cassandra node. For details of this tool, please check the GitHub repository page at https://github.com/pauloricardomg/cassandra-list-subranges

Sub-range repair is available since Cassandra version 1.1.11.

3.4.2 DataStax Enterprise (DSE) repair service

If the Cassandra cluster is running under DataStax Enterprise version, the OpsCenter provides a “repair service” starting from version 4.0. The way that this service works is to continuously repairing small chunks of data (using sub-range repair) in a cluster in the background until the entire cluster is repaired and then it starts the next cycle of processing.

From the OpsCenter Window, there is only one parameter to specify the maximum amount of time it takes to repair the entire cluster once, which is typically less than gc_grace_seconds setting. The default is 9 days, compared with default 10 days of gc_grace_seconds setting.

More advanced setting, such as the parallelism of concurrent repairs, the maximum pending repairs allowed to be running on a node at one time, and etc. can be set either in opscenterd.conf file (for all clusters) or in cluster_name.conf file (for a specific cluster).

More detailed information about DSE repair service and its configuration can be found in the following documents:

Please be noted that DSE repair service, although convenient, is only available for DSE version and not for open source Cassandra version. It also lacks the capability to specify what keyspaces or tables  to repair, such as in “nodetool repair” tool.

3.5 Data-Center vs. Cluster wide repair

Since Cassandra version 1.2, Cassandra starts to provide options for “nodetool repair” to offer the following repair capabilities regarding repair locality/scope:

  • Cassandra 1.2: “-local” (or “–in-local-dc”), only repair nodes in the same data center as the node on which the “nodetool repair” command is executed.
  • Cassandra 2.0: “-dc <dc_name>” (or “–in-dc <dc_name>”), repairs nodes in the named data center
  • Cassandra 2.1:
    • “-dcpar” (or “–dc-parallel”), repairs data center in parallel;
    • “-hosts” (or “–in-hosts”), repairs specified hosts

When none of these options is specified, all data replica across the entire cluster is repaired.

4. Conclusion

In this post, we discussed in details some of the techniques that can help with a more effective Anti-Entropy repair in Cassandra. To summarize:

  • Primary range repair is recommended for maintenance purpose and when it is used, it has to be executed on all nodes in the ring one by one to make sure the whole data range in the ring is covered.
  • When available (Cassandra version 2.1 and beyond), incremental repair will bring the biggest benefit compared with other techniques. But please be aware of the caveats related with it. Please also be noted that with the bug as specified in CASSANDRA-9935, incremental repair may have issues with Level Compaction Strategy (LCS).
  • Sub-range repair can be very useful, especially when incremental repair is not an option. But please make sure that whatever schedule that is set up using this technique has to make sure all range of data to be finished within gc_grace_seconds limit.
  • If a repair is needed to recover from data loss, a sequential, full repair is needed. Neither primary range repair nor incremental repair technique works properly for this purpose. But due to the high cost associated with this operation, sometimes it might be faster to simply wipe out the data directory on a node and let it do bootstrapping.
Categories: DBA Blogs

9 Days of Learning About Engineered Advantages

APRIL 2016 ...

We share our skills to maximize your revenue!
Categories: DBA Blogs

java.lang nullpointerexception While Installing 12.1.0.2 RAC Database Software

Oracle in Action - Mon, 2016-04-18 03:41

RSS content

I have an Oracle  12.1.0.2c cluster having following 5 nodes :

  • Hub Nodes : host01, host02, host03
  • Leaf nodes: host04 and host05

I wanted to install RAC database software on the 3 hub nodes i.e. host01, host02 and host03.

I invoked the OUI as oracle user (Owner of Database Home)

[oracle@host01 database_12_1_0_2]$ ./runInstaller

and chose the option to install RAC database software only on the 3 hub nodes.

After all the pre-requisite checks were successful , I clicked the Install button.
I got the error: java.lang nullpointerexception

On clicking OK, the OUI aborted.

To troubleshoot, I ran the OUI in debug mode as :

[oracle@host01 database_12_1_0_2]$ ./runInstaller -debug -J-DTRACING.ENABLED=true -J-DTRACING.LEVEL=2 -J-DSRVM_TRACE_LEVEL=2 -J-DFULLTRACE

The trace file showed that the leaf node host05 was not responding:

[setup.flowWorker] [ 2016-04-18 14:33:22.771 IST ]
[CRSCTLUtil.getClusterNodeConfigRoles:1198] outputs[4] = CRS-4404: The following nodes did not reply within the allotted time:
[setup.flowWorker] [ 2016-04-18 14:33:22.772 IST ]
[CRSCTLUtil.getClusterNodeConfigRoles:1200] values length = 2
[setup.flowWorker] [ 2016-04-18 14:33:22.772 IST ]
[CRSCTLUtil.getClusterNodeConfigRoles:1198] outputs[5] = host05
[setup.flowWorker] [ 2016-04-18 14:33:22.772 IST ]
[CRSCTLUtil.getClusterNodeConfigRoles:1200] values length = 1

I realized that although I wanted to install database software on 3 nodes only, all the nodes
needed to be up and running whereas in my case, the node host05 was down at that time.

When I started the node host05 and reinvoked OUI, the database software was installed
successfully.

References:
http://www.hhutzler.de/blog/oracle-installer-and-rac-installation/



Tags:  

Del.icio.us
Digg

Comments:  3 comments on this item
You might be interested in this:  
Copyright © ORACLE IN ACTION [java.lang nullpointerexception While Installing 12.1.0.2 RAC Database Software], All Right Reserved. 2016.

The post java.lang nullpointerexception While Installing 12.1.0.2 RAC Database Software appeared first on ORACLE IN ACTION.

Categories: DBA Blogs

Online Relocation of Database File : ASM to FileSystem and FileSystem to ASM

Hemant K Chitale - Sun, 2016-04-17 09:44
There have been few published examples of the online datafile relocation feature in 12c.  The examples I've seen are on filesystem.

Here I show online relocation to/from ASM and FileSystem.

SQL> connect system/oracle
Connected.
SQL> create tablespace test_relocate;

Tablespace created.

SQL> create table test_relocate_tbl
2 tablespace test_relocate
3 as select * from dba_objects;

Table created.

SQL> select tablespace_name, bytes/1024
2 from user_segments
3 where segment_name = 'TEST_RELOCATE_TBL';

TABLESPACE_NAME BYTES/1024
------------------------------ ----------
TEST_RELOCATE 13312

SQL> select file_name, bytes/1024
2 from dba_data_files
3 where tablespace_name = 'TEST_RELOCATE';

FILE_NAME
--------------------------------------------------------------------------------
BYTES/1024
----------
+DATA/NONCDB/DATAFILE/test_relocate.260.909444793
102400


SQL>
SQL> alter database move datafile
2 '+DATA/NONCDB/DATAFILE/test_relocate.260.909444793'
3 to '/oradata/NONCDB/test_relocate_01.dbf';

Database altered.

SQL> !ls -l /oradata/NONCDB
total 102408
-rw-r----- 1 oracle asmdba 104865792 Apr 17 23:39 test_relocate_01.dbf

SQL> 
SQL> alter database move datafile  
2 '/oradata/NONCDB/test_relocate_01.dbf'
3 to '+DATA';

Database altered.

SQL> select file_name, bytes/1024
2 from dba_data_files
3 where tablespace_name = 'TEST_RELOCATE';

FILE_NAME
--------------------------------------------------------------------------------
BYTES/1024
----------
+DATA/NONCDB/DATAFILE/test_relocate.260.909445261
102400


SQL>
SQL> !ls -l /oradata/NONCDB
total 0

SQL>


Note that I was courageous enough to not use the KEEP keyword (which is optional !).
.
.
.

Categories: DBA Blogs

Links for 2016-04-16 [del.icio.us]

Categories: DBA Blogs

What’s Holding You Back From Giving Back?

Pythian Group - Fri, 2016-04-15 09:19

This week in honour of National Volunteer Week, I’m reflecting on the importance that volunteering has had in my life.

I’ve learned from each and every one of my experiences. From wrapping holiday presents at the mall, to helping source articles for an industry magazine, to wish granting, recruiting committee and board members, and providing HR advice and counsel. These experiences eventually led me to become a member of the board of directors for a number of organizations, and I was even named Board Chair.

Ironically, the rewards and benefits that I have received from the experiences far outweigh any amount of time I have given the many organizations I have supported over the years. Volunteering has provided me the opportunity to expand my skills and experience, and take on leadership roles long before I had the credentials to be hired for them. I initially started volunteering when I moved to Ottawa, and quickly learned that there is no better way to get to know your community, meet new people and expand your network. Once I started, I never looked back. I caught the “volunteer bug.” It is an important part of my life.

I am often asked how I find the time to volunteer. I always respond with, “like anything, if it’s important to you, you can and will find the time.” As I have expanded my family and career, I seek opportunities where I can continue to share my knowledge, skills and experience in ways that do not impede on either. A perfect example of this would be career mentoring. I have been a mentor for a number of organizations including the HRPA, OCISO, and the WCT. I have been fortunate to have great mentors in the past and now pay it forward. I remain connected with many of them.

In my role as VP of HR at Pythian I was thrilled to champion our Love Your Community Programs. These programs provide our employees in over 36 countries with a volunteer day and opportunities for sponsorship – i.e. raising money for causes that are meaningful to them. The programs have allowed Pythian the opportunity to positively impact the communities where our employees live.

Volunteer Week highlights the importance of volunteering in our communities, and showcases the impact it has on the lives of both the volunteer, and the communities they support. What’s holding you back from giving back?

And because it couldn’t be said any better: “We make a living by what we get, but we make a life by what we give.”

Winston Churchill

Categories: DBA Blogs

Optimizer Stew – Parsing the Ingredients to Control Execution Plans

Pythian Group - Fri, 2016-04-15 08:01

No matter how many times I have worked with Outlines, Baselines and Profiles, I keep having to look up reminders as to the differences between these three.

There is seemingly no end to articles to the number of articles and blog that tell you what needs to be licensed, how to use them, and which version of Oracle where each made its debut appearance.

This blog will discuss none of that.  This brief article simply shows the definitions of each from the Glossary for the most current version of the Oracle databases. As of this writing that version is 12.1.0.2.

And here they are.

Stored Outline

A stored outline is simply a set of hints for a SQL statement. The hints in stored outlines direct the optimizer to choose a specific plan for the statement.

Link to Stored Outline in the Oracle Glossary

SQL plan baseline

A SQL baselines is a set of one or more accepted plans for a repeatable SQL statement. Each accepted plan contains a set of hints, a plan hash value, and other plan-related information. SQL plan management uses SQL plan baselines to record and evaluate the execution plans of SQL statements over time.

Link to SQL Plan Baseline in the Oracle Glossary

SQL profile

A SQL profile is a set of auxiliary information built during automatic tuning of a SQL statement. A SQL profile is to a SQL statement what statistics are to a table. The optimizer can use SQL profiles to improve cardinality and selectivity estimates, which in turn leads the optimizer to select better plans.

Link to SQL Profile in the Oracle Glossary

Categories: DBA Blogs

Partner Webcast – Transition to the New Integration Model with Oracle SOA Cloud Service

Do you want to fully integrate your enterprise, using the same integration tool and skills for both cloud and on premises deployment? Oracle’s hybrid integration platform allows you to extract...

We share our skills to maximize your revenue!
Categories: DBA Blogs