Pythian Group

Subscribe to Pythian Group feed
Love Your Data
Updated: 11 hours 29 min ago

Enqueue Bytes – Is that a Pun?

Mon, 2016-11-07 13:53

Sometimes it is necessary to put on your uber-geek hat and start using cryptic bits of code to retrieve information from an Oracle database. Troubleshooting enqueue locking events in Oracle databases is one of the times some advanced SQL may be necessary.

Likely you have used SQL similar to the following when troubleshooting Oracle enqueue’s, probably in connection with row lock contention.

SQL# l
  1  SELECT
  2     s.username username,
  3     s.sid,
  4     e.event event,
  5     e.p1text,
  6     e.p1,
  7     e.state
  8  FROM v$session s, v$session_wait e
  9  WHERE s.username IS NOT NULL
 10     AND s.sid = e.sid
 11     AND e.event LIKE '%enq:%'
 12* ORDER BY s.username, UPPER(e.event)
 
USERNAME    SID EVENT                          P1TEXT                P1 STATE
---------- ---- ------------------------------ ---------- ------------- ----------
JKSTILL      68 enq: TX - ROW LOCK contention  name|mode     1415053318 WAITING
 
1 ROW selected.

The value for P1 is not very useful as is; Oracle has encoded the type of enqueue and the requested mode into the column. When working with current events such as when selecting from v$session, it is simple to determine the type of lock and the mode requested by querying v$lock, such as in the following example:

  1* SELECT sid, TYPE, request, block FROM v$lock WHERE sid=68 AND request > 0
SQL# /
 
 SID TY    REQUEST      BLOCK
---- -- ---------- ----------
  68 TX          6          0
 
1 ROW selected.

Session 68 is waiting on a TX enqueue with requested lock mode of 6. Seasoned Oracle DBA’s will recognize this as classic row lock contention.

Why bother to find out just which type of enqueue this is? There are many types of locks in Oracle, and they occur for differing reasons. The TX lock is interesting as it can occur not only in Mode 6 but Mode 4 as well; Mode 4 refers to locks that involve unique keys, such as when 2 or more sessions try to insert the same value for a primary key. The following example shows just that:

SQL# @s
 
USERNAME    SID EVENT                          P1TEXT                P1 STATE
---------- ---- ------------------------------ ---------- ------------- ----------
JKSTILL      68 enq: TX - ROW LOCK contention  name|mode     1415053316 WAITING
 
 
1 ROW selected.
 
SQL# @l
 
 SID TY    REQUEST      BLOCK
---- -- ---------- ----------
  68 TX          4          0
 
1 ROW selected.

Knowing just which lock mode is requested is vital, as the troubleshooting for TX Mode 4 locks will be different from what is used to troubleshoot Mode 6.

Though we can find the lock name and mode information in v$lock, there is still value in being able to decipher that cryptic P1 column.

The ASH and AWR facilities do not include any historical information for the lock name and mode; the P1 column found in v$active_session_history and dba_hist_active_sess_history does not have a corresponding dba_hist_lock view. Any research done after an event has occurred does require decoding this information.

Deciphering v$session.p1

Oracle Support document 34566.1 is the enqueue reference note that provides information needed to get the lock name and mode from the p1 column. As you will see this information is a bit puzzling.

The rest of this article will focus on TX Mode 6 locks. The value shown for this lock and mode in the P1 column is always 1415053318. Following is the SQL recommended by Oracle:

 SELECT chr(to_char(bitand(p1,-16777216))/16777215)||
         chr(to_char(bitand(p1, 16711680))/65535) "Lock",
         to_char( bitand(p1, 65535) )    "Mode"
    FROM v$session_wait
   WHERE event = 'enqueue'

As I currently have some planned row lock contention in a test database we can run this query:

  1   SELECT chr(to_char(bitand(p1,-16777216))/16777215)||
  2           chr(to_char(bitand(p1, 16711680))/65535) "Lock",
  3           to_char( bitand(p1, 65535) )    "Mode"
  4      FROM v$session_wait
  5*    WHERE event LIKE '%enq:%'
SQL# /
 
Lo Mode
-- ----------------------------------------
TX 4

Probably it is not very clear why this SQL works. Let’s try and understand it.
(Note that a small change had to be made to the WHERE clause.)

Converting the P1 value to hex may be helpful

1415053318 = 0x54580006

The two lower order bytes represent the lock mode that has been requested. This can be seen here to be 0x06, which is simple translate to decimal 6 (I can do this one in my head)

The next two bytes are also in hex and represent the two letters of the lock name.

0x54 = 84 = ‘T’
0x58 = 88 = ‘X’

Using string functions it is simple to extract the values from the hex string, convert them to numbers and retrieve the lock name and mode.

SQL# define p1 = 1415053318
 
SQL# l
  1  WITH hex AS (
  2     SELECT TRIM(to_char(&p1,'XXXXXXXXXXXXXXXX')) hexnum FROM dual
  3  ),
  4  hexbreak AS (
  5     SELECT hexnum
  6        , to_number(substr(hexnum,1,2),'XXXXXXXX') enq_name_byte_1
  7        , to_number(substr(hexnum,3,2),'XXXXXXXX') enq_name_byte_2
  8        , to_number(substr(hexnum,5),'XXXXXXXX') enq_mode
  9  FROM hex
 10  )
 11  SELECT
 12     hexnum
 13     , chr(enq_name_byte_1)
 14     || chr(enq_name_byte_2) enqueue_type
 15     , enq_mode
 16* FROM hexbreak
SQL# /
 
HEXNUM            EN   ENQ_MODE
----------------- -- ----------
54580006          TX          6

While that does work, my inner geek wants to investigate those bitmasks and find out why they work. Next are the bitmasks in decimal along with the hex equivalent.

-16777216 = 0xFFFFFFFFFF000000
 16777215 = 0xFFFFFF
 16711680 = 0xFF0000
    65535 = 0xFFFF

The bitand function is used to mask all unwanted bits to 0. The number is then divided by value needed to remove all of the now-zeroed-out lower order bytes.

The values being used as bitmasks are -16777216 and 16711680. The use of -16777216 does not seem to make sense. As the intent is to mask all but one byte, I would expect to find an FF surrounded by a number of zeroes. The bit mask of 16711680, however, looks fine.

Now let’s run the Oracle support query again, but modified to show just the integer values rather than converting them to ASCII.

 
  1  SELECT bitand(p1,-16777216)/16777215,
  2           bitand(p1, 16711680)/65535,
  3           bitand(p1, 65535)
  4      FROM v$session_wait
  5*    WHERE event LIKE '%enq:%'
SQL# /
 
BITAND(P1,-16777216)/16777215 BITAND(P1,16711680)/65535 BITAND(P1,65535)
----------------------------- ------------------------- ----------------
                    84.000005                88.0013428                6

Well, that is interesting. An implicit conversion is taking place with to_char() that is removing the decimal portion of these numbers. Is that being done with trunc(), round(), or something else? I don’t know the answer to that. What seems more important is just doing the math correctly.

There are a couple of things here that can be changed to make this work as expected.

A New BitMask

Let’s modify the first bitmask to something that seems more reasonable than -16777216. Let’s use this instead, as it masks only the single byte we need:

4278190080 = 0xFF000000

Lets’ try it out:

SQL# l
  1  SELECT bitand(p1,4278190080)/16777215,
  2           bitand(p1, 16711680)/65535,
  3           bitand(p1, 65535)
  4      FROM v$session_wait
  5*    WHERE event LIKE '%enq:%'
SQL# /
 
BITAND(P1,4278190080)/16777215 BITAND(P1,16711680)/65535 BITAND(P1,65535)
------------------------------ ------------------------- ----------------
                     84.000005                88.0013428                6

While the new bitmask didn’t break anything, it does not appear to have helped either.

Off By One Error

The solution is to consider the divisors used to remove the lower order zeroes; each of them is off by one. That is easy enough to verify:

SQL# l
  1  SELECT bitand(p1,4278190080)/16777216,
  2           bitand(p1, 16711680)/65536,
  3           bitand(p1, 65535)
  4      FROM v$session_wait
  5*    WHERE event LIKE '%enq:%'
SQL# /
 
BITAND(P1,4278190080)/16777216 BITAND(P1,16711680)/65536 BITAND(P1,65535)
------------------------------ ------------------------- ----------------
                           84                        88                6

Ah, that did it! But what was the problem previously?

Old Divisor Values

The original divisors are off by 1, which does not completely remove the lower order values.

 16777215 = 0xFFFFFF
    65535 = 0xFFFF

Increasing each by one has the desired effect.

New Divisor Values
 16777216 = 0x1000000
    65536 = 0x10000
Conclusion

Those odd bitmasks have been in the back of my mind for some time, and today it seemed like a good idea to dig in and find out more about them. It isn’t too hard to imagine that in some cases the wrong values might be returned, leading to some long and unproductive troubleshooting sessions.

There is a demo script enqueue-bitand.sql containing much of the SQL found in this article. There is also a script awr-top-sqlid-events.sql that incorporates the enqueue lock decoding. This script could be made better than it is, so please issue a pull request if you have some useful modifications.

Categories: DBA Blogs

Distinguish Real SQL Execution Plans from Fake Ones!

Thu, 2016-11-03 10:59

Distinguish Real Execution Plans from Fake Ones!

As an Oracle DBA, one of our daily tasks is to optimize bad SQL statements that are affecting the system and causing performance degradation. First we identify the culprit SQL, then we extract the execution plan, after that, we start the cycle of SQL tuning and optimization as appropriate.

 

There are many methods to extract the execution plan for a specific SQL statement, however, not all these methods will provide the actual or real plan that optimizer follows, rather than a suggestion or expectation of the execution plan, which is not always accurate and can be misleading.

 

In the following blog, I will demonstrate various execution plan extraction methods,  and create a scenario to test these methods then see which ones provide the actual execution plan that optimizer really follows to execute a specific SQL, and which methods provide a suggestion of the execution plan, that is not necessarily used by Oracle optimizer.

 

So first things first, what is an “execution plan”?

 

An execution plan is a set of steps that the optimizer performs when executing a SQL statement and performing an operation.

 

There are many ways to extract the execution plan for a specific SQL, like:

  • Enterprise Manager
  • SQL*Plus AUTOTRACE
  • EXPLAIN PLAN command
  • SQL Trace (event 10046) with tkprof
  • DBMS_XPLAN  package to view plans from:
  • Automatic Workload Repository
  • V$SQL_PLAN
  • SQL Tuning Sets
  • Plan table

 

Some of these methods will provide the “actual” execution plan, while other methods will provide a “suggestion” of the execution plan, that is, the steps that Oracle expects optimizer to follow in order to execute a SQL statement, which may not always be true.

 

In this test case, I will create a table with two columns, and insert skewed data into one column, then I will build an index and histogram on that column. I will then query that column using bind variables and see if Oracle will use the index or not.

 

It should be mentioned that any method that provides suggestion of the execution plan, is just a synonym of EXPLAIN PLAN command, and because this command does not use the feature of Bind Variable Peeking, it will not generate optimal plan for each different value in this test case, and that will be the crucial bit that this test case depends on.

So… without further ado, let’s see that in action:

 

>  Environment preparation for scenario testing:

[oracle@testhost ~]$ sqlplus / as sysdba


SQL> alter system flush shared_pool;

System altered.


SQL> alter system flush buffer_cache;

System altered.


SQL> drop user xp_test cascade;

User dropped.


SQL> grant dba to xp_test identified by xp_test;

Grant succeeded.


SQL> conn xp_test/xp_test


SQL> create table xp_table (emp_id varchar2(10), team varchar2(10)) pctfree 99;

Table created.


SQL> insert into xp_table select * from (SELECT 'EMP' || level , 'TEAM1' from dual connect by Level = 1000); 

1000 rows created. 


SQL> insert into xp_table select * from (SELECT 'EMP' || (level + 1000) , 'TEAM2' from dual connect by Level = 10); 

10 rows created. 


SQL> commit;

Commit complete.


SQL> create index index_test on xp_table (team);

Index created.


SQL> exec dbms_stats.gather_table_stats('XP_TEST','XP_TABLE',method_opt='FOR ALL COLUMNS SIZE AUTO FOR COLUMNS SIZE 254 TEAM',cascade=TRUE);

PL/SQL procedure successfully completed.


SQL> select TABLE_NAME,COLUMN_NAME,HISTOGRAM from dba_tab_columns where table_name = 'XP_TABLE';

TABLE_NAME                   COLUMN_NAME                 HISTOGRAM
--------------------   ------------------------------ ---------------
XP_TABLE                   TEAM                           FREQUENCY
XP_TABLE                   EMP_ID                         NONE


SQL> select count(*),team from xp_table group by team order by 2 asc;

  COUNT(*)            TEAM
   ----------      ----------
    1000             TEAM1
     10              TEAM2

Ok, We have our environment ready for testing, let’s test each execution plan gathering method. First, let’s extract the actual execution plan:

 

> DBMS_XPLAN.DISPLAY_CURSOR:

 

Using this method, we will extract the execution plan directly from the shared pool, which will always be the real execution plan:

-- Define a bind variable:

SQL> variable x varchar2(10);  
SQL> exec :x:='TEAM2';

PL/SQL procedure successfully completed.


SQL> set lines 200    
SQL> col PLAN_TABLE_OUTPUT for a100
SQL> select count(emp_id) from xp_table where team = :x;

COUNT(EMP_ID)
-------------
     10


SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID    31gbpz4ncsvp3, child number 0
-------------------------------------
select count(emp_id) from xp_table where team = :x

Plan hash value: 726202289
-------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time      |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      |      |    4 (100)|      |
|   1 |  SORT AGGREGATE          |          |    1 |    13 |           |      |

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
|   2 |   TABLE ACCESS BY INDEX ROWID| XP_TABLE   |    10 |   130 |    4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN         | INDEX_TEST |    10 |      |    1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("TEAM"=:X)

20 rows selected.

We see clearly that optimizer accessed table using index range scan, which is good, so now are 100% sure that the real execution plan used an INDEX RANGE SCAN, let’s compare this result with other results.

 

> SQL*Plus AUTOTRACE:

 

Autotrace is very useful way to get SQL statistics, but will it provide the real execution plan?

SQL> set autotrace on explain;


SQL> select count(emp_id) from xp_table where team = :x;

COUNT(EMP_ID)
-------------
      10

Execution Plan
----------------------------------------------------------
Plan hash value: 3545047802

-------------------------------------------------------------------------------
| Id  | Operation       | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |    13 |   102    (0)| 00:00:02 |
|   1 |  SORT AGGREGATE    |          |     1 |    13 |        |          |
|*  2 |   TABLE ACCESS FULL| XP_TABLE |   505 |  6565 |   102    (0)| 00:00:02 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TEAM"=:X)

SQL> set autotrace off

Ops!, AUTOTRACE suggests that optimizer will use a full table scan to access the table, which is different from the execution plan provided in previous step, so AUTOTRACE will provide a suggestion of the execution plan, which may not always be true, now let’s continue with the next method.

 

> EXPLAIN PLAN:

 

Explain Plan command will provide the execution plan without even running the SQL, so we can derive easily that we will get a suggestion of the execution plan:

SQL> explain plan for select count(emp_id) from xp_table where team = :x;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3545047802

-------------------------------------------------------------------------------
| Id  | Operation       | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |    13 |   102    (0)| 00:00:02 |
|   1 |  SORT AGGREGATE    |          |     1 |    13 |        |          |
|*  2 |   TABLE ACCESS FULL| XP_TABLE |   505 |  6565 |   102    (0)| 00:00:02 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

   2 - filter("TEAM"=:X)

14 rows selected.

Confirmed, this method also suggests that optimizer will do full table scan, which is not true as we know from the first method, let’s continue.

 

> SQL Trace (event 10046) with tkprof

 

SQL Trace (event 10046) with tkprof will always provide the real execution plan, optionally, we can use the keyword “explain” with tkprof to also include the suggested plan as well:

SQL> alter session set tracefile_identifier='xp_test';

Session altered.


SQL> alter session set events '10046 trace name context forever, level 20';

Session altered.


SQL> select count(emp_id) from xp_table where team = :x;

COUNT(EMP_ID)
-------------
       10

SQL> alter session set events '10046 trace name context off';

Session altered.


SQL> select value from v$diag_info where name like '%Trace%';

VALUE
------------------------------------------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/orcl/orcl/trace
/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_24595_xp_test.trc

SQL> !


$ tkprof /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_24595_xp_test.trc sys=no explain=xp_test/xp_test output=/tmp/with_explain


$ cat /tmp/with_explain.prf


Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=5 pr=0 pw=0 time=67 us)
        10         10         10   TABLE ACCESS BY INDEX ROWID XP_TABLE (cr=5 pr=0 pw=0 time=60 us cost=4 size=130 card=10)
        10         10         10    INDEX RANGE SCAN INDEX_TEST (cr=2 pr=0 pw=0 time=44 us cost=1 size=0 card=10)(object id 81349)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
      1   SORT (AGGREGATE)
     10    TABLE ACCESS   MODE: ANALYZED (FULL) OF 'XP_TABLE' (TABLE)

As we can see, tkprof output shows both real and suggested plans.

So far, we have tested four execution plan methods, AUTOTRACE and EXPLAIN PLAN methods which will provide suggestions of the execution plan, DBMS_XPLAN.DISPLAY_CURSOR which will always provide the real execution plan, and SQL Trace which can provide both suggested and real execution plan.

 

> Creating SQL Plan Baseline and repeating the test for AUTOTRACE and EXPLAIN PLAN:

-- Checking for SQL plan baselines:


SQL> select sql_handle, plan_name, sql_text, enabled, accepted, fixed from dba_sql_plan_baselines;

no rows selected


--Let’s create a baseline:

SQL> show parameter baseline 

NAME                                  TYPE         VALUE
------------------------------------ ----------- ------------------------------
optimizer_capture_sql_plan_baselines  boolean       FALSE
optimizer_use_sql_plan_baselines      boolean       TRUE


SQL> alter session set optimizer_capture_sql_plan_baselines = TRUE; 

Session altered.


SQL> select count(emp_id) from xp_table where team = :x;

COUNT(EMP_ID)
-------------
       10


SQL> select sql_handle, plan_name, sql_text, enabled, accepted, fixed from dba_sql_plan_baselines;

SQL_HANDLE                PLAN_NAME                           SQL_TEXT                                              ENA    ACC   FIX
----------------------   ---------------------------------   ----------------------------------------------------  ------ ----- ------
SQL_65dc367505be1804      SQL_PLAN_6br1qfn2vw604bc04bcd8      select count(emp_id) from xp_table where team = :x    YES    YES    NO


SQL> alter session set optimizer_capture_sql_plan_baselines = FALSE;

Session altered.


SQL> alter system flush buffer_cache;

System altered.


SQL> alter system flush shared_pool;

System altered.

 

> SQL*Plus AUTOTRACE

 

Trying AUTOTRACE tool after creating SQL Plan Baseline:

SQL> set autotrace on explain; 


SQL> select count(emp_id) from xp_table where team = :x;

COUNT(EMP_ID)
-------------
      10


Execution Plan
----------------------------------------------------------
Plan hash value: 726202289

-------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time      |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |    1 |    13 |   129   (0)| 00:00:02 |
|   1 |  SORT AGGREGATE          |          |    1 |    13 |           |      |
|   2 |   TABLE ACCESS BY INDEX ROWID| XP_TABLE   |   505 |  6565 |   129   (0)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN         | INDEX_TEST |   505 |      |    2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("TEAM"=:X)

Note
-----
   - SQL plan baseline "SQL_PLAN_6br1qfn2vw604bc04bcd8" used for this statement

SQL> set autotrace off

Notice that AUTOTRACE this time is aware about the baseline, hence, it is showing that optimizer will perform index range scan. Next, Let’s try EXPLAIN PLAN command:

 

> EXPLAIN PLAN:

 

Trying EXPLAIN PLAN tool after creating SQL Plan Baseline:

SQL> explain plan for select count(emp_id) from xp_table where team = :x;

Explained.


SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 726202289

-------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time      |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |    1 |    13 |   129   (0)| 00:00:02 |
|   1 |  SORT AGGREGATE          |          |    1 |    13 |           |      |
|   2 |   TABLE ACCESS BY INDEX ROWID| XP_TABLE   |   505 |  6565 |   129   (0)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN         | INDEX_TEST |   505 |      |    2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------


PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("TEAM"=:X)

Note
-----
   - SQL plan baseline "SQL_PLAN_6br1qfn2vw604bc04bcd8" used for this statement

19 rows selected.

As we can see, EXPLAIN PLAN is aware of the baseline and showing that optimizer will perform index scan.

 

Summary:

There are many methods to extract the execution plan, some methods will provide the actual execution plan, while other methods will provide just a suggestion that optimizer may not follow in real time. In this test case we have tested four methods for execution plan extraction (DBMS_XPLAN, AUTOTRACE, EXPLAIN PLAN, and SQL Trace), we have then created an SQL Baseline and performed the test again on some methods to confirm that all methods are always aware of the SQL Baseline.
Categories: DBA Blogs

AWS DMS Tips for Oracle Migrations

Thu, 2016-10-27 10:54

Some time ago I published a post regarding the Database Migration Service (DMS) on AWS. The service had just appeared at that time and looked very promising. It is still appealing and has a good use and potential to mature and be better. I was able to test and try it in several cases moving data to AWS and between RDS databases inside AWS. During those exercises, I ran into some issues and wanted to make other people aware of some things to keep in mind when starting or planning to use the DMS. Most of my experience with DMS is related to migrations to and from Oracle databases. So the following tips are about Oracle migrations and replications.

Before planning any kind of a logical replication based on transaction logs please check what kind of data types you have and whether you have primary keys on all your replicated tables. The primary key existence in some cases is not only desirable but is required to properly replicate the objects. As an example, if you want to replicate a table with some lob objects using DMS you must have a primary key for the table because if you don’t your lob columns will be excluded from replication and you end up with null values instead of the lobs on the target database.

Check for datatypes used for the tables. All logical replications have some limitations in terms of replicated data types. You may find some of your tables can be replicated with some limitations or not replicated at all.

Let’s speak about DMS. When you plan your DMS replication tasks keep in mind that you can combine replication of several schemas to one task. It may significantly reduce load on the source system. Each migration or replication task may apply a considerable load to the source system. In my experience, we hit 100% CPU and max IO load running 8 replication tasks on the source m3.large system.

Remember you cannot change the created task. You are able to stop, start or resume the task but you cannot alter any rules or parameters for the task. It will maybe change soon but currently, it is not possible yet.

If you choose the truncate option for your migration and replication task it may change your metadata. You may find your indexes and constraints to be renamed and you even can lose some of the indexes. In my case, I got renamed primary key and unique key constraints and couple of indexes were lost. Please be careful. After that case, I tried to choose “do nothing” mode and do everything by myself preparing the target for data migration and replication.

You can use RDS snapshot to instantiate your data if you plan to setup a replication between two RDS instances of the same type. In our case, it was done to minimize downtime when the database was migrated to an encrypted storage. When you use RDS snapshot you can use snapshot creation time as “Custom CDC start time” for your DMS replication task.

If you use one universal user for your endpoints to replicate multiple schemas you will need to use transformation rules because the replication will try to use schema from endpoint as destination target by default. By other words, you set up a target endpoint using user “DMS” and try to replicate schema SCOTT it will use schema “DMS” as a destination by default if you don’t have the transformation rules.

You have to enable minimal supplemental logging on database level and supplemental logging for all columns explicitly for each replicated table. Even you enable supplemental logging for all columns on database level using “alter database” you still need to add it on table level. The DMS task will be aborted without it complaining about the lack of supplemental logging for all column for the table.

If you create a table in one of your replicated schema you need to add a clause for supplemental logging like “CREATE TABLE …ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; ” . In that case, the table and data will be replicated.

It is better to use AWS cli to get most of the service since all new features and additions will be in cli first. As an example, if you want a debug logging for your replication you have to use AWS cli.

If you have only unique index on a table and it is function based index your data migration and replication task can fail. Also, it may fail if you have more than one unique index on a table.

I hit some issues with monitoring. The idea looks good but it requires some fixing. It looks like it doesn’t work correctly in Firefox and Safari. At least for me, it was not working right.

The status of a task may not tell you everything. Sometimes it shows state “ERROR” but nevertheless, it works and replicates data behind the scenes. So, it can be bit misleading. I look to statistics and monitoring pages for the task to get the full picture.

As a summary, I can say the service deserves attention and can be considered as a valuable option when you plan your migration strategy and AWS DMS team works hard to make it better. Happy migrations!

Categories: DBA Blogs

While upgrading to 12.1.0.2 I faced error ORA-01830 / ORA-06512

Thu, 2016-10-20 10:35

The other day I was running an upgrade for a client that is using ACLs ( Access Control Lists) from 11.2.0.3 to 12.1.0.2. If you have been doing upgrades to 12c, you know that when running the catctl.pl -n 4 catupgrd.sql it entails 73 steps. So this upgrade failed in step 65 with the following error (I have trimmed the output for reading purposes) :

Serial   Phase #:65 Files: 1 Calling sqlpatch with LD_LIBRARY_PATH=/u01/dev/product/12.1.0/lib; export LD_LIBRARY_PATH;/u01/dev/product/12.1.0/perl/bin/perl -I /u01/dev/product/12.1.0/rdbms/admin -I /u01/dev/product/12.1.0/rdbms/admin/../../sqlpatch /u01/dev/product/12.1.0/rdbms/admin/../../sqlpatch/sqlpatch.pl -verbose -upgrade_mode_only > catupgrd_datapatch_upgrade.log 2> catupgrd_datapatch_upgrade.err
returned from sqlpatch
    Time: 80s
Serial   Phase #:66 Files: 1     Time: 71s
Serial   Phase #:67 Files: 1     Time: 1s
Serial   Phase #:68 Files: 1     Time: 0s
Serial   Phase #:69 Files: 1     Time: 20s

Grand Total Time: 4946s

*** WARNING: ERRORS FOUND DURING UPGRADE ***
...
REASON:
catuppst.sql unable to run in Database: DEVSTAR Id: 0
        ERRORS FOUND: during upgrade CATCTL ERROR COUNT=5
------------------------------------------------------
Identifier XDB 16-09-25 12:27:05 Script = /u01/dev/product/12.1.0/rdbms/admin/
ERROR = [ORA-01830: date format picture ends before converting entire input string ORA-06512: at "SYS.XS_OBJECT_MIGRATION", line 167
ORA-06512: at line 28
ORA-06512: at line 69
]
------------------------------------------------------
------------------------------------------------------
Identifier XDB 16-09-25 12:27:05 Script = /u01/dev/product/12.1.0/rdbms/admin/
ERROR = [ORA-06512: at "SYS.XS_OBJECT_MIGRATION", line 167 ORA-06512: at line 28
ORA-06512: at line 69
]
STATEMENT = [as above]
------------------------------------------------------
------------------------------------------------------
Identifier XDB 16-09-25 12:27:05 Script = /u01/dev/product/12.1.0/rdbms/admin/
ERROR = [ORA-06512: at line 28 ORA-06512: at line 69
]
STATEMENT = [as above]
------------------------------------------------------
------------------------------------------------------
Identifier XDB 16-09-25 12:27:05 Script = /u01/dev/product/12.1.0/rdbms/admin/
ERROR = [ORA-06512: at line 69]
STATEMENT = [as above]
------------------------------------------------------
------------------------------------------------------
Identifier ORDIM 16-09-25 12:28:53 Script = /u01/dev/product/12.1.0/rdbms/admin/
ERROR = [ORA-20000: Oracle XML Database component not valid. Oracle XML Database must be installed and valid prior to Oracle Multimedia install, upgrade, downgrade, or patch.
ORA-06512: at line 3
]

And the worst part of it was that the upgrade also corrupted my database , also a good point to stress out , have a good backup before attempting to do an upgrade

Sun Sep 25 13:55:52 2016
Checker run found 59 new persistent data failures
Sun Sep 25 14:00:18 2016
Hex dump of (file 5, block 1) in trace file /u01/app/diag/rdbms/dev/dev/trace/de_ora_13476.trc
Corrupt block relative dba: 0x01400001 (file 5, block 1)
Bad header found during kcvxfh v8
Data in bad block:
 type: 0 format: 2 rdba: 0x01400001
 last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000001
 check value in block header: 0xa641
 computed block checksum: 0x0
Reading datafile '/u01/dev/oradata/dev/system_01.dbf' for corruption at rdba: 0x01400001 (file 5, block 1)
Reread (file 1, block 1) found same corrupt data (no logical check)

So what I had to do was a restore of my database before the upgrade, as I couldn’t even do a flashback due to the corrupt block.

But to fix this error, I had to apply the patch 20369415 to the 12c binaries before I ran the catupgrd.sql

[oracle@dev 20369415]$ opatch lsinventory | grep 20369415
Patch  20369415     : applied on Sun Sep 25 14:49:59 CDT 2016
     20369415

Once the patch was applied , I reran the upgrade, and now it finished successfully

Serial   Phase #:65      Files: 1     Time: 133s
Serial   Phase #:66      Files: 1     Time: 78s
Serial   Phase #:68      Files: 1     Time: 0s
Serial   Phase #:69      Files: 1     Time: 275s
Serial   Phase #:70      Files: 1     Time: 171s
Serial   Phase #:71      Files: 1     Time: 0s
Serial   Phase #:72      Files: 1     Time: 0s
Serial   Phase #:73      Files: 1     Time: 20s

------------------------------------------------------
Phases [0-73]         End Time:[2016_09_26 17:42:54]
------------------------------------------------------

Grand Total Time: 5352s

LOG FILES: (catupgrd*.log)
...
COMP_ID              COMP_NAME                                VERSION  STATUS
-------------------- ---------------------------------------- -------- ---------------
APEX                 Oracle Application Express               4.2.5.00 VALID
                                                              .08

OWB                  OWB                                      11.2.0.3 VALID
                                                              .0

AMD                  OLAP Catalog                             11.2.0.3 OPTION OFF
                                                              .0

SDO                  Spatial                                  12.1.0.2 VALID
                                                              .0

ORDIM                Oracle Multimedia                        12.1.0.2 VALID
                                                              .0

XDB                  Oracle XML Database                      12.1.0.2 VALID
                                                              .0

CONTEXT              Oracle Text                              12.1.0.2 VALID
                                                              .0

OWM                  Oracle Workspace Manager                 12.1.0.2 VALID
                                                              .0

CATALOG              Oracle Database Catalog Views            12.1.0.2 VALID
                                                              .0

CATPROC              Oracle Database Packages and Types       12.1.0.2 VALID
                                                              .0

JAVAVM               JServer JAVA Virtual Machine             12.1.0.2 VALID
                                                              .0

XML                  Oracle XDK                               12.1.0.2 VALID
                                                              .0

CATJAVA              Oracle Database Java Packages            12.1.0.2 VALID
                                                              .0

APS                  OLAP Analytic Workspace                  12.1.0.2 VALID
                                                              .0

XOQ                  Oracle OLAP API                          12.1.0.2 VALID
                                                              .0

Conclusion

This was a small post to make you aware that if you are using ACLs , you need to run the patch 20369415 to the 12c binaries so that you don’t have to face a possible database corruption and have a harder time upgrading your database.

Note: This post was originally posted in rene-ace.com

Categories: DBA Blogs

Oracle Service Secrets: Migrate Transparently

Mon, 2016-09-12 15:02

Databases or schemas tend to get moved around between different servers or even datacenters for hardware upgrades, consolidations or other migrations. And while the work that needs to be done is pretty straight forward for DBAs, I find the most annoying aspect of that is updating all client connect strings and tns entries used with new IP addresses and – if not using services – also the SID as the instance name might have changed.

That process can be simplified a lot when following a simple good practice of creating an extra service for each application or schema and along with that service also a DNS name for that IP. With that in place, a database can be migrated without the need to touch client connection parameters or tns aliases. All that is needed will be to migrate the database or schema, create the service name on the new instance and update the DNS record to the new machine.

Demo

Here is an example. I am migrating a schema from an 11g single instance on my laptop to a RAC database in the oracle public cloud. I am connecting to that database with blog_demo.pythian.com both as the hostname (faked through /etc/hosts instead of proper DNS for this demo) and the service name. As an application I am connecting to the database with sqlcl and a static connection string. Just remember that the whole, and only point of this demo is to migrate the schema without having to change that connect string.

brost$ ping -c 1 blog_demo.pythian.com
PING blog_demo.pythian.com (192.168.78.101): 56 data bytes
64 bytes from 192.168.78.101: icmp_seq=0 ttl=64 time=0.790 ms

brost$ ./sqlcl/bin/sql brost/******@blog_demo.pythian.com/blog_demo.pythian.com

SQLcl: Release 4.2.0.16.175.1027 RC on Mon Sep 05 17:50:11 2016

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options


SQL> select instance_name from v$instance;

INSTANCE_NAME   
----------------
ORCL        

Next I migrated my schema with datapump and imported to a PDB running on a 12c RAC database.

Then added the service name BLOG_DEMO to PDB1 on the database ORCL42.

$ srvctl add service -db orcl42 -pdb pdb1 -service blog_demo -preferred orcl421,orcl422
$ srvctl start service -db orcl42 -service blog_demo

Updated the DNS or as in this simplified demo my /etc/hosts and now I can connect with the same connection string. Note that the IP, the instance_name and the version have changed without the need to modify the connection string.

brost$ ping -c 1 blog_demo.pythian.com
PING blog_demo.pythian.com (140.86.42.42): 56 data bytes

brost$ ./sqlcl/bin/sql brost/******@blog_demo.pythian.com/blog_demo.pythian.com

SQLcl: Release 4.2.0.16.175.1027 RC on Mon Sep 05 18:05:11 2016

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Mon Sep 05 2016 18:04:50 +02:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
and Real Application Testing options

SQL> select instance_name from v$instance;
INSTANCE_NAME   
----------------
orcl421

Note that with a proper DNS and a RAC target you would want to create A-records for the 3 SCAN IPs.

Other posts in this series

You can watch me talk briefly about this and other things that you can do with properly configured services in the video below or follow the links to other parts in this series.

tl;dr

When creating services for your applications to connect to a schema, also create a DNS entry for that and use this DNS name and the service for all client and application connections instead of using the hostname and SID. This might initially look like overhead but allows for flexibility when migrating schemas or databases to other systems. Updating DNS and creating a new service on the target machine can be changed in central places and saves updating potentially hundreds of client connect strings or tnsnames across the enterprise.

Categories: DBA Blogs

Oracle Service Secrets: quiesce tactically

Fri, 2016-09-02 10:18

In the last post of this series about Oracle net services, I talked about how services can help you identify performance issues faster and easier by tagging connections with service names. Today I am introducing you to the idea of temporarily disabling connections during maintenance with the help of services.

During deployments, testing or reorganizations it might be necessary to prevent clients from connecting to the database while still allowing access for DBAs to do their work. Some methods to do this include temporarily locking application user accounts or putting the database in quiesce mode. But with services, you now also have a more tactical approach to this issue.

My example assumes a single instance with two services DEMO_BATCH and DEMO_OLTP. And let’s assume that we need to temporarily disable batch services, maybe just to reduce system load due to those activities or maybe because we are reorganizing the objects used by the batch processes.

To disable a service in a single instance we can either remove it from the SERVICE_NAMES instance parameter or use the DBMS_SERVICE package.

SELECT NAME FROM V$ACTIVE_SERVICES;

NAME
----------------------------------------------------------------
DEMO_BATCH
DEMO_OLTP
ORCLXDB
ORCL.PYTHIAN.COM
SYS$BACKGROUND
SYS$USERS

exec DBMS_SERVICE.STOP_SERVICE('DEMO_BATCH');

PL/SQL procedure successfully completed.

New sessions using the service name will receive an ORA-12514 error when trying to connect:

brbook:~ brost$ ./sqlcl/bin/sql brost/******@192.168.78.101:1521/DEMO_BATCH.PYTHIAN.COM

SQLcl: Release 4.2.0.16.175.1027 RC on Thu Aug 18 13:12:27 2016

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

  USER          = brost
  URL           = jdbc:oracle:thin:@192.168.78.101:1521/DEMO_BATCH.PYTHIAN.COM
  Error Message = Listener refused the connection with the following error:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
Existing sessions are allowed to continue

Note that stopping will only affect new connections. Existing sessions that used the DEMO_BATCH service are allowed to continue until they disconnect or you kill them. This gives you the flexibility of a grace period where you just wait for existing sessions to finish their work and disconnect by themselves.

SELECT NAME FROM V$ACTIVE_SERVICES WHERE NAME = 'DEMO_BATCH';
no rows selected

SELECT SERVICE_NAME, USERNAME FROM V$SESSION WHERE SERVICE_NAME='DEMO_BATCH';

SERVICE_NAME         USERNAME
-------------------- ------------------------------
DEMO_BATCH           BROST
Grid Infrastructure has option to force disconnects

If you are using grid infrastructure and manage services through srvctl this behaviour is basically the same but you get an extra “force” switch to also disconnect existing sessions while stopping a service.

[oracle@ractrial1 ~]$ srvctl stop service -db orcl42 -service racdemo_batch [-force]

[oracle@ractrial1 ~]$ srvctl stop service -h

Stops the service.

Usage: srvctl stop service -db <db_unique_name> [-service  "<service_name_list>"] [-serverpool <pool_name>] [-node <node_name> | -instance <inst_name>] [-pq] [-global_override] [-force [-noreplay]] [-eval] [-verbose]
    -db <db_unique_name>           Unique name for the database
    -service "<serv,...>"          Comma separated service names
    -serverpool <pool_name>        Server pool name
    -node <node_name>              Node name
    -instance <inst_name>          Instance name
    -pq                            To perform the action on parallel query service
    -global_override               Override value to operate on a global service.Ignored for a non-global service
    -force                         Disconnect all sessions during stop or relocate service operations
    -noreplay                      Disable session replay during disconnection
    -eval                          Evaluates the effects of event without making any changes to the system
    -verbose                       Verbose output
    -help                          Print usage
Conclusion

Creating extra services on a database allows you to stop and start them for maintenance which can be used as a convenient way to lock out only certain parts of an application while leaving user accounts unlocked to connect via different services.

Categories: DBA Blogs

The rlwrap utility for DBA.

Thu, 2016-08-18 08:36

I spend most of my time as a DBA in linux terminal and sqlplus. Everybody who works with oracle sqlplus knows about its power, but also about its limitations. For many years I have used the rlwrap utility developed by Hans Lub. It gives me command history, and the ability to edit my SQL Plus commands, and use auto completion if I set it up. In this post I will share some tips about installation and basic usage.

First we need to install the utility, and there are several options for that. We will check a few of them below. Please keep in mind that all of my examples are tested on Oracle Linux 6.

For the first one we need git, yum and automake packages. We will use the latest and greatest source code from the project site on GitHub: https://github.com/hanslub42/rlwrap and your standard Yum repository. Assuming you have connection to GitHub and your Yum repo.
Let’s run it step-by-step:

[root@sandbox ~]# yum install readline-devel
....
[root@sandbox ~]# yum install automake
....
[root@sandbox ~]# yum install git
....
[root@ovmcloud01 ~]# git clone https://github.com/hanslub42/rlwrap.git
Initialized empty Git repository in /root/rlwrap/.git/
remote: Counting objects: 1250, done.
remote: Total 1250 (delta 0), reused 0 (delta 0), pack-reused 1250
Receiving objects: 100% (1250/1250), 565.53 KiB, done.
Resolving deltas: 100% (867/867), done.
[root@ovmcloud01 ~]# cd rlwrap
[root@ovmcloud01 rlwrap]# autoreconf --install
configure.ac:32: installing `tools/config.guess'
configure.ac:32: installing `tools/config.sub'
configure.ac:34: installing `tools/install-sh'
configure.ac:34: installing `tools/missing'
src/Makefile.am: installing `tools/depcomp'
[root@ovmcloud01 rlwrap]# automake  --add-missing
[root@ovmcloud01 rlwrap]# ./configure
....
[root@ovmcloud01 rlwrap]# make install
....

That’s it. You have it installed in your system.

The second way is to compile it from source you have downloaded from http://utopia.knoware.nl/~hlub/uck/rlwrap/ . It may be useful if you don’t have connection to Yum and GitHub.
Keep in mind you will need GNU readline and ncurses libraries and headers installed in your system. So, we download the binaries, unpack it, compile and install.

[root@sandbox]$wget http://utopia.knoware.nl/~hlub/uck/rlwrap/rlwrap-0.42.tar.gz
....
[root@sandbox]$tar xfz rlwrap-0.42.tar.gz
[root@sandbox]$cd rlwrap-0.42
[root@sandbox]$./configure
[root@sandbox]$make install

The third way is to copy previously compiled rlwrap execution file and use it on a new system adding it to */bin directory in standard path.
It works if you have several similar, binary compatible systems and don’t want to spend time compiling the same binaries on each one.

[root@sandbox]$cd rlwrap-0.42
[root@sandbox]$ls -l src/rlwrap
-rwxr-xr-x. 1 root root 225023 Aug 16 12:49 src/rlwrap
[root@sandbox]$cp src/rlwrap /usr/local/bin/
[root@sandbox]$rlwrap --help
Usage: rlwrap [options] command ...

Options:
  -a[password prompt]        --always-readline[=password prompt]
  -A                         --ansi-colour-aware
.....

Of course you may consider to make your own rpm or use EPEL (Extra Packages for Enterprise Linux) yum repository and install it from there. Just keep in mind the version you get from EPEL may be slightly outdated.

[root@sandbox]$yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
[root@sandbox]$yum install rlwrap

Having the rlwrap installed you may find use for it.
Here some basic examples how you can use the utility:
Create an alias in your bashrc for sqlplus :

vi ~/.bashrc

and add

alias sqlp='rlwrap sqlplus'

The same you can do for rman :

alias rman='rlwrap rman'

For Oracle GoldenGate command line utility

alias ggsci='rlwrap ./ggsci' 

In rlwrap you can use ! and TAB to call list of commands or use prefix and CTRL+R to search for certain command in command history. Also you can create your own dictionary and use it for auto completion.
Let’s try to build some dictionary for auto-completion
I created a file “lsql.lst” with the following contents:

[oracle@sandbox ~]$ vi lsql.lst
~
select
from
where
and
update
insert
delete
tablespace
v$database

....
[oracle@sandbox ~]$alias sqlp="rlwrap -f lsql.lst sqlplus / as sysdba"
[oracle@sandbox ~]$ sqlp

SQL*Plus: Release 12.1.0.2.0 Production on Wed Aug 17 15:36:04 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
.............................................
cdb> desc v$t 

— here we are pressing TAB and getting list of suggestions:

cdb> desc v$t 
table       tablespace
cdb> desc v$tablespace
 Name														   Null?    Type
 ----------------------------------------------------------------------------------------------------------------- -------- ----------------------------------------------------------------------------
 TS#															    NUMBER

We can make it even better. Let’s upload name for all dba_views
In our sqlplus session we run :

cdb> spool lsql.lst append
cdb> select name from V$FIXED_TABLE;
......
cdb>spool off

logoff and logon again to reload the file and try :

cdb> select * from V$PA -- TAB
PARALLEL_DEGREE_LIMIT_MTH  PARAMETER                  PARAMETER2                 PARAMETER_VALID_VALUES     PATCHES
cdb> select * from V$B -- TAB
BACKUP                       BACKUP_CONTROLFILE_SUMMARY   BACKUP_DATAFILE_SUMMARY      BACKUP_SET                   BACKUP_SYNC_IO               BSP                          BUFFERED_PUBLISHERS
BACKUP_ARCHIVELOG_DETAILS    BACKUP_COPY_DETAILS          BACKUP_DEVICE                BACKUP_SET_DETAILS           BGPROCESS                    BTS_STAT                     BUFFERED_QUEUES
BACKUP_ARCHIVELOG_SUMMARY    BACKUP_COPY_SUMMARY          BACKUP_NONLOGGED             BACKUP_SET_SUMMARY           BH                           BT_SCAN_CACHE                BUFFERED_SUBSCRIBERS
BACKUP_ASYNC_IO              BACKUP_CORRUPTION            BACKUP_PIECE                 BACKUP_SPFILE                BLOCKING_QUIESCE             BT_SCAN_OBJ_TEMPS            BUFFER_POOL
BACKUP_COMPRESSION_PROGRESS  BACKUP_DATAFILE              BACKUP_PIECE_DETAILS         BACKUP_SPFILE_DETAILS        BLOCK_CHANGE_TRACKING        BUFFER                       BUFFER_POOL_STATISTICS
BACKUP_CONTROLFILE_DETAILS   BACKUP_DATAFILE_DETAILS      BACKUP_REDOLOG               BACKUP_SPFILE_SUMMARY        BMAPNONDURSUB                BUFFER2
cdb> select * from V$B

You can see we have all “V$” views and it can be extremely handy when you don’t really have any time to search for a view name and only vaguely remember that you have a view to look up for certain information.

The rlwrap may not be most sophisticated program but it can make your life much easier. There may be a more advanced tool for sqlplus like SQLcl that provides a lot more options. But—the beauty of rlwrap is in its “lightness” and ability to work not only with sqlplus, but with practically any command line tool.

Categories: DBA Blogs

ORA-15418: Appliance Mode Not Supported For Compatible.asm 12.1.0.0.0

Wed, 2016-07-20 09:45

The other day, I was upgrading a compatible.asm parameter to 12.1 on Exadata and I faced this error for the first time :

View the code on Gist.

Indeed, a diskgroup can have this parameter set to TRUE or FALSE :

View the code on Gist.

I then found this note on Metalink : EXADATA : What Is Oracle ASM appliance.mode Attribute (Doc ID 1664261.1) which explains that starting from Exadata 11.2.0.3.3, “The Oracle ASM appliance.mode attribute improves disk rebalance completion time  when dropping one or more Oracle ASM disks. This means that redundancy is restored faster after a failure.

Wow, that looks like a pretty cool feature! But it seems that (sadly) we cannot set a 12.1 compatible if this feature is enabled.

Let’s give it a try and deactivate it to set my compatible.asm to 12.1 :

View the code on Gist.

It works ! Now that this compatible.asm is set to 12.1, could we enable that appliance.mode feature again ?

View the code on Gist.

Sadly, no. I hit one of the restrictions; indeed, “The Oracle ASM disk group attribute compatible.asm is set to release 11.2.0.4, 12.1.0.2 and later. (Appliance mode is not valid with 12.1.0.1)” — then I guess that even of the documentation does not say so, Appliance mode is not valid with a compatible.asm set to 12.1.0.0.0 either.
Even if it is very weird that Oracle dev “forgot” (?) this cool feature when they released 12c on Exadata, they hopefully “released” it again with 12.1.0.2. I’ll follow up on that when moving this compatible.asm to 12.1.0.2 !

Categories: DBA Blogs

Truncates and Sequences Replication in Oracle GoldenGate

Fri, 2016-07-15 11:29

We use the terms DDL and DML in our DBA life with Oracle all the time. The first stands for Data Definition Language and it is about Data Definition changes. The latter is about manipulation with your data itself, and stands for Data Manipulation Language. When we speak about replication of our data by replication tools between Oracle databases, we generally either enable DDL, work only replicating DML, or do it for both together. In general, I would recommend replicating both DML and DDL just in case, to prevent the replication to be broken in case of unexpected structural changes in the replicated schemas. But in some cases you do not want to replicate all DDL or any DDL at all for certain reasons. I will discuss a couple of operations which are handled slightly different from pure DDL/DML changes in GoldenGate.

The first of them is truncate operation. In Oracle it is definitely DDL and you can see that.

orcl&gt; select object_id,data_object_id,last_ddl_time from dba_objects where object_name='EMP_TEST';

       OBJECT_ID   DATA_OBJECT_ID LAST_DDL_TIME
---------------- ---------------- -----------------
	  148769	   148769 06/24/16 16:07:04

orcl&gt; truncate table emp_test;

Table truncated.

orcl&gt; select object_id,data_object_id,last_ddl_time from dba_objects where object_name='EMP_TEST';

       OBJECT_ID   DATA_OBJECT_ID LAST_DDL_TIME
---------------- ---------------- -----------------
	  148769	   148770 06/24/16 16:15:52

orcl&gt;

It is clear that the object gets the new data_object_id and last_ddl_time shows new time.

There is a general assumption that you need to enable a DDL support to replicate truncates. But is this true? In reality you can replicate truncates (with some limitations) without full DDL support, and I want to show you how it can be done. What you need to do is setup a parameter GETTRUNCATES.
Let’s setup it on extract and see how it works.

Here is my extract parameter file:

[oracle@bigdatalite oggora]$ cat dirprm/trext.prm
extract trext
userid ogg, password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT
--RMTHOSTOPTIONS
RMTHOST bigdatalite, MGRPORT 7849
RMTTRAIL ./dirdat/tr, format release 11.2
GETTRUNCATES
TABLE trsource.*;

We don’t have DDL support and if we try to add a column on the source and put a value to that column our replicat on other side will be abended.

orcl&gt; alter table trsource.emp add col1 varchar2(10) ;

Table altered.

orcl&gt; update trsource.emp set col1='Test1' where empno=7499;

1 row updated.

orcl&gt; commit;

And on the target side:

2016-06-27 13:51:47  INFO    OGG-01021  Oracle GoldenGate Delivery for Oracle, trrep.prm:  Command received from GGSCI: STATS.
2016-06-27 13:57:37  ERROR   OGG-01161  Oracle GoldenGate Delivery for Oracle, trrep.prm:  Bad column index (8) specified for table TRSOURCE.EMP, max columns = 8.
2016-06-27 13:57:37  ERROR   OGG-01668  Oracle GoldenGate Delivery for Oracle, trrep.prm:  PROCESS ABENDING.

You are going to get similar errors for other DDL operations but not for truncates.

orcl&gt; truncate table trsource.emp;

Table truncated.

orcl&gt;
GGSCI (bigdatalite.localdomain) 1&gt; send trext, stats

Sending STATS request to EXTRACT TREXT ...

Start of Statistics at 2016-06-27 14:05:24.

Output to ./dirdat/tr:

Extracting from TRSOURCE.EMP to TRSOURCE.EMP:

*** Total statistics since 2016-06-27 14:05:07 ***
	Total inserts                   	           0.00
	Total updates                   	           0.00
	Total deletes                   	           0.00
	Total truncates                 	           1.00
	Total discards                  	           0.00
	Total operations                	           1.00

You can see that we have captured the truncate by our extract. Even our DDL support is disabled. What we need is to set up the same parameter GETTRUNCATES on replicat side. Why do we need to set it up explicitly? Because the default behaviour and parameter for GoldenGate is “IGNORETRUNCATES” for all processes. As result, the truncates will be applied to the target system.

We are setting our parameter on replicat side and see the result:

[oracle@bigdatalite ogg11ora]$ cat dirprm/trrep.prm
replicat trrep
--trace
DISCARDFILE ./dirdsc/discard.out, append
userid ogg@test, password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT
assumetargetdefs
--DDL include all
GETTRUNCATES
map trsource.emp, target trdest.emp;
[oracle@bigdatalite ogg11ora]$


GGSCI (bigdatalite.localdomain) 4&gt; send trrep, stats

Sending STATS request to REPLICAT TRREP ...

Start of Statistics at 2016-06-27 14:08:40.

Replicating from TRSOURCE.EMP to TRDEST.EMP:

*** Total statistics since 2016-06-27 14:08:25 ***
	Total inserts                   	           0.00
	Total updates                   	           0.00
	Total deletes                   	           0.00
	Total truncates                 	           1.00
	Total discards                  	           0.00
	Total operations                	           1.00

test&gt; select count(*) from trdest.emp;

	COUNT(*)
----------------
	       0

test&gt;

We don’t need full DDL support if we want to replicate truncates only. Sometimes it may help us when we have workflow including truncates, but we don’t want to replicate all DDL commands for some reasons. Just keep in mind that it works with some limitations. You cannot replicate by using “truncate partition” for Oracle. It will require full DDL support.

The second thing I want to discuss in this topic is support for sequences values replication. Sometimes people assume that it requires DDL support, but this is not true. As matter of fact replicating of sequences values doesn’t require you to enable DDL support for your replication. Of course, you need full DDL replication support to replicate CREATE, ALTER, DROP, RENAME for sequences, but the values are replicated as DML.

To enable the replication of sequences you need to create a special user on source and target databases, add the user to the GGSCHEMA parameter to your .GLOBALS file, and run one script to create all necessary procedures in the newly created schema.
Let’s have a closer look. I have a user OGG I am using for connection and I plan to use the same user for sequence support.

Here is my .GLOBALS file:

[oracle@bigdatalite oggora]$ cat GLOBALS
GGSCHEMA OGG

The same I have on the target side:

[oracle@bigdatalite ogg11ora]$ cat GLOBALS
GGSCHEMA OGG

I ran the script sequence.sql on both sides.

orcl&gt; @sequence.sql
Please enter the name of a schema for the GoldenGate database objects:
ogg
Setting schema name to OGG

UPDATE_SEQUENCE STATUS:

Line/pos				 Error
---------------------------------------- -----------------------------------------------------------------
No errors				 No errors

GETSEQFLUSH

Line/pos				 Error
---------------------------------------- -----------------------------------------------------------------
No errors				 No errors

SEQTRACE

Line/pos				 Error
---------------------------------------- -----------------------------------------------------------------
No errors				 No errors

REPLICATE_SEQUENCE STATUS:

Line/pos				 Error
---------------------------------------- -----------------------------------------------------------------
No errors				 No errors

STATUS OF SEQUENCE SUPPORT
--------------------------------------------------------------
SUCCESSFUL installation of Oracle Sequence Replication support
orcl&gt;

And on the source side add primary key supplemental logging to the sys.seq$ table:

orcl&gt; ALTER TABLE sys.seq$ ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;

Table altered.

orcl&gt;

You may have a look to the procedures created by the scripts:
SEQTRACE
GETSEQFLUSH
REPLICATESEQUENCE
UPDATESEQUENCE

These procedures enable interface to flush, update and replicate the sequences.

Now we are creating a sequence on the source and target with the same parameters.

orcl&gt; create sequence trsource.empno_seq start with 8100;

Sequence created.

orcl&gt;

Adding parameter SEQUENCE to our parameter file for extract:

[oracle@bigdatalite oggora]$ cat dirprm/trext.prm
extract trext
userid ogg, password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT
--RMTHOSTOPTIONS
RMTHOST bigdatalite, MGRPORT 7849
RMTTRAIL ./dirdat/tr, format release 11.2
--DDL include objname trsource.*
GETTRUNCATES
SEQUENCE tsource.*;
TABLE trsource.*;

[oracle@bigdatalite oggora]$

On the target we are creating the same sequence:

test&gt; create sequence trdest.empno_seq start with 8100;

Sequence created.

test&gt;
[oracle@bigdatalite ogg11ora]$ cat dirprm/trrep.prm
replicat trrep
--trace
DISCARDFILE ./dirdsc/discard.out, append
userid ogg@test, password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT
assumetargetdefs
--DDL include all
GETTRUNCATES
map trsource., target trdest.;

I made our sequences a bit different on purpose. Our source sequence had a slightly bigger current value than target:

orcl&gt; select trsource.empno_seq.currval from dual;

	 CURRVAL
----------------
	    8102

orcl&gt;


test&gt; select trdest.empno_seq.currval from dual;

	 CURRVAL
----------------
	    8100

test&gt;

What we need is to run command FLUSH SEQUENCE on our extract side:

GGSCI (bigdatalite.localdomain) 9&gt; dblogin userid ogg@orcl,password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT
Successfully logged into database.

GGSCI (bigdatalite.localdomain as ogg@orcl) 3&gt; flush sequence trsource.empno_seq
Successfully flushed 1 sequence(s) trsource.empno_seq.

GGSCI (bigdatalite.localdomain as ogg@orcl) 4&gt;

And on target we can see:

test&gt; select * from dba_sequences where SEQUENCE_NAME='EMPNO_SEQ';

SEQUENCE_OWNER		       SEQUENCE_NAME			     MIN_VALUE	      MAX_VALUE     INCREMENT_BY C O	   CACHE_SIZE	   LAST_NUMBER
------------------------------ ------------------------------ ---------------- ---------------- ---------------- - - ---------------- ----------------
TRDEST			       EMPNO_SEQ				     1 9999999999999999 	       1 N N		   20		  8143

test&gt; select last_number from dba_sequences where SEQUENCE_NAME='EMPNO_SEQ';

     LAST_NUMBER
----------------
	    8143

test&gt;

The last number for the sequence on the target has been increased to 8143 when on the source we have only cache was flushed and we got 8123 as a last number for the sequence:

orcl&gt; select last_number from dba_sequences where SEQUENCE_NAME='EMPNO_SEQ';

     LAST_NUMBER
----------------
	    8123

orcl&gt;

Let’s try to get new values for the sequence.

orcl&gt; select trsource.empno_seq.nextval from dual;

	 NEXTVAL
----------------
	    8104

orcl&gt; select trsource.empno_seq.nextval from dual;

	 NEXTVAL
----------------
	    8105

We continue to increase values on the source, and as soon as we crossed max number for the source (8123) we got new value on the target:

orcl&gt; select trsource.empno_seq.nextval from dual;

	 NEXTVAL
----------------
	    8119

........
	 NEXTVAL
----------------
	    8124

orcl&gt;


test&gt; select last_number from dba_sequences where SEQUENCE_NAME='EMPNO_SEQ';

     LAST_NUMBER
----------------
	    8144

test&gt;

And the statistics on the target will be shown as updates:

GGSCI (bigdatalite.localdomain) 1&gt; send trrep, stats

Sending STATS request to REPLICAT TRREP ...

Start of Statistics at 2016-06-29 13:20:36.

Replicating from TRSOURCE.EMPNO_SEQ to TRDEST.EMPNO_SEQ:

*** Total statistics since 2016-06-29 13:10:52 ***
	Total updates                   	           4.00
	Total discards                  	           0.00
	Total operations                	           4.00


We can see that the two operations are a bit different from all other standard DDL and DML in Oracle GoldenGate. I hope this small piece of information may help you in your implementation, or help to support your GoldenGate environment.

Stay tuned and keep your eyes on Pythian blog.

Categories: DBA Blogs

Log Buffer #482: A Carnival of the Vanities for DBAs

Wed, 2016-07-13 13:29

This Week’s log buffer edition covers some of the useful blog posts from Oracle, SQL Server and MySQL.

Oracle:

ASM disks – lsdg compared with the v$asm_diskgroup view

Can a query on the standby update the primary ?

What should I know about SQL?

Setting Environment Variables in Application Server/Process Scheduler Tuxedo Domains

Oracle HEXTORAW Function with Examples

SQL Server:

Query Store is a new feature in SQL Server 2016 which, once enabled, automatically captures a history of queries, execution plans, and runtime statistics, retaining them for your troubleshooting performance problems caused by query plan changes.

Finding and Eliminating Duplicate or Overlapping Indexes

Changing Linked Server Properties

Windows Containers and Docker

Stretch Database in SQL Server 2016 RTM

MySQL:

Why Adaptive Fault Detection is Powerful and Unique

Develop By Example – Document Store Connections using Node.js

libssl.so.6: cannot open shared object file with MariaDB Galera

How to make sure that ‘password’ is not a valid MySQL password

MySQL 5.7, utf8mb4 and the load data infile

Categories: DBA Blogs

Eight Ways To Ensure Your Applications Are Enterprise-Ready

Tue, 2016-07-12 10:00

When it comes to building database applications and solutions, developers, DBAs, engineers and architects have a lot of new and exciting tools and technologies to play with, especially with the Hadoop and NoSQL environments growing so rapidly.

While it’s easy to geek out about these cool and revolutionary new technologies, at some point in the development cycle you’ll need to stop to consider the real-world business implications of the application you’re proposing. After all, you’re bound to face some tough questions, like:

Why did you choose that particular database for our mission-critical application? Can your team provide 24/7 support for the app? Do you have a plan to train people on this new technology? Do we have the right hardware infrastructure to support the app’s deployment? How are you going to ensure there won’t be any bugs or security vulnerabilities?

If you don’t have a plan for navigating and anticipating these kinds of questions in advance, you’re likely to face difficulty getting approval for and implementing your application.

Any database applications or solutions you build or adopt for your organization must be “enterprise-ready”: secure, stable and scalable with a proven, tested capacity for deployment. They must be easy to administer and easy to support. But how do you make sure that happens?

Here are eight things to consider before declaring your proposed solution enterprise-ready:

  1. Open communications: A close working relationship between the development and operations teams goes a long way toward seamless integration of your database applications. By working together (from start to finish, as early on as possible), you can better anticipate the issues to be solved so your app or solution gets up and running faster.
  2. Platform reliability: Open source databases are great for obvious reasons: they’re free and easily customizable. But if your app is revenue-generating or mission-critical, it’s better to use a tested and proven distribution platform like Datastax Enterprise for Cassandra, Cloudera or HortonWorks for Hadoop, and Oracle or Percona for MySQL.
  3. Continuous quality: No matter which technology you use to build your app, make sure it passes rigorous quality assurance, scale and performance testing — both initially and with every new release of the software. Your vendor also needs to be proactive when it comes to releasing patches and fixing bugs.
  4. Suitable infrastructure: Consider whether the infrastructure you’re running is appropriate for the app you’re developing. If the database is set to run on multiple nodes of commodity hardware — to cover your bases in case one fails — but your operations team likes to store everything on an expensive SAN device, you might want to look into other alternatives.
  5. Experienced hosting: You’ll want to find a hosting company that is reliable, cost-effective and meets your company’s security policies. It should also have experience hosting the database technology you plan on using; that way, it knows how to respond when issues or challenges arise.
  6. Expert talent: Bring in a team of experts that can support your entire environment. While your operations team may want to take care of some elements themselves (everything up to the OS level, for instance), you’ll still want to show them that you have 24/7 support coverage available if needed. This team should be committed to continuous training and have enough people skilled with your technology to provide uninterrupted coverage.
  7. Comprehensive skills: Your team should be able to check your configurations against best practices for security, performance and availability — but don’t forget to ensure that they’re also set up for the more mundane things like systems monitoring, responding to alerts and fault finding.
  8. Ongoing costs: When tallying the running cost of your application, keep in mind that you need to incorporate the cost of the distributed version, its hosting, and 24/7 support and optimization.

With all the elements that go into getting an application enterprise-ready, it might be easier to work with a reputable partner who has the experience and expertise to help you deploy the right solution for your organization and ensure its long-term success.

Find out how Pythian’s solutions can help you succeed.

Categories: DBA Blogs

Time Slider Bug In Solaris 11: Snapshots Created Only For Leaf Datasets

Tue, 2016-06-21 14:06

Time Slider is a feature in Solaris that allows you to open past versions of your files.

It is implemented via a service which creates automatic ZFS snapshots every 15 minutes (frequent), hourly, daily, weekly and monthly. By default it retains only 3 frequent, 23 hourly, 6 daily, 3 weekly and 12 monthly snapshots.

I am using Time Slider on several Solaris 11.x servers and I found the same problem on all of them – it doesn’t create any automatic snapshots for some datasets.
For example, it doesn’t create any snapshots for the Solaris root dataset rpool/ROOT/solaris. However it creates snapshots for the leaf dataset rpool/ROOT/solaris/var. The rpool/ROOT dataset also doesn’t have any automatic snapshots, but rpool itself has snapshots, so it’s not easy to understand what is happening.

I searched for this problem and found that other people have noticed it as well.

There is a thread about it in the Oracle Community:

Solaris 11, 11.1, 11.2 and 11.3: Possible Bug in Time-Slider/ZFS Auto-Snapshot

The last message mentions the following bug in My Oracle Support:

Bug 15775093 : SUNBT7148449 time-slider only taking snapshots of leaf datasets

This bug has been created on 24-Feb-2012 but still isn’t fixed.

After more searching, I found this issue in the bug tracker of the illumos project:

Bug #1013 : time-slider fails to take snapshots on complex dataset/snapshot configuration

The original poster (OP) has encountered a problem with a complex pool configuration with many nested datasets having different values for the com.sun:auto-snapshot* properties.
He has dug into the Time Slider Python code and has proposed a change, which has been blindly accepted without proper testing and has ended up in Solaris 11.
Unfortunately, this change has introduced a serious bug which has destroyed the logic for creating recursive snapshots where they are possible.

Let me quickly explain how this is supposed to work.
If a pool has com.sun:auto-snapshot=true for the main dataset and all child datasets inherit this property, Time Slider can create a recursive snapshot for the main dataset and skip all child datasets, because they should already have the same snapshot.
However, if any child dataset has com.sun:auto-snapshot=false, Time Slider can no longer do this.
In this case the intended logic is to create recursive snapshots for all sub-trees which don’t have any excluded children and then create non-recursive snapshots for the remaining datasets which also have com.sun:auto-snapshot=true.
The algorithm is building separate lists of datasets for recursive snapshots and for single snapshots.

Here is an excerpt from /usr/share/time-slider/lib/time_slider/zfs.py:

        # Now figure out what can be recursively snapshotted and what
        # must be singly snapshotted. Single snapshot restrictions apply
        # to those datasets who have a child in the excluded list.
        # 'included' is sorted in reverse alphabetical order.
        for datasetname in included:
            excludedchild = False
            idx = bisect_right(everything, datasetname)
            children = [name for name in everything[idx:] if \
                        name.find(datasetname) == 0]
            for child in children:
                idx = bisect_left(excluded, child)
                if idx < len(excluded) and excluded[idx] == child:
                    excludedchild = True
                    single.append(datasetname)
                    break
            if excludedchild == False:
                # We want recursive list sorted in alphabetical order
                # so insert instead of append to the list.
                recursive.append (datasetname)

This part is the same in all versions of Solaris 11 (from 11-11 to 11.3, which is currently the latest).
If we look at the comment above the last line, it says that it should do “insert instead of append to the list”.
This is because the included list is sorted in reverse alphabetical order when it is built.
And this is the exact line that has been modified by the OP. When append is used instead of insert the recursive list becomes sorted in reverse alphabetical order as well.
The next part of the code is traversing the recursive list and is trying to skip all child datasets which already have their parent marked for recursive snapshot:

        for datasetname in recursive:
            parts = datasetname.rsplit('/', 1)
            parent = parts[0]
            if parent == datasetname:
                # Root filesystem of the Zpool, so
                # this can't be inherited and must be
                # set locally.
                finalrecursive.append(datasetname)
                continue
            idx = bisect_right(recursive, parent)
            if len(recursive) > 0 and \
               recursive[idx-1] == parent:
                # Parent already marked for recursive snapshot: so skip
                continue
            else:
                finalrecursive.append(datasetname)

This code heavily relies on the sort order and fails to do its job when the list is sorted in reverse order.
What happens is that all datasets remain in the list with child datasets being placed before their parents.
Then the code tries to create recursive snapshot for each of these datasets.
The operation is successful for the leaf datasets, but fails for the parent datasets because their children already have a snapshot with the same name.
The snapshots are also successful for the datasets in the single list (ones that have excluded children).
The rpool/dump and rpool/swap volumes have com.sun:auto-snapshot=false. That’s why rpool has snapshots.

Luckily, the original code was posted in the same thread so I just reverted the change:

            if excludedchild == False:
                # We want recursive list sorted in alphabetical order
                # so insert instead of append to the list.
                recursive.insert(0, datasetname)

After doing this, Time Slider immediately started creating snapshots for all datasets that have com.sun:auto-snapshot=true, including rpool/ROOT and rpool/ROOT/solaris.
So far I haven’t found any issue and snapshots work as expected.
There may be some issues with very complex structure like the OP had, but his change has completely destroyed the clever algorithm for doing recursive snapshots where they are possible.

Final Thoughts.

It is very strange that Oracle hasn’t paid attention to this bug and has left it hanging for more than 4 years. Maybe they consider Time Slider a non-important Desktop feature. However I think that it’s fairly useful for servers as well.

The solution is simple – a single line change, but it will be much better if this is resolved in a future Solaris 11.3 SRU. Until then I hope that my blog post will be useful for anyone who is trying to figure out why the automatic snapshots are not working as intended.

Categories: DBA Blogs

ASMCMD&gt: A Better DU, Version 2

Tue, 2016-06-21 13:03

A while ago, I posted a better “du” for asmcmd . Since then, Oracle 12cR2 beta has been released but it seems that our poor old “du” will not be improved.

I then wrote a better “better du for asmcmd” with some cool new features compared to the previous one which was quite primitive.

In this second version you will find :

  • No need to set up your environment, asmdu will do it for you
  • If no parameter is passed to the script, asmdu will show you a summary of all the diskgroups : asmdu_1
  • If a parameter (a diskgroup) is passed to the script, asmdu will show you a summary of the diskgroup size with its filling rate and the list of the directories it contains with their sizes :asmdu_2Note : you can quickly see in this screenshot that “DB9” consumes the most space in the FRA diskgroup; it is perhaps worth to have a closer look
  • A list of all running instances on the server is now shown on top of the asmdu output; I found that handy to have that list here
  • I also put some colored thresholds (red, yellow and green) to be able to quickly see which diskgroup has a space issue; you can modify it easily in the script :
#
# Colored thresholds (Red, Yellow, Green)
#
 CRITICAL=90
 WARNING=75
  • The only pre-requisite is that oraenv has to work

Here is the code :

#!/bin/bash
# Fred Denis -- denis@pythian.com -- 2016
#
# - If no parameter specified, show a du of each DiskGroup
# - If a parameter, print a du of each subdirectory
#

D=$1

#
# Colored thresholds (Red, Yellow, Green)
#
 CRITICAL=90
 WARNING=75

#
# Set the ASM env
#
ORACLE_SID=`ps -ef | grep pmon | grep asm | awk '{print $NF}' | sed s'/asm_pmon_//' | egrep "^[+]"`
export ORAENV_ASK=NO
. oraenv > /dev/null 2>&1

#
# A quick list of what is running on the server
#
ps -ef | grep pmon | grep -v grep | awk '{print $NF}' | sed s'/.*_pmon_//' | egrep "^([+]|[Aa-Zz])" | sort | awk -v H="`hostname -s`" 'BEGIN {printf("%s", "Databases on " H " : ")} { printf("%s, ", $0)} END{printf("\n")}' | sed s'/, $//'

#
# Manage parameters
#
if [[ -z $D ]]
then # No directory provided, will check all the DG
 DG=`asmcmd lsdg | grep -v State | awk '{print $NF}' | sed s'/\///'`
 SUBDIR="No" # Do not show the subdirectories details if no directory is specified
else
 DG=`echo $D | sed s'/\/.*$//g'`
fi

#
# A header
#
printf "\n%25s%16s%16s%14s" "DiskGroup" "Total_MB" "Free_MB" "% Free"
printf "\n%25s%16s%16s%14s\n" "---------" "--------" "-------" "------"

#
# Show DG info
#
for X in ${DG}
do
 asmcmd lsdg ${X} | tail -1 |\
 awk -v DG="$X" -v W="$WARNING" -v C="$CRITICAL" '\
 BEGIN \
 {COLOR_BEGIN = "\033[1;" ;
 COLOR_END = "\033[m" ;
 RED = COLOR_BEGIN"31m" ;
 GREEN = COLOR_BEGIN"32m" ;
 YELLOW = COLOR_BEGIN"33m" ;
 COLOR = GREEN ;
 }
 { FREE = sprintf("%12d", $8/$7*100) ;
 if ((100-FREE) > W) {COLOR=YELLOW ;}
 if ((100-FREE) > C) {COLOR=RED ;}
 printf("%25s%16s%16s%s\n", DG, $7, $8, COLOR FREE COLOR_END) ; }'
done
printf "\n"

#
# Subdirs info
#
if [ -z ${SUBDIR} ]
then
(for DIR in `asmcmd ls ${D}`
do
 echo ${DIR} `asmcmd du ${D}/${DIR} | tail -1`
done) | awk -v D="$D" ' BEGIN { printf("\n\t\t%40s\n\n", D " subdirectories size") ;
 printf("%25s%16s%16s\n", "Subdir", "Used MB", "Mirror MB") ;
 printf("%25s%16s%16s\n", "------", "-------", "---------") ;}
 {
 printf("%25s%16s%16s\n", $1, $2, $3) ;
 use += $2 ;
 mir += $3 ;
 }
 END { printf("\n\n%25s%16s%16s\n", "------", "-------", "---------") ;
 printf("%25s%16s%16s\n\n", "Total", use, mir) ;} '
fi


#************************************************************************#
#* E N D O F S O U R C E *#
#************************************************************************#

We use it a lot in my team and found no issue with the script so far. Let me know if you find one and enjoy!

 

Categories: DBA Blogs

Encrypting sensitive data in puppet using hiera-eyaml

Fri, 2016-05-27 13:36

Puppet manifests can hold a lot of sensitive information. Sensitive information like passwords or certificates are used in the configuration of many applications. Exposing them in a puppet manifest is not ideal and may conflict with an organization’s compliance policies. That is why data separation is very important aspect of secure puppet code.

Hiera is a pluggable Hierarchical Database. Hiera can help by keeping data out of puppet manifests. Puppet classes can look for data in hiera and hiera would search hierarchically and provide the first instance of value.

Although Hiera is able to provide data separation, it cannot ensure security of sensitive information. Anyone with access to the Hiera data store will be able to see the data.

Enter Hiera-eyaml. Hiera-eyaml is a backend for Hiera that provides per-value encryption of sensitive data within yaml files to be used by Puppet.

The following puppet module can be used to manage hiera with eyaml support.

https://forge.puppetlabs.com/hunner/hiera puppet module.

The module class can be used like below,

modules/profile/manifests/hieraconf.ppclass profile::hieraconf {
# hiera configuration
class { ‘hiera’:
hierarchy => [
‘%{environment}/%{calling_class}’,
‘%{environment}’,
‘%{fqdn}’,
‘common’,
‘accounts’,
‘dev’
],
}
}

The /etc/hiera.conf would look like following after the puppet run,

/etc/puppet/hiera.yaml# managed by puppet

:backends:
– yaml
:logger: console
:hierarchy:
– “%{environment}/%{calling_class}”
– “%{environment}”
– “%{fqdn}”
– common
– accounts
– dev
:yaml:
:datadir: /etc/puppet/hieradata

Moving data to Hiera
In following example, diamond collector for Mongodb does have data like, hosts, user and password. The collector is only enabled for grafana.pythian.com host.

modules/profile/manifests/diamond_coll.pp
[..] diamond::collector { ‘MongoDBCollector’:
options => {
enabled => $fqdn ? { /grafana.pythian.com/ =>True, default => false },
hosts => ‘abc.pythian.com,xyz.pythian.com’,
user => ‘grafana’,
passwd => ‘xxxx’,
}
}

To move the data to hiera, create_resources function can be used in the manifest.

modules/profile/manifests/diamond_coll.ppclass profile::diamond_coll{
[..] $mycollectors = hiera(‘diamond::collectors’, {})
create_resources(‘diamond::collector’, $mycollectors)
[..] }

Then a new yaml file can be created and diamond::collectors code for MongoDBCollector can be abstracted like below,

hieradata/grafana.pythian.com.yaml

diamond::collectors:
MongoDBCollector:
options:
enabled: True
hosts: abc.pythian.com,xyz.pythian.com
user: grafana
passwd: xxxx

Moving data to Hiera-eyaml
Hiera puppet code can be changed to following to enable eyaml.

class profile::hieraconf {
# hiera configuration
class { ‘hiera’:
hierarchy => [
‘%{environment}/%{calling_class}’,
‘%{environment}’,
‘%{fqdn}’,
‘common’,
‘accounts’,
‘dev’
],
eyaml => true,
eyaml_datadir => ‘/etc/puppet/hieradata’,
eyaml_extension => ‘eyaml’,
}
}

This will add eyaml backend to puppet after a puppet run on puppet server. Puppet modules does following to achieve this.

Update
1. The hiera-eyaml gem will be installed.
2. Following keys will be created for hiera-eyaml using ‘eyaml createkeys’.

/etc/puppet/keys/private_key.pkcs7.pem
/etc/puppet/keys/public_key.pkcs7.pem

3. Update /etc/hiera.conf.

The /etc/hiera.conf would look like following after the puppet run,

/etc/puppet/hiera.yaml# managed by puppet

:backends:
– eyaml
– yaml
:logger: console
:hierarchy:
– “%{environment}/%{calling_class}”
– “%{environment}”
– “%{fqdn}”
– common
– accounts
– dev
:yaml:
:datadir: /etc/puppet/hieradata
:eyaml:
:datadir: /etc/puppet/hieradata
:extension: eyaml
:pkcs7_private_key: /etc/puppet/keys/private_key.pkcs7.pem
:pkcs7_public_key: /etc/puppet/keys/public_key.pkcs7.pem

Puppetmaster need to be restarted after this as changes to hiera.conf would need a restart to apply.

Using eyaml command line

Eyaml commands need to be used in a directory with keys directory(In this example /etc/puppet). Following command can be used to encrypt a password. The command would give us two options, string and block.

# eyaml encrypt -p
Enter password: ****
string: ENC[PKCS7,MIIBeQYJKoZIhvcN[..]Fg3jAmdlCLbQ] OR
block: >
ENC[PKCS7,MIIBeQYJKoZIhvcNAQcDoIIBajCCAWYCAQAxggEhMIIBHQIBADAFMAACAQEw
DQYJKoZIhvcNAQEBBQAEggEAQMo0dyWRmBC30TVDVxEOoClgUsxtzDSXmrJL
pz3ydhvG0Ll96L6F2WbclmGtpaqksbpc98p08Ri8seNmSp4sIoZWaZs6B2Jk
BLOIJBZfSIcHH8tAdeV4gBS1067OD7I+ucpCnjUDTjYAp+PdupyeaBzkoWMA
X/TrmBXW39ndAATsgeymwwG/EcvaAMKm4b4qH0sqIKAWj2qeVJTlrBbkuqCj
qjOO/kc47gKlCarJkeJH/rtErpjJ0Et+6SJdbDxeSbJ2QhieXKGAj/ERCoqh
hXJiOofFuAloAAUfWUfPKnSZQEhHCPDkeyhgDHwc8akWjC4l0eeorZgLPcs1
1oQJqTA8BgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBC+JDHdj2M2++mFu+pv
ORXmgBA/Ng596hsGFg3jAmdlCLbQ]

To decrypt following command can be used.

# eyaml decrypt -s ‘ENC[PKCS7,MIIBeQYJKoZIhvcN[..]Fg3jAmdlCLbQ]’
test

The encrypted string or block can be used in hiera. While using our previous example, the hiera file would look like following. We also have to rename the file to .eyaml from .yaml.

hieradata/grafana.pythian.com.eyaml

diamond::collectors:
MongoDBCollector:
options:
enabled: True
hosts: abc.pythian.com,xyz.pythian.com
user: grafana
passwd: ENC[PKCS7,MIIBeQYJKoZIhvcN[..]Fg3jAmdlCLbQ]

Encrypting certificates
Following is a standard file resource used to copy an ssl certificate..

environments/production/manifests/logstash.pythian.com.ppfile { ‘/etc/logstash/certs/logstash-forwarder.crt’:
ensure => present,
mode => ‘0644’,
owner => ‘root’,
group => ‘root’,
source => ‘puppet:///modules/logstash/logstash-forwarder.crt’,
}

The file resource can be moved to hiera using hiera_hash.

environments/production/manifests/logstash.pythian.com.pp$fileslog = hiera_hash(‘fileslog’)
create_resources ( file, $fileslog )

The data can be added to a yaml file.

hieradata/common.yaml—
files:
‘/etc/logstash-forwarder/certs/logstash-forwarder.crt’:
ensure : present
mode : ‘0644’
owner : ‘root’
group : ‘root’
source : ‘puppet:///modules/logstash/logstash-forwarder.key’

To encrypt data, following command can be used.

# eyaml encrypt -f modules/logstash/files/logstash-forwarder.crt

The returned string value can be added using content parameter of file resource.

hieradata/common.eyaml
[..] files:
‘/etc/logstash-forwarder/certs/logstash-forwarder.crt’:
ensure : present
mode : ‘0644’
owner : ‘root’
group : ‘root’
content : ‘ENC[PKCS7,MIIB+wYJKoZI[..]C609Oc2QUvxARaw==]’

The above examples covers encrypting strings and files, which constitutes most of the sensitive data used in puppet code. Incorporating hiera-eyaml into puppet work-flow will ensure compliance and security of sensitive data.

Categories: DBA Blogs

Understanding MySQL Fabric Faulty Server Detection

Thu, 2016-05-26 13:39

Awhile ago I found myself analyzing a MySQL fabric installation to understand why a group member was occasionally being marked as FAULTY even when the server was up and running and no failures were observed.  

         
                         server_uuid     address  status       mode weight
------------------------------------ ----------- ------- ---------- ------
ab0b0653-6121-11c5-55a0-007543445454 mysql1:3306 PRIMARY READ_WRITE    1.0
f34dd331-2432-11f4-a2d3-006754678533 mysql2:3306 FAULTY  READ_ONLY     1.0

 

Upon reviewing mysqlfabric logs, I found the following warnings were being logged from time to time:

[WARNING] 1442221217.920115 - FailureDetector(xc_grp_1) - Server (f34dd331-2432-11f4-a2d3-006754678533) in group (xc_grp_1) is unreachable

 

Since I was not clear under which circumstances a server is marked as FAULTY, I decided to review MySQL Fabric code (Python) to better understand the process.

The module responsible for printing this message is failure_detection.py and more specifically, the _run method belonging to FailureDetector class. This method will loop through every server in a group, and attempt a connection to the MySQL instance running on that node. MySQLServer.Is_alive (mysql/fabric/server.py) method is called for this purpose.

Before reviewing the failure detection process, we first need to know that there are four MySQL fabric parameters that will affect when a server is considered unstable or faulty:

DETECTION_TIMEOUT
DETECTION_INTERVAL
DETECTIONS
FAILOVER_INTERVAL

 

Based on the above variables, the logic followed by FailureDetector._run() to mark a server as FAULTY is the following:

1) Every {DETECTION_INTERVAL/DETECTIONS} seconds, a connection against each server in the group is attempted with a timeout equal to DETECTION_TIMEOUT

2) If DETECTION_TIMEOUT is exceeded, the observed message is logged and a counter incremented

3) When this counter reaches DETECTIONS, the server is marked as “unstable” and if the last time the master changed was greater than FAILOVER_INTERVAL ago, the server is marked as FAULTY

With a better understanding of the logic followed by MySQL fabric to detect faulty nodes, I went to the configuration file to check the existing values for each of the parameters:

DETECTIONS=3
DETECTION_TIMEOUT=1
FAILOVER_INTERVAL=0
DETECTION_INTERVAL=6

From the values above we can notice that each group will be polled every 2 seconds (DETECTION_INTERVAL/DETECTIONS) and that the monitored server should respond within a second for the test to be considered successful.

On high concurrency nodes, or nodes under heavy load, a high polling frequency combined with tight timeouts could cause the servers to be marked as FAULTY just because the connection attempt would not be completed or processed (in the case of high connection rates or saturated network interfaces) before the timeout occurs.

Also, having FAILOVER_INTERVAL reduced to 0, will cause the server to be marked as FAULTY even if a failover had just occurred.

A less aggressive configuration would be more appropriated for heavy loaded environment:

DETECTIONS=3 -> It's Ok
DETECTION_TIMEOUT -> 5 TO 8
FAILOVER_INTERVAL -> 1200
DETECTION_INTERVAL -> 45 TO 60

Conclusion

As with any other database clustering solution that relies on a database connection to test node status, situations where the database server would take longer to respond should also be considered. The polling frequency should be adjusted so the detection window is within an acceptable range, but the rate of monitoring connections generated is also kept to the minimum. Check timeouts should also be adjusted to avoid false positives caused by the server not being able to respond in a timely manner.

 

Categories: DBA Blogs

Tracefile Automation – Simplify Your Troubleshooting Tasks

Thu, 2016-05-26 10:06

Here’s a common Oracle troubleshooting scenario when a SQL statement needs tuning and/or troubleshooting:

  • log on to dev server
  • connect to database (on a different server)
  • run the SQL statement with 10046 tracing enabled
  • ssh to the database server
  • copy the trace file back to work environment
  • process the trace file.

 

All of this takes time. Granted, not a great deal of time, but tuning is an iterative process and so these steps will be performed multiple times. Not only are these steps a productivity killer, but they are repetitive and annoying. No one wants to keep running the same manual command over and over.

This task is ripe for some simple automation.

If both the client and database servers are some form of Unix, automating these tasks is straightforward.

Please note that these scripts require an 11g or later version of the Oracle database. These scripts are dependent on the v$diag_info view to retrieve the tracefile name. While these scripts could be made to work on 10g databases, that is left as an exercise for the reader.

Step by Step

To simplify the process it can be broken down into steps.

 

1. Reconnect

The first step is to create a new connection each time the SQL is executed. Doing so ensures the database session gets a new tracefile, as we want each execution to be isolated.

-- reconnect.sql

connect jkstill/XXXX@oravm

 

2. Get the Tracefile hostname, owner and filename

Oracle provides all the information needed.

In addition the script will set the 10046 event, run the SQL of interest and then disable the 10046 event.

Following is a snippet from the tracefile_identifier_demo.sql script.

 


-- column variables to capture host, owner and tracefile name
col tracehost new_value tracehost noprint
col traceowner new_value traceowner noprint
col tracefile new_value tracefile noprint

set term off head off feed off

-- get oracle owner
select username traceowner from v$process where pname = 'PMON';

-- get host name
select host_name tracehost from v$instance;

-- set tracefile identifier
alter session set tracefile_identifier = 'MYTRACEFILE';

select value tracefile from v$diag_info where name = 'Default Trace File';

set term on head on feed on

-- do your tracing here
alter session set events '10046 trace name context forever, level 12';

-- run your SQL here
@@sql2trace

alter session set events '10046 trace name context off';

 

In this case sql2trace.sql is a simple SELECT from a test table.  All of the scripts used here appear in Github as mentioned at the end of this article.

 

3. Process the Tracefile

Now that the tracefile has been created, it is time to retrieve it.

The following script scp.sql is called from tracefile_identifier_demo.sql.

 


col scp_src new_value scp_src noprint
col scp_target new_value scp_target noprint

set term off feed off verify off echo off

select '&&1' scp_src from dual;
select '&&2' scp_target from dual;

set feed on term on verify on

--disconnect

host scp &&scp_src &&scp_target

Following is an example putting it all together in tracefile_identifier_demo.sql.

 

SQL> @tracefile_identifier_demo
Connected.

1 row selected.


PRODUCT                        VERSION              STATUS
------------------------------ -------------------- --------------------
NLSRTL                         12.1.0.2.0           Production
Oracle Database 12c Enterprise 12.1.0.2.0           64bit Production
 Edition

PL/SQL                         12.1.0.2.0           Production
TNS for Linux:                 12.1.0.2.0           Production

Data Base
------------------------------
P1.JKS.COM

INSTANCE_NAME        HOST_NAME                      CURRDATE
-------------------- ------------------------------ ----------------------
js122a1              ora12c102rac01.jks.com         2016-05-23 16:38:11

STARTUP
--------------------
04/02/2016 11:22:12


Session altered.

Elapsed: 00:00:00.00

OWNER        OBJECT NAME                     OBJECT_ID OBJECT_TYPE             CREATED
------------ ------------------------------ ---------- ----------------------- -------------------
SYS          OLAP_EXPRESSION                     18200 OPERATOR                2016-01-07 21:46:54
SYS          OLAP_EXPRESSION_BOOL                18206 OPERATOR                2016-01-07 21:46:54
SYS          OLAP_EXPRESSION_DATE                18204 OPERATOR                2016-01-07 21:46:54
SYS          OLAP_EXPRESSION_TEXT                18202 OPERATOR                2016-01-07 21:46:54
SYS          XMLSEQUENCE                          6379 OPERATOR                2016-01-07 21:41:25
SYS          XQSEQUENCE                           6380 OPERATOR                2016-01-07 21:41:25
SYS          XQWINDOWSEQUENCE                     6393 OPERATOR                2016-01-07 21:41:25

7 rows selected.

Elapsed: 00:00:00.00

Session altered.

Elapsed: 00:00:00.00

js122a1_ora_1725_MYTRACEFILE.trc                                                                                                                                                            100% 3014     2.9KB/s   00:00

SQL> host ls -l js122a1_ora_1725_MYTRACEFILE.trc
-rw-r----- 1 jkstill dba 3014 May 23 16:38 js122a1_ora_1725_MYTRACEFILE.trc

But Wait, There’s More!

This demo shows you how to automate the retrieval of the trace file. But why stop there?  The processing of the file can be modified as well.

Really, it isn’t even necessary to copy the script over, as the content can be retrieved and piped to your favorite command.  The script mrskew.sql for instance uses ssh to cat the tracefile, and then pipe the contents to the Method R utility, mrskew.  Note: mrskew is a commercial utility, not open source software.

 

-- mrskew.sql

col ssh_target new_value ssh_target noprint
col scp_filename new_value scp_filename noprint

set term off feed off verify off echo off

select '&&1' ssh_target from dual;
select '&&2' scp_filename from dual;

set feed on term on verify on

--disconnect

host ssh &&ssh_target 'cat &&scp_filename' | mrskew

 

Following is another execution of tracefile_identifier_demo.sql, this time piping output to mrskew. Only the final part of the output is shown following

 

...

Elapsed: 00:00:00.01

CALL-NAME                    DURATION       %  CALLS      MEAN       MIN       MAX
---------------------------  --------  ------  -----  --------  --------  --------
PARSE                        0.002000   33.1%      2  0.001000  0.000000  0.002000
db file sequential read      0.001211   20.0%      5  0.000242  0.000056  0.000342
FETCH                        0.001000   16.5%      1  0.001000  0.001000  0.001000
gc cr grant 2-way            0.000999   16.5%      1  0.000999  0.000999  0.000999
SQL*Net message from client  0.000817   13.5%      2  0.000409  0.000254  0.000563
Disk file operations I/O     0.000018    0.3%      2  0.000009  0.000002  0.000016
SQL*Net message to client    0.000002    0.0%      2  0.000001  0.000001  0.000001
CLOSE                        0.000000    0.0%      2  0.000000  0.000000  0.000000
EXEC                         0.000000    0.0%      2  0.000000  0.000000  0.000000
---------------------------  --------  ------  -----  --------  --------  --------
TOTAL (9)                    0.006047  100.0%     19  0.000318  0.000000  0.002000

Now we can see where all the db time was consumed for this SQL statement, and there was no need to copy the trace file to the current working directory. The same can be done for tkprof and other operations.  Please see the plan.sql and tkprof.sql scripts in the Github repository.

 

Wrapping It Up

A little bit of automation goes a long way. Commands that are repetitive, boring and error prone can easily be automated.  Both your productivity and your mood will soar when little steps like these are used to improve your workflow.

All files for this demo can be found at https://github.com/jkstill/tracefile_identifier

Categories: DBA Blogs

Log Buffer #475: A Carnival of the Vanities for DBAs

Wed, 2016-05-25 14:32

This Log Buffer Edition goes through various blogs, and selects some of the top posts from Oracle, SQL Server and MySQL.

Oracle:

MOS Note:136697.1 – New HCHECK.SQL for Oracle Database 12c

ORAchk / EXAchk questions.

Cloud control won’t start!

ASMLib is an optional utility that can be used on Linux systems to manage Oracle ASM devices.

ORA-56841: Master Diskmon cannot connect to a CELL.

Oracle BITAND Function with Examples.

 

SQL Server:

Natively Compiled Stored Procedures: What they are all about

Considerations around validation errors 41305 and 41325 on memory optimized tables with foreign keys

Taking Azure SQL Data Warehouse for a Test-Drive.

Persistent PowerShell: The PowerShell Profile.

SQL Server Always On Endpoint Encryption Algorithm Compatibility Error.

 

MySQL:

Fixing MySQL scalability problems with ProxySQL or thread pool.

Installing a Web, Email & MySQL Database Cluster on Debian 8.4 Jessie with ISPConfig 3.1

Planets9s – Download the new ClusterControl 1.3 for MySQL, MongoDB & PostgreSQL

AWS Aurora Benchmark – Choose the right tool for the job

Where is the MySQL 5.7 root password?

Categories: DBA Blogs

SharePlex Replication Between Two Instances On The Same Host

Wed, 2016-05-25 10:12

Several days ago I was asked a question about SharePlex and should verify behaviour before providing the answer. I had one linux VM with two databases and needed to setup replication between them. One of them (orcl) was a target 12c EE database while the second one was a source 10g SE. Accordingly, I should use two different versions of SharePlex to mimic the questionable behaviour, so I should have two different SharePlex instances talking to each other on the same host machine. It worked pretty well for, as an example, GoldenGate, where you just setup different ports for manager processes. However, in SharePlex all the instances participating in a configuration should use the same port. The SharePlex documentation states:
“Important! The SharePlex port number must be the same one on all machines in the replication configuration so that they can communicate through TCP/IP connections.”

Of course you cannot use the same port on the same network interface for two independent SharePlex processes working from different homes. In addition, SharePlex is not asking you about a hostname either during installation, or when you start it.
In my case I had 2 interfaces in different subnets on my VM, and I could use them for my replication. But how should I tell to the SharePlex to use one or the other? The answer was simple. You should use “SP_SYS_HOST_NAME” parameter as the environment variable for your shell. Here is how I’ve done that.
I added two new hostnames for those interfaces to my /etc/hosts file to be used for my SharePlex instances:

[root@sandbox ~]$ cat /etc/hosts | grep splex
10.177.130.58	splexhost
172.16.128.10	splexstor
[root@sandbox ~]$ 

I unzipped SharePlex 8.6.3 for Oracle 10 and prepared installation. By default the installer will pick up the hostname automatically and use it during installation.
Here is my hostname and it is not what I want to use for my first SharePlex instance:

[oracle@sandbox ~]$ hostname
sandbox.localdomain
[oracle@sandbox ~]$ ping sandbox.localdomain
PING sandbox.localdomain (127.0.0.1) 56(84) bytes of data.
64 bytes from sandbox.localdomain (127.0.0.1): icmp_seq=1 ttl=64 time=0.058 ms
64 bytes from sandbox.localdomain (127.0.0.1): icmp_seq=2 ttl=64 time=0.061 ms
^C
--- sandbox.localdomain ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1468ms
rtt min/avg/max/mdev = 0.058/0.059/0.061/0.007 ms
[oracle@sandbox ~]$

What you need to do is to setup an environment variable SP_SYS_HOST_NAME. I wanted the first SharePlex listening and working on hostname splexhost using default port 2100. I should also define proper Oracle home and Oracle database SID during installation. I used standard Oracle utility oraenv to have Oracle variables setup. Here is how I started installation for the source:

[oracle@sandbox ~]$ export SP_SYS_HOST_NAME=splexhost
[oracle@sandbox ~]$ . oraenv
ORACLE_SID = [test] ?
The /u01/app/oracle/product/10.2.0/sehome_1/bin/orabase binary does not exist
You can set ORACLE_BASE manually if it is required.
[oracle@sandbox ~]$ cd /u01/distr/SharePlex
[oracle@sandbox SharePlex]$ ll
total 96736
-rwxr-xr-x. 1 oracle oinstall 99056391 Jan 11 21:56 SharePlex-8.6.3-b171-oracle100-rh-40-amd64-m64.tpm
[oracle@sandbox SharePlex]$ ./SharePlex-8.6.3-b171-oracle100-rh-40-amd64-m64.tpm
Unpacking ..................................................................
  ..........................................................................

During the installation I provided directories for program and variable files, port number and information about ORACLE_SID and ORACLE_HOME. Everything went smoothly. I finished the installation by running the “ora_setup” utility creating necessary schema and objects in the source database.
For target I used the same strategy setting up Oracle variables by “oraenv” and exporting explicitly SP_SYS_HOST_NAME=splexstor. The installation worked out without any problems and I got two SharePlex instances installed to different directories.
To start a SharePlex home for a certain IP address and interface you need to explicitly setup SP_SYS_HOST_NAME to an appropriate value. Let’s see how it’s been done for source.

[oracle@sandbox ~]$ export SP_SYS_HOST_NAME=splexhost
[oracle@sandbox ~]$ . oraenv
ORACLE_SID = [test] ?
The /u01/app/oracle/product/10.2.0/sehome_1/bin/orabase binary does not exist
You can set ORACLE_BASE manually if it is required.
[oracle@sandbox ~]$ cd /u01/sp10/bin
[oracle@sandbox bin]$ telnet splexhost 2100
Trying 10.177.130.58...
telnet: connect to address 10.177.130.58: Connection refused

[oracle@sandbox bin]$ nohup /u01/sp10/bin/sp_cop -usp10 &
[1] 2023
[oracle@sandbox bin]$ nohup: ignoring input and appending output to `nohup.out'

[oracle@sandbox bin]$ telnet splexhost 2100
Trying 10.177.130.58...
Connected to splexhost.
Escape character is '^]'.
^]
telnet> q
Connection closed.
[oracle@sandbox bin]$

For target you have to adjust your environment variables accordingly and do the same.

[oracle@sandbox ~]$ export SP_SYS_HOST_NAME=splexstor
[oracle@sandbox ~]$ . oraenv
ORACLE_SID = [test] ? orcl
The Oracle base has been set to /u01/app/oracle
[oracle@sandbox ~]$cd /u01/sp12/bin
[oracle@sandbox bin]$ nohup /u01/sp12/bin/sp_cop -usp12 &
[1] 2066
[oracle@sandbox bin]$ nohup: ignoring input and appending output to `nohup.out'

[oracle@sandbox bin]$ 

As result we have two SharePlex instances running on the same host and talking to each other. Now we can create a sample replication. In the database schema “splex” created by “ora_setup” utility we have DEMO_SRC and DEMO_DST tables. SharePlex is using those tables for a demo replication. We can use them too and see how it works in our case. We can either modify a default pre-created sample configuration “ORA_config” or we can create a new one. I’ve created a new config “sample” on my source SharePlex and activated it:

sp_ctrl (splexhost:2100)> list config

File   Name                                         State       Datasource
--------------------------------------------------  ----------  ---------------
ORA_config                                          Inactive    o.SOURCE_SID
Last Modified At: 17-May-16 11:17    Size: 151

sp_ctrl (splexhost:2100)> create config sample

The command opens default editor and you can write your configuration.
Here is what I put to my “sample” config.

datasource:o.test

#source tables      target tables           routing map

splex.demo_src      splex.demo_dest             splexstor@o.orcl

Now we can activate config.

sp_ctrl (splexhost:2100)> activate config sample

sp_ctrl (splexhost:2100)> list config

File   Name                                         State       Datasource
--------------------------------------------------  ----------  ---------------
ORA_config                                          Inactive    o.SOURCE_SID
Last Modified At: 17-May-16 11:17    Size: 151

sample                                              Active      o.test
Last Modified At: 17-May-16 11:30    Size: 134     Internal Name: .conf.1

sp_ctrl (splexhost:2100)>

Now we can see all the processes running :

On the source:

sp_ctrl (splexhost:2100)> lstatus

Detailed Status for splexhost
Process          State                             PID     Running   Since
---------------  ------------------------------  --------  --------------------
Cop              Running                             2023  17-May-16 11:24:39
Capture          Running                             2250  17-May-16 11:30:53
  Data/Host:   o.test
Read             Running                             2279  17-May-16 11:30:53
  Data/Host:   o.test
Export           Running                             2304  17-May-16 11:30:56
  Data/Host:   splexstor
  Queue Name:  splexhost
Cmd & Ctrl       Running                             2581  17-May-16 11:40:39
  Data/Host:   splexhost

On the target:

sp_ctrl (splexstor:2100)> lstatus

Detailed Status for splexstor
Process          State                             PID     Running   Since
---------------  ------------------------------  --------  --------------------
Cop              Running                             2066  17-May-16 11:26:23
Import           Running                             2305  17-May-16 11:30:56
  Data/Host:   splexhost
  Queue Name:  splexhost
Post             Running                             2306  17-May-16 11:30:56
  Data/Host:   o.test-o.orcl
  Queue Name:  splexhost
Cmd & Ctrl       Running                             2533  17-May-16 11:38:18
  Data/Host:   splexstor

Let’s insert a row on the source :

test>  insert into splex.demo_src values ('JIM', '8001 Irvine Center Drive', '949-754-8000');

1 row created.

test> commit;

Commit complete.

test>

And we can see the row was successfully replicated to target:

orcl> select * from splex.demo_dest;

NAME							     ADDRESS							  PHONE#
------------------------------------------------------------ ------------------------------------------------------------ ------------
JIM							     8001 Irvine Center Drive					  949-754-8000

orcl>

As you can see, we were able to use one box to replicate data between two different databases, using two different SharePlex installations. The idea was simple and clear from the start, but I couldn’t find enough information in the installation guide for the SP_SYS_HOST_NAME parameter except documentation about configuring it for cluster installation. The parameter was documented in the reference section of documentation, though in reality it was not.
It could be even better if we could place the parameter inside, and not think about setting variables, but unfortunately setting the parameter in the “paramdb” didn’t work for me. Even having the parameter, you still need to setup your environment variable SP_SYS_HOST_NAME=your_host_name for non default hostname. I hope the article may help somebody save a bit of time.

Categories: DBA Blogs

Understanding The Database Options – AutoClose

Fri, 2016-05-20 14:03

In this blog post we’ll cover how AutoClose works, and why it’s recommended to disable this property.

At the time that the SQL Server service is started, the operating system logs to find the location of the master system database, and requests exclusive lock of the data and log files.

After that, SQL Server performs the reading in view of sys.master_files system and finds all the data files (.mdf, .NDF) and transaction log (.ldf) from all databases stored on the instance, and also requests the exclusive lock these files to initialize each of the databases.

The first situation in which the AutoClose property can influence the performance drop is in the acquisition of this exclusive lock on the data and log files.

If the property is off, this lock is held since service startup until you stop, however if the property is enabled, from the time when there is no more activity in the database, this lock is released and the data and log files are available to any other process.

Initially this situation may seem very interesting, because we could manipulate the data and log files and perform some administrative tasks, such as a backup.

Now imagine that during any backup, an application needs to access the database, what would happen?

The SQL Server would return an error alerting the impossibility of acquiring exclusive lock on files and the database initialization would fail.

Another big performance problem resulting from use of the AutoClose property is related to the use of the Memory Cache and Buffer areas, Plan Cache.

Whenever a query is performed, generates an execution plan that is kept in an area of memory called the Plan Cache. This area of memory is to store the generated execution plans so that they can be reused if the query is executed again.

After the execution plan generation, all search operators are executed and the data pages selected by the query are stored in an area of memory called Cache Buffer. This area of memory is to store the pages of data so that you don’t have to perform new accesses to the disk subsystem and thus optimize the next i/o requests.

When the AutoClose property is enabled and there are no more connections to the database, all the data pages and execution plans that are in memory will be deleted, thus creating a big drop of performance.

We ran a small demonstration just to be clear on this behavior.

Initially you will enable the AutoClose property in the AdventureWorks2012 database, as script below:

USE master
GO
ALTER DATABASE [AdventureWorks2012] SET AUTO_CLOSE ON WITH NO_WAIT
GO

Then let’s perform some queries in the database AdventureWorks2012, as script below:

USE AdventureWorks2012
GO
SELECT * FROM Person.person
GO
SELECT * FROM Sales.salesorderheader

 

After the execution of queries, it is possible to analyze, through the DMV sys.dm os_buffer_descriptors, the amount of data pages that have been allocated in memory to the database AdventureWorks2012, as illustrated on Figure 1:

Fava_AutoClose_1
Figure 1 – Data pages that have been allocated in memory

With the DMVs sys.dm_exec_cached_plans and sys.dm_exec_sql_text we can check execution plans that were stored in memory for queries executed, as illustrated on Figure 2.

Fava_AutoClose_2
Figure 2 – Execution plans stored in memory

So when all connections to the database AdventureWorks2012 are finished, all the memory areas will be cleaned, as the image below:

--Amount of data pages in memory
SELECT
Count (*) TotalPages,
DB_NAME (database_id) DBname
FROM
sys.dm_os_buffer_descriptors
GROUP BY
Db_name (database_id)
ORDER BY
1 DESC

--Amount of in-memory execution plans
SELECT
COUNT (*) TotalPlanos
FROM
SYS.dm_sys.dm_exec_cached_plans
CROSS APPLY
sys.dm_exec_sql_text sys.dm (plan_handle)
Where
[dbid] = 7 and objtype = ' Adhoc '

Fava_AutoClose_3
Figure 3 – Memory usage after close all connections

With this demonstration is extremely simple to conclude that the AutoClose property is always disabled due to performance problems that can bring to a high performance database.

Categories: DBA Blogs

An Effective Approach to Migrate Dynamic Thrift Data to CQL: Part 2

Fri, 2016-05-20 13:22

Note that this post is Part 2 of a 3-part blog series. If you haven’t read Part 1 of this series, please do so before continuing. Part 1 gives some background knowledge of the problem we’re trying to solve, as well as some fundamental concepts used in the following discussion. The chapter number sequencing also follows that from Part 1. (note: Part 3 can be found here)

4. Static and Dynamic Cassandra Table Definition

In Thrift, Cassandra tables can be defined either statically, or dynamically. When a table is defined dynamically, an application can add new columns on the fly and the column definition for each storage row doesn’t necessary need to be the same. Although a little bit flexible this way, it can be problematic as well because the dynamic column definition information is merely available in the application and invisible to outside world.

In CQL, however, tables are always statically defined, which means that any column and its type information has to be defined in advance before it can be used. Each row in a CQL table has exactly the same columns and type definitions.

In CQL, the dynamism of a Thrift table definition can be achieved through clustering column and/or more advanced column type definition like collections and user defined types (UDTs).

4.1 Static Table Definition

In Thrift, a statically defined table has column_metadata information in the table definition statement, as following:

create column family avg_grade
    with key_validation_class = Int32Type
     and comparator = UTF8Type
     and column_metadata = [
       {column_name: class_id, validation_class: Int32Type},
       {column_name: grade, validation_class: FloatType}
     ]

A strictly equivalent CQL table definition is like this (note the “WITH COMPACT STORAGE” table property):

create table avg_grade (
    key int primary key,
    class_id int,
    grade float
) with compact storage

A statically table defined in either a Thrift utility (cassandra-cli) or a CQL utility (cqlsh) can be accessed in another one with no problem. One difference between the Thrift and CQL definition is that in CQL definition, the row key definition has a name, but Thrift definition doesn’t. In this case, when accessing a table defined in Thrift, CQL uses a default name (“key”) for the row key.

4.2 Dynamic Table Definition

In Thrift, a dynamically defined table does NOT have column_metadata information in the table definition statement. Typically, time-series application adopts dynamic table definition. For example, for a sensor monitoring application, we may use sensor id as the storage row key and for each sensor, we want to record different event values detected by the sensor within a period of time. An example table definition is as following:

create column family sensor_data
   with key_validation_class = Int32Type
    and comparator = TimeUUIDType
    and default_validation_class = DecimalType;

Suppose for this table, 2 events are recorded for sensor 1 and 1 event is recorded for sensor 2. The output from cassandra-cli utility is like below:

-------------------
RowKey: 1
=> (name=1171d7e0-14d2-11e6-858b-5f3b22f4f11c, value=21.5, timestamp=1462680314974000)
=> (name=23371940-14d2-11e6-858b-5f3b22f4f11c, value=32.1, timestamp=1462680344792000)
-------------------
RowKey: 2
=> (name=7537fcf0-14d2-11e6-858b-5f3b22f4f11c, value=11.0, timestamp=1462680482367000)

The above shows output that the columns for each row are dynamically generated by the application and can be different between rows. In CQL, a strictly equivalent table definition and output for the above dynamic Thrift able is as below:

CREATE TABLE sensor_data (
    key int,
    column1 timeuuid,
    value decimal,
    PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
key  | column1                              | value
-----+--------------------------------------+-------------------------
   1 | 1171d7e0-14d2-11e6-858b-5f3b22f4f11c | 0E-1077248000
   1 | 23371940-14d2-11e6-858b-5f3b22f4f11c | -8.58993459E-1077939396
   2 | 7537fcf0-14d2-11e6-858b-5f3b22f4f11c | 0E-1076232192

Since the columns are generated on the fly by the application, CQL doesn’t know the column names in advance. So it uses the default column name “column1” (and also the default key name “key”) in its definition. Functionally, it can be transformed equally to a more descriptive definition as below by using “ALTER TABLE” CQL command to rename the column names (e.g. “key” to “sensor_id”, “column1” to “event_time”), as below:

CREATE TABLE sensor_data (
    sensor_id int,
    event_time timeuuid,
    value decimal,
    PRIMARY KEY (sensor_id, event_time)
) WITH COMPACT STORAGE
4.3 Mixed Table Definition

In thrift, a table can also be in a mixed mode, which means that when a table is created, it has part of its column information being defined in column_metadata, just like a static table. But during runtime, a Thrift based application can add more columns on the go.

The table below, blogs, is such an example.  This table has one column “author” as statically defined. There are also 3 more columns (tags:category, tags:rating, and tags:recommend) for RowKey 1 defined on the fly.

create column family blogs
    with key_validation_class = Int32Type
     and comparator = UTF8Type
     and column_metadata = [
       {column_name: author, validation_class: UTF8Type}
     ]
-------------------
RowKey: 1
=> (name=author, value=Donald, timestamp=1462720696717000)
=> (name=tags:category, value=music, timestamp=1462720526053000)
=> (name=tags:rating, value=A, timestamp=1462720564590000)
=> (name=tags:recommend, value=Yes, timestamp=1462720640817000)
-------------------
RowKey: 2
=> (name=author, value=John, timestamp=1462720690911000)

When examining in CQL, in both table schema and data output, we can only see the columns as statically defined. The dynamically columns are NOT displayed.

CREATE TABLE blogs (
    key int PRIMARY KEY,
    author text
) WITH COMPACT STORAGE
key  | author
-----+--------
1    | Donald
2    | John
Categories: DBA Blogs

Pages