Skip navigation.

Pythian Group

Syndicate content
Love Your Data
Updated: 7 hours 34 min ago

Understanding MySQL Fabric Faulty Server Detection

Thu, 2016-05-26 12:39

Awhile ago I found myself analyzing a MySQL fabric installation to understand why a group member was occasionally being marked as FAULTY even when the server was up and running and no failures were observed.  

         
                         server_uuid     address  status       mode weight
------------------------------------ ----------- ------- ---------- ------
ab0b0653-6121-11c5-55a0-007543445454 mysql1:3306 PRIMARY READ_WRITE    1.0
f34dd331-2432-11f4-a2d3-006754678533 mysql2:3306 FAULTY  READ_ONLY     1.0

 

Upon reviewing mysqlfabric logs, I found the following warnings were being logged from time to time:

[WARNING] 1442221217.920115 - FailureDetector(xc_grp_1) - Server (f34dd331-2432-11f4-a2d3-006754678533) in group (xc_grp_1) is unreachable

 

Since I was not clear under which circumstances a server is marked as FAULTY, I decided to review MySQL Fabric code (Python) to better understand the process.

The module responsible for printing this message is failure_detection.py and more specifically, the _run method belonging to FailureDetector class. This method will loop through every server in a group, and attempt a connection to the MySQL instance running on that node. MySQLServer.Is_alive (mysql/fabric/server.py) method is called for this purpose.

Before reviewing the failure detection process, we first need to know that there are four MySQL fabric parameters that will affect when a server is considered unstable or faulty:

DETECTION_TIMEOUT
DETECTION_INTERVAL
DETECTIONS
FAILOVER_INTERVAL

 

Based on the above variables, the logic followed by FailureDetector._run() to mark a server as FAULTY is the following:

1) Every {DETECTION_INTERVAL/DETECTIONS} seconds, a connection against each server in the group is attempted with a timeout equal to DETECTION_TIMEOUT

2) If DETECTION_TIMEOUT is exceeded, the observed message is logged and a counter incremented

3) When this counter reaches DETECTIONS, the server is marked as “unstable” and if the last time the master changed was greater than FAILOVER_INTERVAL ago, the server is marked as FAULTY

With a better understanding of the logic followed by MySQL fabric to detect faulty nodes, I went to the configuration file to check the existing values for each of the parameters:

DETECTIONS=3
DETECTION_TIMEOUT=1
FAILOVER_INTERVAL=0
DETECTION_INTERVAL=6

From the values above we can notice that each group will be polled every 2 seconds (DETECTION_INTERVAL/DETECTIONS) and that the monitored server should respond within a second for the test to be considered successful.

On high concurrency nodes, or nodes under heavy load, a high polling frequency combined with tight timeouts could cause the servers to be marked as FAULTY just because the connection attempt would not be completed or processed (in the case of high connection rates or saturated network interfaces) before the timeout occurs.

Also, having FAILOVER_INTERVAL reduced to 0, will cause the server to be marked as FAULTY even if a failover had just occurred.

A less aggressive configuration would be more appropriated for heavy loaded environment:

DETECTIONS=3 -> It's Ok
DETECTION_TIMEOUT -> 5 TO 8
FAILOVER_INTERVAL -> 1200
DETECTION_INTERVAL -> 45 TO 60

Conclusion

As with any other database clustering solution that relies on a database connection to test node status, situations where the database server would take longer to respond should also be considered. The polling frequency should be adjusted so the detection window is within an acceptable range, but the rate of monitoring connections generated is also kept to the minimum. Check timeouts should also be adjusted to avoid false positives caused by the server not being able to respond in a timely manner.

 

Categories: DBA Blogs

Tracefile Automation – Simplify Your Troubleshooting Tasks

Thu, 2016-05-26 09:06

Here’s a common Oracle troubleshooting scenario when a SQL statement needs tuning and/or troubleshooting:

  • log on to dev server
  • connect to database (on a different server)
  • run the SQL statement with 10046 tracing enabled
  • ssh to the database server
  • copy the trace file back to work environment
  • process the trace file.

 

All of this takes time. Granted, not a great deal of time, but tuning is an iterative process and so these steps will be performed multiple times. Not only are these steps a productivity killer, but they are repetitive and annoying. No one wants to keep running the same manual command over and over.

This task is ripe for some simple automation.

If both the client and database servers are some form of Unix, automating these tasks is straightforward.

Please note that these scripts require an 11g or later version of the Oracle database. These scripts are dependent on the v$diag_info view to retrieve the tracefile name. While these scripts could be made to work on 10g databases, that is left as an exercise for the reader.

Step by Step

To simplify the process it can be broken down into steps.

 

1. Reconnect

The first step is to create a new connection each time the SQL is executed. Doing so ensures the database session gets a new tracefile, as we want each execution to be isolated.

-- reconnect.sql

connect jkstill/XXXX@oravm

 

2. Get the Tracefile hostname, owner and filename

Oracle provides all the information needed.

In addition the script will set the 10046 event, run the SQL of interest and then disable the 10046 event.

Following is a snippet from the tracefile_identifier_demo.sql script.

 


-- column variables to capture host, owner and tracefile name
col tracehost new_value tracehost noprint
col traceowner new_value traceowner noprint
col tracefile new_value tracefile noprint

set term off head off feed off

-- get oracle owner
select username traceowner from v$process where pname = 'PMON';

-- get host name
select host_name tracehost from v$instance;

-- set tracefile identifier
alter session set tracefile_identifier = 'MYTRACEFILE';

select value tracefile from v$diag_info where name = 'Default Trace File';

set term on head on feed on

-- do your tracing here
alter session set events '10046 trace name context forever, level 12';

-- run your SQL here
@@sql2trace

alter session set events '10046 trace name context off';

 

In this case sql2trace.sql is a simple SELECT from a test table.  All of the scripts used here appear in Github as mentioned at the end of this article.

 

3. Process the Tracefile

Now that the tracefile has been created, it is time to retrieve it.

The following script scp.sql is called from tracefile_identifier_demo.sql.

 


col scp_src new_value scp_src noprint
col scp_target new_value scp_target noprint

set term off feed off verify off echo off

select '&&1' scp_src from dual;
select '&&2' scp_target from dual;

set feed on term on verify on

--disconnect

host scp &&scp_src &&scp_target

Following is an example putting it all together in tracefile_identifier_demo.sql.

 

SQL> @tracefile_identifier_demo
Connected.

1 row selected.


PRODUCT                        VERSION              STATUS
------------------------------ -------------------- --------------------
NLSRTL                         12.1.0.2.0           Production
Oracle Database 12c Enterprise 12.1.0.2.0           64bit Production
 Edition

PL/SQL                         12.1.0.2.0           Production
TNS for Linux:                 12.1.0.2.0           Production

Data Base
------------------------------
P1.JKS.COM

INSTANCE_NAME        HOST_NAME                      CURRDATE
-------------------- ------------------------------ ----------------------
js122a1              ora12c102rac01.jks.com         2016-05-23 16:38:11

STARTUP
--------------------
04/02/2016 11:22:12


Session altered.

Elapsed: 00:00:00.00

OWNER        OBJECT NAME                     OBJECT_ID OBJECT_TYPE             CREATED
------------ ------------------------------ ---------- ----------------------- -------------------
SYS          OLAP_EXPRESSION                     18200 OPERATOR                2016-01-07 21:46:54
SYS          OLAP_EXPRESSION_BOOL                18206 OPERATOR                2016-01-07 21:46:54
SYS          OLAP_EXPRESSION_DATE                18204 OPERATOR                2016-01-07 21:46:54
SYS          OLAP_EXPRESSION_TEXT                18202 OPERATOR                2016-01-07 21:46:54
SYS          XMLSEQUENCE                          6379 OPERATOR                2016-01-07 21:41:25
SYS          XQSEQUENCE                           6380 OPERATOR                2016-01-07 21:41:25
SYS          XQWINDOWSEQUENCE                     6393 OPERATOR                2016-01-07 21:41:25

7 rows selected.

Elapsed: 00:00:00.00

Session altered.

Elapsed: 00:00:00.00

js122a1_ora_1725_MYTRACEFILE.trc                                                                                                                                                            100% 3014     2.9KB/s   00:00

SQL> host ls -l js122a1_ora_1725_MYTRACEFILE.trc
-rw-r----- 1 jkstill dba 3014 May 23 16:38 js122a1_ora_1725_MYTRACEFILE.trc

But Wait, There’s More!

This demo shows you how to automate the retrieval of the trace file. But why stop there?  The processing of the file can be modified as well.

Really, it isn’t even necessary to copy the script over, as the content can be retrieved and piped to your favorite command.  The script mrskew.sql for instance uses ssh to cat the tracefile, and then pipe the contents to the Method R utility, mrskew.  Note: mrskew is a commercial utility, not open source software.

 

-- mrskew.sql

col ssh_target new_value ssh_target noprint
col scp_filename new_value scp_filename noprint

set term off feed off verify off echo off

select '&&1' ssh_target from dual;
select '&&2' scp_filename from dual;

set feed on term on verify on

--disconnect

host ssh &&ssh_target 'cat &&scp_filename' | mrskew

 

Following is another execution of tracefile_identifier_demo.sql, this time piping output to mrskew. Only the final part of the output is shown following

 

...

Elapsed: 00:00:00.01

CALL-NAME                    DURATION       %  CALLS      MEAN       MIN       MAX
---------------------------  --------  ------  -----  --------  --------  --------
PARSE                        0.002000   33.1%      2  0.001000  0.000000  0.002000
db file sequential read      0.001211   20.0%      5  0.000242  0.000056  0.000342
FETCH                        0.001000   16.5%      1  0.001000  0.001000  0.001000
gc cr grant 2-way            0.000999   16.5%      1  0.000999  0.000999  0.000999
SQL*Net message from client  0.000817   13.5%      2  0.000409  0.000254  0.000563
Disk file operations I/O     0.000018    0.3%      2  0.000009  0.000002  0.000016
SQL*Net message to client    0.000002    0.0%      2  0.000001  0.000001  0.000001
CLOSE                        0.000000    0.0%      2  0.000000  0.000000  0.000000
EXEC                         0.000000    0.0%      2  0.000000  0.000000  0.000000
---------------------------  --------  ------  -----  --------  --------  --------
TOTAL (9)                    0.006047  100.0%     19  0.000318  0.000000  0.002000

Now we can see where all the db time was consumed for this SQL statement, and there was no need to copy the trace file to the current working directory. The same can be done for tkprof and other operations.  Please see the plan.sql and tkprof.sql scripts in the Github repository.

 

Wrapping It Up

A little bit of automation goes a long way. Commands that are repetitive, boring and error prone can easily be automated.  Both your productivity and your mood will soar when little steps like these are used to improve your workflow.

All files for this demo can be found at https://github.com/jkstill/tracefile_identifier

Categories: DBA Blogs

Log Buffer #475: A Carnival of the Vanities for DBAs

Wed, 2016-05-25 13:32

This Log Buffer Edition goes through various blogs, and selects some of the top posts from Oracle, SQL Server and MySQL.

Oracle:

MOS Note:136697.1 – New HCHECK.SQL for Oracle Database 12c

ORAchk / EXAchk questions.

Cloud control won’t start!

ASMLib is an optional utility that can be used on Linux systems to manage Oracle ASM devices.

ORA-56841: Master Diskmon cannot connect to a CELL.

Oracle BITAND Function with Examples.

 

SQL Server:

Natively Compiled Stored Procedures: What they are all about

Considerations around validation errors 41305 and 41325 on memory optimized tables with foreign keys

Taking Azure SQL Data Warehouse for a Test-Drive.

Persistent PowerShell: The PowerShell Profile.

SQL Server Always On Endpoint Encryption Algorithm Compatibility Error.

 

MySQL:

Fixing MySQL scalability problems with ProxySQL or thread pool.

Installing a Web, Email & MySQL Database Cluster on Debian 8.4 Jessie with ISPConfig 3.1

Planets9s – Download the new ClusterControl 1.3 for MySQL, MongoDB & PostgreSQL

AWS Aurora Benchmark – Choose the right tool for the job

Where is the MySQL 5.7 root password?

Categories: DBA Blogs

SharePlex Replication Between Two Instances On The Same Host

Wed, 2016-05-25 09:12

Several days ago I was asked a question about SharePlex and should verify behaviour before providing the answer. I had one linux VM with two databases and needed to setup replication between them. One of them (orcl) was a target 12c EE database while the second one was a source 10g SE. Accordingly, I should use two different versions of SharePlex to mimic the questionable behaviour, so I should have two different SharePlex instances talking to each other on the same host machine. It worked pretty well for, as an example, GoldenGate, where you just setup different ports for manager processes. However, in SharePlex all the instances participating in a configuration should use the same port. The SharePlex documentation states:
“Important! The SharePlex port number must be the same one on all machines in the replication configuration so that they can communicate through TCP/IP connections.”

Of course you cannot use the same port on the same network interface for two independent SharePlex processes working from different homes. In addition, SharePlex is not asking you about a hostname either during installation, or when you start it.
In my case I had 2 interfaces in different subnets on my VM, and I could use them for my replication. But how should I tell to the SharePlex to use one or the other? The answer was simple. You should use “SP_SYS_HOST_NAME” parameter as the environment variable for your shell. Here is how I’ve done that.
I added two new hostnames for those interfaces to my /etc/hosts file to be used for my SharePlex instances:

[root@sandbox ~]$ cat /etc/hosts | grep splex
10.177.130.58	splexhost
172.16.128.10	splexstor
[root@sandbox ~]$ 

I unzipped SharePlex 8.6.3 for Oracle 10 and prepared installation. By default the installer will pick up the hostname automatically and use it during installation.
Here is my hostname and it is not what I want to use for my first SharePlex instance:

[oracle@sandbox ~]$ hostname
sandbox.localdomain
[oracle@sandbox ~]$ ping sandbox.localdomain
PING sandbox.localdomain (127.0.0.1) 56(84) bytes of data.
64 bytes from sandbox.localdomain (127.0.0.1): icmp_seq=1 ttl=64 time=0.058 ms
64 bytes from sandbox.localdomain (127.0.0.1): icmp_seq=2 ttl=64 time=0.061 ms
^C
--- sandbox.localdomain ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1468ms
rtt min/avg/max/mdev = 0.058/0.059/0.061/0.007 ms
[oracle@sandbox ~]$

What you need to do is to setup an environment variable SP_SYS_HOST_NAME. I wanted the first SharePlex listening and working on hostname splexhost using default port 2100. I should also define proper Oracle home and Oracle database SID during installation. I used standard Oracle utility oraenv to have Oracle variables setup. Here is how I started installation for the source:

[oracle@sandbox ~]$ export SP_SYS_HOST_NAME=splexhost
[oracle@sandbox ~]$ . oraenv
ORACLE_SID = [test] ?
The /u01/app/oracle/product/10.2.0/sehome_1/bin/orabase binary does not exist
You can set ORACLE_BASE manually if it is required.
[oracle@sandbox ~]$ cd /u01/distr/SharePlex
[oracle@sandbox SharePlex]$ ll
total 96736
-rwxr-xr-x. 1 oracle oinstall 99056391 Jan 11 21:56 SharePlex-8.6.3-b171-oracle100-rh-40-amd64-m64.tpm
[oracle@sandbox SharePlex]$ ./SharePlex-8.6.3-b171-oracle100-rh-40-amd64-m64.tpm
Unpacking ..................................................................
  ..........................................................................

During the installation I provided directories for program and variable files, port number and information about ORACLE_SID and ORACLE_HOME. Everything went smoothly. I finished the installation by running the “ora_setup” utility creating necessary schema and objects in the source database.
For target I used the same strategy setting up Oracle variables by “oraenv” and exporting explicitly SP_SYS_HOST_NAME=splexstor. The installation worked out without any problems and I got two SharePlex instances installed to different directories.
To start a SharePlex home for a certain IP address and interface you need to explicitly setup SP_SYS_HOST_NAME to an appropriate value. Let’s see how it’s been done for source.

[oracle@sandbox ~]$ export SP_SYS_HOST_NAME=splexhost
[oracle@sandbox ~]$ . oraenv
ORACLE_SID = [test] ?
The /u01/app/oracle/product/10.2.0/sehome_1/bin/orabase binary does not exist
You can set ORACLE_BASE manually if it is required.
[oracle@sandbox ~]$ cd /u01/sp10/bin
[oracle@sandbox bin]$ telnet splexhost 2100
Trying 10.177.130.58...
telnet: connect to address 10.177.130.58: Connection refused

[oracle@sandbox bin]$ nohup /u01/sp10/bin/sp_cop -usp10 &
[1] 2023
[oracle@sandbox bin]$ nohup: ignoring input and appending output to `nohup.out'

[oracle@sandbox bin]$ telnet splexhost 2100
Trying 10.177.130.58...
Connected to splexhost.
Escape character is '^]'.
^]
telnet> q
Connection closed.
[oracle@sandbox bin]$

For target you have to adjust your environment variables accordingly and do the same.

[oracle@sandbox ~]$ export SP_SYS_HOST_NAME=splexstor
[oracle@sandbox ~]$ . oraenv
ORACLE_SID = [test] ? orcl
The Oracle base has been set to /u01/app/oracle
[oracle@sandbox ~]$cd /u01/sp12/bin
[oracle@sandbox bin]$ nohup /u01/sp12/bin/sp_cop -usp12 &
[1] 2066
[oracle@sandbox bin]$ nohup: ignoring input and appending output to `nohup.out'

[oracle@sandbox bin]$ 

As result we have two SharePlex instances running on the same host and talking to each other. Now we can create a sample replication. In the database schema “splex” created by “ora_setup” utility we have DEMO_SRC and DEMO_DST tables. SharePlex is using those tables for a demo replication. We can use them too and see how it works in our case. We can either modify a default pre-created sample configuration “ORA_config” or we can create a new one. I’ve created a new config “sample” on my source SharePlex and activated it:

sp_ctrl (splexhost:2100)> list config

File   Name                                         State       Datasource
--------------------------------------------------  ----------  ---------------
ORA_config                                          Inactive    o.SOURCE_SID
Last Modified At: 17-May-16 11:17    Size: 151

sp_ctrl (splexhost:2100)> create config sample

The command opens default editor and you can write your configuration.
Here is what I put to my “sample” config.

datasource:o.test

#source tables      target tables           routing map

splex.demo_src      splex.demo_dest             splexstor@o.orcl

Now we can activate config.

sp_ctrl (splexhost:2100)> activate config sample

sp_ctrl (splexhost:2100)> list config

File   Name                                         State       Datasource
--------------------------------------------------  ----------  ---------------
ORA_config                                          Inactive    o.SOURCE_SID
Last Modified At: 17-May-16 11:17    Size: 151

sample                                              Active      o.test
Last Modified At: 17-May-16 11:30    Size: 134     Internal Name: .conf.1

sp_ctrl (splexhost:2100)>

Now we can see all the processes running :

On the source:

sp_ctrl (splexhost:2100)> lstatus

Detailed Status for splexhost
Process          State                             PID     Running   Since
---------------  ------------------------------  --------  --------------------
Cop              Running                             2023  17-May-16 11:24:39
Capture          Running                             2250  17-May-16 11:30:53
  Data/Host:   o.test
Read             Running                             2279  17-May-16 11:30:53
  Data/Host:   o.test
Export           Running                             2304  17-May-16 11:30:56
  Data/Host:   splexstor
  Queue Name:  splexhost
Cmd & Ctrl       Running                             2581  17-May-16 11:40:39
  Data/Host:   splexhost

On the target:

sp_ctrl (splexstor:2100)> lstatus

Detailed Status for splexstor
Process          State                             PID     Running   Since
---------------  ------------------------------  --------  --------------------
Cop              Running                             2066  17-May-16 11:26:23
Import           Running                             2305  17-May-16 11:30:56
  Data/Host:   splexhost
  Queue Name:  splexhost
Post             Running                             2306  17-May-16 11:30:56
  Data/Host:   o.test-o.orcl
  Queue Name:  splexhost
Cmd & Ctrl       Running                             2533  17-May-16 11:38:18
  Data/Host:   splexstor

Let’s insert a row on the source :

test>  insert into splex.demo_src values ('JIM', '8001 Irvine Center Drive', '949-754-8000');

1 row created.

test> commit;

Commit complete.

test>

And we can see the row was successfully replicated to target:

orcl> select * from splex.demo_dest;

NAME							     ADDRESS							  PHONE#
------------------------------------------------------------ ------------------------------------------------------------ ------------
JIM							     8001 Irvine Center Drive					  949-754-8000

orcl>

As you can see, we were able to use one box to replicate data between two different databases, using two different SharePlex installations. The idea was simple and clear from the start, but I couldn’t find enough information in the installation guide for the SP_SYS_HOST_NAME parameter except documentation about configuring it for cluster installation. The parameter was documented in the reference section of documentation, though in reality it was not.
It could be even better if we could place the parameter inside, and not think about setting variables, but unfortunately setting the parameter in the “paramdb” didn’t work for me. Even having the parameter, you still need to setup your environment variable SP_SYS_HOST_NAME=your_host_name for non default hostname. I hope the article may help somebody save a bit of time.

Categories: DBA Blogs

Understanding The Database Options – AutoClose

Fri, 2016-05-20 13:03

In this blog post we’ll cover how AutoClose works, and why it’s recommended to disable this property.

At the time that the SQL Server service is started, the operating system logs to find the location of the master system database, and requests exclusive lock of the data and log files.

After that, SQL Server performs the reading in view of sys.master_files system and finds all the data files (.mdf, .NDF) and transaction log (.ldf) from all databases stored on the instance, and also requests the exclusive lock these files to initialize each of the databases.

The first situation in which the AutoClose property can influence the performance drop is in the acquisition of this exclusive lock on the data and log files.

If the property is off, this lock is held since service startup until you stop, however if the property is enabled, from the time when there is no more activity in the database, this lock is released and the data and log files are available to any other process.

Initially this situation may seem very interesting, because we could manipulate the data and log files and perform some administrative tasks, such as a backup.

Now imagine that during any backup, an application needs to access the database, what would happen?

The SQL Server would return an error alerting the impossibility of acquiring exclusive lock on files and the database initialization would fail.

Another big performance problem resulting from use of the AutoClose property is related to the use of the Memory Cache and Buffer areas, Plan Cache.

Whenever a query is performed, generates an execution plan that is kept in an area of memory called the Plan Cache. This area of memory is to store the generated execution plans so that they can be reused if the query is executed again.

After the execution plan generation, all search operators are executed and the data pages selected by the query are stored in an area of memory called Cache Buffer. This area of memory is to store the pages of data so that you don’t have to perform new accesses to the disk subsystem and thus optimize the next i/o requests.

When the AutoClose property is enabled and there are no more connections to the database, all the data pages and execution plans that are in memory will be deleted, thus creating a big drop of performance.

We ran a small demonstration just to be clear on this behavior.

Initially you will enable the AutoClose property in the AdventureWorks2012 database, as script below:

USE master
GO
ALTER DATABASE [AdventureWorks2012] SET AUTO_CLOSE ON WITH NO_WAIT
GO

Then let’s perform some queries in the database AdventureWorks2012, as script below:

USE AdventureWorks2012
GO
SELECT * FROM Person.person
GO
SELECT * FROM Sales.salesorderheader

 

After the execution of queries, it is possible to analyze, through the DMV sys.dm os_buffer_descriptors, the amount of data pages that have been allocated in memory to the database AdventureWorks2012, as illustrated on Figure 1:

Fava_AutoClose_1
Figure 1 – Data pages that have been allocated in memory

With the DMVs sys.dm_exec_cached_plans and sys.dm_exec_sql_text we can check execution plans that were stored in memory for queries executed, as illustrated on Figure 2.

Fava_AutoClose_2
Figure 2 – Execution plans stored in memory

So when all connections to the database AdventureWorks2012 are finished, all the memory areas will be cleaned, as the image below:

--Amount of data pages in memory
SELECT
Count (*) TotalPages,
DB_NAME (database_id) DBname
FROM
sys.dm_os_buffer_descriptors
GROUP BY
Db_name (database_id)
ORDER BY
1 DESC

--Amount of in-memory execution plans
SELECT
COUNT (*) TotalPlanos
FROM
SYS.dm_sys.dm_exec_cached_plans
CROSS APPLY
sys.dm_exec_sql_text sys.dm (plan_handle)
Where
[dbid] = 7 and objtype = ' Adhoc '

Fava_AutoClose_3
Figure 3 – Memory usage after close all connections

With this demonstration is extremely simple to conclude that the AutoClose property is always disabled due to performance problems that can bring to a high performance database.

Categories: DBA Blogs

An Effective Approach to Migrate Dynamic Thrift Data to CQL: Part 2

Fri, 2016-05-20 12:22

Note that this post is Part 2 of a 3-part blog series. If you haven’t read Part 1 of this series, please do so before continuing. Part 1 gives some background knowledge of the problem we’re trying to solve, as well as some fundamental concepts used in the following discussion. The chapter number sequencing also follows that from Part 1. (note: Part 3 can be found here)

4. Static and Dynamic Cassandra Table Definition

In Thrift, Cassandra tables can be defined either statically, or dynamically. When a table is defined dynamically, an application can add new columns on the fly and the column definition for each storage row doesn’t necessary need to be the same. Although a little bit flexible this way, it can be problematic as well because the dynamic column definition information is merely available in the application and invisible to outside world.

In CQL, however, tables are always statically defined, which means that any column and its type information has to be defined in advance before it can be used. Each row in a CQL table has exactly the same columns and type definitions.

In CQL, the dynamism of a Thrift table definition can be achieved through clustering column and/or more advanced column type definition like collections and user defined types (UDTs).

4.1 Static Table Definition

In Thrift, a statically defined table has column_metadata information in the table definition statement, as following:

create column family avg_grade
    with key_validation_class = Int32Type
     and comparator = UTF8Type
     and column_metadata = [
       {column_name: class_id, validation_class: Int32Type},
       {column_name: grade, validation_class: FloatType}
     ]

A strictly equivalent CQL table definition is like this (note the “WITH COMPACT STORAGE” table property):

create table avg_grade (
    key int primary key,
    class_id int,
    grade float
) with compact storage

A statically table defined in either a Thrift utility (cassandra-cli) or a CQL utility (cqlsh) can be accessed in another one with no problem. One difference between the Thrift and CQL definition is that in CQL definition, the row key definition has a name, but Thrift definition doesn’t. In this case, when accessing a table defined in Thrift, CQL uses a default name (“key”) for the row key.

4.2 Dynamic Table Definition

In Thrift, a dynamically defined table does NOT have column_metadata information in the table definition statement. Typically, time-series application adopts dynamic table definition. For example, for a sensor monitoring application, we may use sensor id as the storage row key and for each sensor, we want to record different event values detected by the sensor within a period of time. An example table definition is as following:

create column family sensor_data
   with key_validation_class = Int32Type
    and comparator = TimeUUIDType
    and default_validation_class = DecimalType;

Suppose for this table, 2 events are recorded for sensor 1 and 1 event is recorded for sensor 2. The output from cassandra-cli utility is like below:

-------------------
RowKey: 1
=> (name=1171d7e0-14d2-11e6-858b-5f3b22f4f11c, value=21.5, timestamp=1462680314974000)
=> (name=23371940-14d2-11e6-858b-5f3b22f4f11c, value=32.1, timestamp=1462680344792000)
-------------------
RowKey: 2
=> (name=7537fcf0-14d2-11e6-858b-5f3b22f4f11c, value=11.0, timestamp=1462680482367000)

The above shows output that the columns for each row are dynamically generated by the application and can be different between rows. In CQL, a strictly equivalent table definition and output for the above dynamic Thrift able is as below:

CREATE TABLE sensor_data (
    key int,
    column1 timeuuid,
    value decimal,
    PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
key  | column1                              | value
-----+--------------------------------------+-------------------------
   1 | 1171d7e0-14d2-11e6-858b-5f3b22f4f11c | 0E-1077248000
   1 | 23371940-14d2-11e6-858b-5f3b22f4f11c | -8.58993459E-1077939396
   2 | 7537fcf0-14d2-11e6-858b-5f3b22f4f11c | 0E-1076232192

Since the columns are generated on the fly by the application, CQL doesn’t know the column names in advance. So it uses the default column name “column1” (and also the default key name “key”) in its definition. Functionally, it can be transformed equally to a more descriptive definition as below by using “ALTER TABLE” CQL command to rename the column names (e.g. “key” to “sensor_id”, “column1” to “event_time”), as below:

CREATE TABLE sensor_data (
    sensor_id int,
    event_time timeuuid,
    value decimal,
    PRIMARY KEY (sensor_id, event_time)
) WITH COMPACT STORAGE
4.3 Mixed Table Definition

In thrift, a table can also be in a mixed mode, which means that when a table is created, it has part of its column information being defined in column_metadata, just like a static table. But during runtime, a Thrift based application can add more columns on the go.

The table below, blogs, is such an example.  This table has one column “author” as statically defined. There are also 3 more columns (tags:category, tags:rating, and tags:recommend) for RowKey 1 defined on the fly.

create column family blogs
    with key_validation_class = Int32Type
     and comparator = UTF8Type
     and column_metadata = [
       {column_name: author, validation_class: UTF8Type}
     ]
-------------------
RowKey: 1
=> (name=author, value=Donald, timestamp=1462720696717000)
=> (name=tags:category, value=music, timestamp=1462720526053000)
=> (name=tags:rating, value=A, timestamp=1462720564590000)
=> (name=tags:recommend, value=Yes, timestamp=1462720640817000)
-------------------
RowKey: 2
=> (name=author, value=John, timestamp=1462720690911000)

When examining in CQL, in both table schema and data output, we can only see the columns as statically defined. The dynamically columns are NOT displayed.

CREATE TABLE blogs (
    key int PRIMARY KEY,
    author text
) WITH COMPACT STORAGE
key  | author
-----+--------
1    | Donald
2    | John
Categories: DBA Blogs

A Test Drive With Azure

Thu, 2016-05-19 11:55

While recently reading the latest Microsoft Azure updates, I found that we could try some VM’s from Azure Marketplace for free, even without any Azure subscription account. The feature is called “Azure test drive”. I found it interesting and decided to give a try, and see what it could offer. Here I am going to share my “driving” experience with the service.

 

I opened a browser where I was not logged to Azure, and went to the Test drives on Microsoft Azure webpage. There you can see a list of different VMs including DataStax, NetApp DataVault, and others. There are dozens of services listed there. I picked up a “SoftNAS Cloud NAS”, clicked and was redirected to the product page. On this page I had two options to create a VM. The main one was the usual “Create Virtual Machine” button, and the second one was located a bit lower and stated “Or, take a free test drive first”. I correctly assumed that was exactly what I needed (not a rocket science), and clicked it. I was was offered either to login or create a new “Orbitera” account for Azure Marketplace test drives.

 

The new service did not required an Azure account, but still asked to create one on another service. I didn’t have that account yet, and opted to create a new one. On the account creation page I was asked standard questions, filled out forms, and at the bottom confirmed that I was not a robot and pushed “sign up”. I got an email with a validation link just thirty seconds after that.
After validating my account I was able to sign up, and got to the test drive the page. I then received a button saying “Launch Test Drive”, an introduction video and couple of guides in pdf to download. One of the guides was a SoftNAS Cloud NAS Demo which helped get most out of the test drive, and another one was about the “SoftNAS Cloud NAS” itself. I pushed the button and got a counter showing how many many minutes were remaining to complete the deployment of my test VM.

 

It took less than 7 minutes to get link to the my Softnas interface and credentials. By the way, you don’t need to sit and wait when it is crating, you are going to get an email when the VM is ready to use and when you time is up. The test drive VM lives only one hour, and you have a counter on the test drive page with time left for you exercise. I got an hour to test SoftNAS. I later tried another test drive for SUSE HPC and got only 30 minutes available for it. I think the test time will depend on the service you want to try out, and how much time you may need to go through all the options.
Interestingly, I got a warning connecting to my newly created SoftNAS. It looked like certificate for the web interface was not good. My Firefox browser complained “The owner of softnas33027publicipdns.eastus.cloudapp.azure.com has configured their website improperly.” Of course you can wave that warning and add the certificate to exceptions but, I think,for SoftNAS it will be better to fix it.

 

So, I played with the test VM, shared couple of volumes, and was able to mount them on another VM and on my Mac. It worked pretty well and allowed  me to make myself friendly with the interface and options. When the my hour had finished, the SoftNAS went down and all my mounts became unavailable. If you need more time you can just fire another trial and use one more hour. Yes, it would be a brand new VM, and no configuration would be saved if you had one, but, it will allow you to explore features you weren’t able to try before time was gone.

 

I liked the service. It provides opportunity to test and play with new products in Marketplace and decide either it suits you or not. And you can just show to somebody how to work with SoftNAS, how to share and mount a new NFS or do other things. I hope the test drive will be growing and we see more and more new brands among available products.

Categories: DBA Blogs

Log Buffer #474: A Carnival of the Vanities for DBAs

Wed, 2016-05-18 14:46

This Log Buffer Edition covers Oracle, SQL Server and MySQL blogs from across the planet.

Oracle:

You might be thinking “Nothing is more frustrating that encountering a string column that is full of dates”. But there is something worse than that.

Unique constraint WWV_FLOW_WORKSHEET_RPTS_UK violated.

Understanding query slowness after platform change

Database Migration and Integration using AWS DMS

Oracle BPM 12c: Browsing the SOAINFRA

SQL Server:

Adding PK Exceptions to SQLCop Tests

Is a RID Lookup faster than a Key Lookup?

Performance Surprises and Assumptions : DATEADD()

Generate INSERT scripts from SQL Server queries and stored procedure output

PowerShell Desired State Configuration: Pull Mode

Continuous Delivery from the 19th Century to TODAY

MySQL:

ProxySQL versus MaxScale for OLTP RO workloads

Properly removing users in MySQL

MySQL/MariaDB cursors and temp tables

Quick start MySQL testing using Docker (on a Mac!)

Query Rewrite plugin can harm performance

Categories: DBA Blogs

Join Pythian at the 2016 Cloud Computing Expo

Wed, 2016-05-18 09:46

 

It won’t be too long before we’re touching down in New York City to attend another Cloud Computing Expo, June 7-9 at the Javits Centre.

If you plan on attending the show, be sure to stop by Pythian’s booth (#424) to chat with our technical experts about your cloud strategy development and to find how Pythian is currently helping companies maximize performance and reduce costs when moving to the cloud.

Our experts will also be presenting on a wide range of insightful cloud topics, which you won’t want to miss.

 

Our Speakers:

Chris Presley (Solutions Architect) will be speaking on the factors that you should consider when moving a database onto a public cloud.

Alex Lovell-Troy (Director, DevOps) will be guiding listeners from Configuration Management to Cloud Orchestration.

Warner Chaves (Principal Consultant) will be providing a tour of Data Platforms as a Service.

 

Interested in meeting up at the show? Drop us a line, we’d love to connect.

We’re looking forward to seeing you there!

Categories: DBA Blogs

An Effective Approach to Migrate Dynamic Thrift Data to CQL, Part 1

Tue, 2016-05-17 15:00
1. Introduction

Recently Pythian’s Cassandra team worked on one customer’s request to copy data of several Cassandra tables between two Cassandra clusters (not Cassandra data centers). The original approach we used to copy data is through Cassandra COPY TO/FROM commands because the size of the data to be copied is not large (several hundred mega-bytes per table). The execution of the commands was successful, but for one of the tables that we did data copy, the application complained about missing data. We examined the data for that table using Cassandra cqlsh utility and found no discrepancy. After a further discussion with the customer, we realized that the source tables were created and manipulated by a Thrift based application and the application can dynamically create different columns for different rows, although each row does share a common set of statically defined columns. It is the data in these dynamic columns that are missing during the data copy process.

We addressed the issue, but in the end we felt that we should write something about Cassandra data migration from Thrift to CQL because this is quite a common problem faced by many existing Cassandra users right now, considering that Cassandra is gradually phasing out Thrift and replacing it with CQL.

This post is the first part of a three-post series. In this post (Part 1), we’re going to dive into the details of Cassandra storage engine (pre-3.0 version) and explore some fundamental concepts that are key to better understanding the discussion in the following posts. In the next post (Part 2), we’re going to explore how Cassandra tables can be defined statically, dynamically, or in a mixed mode in Thrift and what the corresponding table definition in CQL are. In the last post (Part 3), we’ll present an effective approach to migrate dynamically generated data in Thrift into a statically defined CQL table, without suffering any data loss.

1.1. Cassandra Transition from Thrift to CQL API

Apache Thrift is a software framework developed at Facebook for “scalable cross-language services development”. In early days of Cassandra, Thrift base API was the only method to develop Cassandra client applications. But with the maturity of CQL (Cassandra query language), Cassandra is gradually moving away from Thrift API to CQL API. Along with this trend,

  • Thrift based client drivers are not officially supported.
  • Thrift API will not get new Cassandra features; it exists simply for backward compatibility purpose.
  • CQL based “cqlsh” utility is replacing thrift based “cassandra-cli” utility as the main command-line tool to interact with Cassandra.

Please note that the latest version of CQL is 3.x (3.4 as of May, 2016). CQL2 is deprecated and removed for Cassandra 2.2 and later.  In the discussion below, we will simply use CQL to refer to CQL 3.

2. Overview of Internal Cassandra Data Storage Structure

Please note that since Cassandra 3.0, the underlying storage engine for Cassandra has gone through a lot of changes. The discussion in this post series is for pre-3.0 Cassandra (v 2.2 and before).

At very high level, a Cassandra table (or column family by old term) can be seen as a map of sorted map in the following format *:

     Ma<RowKey, SortedMap<ColumnKey, ColumnValue>>

A graphical representation looks like below:

Cassanra_thrift-to-cql-scrnsht1

Please note that although there are more complex structures such as Super Column and Composite Column, the basic idea remains the same and the representation above is good enough for us to describe the problem in this document.

Internally, such a storage structure is where both Thrift and CQL APIs are based. The difference is that Thrift API manipulates the storage structure directly, but CQL API does so through an abstraction layer and expresses the data to user in a tabular form similar to what SQL does for a relational database.

In order to make this clearer, let’s use an example to compare the outputs between cassandra-cli and cqlsh command utilities, which are based on Thrift and CQL protocols separately. The table schema is defined as below (in CQL format):

CREATE TABLE song_tags (
   id uuid,
   tag_name text,
   PRIMARY KEY (id, tag_name)
) WITH COMPACT STORAGE

This simple table is used to maintain the song tags. After inserting several rows in this table, we examined the table content using both Thirft based “cassandra-cli” utility and CQL based “cqlsh” utility. The result is as below:

Cassanra_thrift-to-cql-scrnsht2

From the example above, it can be easily noticed that between Thrift and CQL, the terms “row” and “column” don’t share the same meaning and this causes some confusion when people do Thrift to CQL migration. For example,

  • In Thrift, one row means one data partition that is determined by the partition key definition. Each row has multiple columns, or more precisely “cells”. Each cell contains the time-stamp of when it is created. The name/key of the cell/column is not necessarily the name as defined in the table definition, especially when there are clustering column(s) defined (just as in the example above)
  • In CQL, one row could be one partition, or could be one part of a partition. It really depends on how the partition key and cluster key are designed

Due to such differences, I’m going to use the following terms in this document for clarification:

  • “CQL Row”, or simply “Row”, refers to a row in CQL context
  • “Storage Row”, refers to a row in Thrift context
  • “Column”, refers to a column in both CQL or Thrift contexts
  • “Cell” particularly refers to a column in Thrift context
  • “Table” refers to a table in CQL context, or a column family in Thrift context
  • “Partition” refers to a data partition determined by the hash key. In Thrift context, “Partition” and “Storage Row” has the same meaning. In CQL context, one “Partition” includes multiple “CQL Rows”.
3. Compact Storage

In CQL, a table property, called for COMPACT STRORAGE, is created for backward compatibility.  As the name suggests, tables created with this directive consumes less storage space compared with those created without this directive.

To make this clear, we also uses an example to explain. Basically, two tables are created to keep track of the average student grades for classes in a school. The table definition for them are exactly the same except that one (avg_grade) is defined without COMPACT STORAGE property, but another (avg_grade2) does. The same records of data are also inserted into both tables, as below:

CREATE TABLE avg_grade (
    student_id int,
    class_id int,
    grade double,
    PRIMARY KEY (student_id, class_id)
)

CREATE TABLE avg_grade2 (
    student_id int,
    class_id int,
    grade double,
    PRIMARY KEY (student_id, class_id)
) WITH COMPACT STORAGE

insert into avg_grade(student_id, class_id, grade) values (1, 1, 75.2);
insert into avg_grade(student_id, class_id, grade) values (2, 1, 81.3);

insert into avg_grade2(student_id, class_id, grade) values (1, 1, 75.2);
insert into avg_grade2(student_id, class_id, grade) values (2, 1, 81.3);

The statements are executed in cqlsh utility. The data is then flushed from memory to disk with “nodetool flush” command. After that, sstable2json utility is used to examine the contents of the SSTable data files for both tables. Below is the output:

Cassanra_thrift-to-cql-scrnsht3

From the output above, we can see that tables NOT in compact storage mode has more cells within each Storage Row (e.g. the extra cell with empty value for clustering column “class”) and each Cell stores more metadata (e.g. the name of the “grade” column is added in each of the row). So just from storage perspective, having a table defined in compact storage could save quite some storage space, especially when we’re dealing with many columns and/or with complex column types like collections.

In Thrift, tables are always stored in compact storage mode. In CQL, tables by default are stored in non-compact storage mode, unless the tables are defined with “COMPACT STRORAGE” property. As a result of this, CQL tables without “COMPACT STORAGE” property are not visible in Thrift based utilities like cassandra-cli.

When a CQL table is defined with COMPACT STORAGE property, it gets the benefit of saving some disk space. However, there are also some caveats that need to pay attention to. For example:

  • It cannot have new columns added or existing columns dropped.
  • If it has a compound primary key (multiple columns), then at most one column can be defined as not part of the key.
  • It cannot have a column defined with non-frozen collection types.
    • Note that for people who are not familiar with the concept of “frozen” vs. “non-frozen” collection, a frozen collection serializes all sub-components of the collection into one single value when stored, which is treated as a blob and the whole value must be updated once. On the contrary, A non-frozen collection allows updates on individual fields.
Categories: DBA Blogs

Characterset Conversion Conundrums

Mon, 2016-05-16 10:30

Every now and then a database needs to be migrated from one characterset to another. The driver behind this is ususally to either synchronize all databases to a single standard or to support new characters or symbols like €,

Categories: DBA Blogs

Replication between Tungsten clusters

Mon, 2016-05-16 09:52
Replication between Tungsten clusters

The process I will describe in this post will allow you to configure replication between Tungsten clusters. The most common use case I have seen for this is a dedicated ETL cluster.

The setup will look like this:

 

Replication between Tungsten clusters

So we will have a composite datasource (compositeprod) composed of east and west clusters, and two ETL clusters, one on each side.

The described setup allows failover or switchover within a single datacenter (e.g. db1.east -> db2.east) or to the Disaster Recovery Site, (e.g. db1.east -> db1.west) for the core cluster.

At the time of this article’s publication it is not possible to replicate between two composite clusters, so each ETL cluster needs to be standalone. This feature is expected for Tungsten 5.0 release.

We will install a standalone replicator on the ETL hosts to bring data in from the core cluster. This assumes you have already completed the required prerequisites.

Configuration

Using the .ini files installation method, the config file for the core cluster hosts would be:

[defaults]
application-password=****
application-port=9999
application-user=app
home-directory=/opt/continuent
mysql-connectorj-path=/opt/continuent/connector-java/mysql-connector-java-5.1.24/mysql-connector-java-5.1.24-bin.jar
replication-password=****
replication-port=3306
replication-user=tungsten
start-and-report=true

[east]
master=db1.east
members=db1.east,db2.east,db3.east

[west]
master=db1.west
members=db1.west,db2.west,db3.west

[compositeprod]
dataservice-composite-datasources=east,west

The config file for etl.east hosts would be:

[defaults]
application-password=****
application-port=9999
application-user=etl
home-directory=/opt/continuent
mysql-connectorj-path=/opt/continuent/connector-java/mysql-connector-java-5.1.24/mysql-connector-java-5.1.24-bin.jar
replication-password=****
replication-port=3306
replication-user=tungsten
start-and-report=true

[defaults.replicator]
home-directory=/opt/continuent/replicator
executable-prefix=tr
rmi-port=10002
thl-port=2113
property=replicator.log.slave.updates=false

[etl_east]
topology=clustered
master=etl1.east
members=etl1.east,etl2.east,etl3.east

[compositeprod]
topology=cluster-alias
master=db1.east
members=db1.east,db2.east,db3.east

[core_etl_east]
topology=cluster-slave
master-dataservice=compositeprod
slave-dataservice=etl_east

For etl.west hosts:

[defaults]
application-password=****
application-port=9999
application-user=etl
home-directory=/opt/continuent
mysql-connectorj-path=/opt/continuent/connector-java/mysql-connector-java-5.1.24/mysql-connector-java-5.1.24-bin.jar
replication-password=****
replication-port=3306
replication-user=tungsten
start-and-report=true

[defaults.replicator]
home-directory=/opt/continuent/replicator
executable-prefix=tr
rmi-port=10002
thl-port=2113
property=replicator.log.slave.updates=false

[etl_west]
topology=clustered
master=etl1.west
members=etl1.west,etl2.west,etl3.west

[compositeprod]
topology=cluster-alias
master=db1.west
members=db1.west,db2.west,db3.west

[core_etl_west]
topology=cluster-slave
master-dataservice=compositeprod
slave-dataservice=etl_west
Installation

Now we need to install the cluster by running the Tungsten installer on each core and ETL host:

./install/continuent-tungsten-4.0.2-6072082/tools/tpm install

Also on ETL hosts only, we need to install the standalone replicator:

./install/tungsten-replicator-4.0.0-18/tools/tpm install

Each ETL host will end up with 2 different replicator processes:

  1. The replicator that brings data in from the core cluster
  2. The replicator that reads data from the ETL cluster master to replicate to the ETL slaves

With the ‘executable-prefix‘ option, it is easy to reference each one:

  1. tr_trepctl status will show info from the core -> ETL replicator
  2. trepctl status will show info from the ETL cluster replicator
Tips & tricks

One thing to keep in mind is that for aliases to actually work, we need to add the folllowing line to .bashrc:

. "/opt/continuent/replicator/share/env.sh"

as there is a bug where the installer only adds the aliases from the cluster installation, and ignores the ones from the standalone replicator installation.

The [defaults.replicator] tag is helpful to pass options to the standalone replicator (it is ignored by the cluster installation), as the same tungsten.ini file will be read by both the cluster and standalone replicator products.

In this case I am using it to specify different THL and RMI ports, so the standalone replicator does not collide with the cluster replicator.

We also need to tell the core -> ETL replicator not to log its updates to the binlog, otherwise the changes will be picked up by the ETL cluster replicator as well, and end up being applied two times on the ETL slaves, resulting in consistency errors.

Do you have any questions about replication between Tungsten clusters? Let me know in the comments!

 

Categories: DBA Blogs

Analyze Index Validate Structure – The Dark Side

Fri, 2016-05-13 08:38

Recently a co-worker wanted to discuss a problem he had encountered after upgrading a database.

The upgrade plan included steps to verify object integrity; this was being done with analyze table <tablename> validate structure cascade. All was fine until one particular table was being analyzed.  Suddenly it seemed the process entered a hung state.

The job was killed and separate commands were created to analyze each object individually.  That went well up until one of the last indexes was reached.

Me: How long has it been running?

Coworker: Three days.

Yes, you read that correctly, it had been running for three days.

My friend ran a 10046 trace to see what the process was doing; nearly all the work was ‘db file sequential read’ on the table.

At this time I suspected it was related to the clustering_factor for the index in question.  The analyze process for an index verifies each row in the index.  If the cluster is well ordered then the number of blocks read from the table will be similar to the number of blocks making up the table.

If however the table is not well ordered relative to the columns in the index the number of blocks read from the table can be many times the total number of blocks that are actually in the table.

Consider for a moment that we have rows with an  ID of 1,2,3,4 and 5.  Let’s assume that our index is created on the ID column.

If these rows are stored in order in the table, it is very likely these rows will all be in the same block, and that a single block read will fetch all of these rows.

If however the rows are stored in some random order, it may be that a separate block read is required for each lookup.

IDBlock Number122275316425104

In this case 5 separate blocks must be read to retrieve these rows.

In the course of walking the index, some  minutes later these rows must also be read:

IDBlock Number104857622104857775104857816104857921048580104

The blocks where these rows reside are the same blocks as the earlier example. The problem of course is that quite likely the blocks have been removed from cache by this time, and must be read again from disk.

Now imagine performing this for millions of rows. With a poor clustering factor the analyze command on an index could take quite some time to complete.

This seemed worthy of a test so we could get a better idea of just how bad this issue might be.

The test was run with 1E7 rows. The SQL shown below creates 1E7 rows, but you can simply change the value of level_2 to 1e3 to reduce the total rows to 1E6, or even smaller if you like.

 


-- keep this table small and the rows easily identifiable
-- or not...

-- 1e3 x 1e4 = 1e7
def level_1=1e3
def level_2=1e4

drop table validate_me purge;

create table validate_me
tablespace
   alloctest_a -- EXTENT MANAGEMENT LOCAL AUTOALLOCATE SEGMENT SPACE MANAGEMENT AUTO
   --alloctest_m -- EXTENT MANAGEMENT LOCAL AUTOALLOCATE SEGMENT SPACE MANAGEMENT MANUAL
   --alloctest_u -- EXTENT MANAGEMENT LOCAL UNIFORM SIZE 65536 SEGMENT SPACE MANAGEMENT AUTO
pctfree 0
as
select
   -- for a good clustering factor
   --id
   --
   -- for a bad clustering factor
   floor(dbms_random.value(1,1e6)) id
   , substr('ABCDEFGHIJKLMNOPQRSTUVWXYZ',mod(id,10),15) search_data
   , to_char(id,'99') || '-' || rpad('x',100,'x') padded_data
from (
   select rownum id
   from
   (
      select null
      from dual
      connect by level <= &level_1
   ) a,
   (
      select null
      from dual
      connect by level <= &level_2
   ) b
)
/

create index validate_me_idx1 on validate_me(id,search_data);

exec dbms_stats.gather_table_stats(user,'VALIDATE_ME',method_opt => 'for all columns size 1')


 

Let’s see just what the clustering factor is for this index. The following script cluster-factor.sql will get this information for us.

 


col v_tablename new_value v_tablename noprint
col v_owner new_value v_owner noprint

col table_name format a20 head 'TABLE NAME'
col index_name format a20 head 'INDEX NAME'
col index_rows format 9,999,999,999 head 'INDEX ROWS'
col table_rows format 9,999,999,999 head 'TABLE ROWS'
col clustering_factor format 9,999,999,999 head 'CLUSTERING|FACTOR'
col leaf_blocks format 99,999,999 head 'LEAF|BLOCKS'
col table_blocks format 99,999,999 head 'TABLE|BLOCKS'


prompt
prompt Owner:
prompt

set term off feed off verify off
select upper('&1') v_owner from dual;
set term on feed on

prompt
prompt Table:
prompt

set term off feed off verify off
select upper('&2') v_tablename from dual;
set term on feed on


select
   t.table_name
   , t.num_rows table_rows
   , t.blocks table_blocks
   , i.index_name
   , t.num_rows index_rows
   , i.leaf_blocks
   , clustering_factor
from all_tables t
   join all_indexes i
      on i.table_owner = t.owner
      and i.table_name = t.table_name
where t.owner = '&v_owner'
   and t.table_name = '&v_tablename'

/

undef 1 2

 

Output from the script:

 

SQL> @cluster-factor jkstill validate_me

Owner:

Table:


                                          TABLE                                            LEAF     CLUSTERING
TABLE NAME               TABLE ROWS      BLOCKS INDEX NAME               INDEX ROWS      BLOCKS         FACTOR
-------------------- -------------- ----------- -------------------- -------------- ----------- --------------
VALIDATE_ME              10,000,000     164,587 VALIDATE_ME_IDX1         10,000,000      45,346     10,160,089

1 row selected.

Elapsed: 00:00:00.05

 

On my test system creating the table for 1E7 rows required about 2 minutes and 15 seconds, while creating the index took 28 seconds.

You may be surprised at just how long it takes to analyze that index.

 

SQL> analyze index jkstill.validate_me_idx1 validate structure online;

Index analyzed.

Elapsed: 00:46:06.49

 

Prior to executing this command a 10046 trace had been enabled, so there is a record of how Oracle spent its time on this command.

 

If you are wondering how much of the 46 minutes was consumed by the tracing and writing the trace file, it was about 6 minutes:

 

$>  grep "WAIT #48004569509552: nam='db file sequential read'"; oravm1_ora_2377_VALIDATE.trc  | awk '{ x=x+$8 } END { printf ("%3.2f\n",x/1000000/60) }'
  40.74

A Well Ordered Table

Now lets see how index analyze validate structure performs when the table is well ordered. The table uses the DDL as seen in the previous example, but rather than use dbms_random to generate the ID column, the table is created with the rows loaded in ID order.  This is done by uncommenting id in the DDL and commenting out the call to dbms_random.

 

SQL> analyze index jkstill.validate_me_idx1 validate structure online;

Index analyzed.

Elapsed: 00:01:40.53

That was a lot faster than previous.  1 minute and 40 seconds whereas previously the same command ran for 40 minutes.

 

Using some simple command line tools we can see how many times each block was visited.

 

First find the cursors and verify this cursor used only once in the session

$> grep -B1 '^analyze index' oravm1_ora_19987_VALIDATE.trc
PARSING IN CURSOR #47305432305952 len=64 dep=0 uid=90 oct=63 lid=90 tim=1462922977143796 hv=2128321230 ad='b69cfe10' sqlid='318avy9zdr6qf'
analyze index jkstill.validate_me_idx1 validate structure online


$> grep -nA1 'PARSING IN CURSOR #47305432305952' oravm1_ora_19987_VALIDATE.trc
63:PARSING IN CURSOR #47305432305952 len=64 dep=0 uid=90 oct=63 lid=90 tim=1462922977143796 hv=2128321230 ad='b69cfe10' sqlid='318avy9zdr6qf'
64-analyze index jkstill.validate_me_idx1 validate structure online
--
276105:PARSING IN CURSOR #47305432305952 len=55 dep=0 uid=90 oct=42 lid=90 tim=1462923077576482 hv=2217940283 ad='0' sqlid='06nvwn223659v'
276106-alter session set events '10046 trace name context off'

As this cursor was reused, we need to limit the lines we considered from the trace file.

 

One wait line appears like this:

WAIT #47305432305952: nam=’db file sequential read’ ela= 317 file#=8 block#=632358 blocks=1 obj#=335456 tim=1462923043050233

As it is already known the entire table resides in one file, it is not necessary to check the file.

From the following command it is clear that no block was read more than once during the analyze index validate structure when the table was well ordered in relation to the index.

 

$> tail -n +64 oravm1_ora_19987_VALIDATE.trc| head -n +$((276105-64)) | grep "WAIT #47305432305952: nam='db file sequential read'" | awk '{ print $10 }' | awk -F= '{ print $2 }' | sort | uniq -c | sort -n | tail
      1 742993
      1 742994
      1 742995
      1 742996
      1 742997
      1 742998
      1 742999
      1 743000
      1 743001
      1 743002

 

That command line may look a little daunting, but it is really not difficult when each bit is considered separately.

From the grep command that searched for cursors we know that the cursor we are interested in first appeared at line 64 in the trace file.

tail -n +64 oravm1_ora_19987_VALIDATE.trc

The cursor was reused at line 276105, so tell the tail command to output only the lines up to that point in the file.

head -n +$((276105-64))

The interesting information in this case is for ‘db file sequential read’ on the cursor of interest.

grep “WAIT #47305432305952: nam=’db file sequential read'”

Next awk is used to output the block=N portion of each line.

awk ‘{ print $10 }’

awk is again used, but this time to split the block=N output at the ‘=’ operator, and output only the block number.

awk -F= ‘{ print $2 }’

The cut command could have been used here as well. eg. cut -d= -f2

Sort the block numbers

sort

Use the uniq command to get a count of how many times each value appears in the output.

uniq -c

Use sort -n to sort the output from uniq.  If there are any counts greater than 1, they will appear at the end of the output.

sort -n

And pipe the output through tail. We only care if any block was read more than once.

tail

Now for the same procedure on the trace file generated from the poorly ordered table.

 


$> grep -B1 '^analyze index' oravm1_ora_2377_VALIDATE.trc
PARSING IN CURSOR #48004569509552 len=64 dep=0 uid=90 oct=63 lid=90 tim=1462547433220254 hv=2128321230 ad='aad620f0' sqlid='318avy9zdr6qf'
analyze index jkstill.validate_me_idx1 validate structure online

$> grep -nA1 'PARSING IN CURSOR #48004569509552' oravm1_ora_2377_VALIDATE.trc
51:PARSING IN CURSOR #48004569509552 len=64 dep=0 uid=90 oct=63 lid=90 tim=1462547433220254 hv=2128321230 ad='aad620f0' sqlid='318avy9zdr6qf'
52-analyze index jkstill.validate_me_idx1 validate structure online
--
6076836:PARSING IN CURSOR #48004569509552 len=55 dep=0 uid=90 oct=42 lid=90 tim=1462550199668869 hv=2217940283 ad='0' sqlid='06nvwn223659v'
6076837-alter session set events '10046 trace name context off'

 

The top 30 most active blocks were each read 53 or more times when the table was not well ordered in relation to the index.

$> tail -n +51 oravm1_ora_2377_VALIDATE.trc | head -n +$((6076836-51)) | grep "WAIT #48004569509552: nam='db file sequential read'" | awk '{ print $10 }' | awk -F= '{ print $2 }' | sort | uniq -c | sort -n | tail -30
     53 599927
     53 612399
     53 613340
     53 633506
     53 640409
     53 644099
     53 649054
     53 659198
     53 659620
     53 662600
     53 669176
     53 678119
     53 682177
     53 683409
     54 533294
     54 533624
     54 537977
     54 549041
     54 550178
     54 563206
     54 568045
     54 590132
     54 594809
     54 635330
     55 523616
     55 530064
     55 532693
     55 626066
     55 638284
     55 680250

 

Use RMAN

There is a feature of RMAN that allows checking for logical and physical corruption of an Oracle database via the  command backup check logical validate database.  This command does not actually create a backup, but just reads the database looking for corrupt blocks. Following is an (edited) execution of running this command on the same database where the analyze index commands were run.

A portion of the block corruption report is included.

RMAN> backup check logical validate database;
2>
Starting backup at 06-MAY-16
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=29 instance=oravm1 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00008 name=+DATA/oravm/datafile/alloctest_a.273.789580415
input datafile file number=00009 name=+DATA/oravm/datafile/alloctest_u.272.789582305
input datafile file number=00024 name=+DATA/oravm/datafile/swingbench.375.821472595
input datafile file number=00023 name=+DATA/oravm/datafile/swingbench.374.821472577
input datafile file number=00019 name=+DATA/oravm/datafile/bh08.281.778786819
input datafile file number=00002 name=+DATA/oravm/datafile/sysaux.257.770316147
input datafile file number=00004 name=+DATA/oravm/datafile/users.259.770316149
input datafile file number=00001 name=+DATA/oravm/datafile/system.256.770316143
input datafile file number=00011 name=+DATA/oravm/datafile/alloctest_m.270.801310167
input datafile file number=00021 name=+DATA/oravm/datafile/ggs_data.317.820313833
input datafile file number=00006 name=+DATA/oravm/datafile/undotbs2.265.770316553
input datafile file number=00026 name=+DATA/oravm/datafile/undotbs1a.667.850134899
input datafile file number=00005 name=+DATA/oravm/datafile/example.264.770316313
input datafile file number=00014 name=+DATA/oravm/datafile/bh03.276.778786795
input datafile file number=00003 name=+DATA/oravm/datafile/rcat.258.861110361
input datafile file number=00012 name=+DATA/oravm/datafile/bh01.274.778786785
input datafile file number=00013 name=+DATA/oravm/datafile/bh02.275.778786791
input datafile file number=00022 name=+DATA/oravm/datafile/ccdata.379.821460707
input datafile file number=00007 name=+DATA/oravm/datafile/hdrtest.269.771846069
input datafile file number=00010 name=+DATA/oravm/datafile/users.271.790861829
input datafile file number=00015 name=+DATA/oravm/datafile/bh04.277.778786801
input datafile file number=00016 name=+DATA/oravm/datafile/bh05.278.778786805
input datafile file number=00017 name=+DATA/oravm/datafile/bh06.279.778786809
input datafile file number=00018 name=+DATA/oravm/datafile/bh07.280.778786815
input datafile file number=00020 name=+DATA/oravm/datafile/bh_legacy.282.778787059
input datafile file number=00025 name=+DATA/oravm/datafile/baseline_dat.681.821717827
input datafile file number=00027 name=+DATA/oravm/datafile/sqlt.668.867171675
input datafile file number=00028 name=+DATA/oravm/datafile/bh05.670.878914399

channel ORA_DISK_1: backup set complete, elapsed time: 00:25:27

List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              75632        256074          375655477
  File Name: +DATA/oravm/datafile/system.256.770316143
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              158478
  Index      0              17160
  Other      0              4730

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              36332        394240          375655476
  File Name: +DATA/oravm/datafile/sysaux.257.770316147
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              170007
  Index      0              138603
  Other      0              49298

 

As shown in the report, only 25 minutes were required to check the entire database for physically or logically corrupt blocks, as opposed to the 40 minutes needed to analyze index validate structure.

While the RMAN corruption check is not the same as the check performed by analyze index validate structure, it is a test that can be completed in a much more timely manner, particularly if some indexes are both large and have a high value for the clustering factor.

Rebuild the Index?

If you have strong suspicions that a large index with an unfavorable clustering factor has corrupt blocks, it may be more expedient to just rebuild the index.  If the database is on Oracle Enterprise Edition, the rebuild can also be done with the ONLINE option.

Consider again the index on the test table with 1E7 rows.  Creating the index required 28 seconds, while validating the structure required 40 minutes.

 

 SQL> alter index validate_me_idx1 rebuild online;

Index altered.

Elapsed: 00:00:59.88


 

The conclusion is quite clear; the use of analyze index validate structure needs to be carefully considered when its use it contemplated for large indexes. The use of this command could be very resource intensive and take quite some time to complete. It is worthwhile to consider alternatives that my be much less resource intensive and time consuming.

Categories: DBA Blogs

MySQL encrypted streaming backups directly into AWS S3

Fri, 2016-05-13 08:25
Overview

Cloud storage is becoming more and more popular for offsite storage and DR solutions for many businesses. This post will help with those people that want to perform this process for MySQL backups directly into Amazon S3 Storage. These steps can probably also be adapted for other processes that may not be MySQL oriented.

Steps

In order to perform this task we need to be able to stream the data, encrypt it, and then upload it to S3. There are a number of ways to do each step and I will try and dive into multiple examples so that way you can mix and match the solution to your desired results.  The AWS S3 CLI tools that I will be using to do the upload also allows encryption but to try and get these steps open for customization, I am going to do the encryption in the stream.

  1. Stream MySQL backup
  2. Encrypt the stream
  3. Upload the stream to AWS S3
Step 1 : Stream MySQL Backup

There are a number of ways to stream the MySQL backup. A lot of it depends on your method of backup. We can stream the mysqldump method or we can utilize the file level backup tool Percona Xtrabackup to stream the backup. Here are some examples of how these would be performed.

mysqldump

When using mysqldump it naturally streams the results. This is why we have to add the greater than sign to stream the data into our .sql file. Since mysqldump is already streaming the data we will pipe the results into our next step

[root@node1 ~]# mysqldump --all-databases > employee.sql

becomes

[root@node1 ~]# mysqldump --all-databases |
xtrabackup

xtrabackup will stream the backup but with a little more assistance to tell it to do so. You can reference Precona’s online documentation (https://www.percona.com/doc/percona-xtrabackup/2.4/innobackupex/streaming_backups_innobackupex.html) for all of the different ways to stream and compress the backups using xtrabackup. We will be using the stream to tar method.

innobackupex --stream=tar /root > /root/out.tar

becomes

innobackupex --stream=tar ./ |
Step 2 : Encrypt The Stream

Now that we have the backup process in place, we will then want to make sure that our data is secure. We will want to encrypt the data that we are going to be sending up to AWS S3 as to make sure the data is protected. We can accomplish this a couple of ways. The first tool I am going to look at is GnuPG (https://www.gnupg.org/), which is the open source version of PGP encryption. The second tool I will look at is another very popular tool OpenSSL (https://www.openssl.org/).  Below are examples of how I set them up and tested their execution with streaming.

GnuPG

I will be creating a public and private key pair with a password that will be used to encrypt and decrypt the data. If you are going to do this for your production and sensitive data, please ensure that your private key is safe and secure.  When creating the keypair I was asked to provide a password.  When decrypting the data I was then asked for the password again to complete the process. It was an interactive step and is not shown in the example below. To accept a stream, you don’t provide a file name to encrypt, then to stream the output, you just don’t provide an output parameter.

KEY PAIR CREATION
[root@node1 ~]# gpg --gen-key
gpg (GnuPG) 2.0.14; Copyright (C) 2009 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Please select what kind of key you want:
   (1) RSA and RSA (default)
   (2) DSA and Elgamal
   (3) DSA (sign only)
   (4) RSA (sign only)
Your selection? 1
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048)
Requested keysize is 2048 bits
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0)
Key does not expire at all
Is this correct? (y/N) y

GnuPG needs to construct a user ID to identify your key.

Real name: root
Name must be at least 5 characters long
Real name: root@kmarkwardt
Email address: markwardt@pythian.com
Comment:
You selected this USER-ID:
    "root@kmarkwardt <markwardt@pythian.com>"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
You need a Passphrase to protect your secret key.

can't connect to `/root/.gnupg/S.gpg-agent': No such file or directory
gpg-agent[1776]: directory `/root/.gnupg/private-keys-v1.d' created
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.

After typing for what felt like FOREVER, to generate enough entropy

gpg: /root/.gnupg/trustdb.gpg: trustdb created
gpg: key 1EFB61B1 marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
pub   2048R/1EFB61B1 2016-04-29
      Key fingerprint = 8D98 2D23 3C49 F1E7 9CD2  CD0F 7163 EB03 1EFB 61B1
uid                  root@kmarkwardt <markwardt@pythian.com>
sub   2048R/577322A0 2016-04-29

[root@node1 ~]#

 

SAMPLE USAGE
ENCRYPT
[root@node1 openssl]# echo "test" | gpg --output install.log.gpg --encrypt -r root 
[root@node1 openssl]# cat install.log.gpg
?
 ???    Ws"???l?
??g             ?w??g?C}P
   ?5A??f?6?p?
???Qq?m??&?rKE??*}5.?4XTj?????Th????}A???: ^V?/w?$???"?<'?;
?Y?|?W????v?R??a?8o<BG??!?R???f?u?????????e??????/?X?y?S7??H??@???Y?X~x>qoA0??L?????*???I?;I?l??]??Gs?G'?!??
                                                                                                            ??k>?
DECRYPT
[root@node1 ~]# gpg --decrypt -r root --output install.log.decrypted install.log.gpg
install.log.decrypted
You need a passphrase to unlock the secret key for
user: "root@kmarkwardt <markwardt@pythian.com>"
2048-bit RSA key, ID 577322A0, created 2016-04-29 (main key ID 1EFB61B1)

can't connect to `/root/.gnupg/S.gpg-agent': No such file or directory
gpg: encrypted with 2048-bit RSA key, ID 577322A0, created 2016-04-29
     "root@kmarkwardt <markwardt@pythian.com>"
[root@node1 ~]# ls
install.log.decrypted
install.log.gpg

ENCRYPT STREAM

[root@node1 ~]# mysqldump --all-databases | gpg --encrypt -r root 
or
[root@node1 ~]# innobackupex --stream=tar ./ | gpg --encrypt -r root 

 

OpenSSL

As with GPG we will generate a public and private key with a pass phrase.  There are other ways to use openssl to encrypt and decrypt the data such as just using a password with no keys, using just keys with no password, or encrypt with no password or keys.  I am using keys with a password as this is a very secure method.

KEY PAIR CREATION
[root@node1 openssl]# openssl req -newkey rsa:2048 -keyout privkey.pem -out req.pem
Generating a 2048 bit RSA private key
.......................................+++
........+++
writing new private key to 'privkey.pem'
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:
State or Province Name (full name) []:
Locality Name (eg, city) [Default City]:
Organization Name (eg, company) [Default Company Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server's hostname) []:
Email Address []:

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:

[root@node1 openssl]# openssl x509 -req -in req.pem -signkey privkey.pem -out cert.pem
Signature ok
subject=/C=XX/L=Default City/O=Default Company Ltd
Getting Private key
Enter pass phrase for privkey.pem:
[root@node1 openssl]# ls -al
total 20
drwxr-xr-x  2 root root 4096 May  5 10:47 .
dr-xr-x---. 9 root root 4096 May  4 04:38 ..
-rw-r--r--  1 root root 1103 May  5 10:47 cert.pem
-rw-r--r--  1 root root 1834 May  5 10:43 privkey.pem
-rw-r--r--  1 root root  952 May  5 10:43 req.pem
[root@node1 openssl]# rm -rf req.pem 
SAMPLE USAGE
ENCRYPT
[root@node1 openssl]# echo "test" | openssl smime -encrypt -aes256 -binary -outform DER cert.pem > test.dat
[root@node1 openssl]# cat test.dat 
???0??1?k0?g0O0B1
                 0    UXX10U

                              Default City10U

?V??p?A$????PO??+???q@t??????\"%:0
??J?????5???0?D/?1z-?xO??&?#?;???E>^?g??#7??#m????lA???'??{)?*xM
P?l????]iz/???H???????[root@node1 openssl]#
DECRYPT
[root@node1 openssl]# openssl smime -decrypt -in test.dat -inform DER -inkey privkey.pem -out test.txt
Enter pass phrase for privkey.pem:
[root@node1 openssl]# cat test.txt 
test

ENCRYPT STREAM

[root@node1 ~]# mysqldump --all-databases | openssl smime -encrypt -aes256 -binary -outform DER cert.pem
or 
[root@node1 ~]# innobackupex --stream=tar ./ | openssl smime -encrypt -aes256 -binary -outform DER cert.pem
Step 3 : Stream to Amazon AWS S3

Now that we have secured the data, we will want to pipe the data into an Amazon AWS S3 bucket.  This will provide an offsite copy of the MySQL backup that you can convert to long term storage, or restore into an EC2 instance.  With this method I will only be looking at one.  The Amazon provided AWS CLI tools incorporates working with S3.  Allowing you to copy your files up into S3 with the ability to stream your input.

AWS CLI

In order to tell the AWS CLI S3 copy command to accept STDIN input you just have to put a dash in the place of the source file.  This will allow the command to accept a stream to copy.  The AWS CLI tools for copying into S3 also allows for encryption.  But I wanted to provide other methods as well to allow you to customize your own solution.   You can also stream the download of the S3 bucket item, which could allow for uncompression as you download the data or any other number of options.

UPLOAD STREAM

echo "test" | aws s3 cp - s3://pythian-test-bucket/incoming.txt 

BACKUP / ENCRYPT / UPLOAD STREAM

-- MySQL Dump -> OpenSSL Encryption -> AWS S3 Upload
[root@node1 ~]# mysqldump --all-databases | openssl smime -encrypt -aes256 -binary -outform DER cert.pem | aws s3 cp - s3://pythian-test-bucket/mysqldump.sql.dat
-- Xtrabackup -> OpenSSL Encryption -> AWS S3 Upload
[root@node1 ~]# innobackupex --stream=tar ./ | openssl smime -encrypt -aes256 -binary -outform DER cert.pem |aws s3 cp - s3://pythian-test-bucket/mysqldump.tar.dat
-- MySQL Dump -> GPG Encryption -> AWS S3 Upload
[root@node1 ~]# mysqldump --all-databases | gpg --encrypt -r root | aws s3 cp - s3://pythian-test-bucket/mysqldump.sql.gpg
-- MySQL Dump -> GPG Encryption -> AWS S3 Upload
[root@node1 ~]# innobackupex --stream=tar ./ | gpg --encrypt -r root | aws s3 cp - s3://pythian-test-bucket/mysqldump.tar.gpg

References

  • https://www.percona.com/doc/percona-xtrabackup/2.4/innobackupex/streaming_backups_innobackupex.html
  • https://linuxconfig.org/using-openssl-to-encrypt-messages-and-files-on-linux
  • https://www.gnupg.org/gph/en/manual/c14.html
  • https://www.gnupg.org/gph/en/manual/x110.html
  • https://linuxconfig.org/using-openssl-to-encrypt-messages-and-files-on-linux
  • https://www.openssl.org/docs/manmaster/apps/openssl.html
  • http://docs.aws.amazon.com/cli/latest/reference/s3/cp.html

 

 

Categories: DBA Blogs

Migrating Your Enterprise Applications To Amazon Web Services (AWS)

Thu, 2016-05-12 14:34

 

Many of the enterprise clients we work with are looking at Amazon Web Services (AWS) to support their cloud strategies and reap the benefits of the public cloud: lower costs, higher scalability, greater availability of computing resources.

 

AWS is well known for its cutting edge service offerings, which are always growing and evolving—making them an easy choice to recommend to our enterprise clients. When used to provide infrastructure as a service (IaaS), it simplifies hardware provisioning and management and makes it easy to allocate compute, memory and storage capacity. In a platform as a service (PaaS) situation, it can streamline operations through automated backups and patching, and can offer the convenience of pay-as-you-go billing.

 

The challenge for many organizations migrating to the cloud comes when they start to move their enterprise applications. That’s because these applications often rely on highly customized infrastructure with tightly coupled components. They also tend to exist in silos. Some careful planning is needed to make sure the new cloud environment delivers the outperforms the legacy on-premises one in terms of scalability, reliability and performance. Whenever desired by the customer, we will often recommend and apply a number of optimizations during the migration process, such as decoupling of components, stability and performance tune-ups as well as improved operational  visibility and platform automation.

 

When we migrated Harvard Business Publishing to AWS, for example, we tailored the public cloud architecture to meet their specific needs. (Harvard Business Publishing is a subsidiary of Harvard University — based in Boston, with offices in New York City, India, Singapore, Qatar and the United Kingdom. They produce and distribute content across multiple platforms.)

 

Harvard Business Publishing was running Oracle applications in a data center on three Oracle Database Appliances. With the data center lease coming up for renewal, they asked us to help migrate their Oracle-based production, quality assurance and disaster recovery systems to an AWS cloud. Obviously, these systems were mission critical.

 

We did thorough capacity planning and developed a reliable, highly-automated and predictable migration approach up front, customized the AWS architecture, and migrated to a large number of virtual machines and Oracle schemas.

 

In another instance, we migrated CityRealty’s core applications to the cloud from on-premise hardware running Oracle databases. CityRealty is the oldest continuously operating real estate website in New York City — the world’s largest real estate market — and remains the city’s leading site today. We proposed moving three production databases to Amazon Elastic Compute Cloud (EC2), which provides highly scalable compute capacity, as well as upgrading the Oracle databases. We also built a highly efficient standby and recovery solution using Amazon Elastic Block Storage (EBS) snapshots.

 

In both of these cases, we did the planning up front to ensure the new cloud environment would be ready to receive the clients’ applications and run them without any compromise in performance. We’ve done similar migrations for clients operating e-commerce businesses with revenues of millions and even billions per year. Every situation is a little different: in some cases clients needed to simplify and standardize their mission-critical systems; in others, boost reliability; in others, increase automation or eliminate transaction-hampering latency.

 

We have the benefit of small, dedicated cloud teams around the world with multiple, advanced AWS certifications to address even the most complex requirements. Our goal always is to ensure the public cloud architecture is ideally suited to the enterprise, and to provide detailed implementation plans and data management solutions for optimal performance. That allows us to implement and manage public, private, and hybrid environments with lower risk and cost for organizations that want to seize the benefits of the cloud.

Find out how Pythian can help you with cloud solutions for AWS  today.

Categories: DBA Blogs

Properly removing users in MySQL

Thu, 2016-05-12 08:58

Removing users in MySQL is an easy task, but there are some drawbacks to check carefully before dropping a user. Not taking these possible issues into consideration can render your application unusable.

First it is important to understand the concept of user in MySQL. A user has three main functions: authentication, privileges and resources. This is different from other databases: in MySQL users don’t own objects. An object belongs to a database and there is no direct relationship between objects and users. This simplifies the process of deleting a user because no objects will be erased.

But users, as I wrote before, have an associated set of privileges that define what database sessions can do, and the privileges applied both in stored programs and view execution.

At the same time, procedures, functions, triggers, and views have two possible security contexts: invoker and definer. If they are created using the invoker security model, then the user calling the routine or view must have enough privileges to execute the code within the routine or view. But if created using the definer security model, the code can be executed if the definer has enough privileges to execute it. And yes, the default security model in MySQL is definer.

This means that, unless declared otherwise, most routines will check the privileges for the user that created the routine. If somebody removes that user, querying the view or executing the code will fail with error. All the procedures, functions, views and triggers created by that user with the default options will become unusable.

How do we fix this? The quick and dirty way is to recreate the user with a different password to avoid user logins. It is not an elegant solution but probably this is the first thing you will do while you solve the problem correctly.  Another alternative, if you are running MySQL 5.7 is account locking, this feature disables login for that account but allows code and views to be executed. In any case, it is a good practice to make a backup of the user creation scripts prior to dropping the user. Percona toolkit has the utility pt-show-grants for that purpose.

The elegant way to avoid the problem is to check that there are not routines or views using the definer security model and configured to run with the user privileges of the user you want to remove. There are several tables in the mysql user catalog that provide you with this information.

Unfortunately there is not an easy way to change this attributes. The best thing you can do is drop and recreate those objects using different security characteristics.

Let see an example:

Connect to the database using an account with enough privileges to create users, databases and procedures and create a new database only for testing purposes.

mysql> create database remove_test;
Query OK, 1 row affected (0,05 sec)

Create a user with privileges on the database created in previous step:

mysql> grant all privileges on remove_test.* to test_user@'%' identified by 'test';
Query OK, 0 rows affected, 1 warning (0,20 sec)

Connect to the database using the user created:

$ mysql -u test_user -ptest remove_test

Create a view, check the attributes and execute it. We are going to use a couple of functions that I will explain a bit later.

mysql> create view test_view as select current_user(),user();
Query OK, 0 rows affected (0,05 sec)
mysql> show create view test_view\G
*************************** 1. row ***************************
                View: test_view
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`test_user`@`%` SQL SECURITY DEFINER VIEW `test_view` AS select current_user() AS `current_user()`,user() AS `user()`
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0,00 sec)

mysql> select * from test_view;
+----------------+---------------------+
| current_user() | user()              |
+----------------+---------------------+
| test_user@%    | test_user@localhost |
+----------------+---------------------+
1 row in set (0,00 sec)

As we can see, the view has been created with SQL SECURITY DEFINER and DEFINER=`test_user`@`%`. The view returns the value from two functions: current_user() returns the value that matches against the mysql.user table and that defines the privileges the current session or current execution context has. The function user() returns the host you are connected from and the username specified as connection identifier.

Now we reconnect to the database using the privileged account from the first step, and requery the view:

$ mysql -u root -p remove_test
mysql> select * from test_view;
+----------------+----------------+
| current_user() | user()         |
+----------------+----------------+
| test_user@%    | root@localhost |
+----------------+----------------+
1 row in set (0,00 sec)

The view is executed by root@localhost but with the privileges of test_user@%. It is important to note that if you monitor the execution of a routine or view, using SHOW PROCESSLIST or query against information_schema.processlist, the contents of user and host are the same that current_user() return, not the ones returned by user().

Now we will drop the user and query once again the view.

mysql> drop user test_user@'%';
Query OK, 0 rows affected (0,07 sec)
mysql> select * from test_view;
ERROR 1449 (HY000): The user specified as a definer ('test_user'@'%') does not exist

In case you don’t see the error and still get the correct results, this is because the contents of the view are stored in the query cache. Clear their contents and repeat the query.

mysql> reset query cache;
Query OK, 0 rows affected (0,00 sec)

How to validate if it is safe to remove a user? You must query the information_schema tables to find objects than can bring problems and recreate them.

mysql> select routine_schema as db,
    ->        routine_type as object_type,
    ->        routine_name as object_name
    -> from routines
    -> where security_type='DEFINER'
    ->   and definer='test_user@%'
    -> union all
    -> select trigger_schema as db,
    ->        'TRIGGER' as object_type,
    ->         trigger_name as object_name
    -> from triggers
    -> where definer='test_user@%'
    -> union all
    -> select table_schema as db,
    ->        'VIEW' as object_type,
    -> table_name as object_name
    -> from views
    -> where security_type='DEFINER'
    ->   and definer='test_user@%';
+-------------+-------------+-------------+
| db          | object_type | object_name |
+-------------+-------------+-------------+
| remove_test | VIEW        | test_view   |
+-------------+-------------+-------------+
1 row in set (0,02 sec)

Replace test_user@% with the value of the user you want to remove.

This way you get a list of the objects you must change to avoid problems. As I said before the best thing you can do to change the definer is recreating the objects. To obtain the creation script you should use SHOW CREATE VIEW, SHOW CREATE TRIGGER, SHOW CREATE PROCEDURE or SHOW CREATE FUNCTION.

If you want to avoid problems in the future, my recommendation is that for every application, create an account for that application, lock it with an unknown password or using account locking feature. Grant that account all the privileges needed to run the application and make sure all the views, procedures, functions and triggers have that account in the definer field.

Categories: DBA Blogs

Log Buffer #473: A Carnival of the Vanities for DBAs

Wed, 2016-05-11 15:32

This Log Buffer Edition covers some of the hottest blog posts of Oracle, SQL Server and MySQL for the previous week.

Oracle:

A recent post on the OTN database forum reminded Jonathan how easy it is to forget to keep repeating a piece of information after the first couple of hundred times you’ve explained it. No matter how “intuitively obvious” it is for one person, it’s new to someone else.

The OMS Patcher is a newer patching mechanism for the OMS specifically.

A SQLT report has all kinds of pertinent information including—to name just a few—optimizer settings, indexes, statistics, plan history, and view definitions.

Joins are fundamental in SQL, and are used in most every production query. There are three types in particular that every developer should fully understand.

Why should Oracle Database professionals care about NoSQL and where to start?

SQL Server:

Understanding OPENJSON and FOR JSON in Azure and SQL 2016

Using Data Compression in Master Data Services 2016

The most frustrating thing with any new system is often just working out how to connect to it.

There are several decisions to be made when designing indexes for Memory-optimized tables in In-Memory OLTP, and to make the best choice, it pays to understand something about the nature and usage of memory-optimised indexes.

Database Hardware and Infrastructure Trends

STRING_SPLIT() in SQL Server 2016 : Follow-Up #2

MySQL:

Why would I run MySQL/MariaDB on POWER anyway?

By default, the write() system call returns after all data has been copied from the user space file descriptor into the kernel space buffers. There is no guarantee that data has actually reached the physical storage.

OpenCPS: Vietnam’s Public Sector goes Open Source

MySQL uses metadata locking to manage concurrent access to database objects, and to ensure data consistency when performing modifications to the schema

Using Docker for development is a great way of ensuring that what you develop will be the same that you deploy in production.

Categories: DBA Blogs

Defining Digital Transformation

Wed, 2016-05-11 14:33

 

Terminology is important—and it’s particularly important for us to define terms that are central to what we do. So when it comes to the subject of digital transformation, what exactly are we talking about?

 

In speaking with clients and industry thought leaders, I’ve come to realize that the term “digital transformation” has a different meaning to different people. For a term that is so widely used — and that, on its surface, seems pretty straightforward — the range of interpretation is remarkable. It’s a bit like when we say “I’ll do it later.”  “Later” to one person means “before the sun goes down today.” “Later” to another person means “sometime in the future”, and it could mean days or weeks in their mind. “Later” to a third person can mean “I have no plans to do it, and this is my way of telling you nicely.”

 

Because the term is so essential to the work we do for our clients, I thought it would be helpful to define what digital transformation means to us here at Pythian. There’s so much we can say on the topic, so I plan to follow up with a series of articles about how I’ve seen it implemented, or worse, not implemented or even embraced as a concept.

 

To start, “digital transformation” is about technology. I know that to some people it isn’t, but I disagree. These days, you can’t transform your business without technology. It’s not about which technology you choose, as much as it’s about how to use it. Even more specifically, we’ve found that the businesses that are achieving positive transformation are using technology to capitalize on data. I have yet to see a single transformation project that didn’t use data as a major component of its success.

 

Let’s look at the term “transformation.” This equates to change, but it doesn’t mean change for its own sake. The change we’re talking about has to benefit the business. However, the factor that can make or break successful change is people. Their attitudes, preconceptions, and ideas almost always have to be aligned with the change for successful transformation to occur. People need to get behind the initiative, people have to fund it, people have to develop it, and people have to support it once it’s developed. And we all know that getting people to change can be more difficult than developing any new technology. In short, the transformative capabilities inherent in technology can only be realized when coupled with the willingness to embrace change.

Why Digital Transformation?

Why is the concept of digital transformation important in the first place? At Pythian, we believe that it’s about using technology and data to change your business for the better. What do we mean when we say “for the better”? Therein lies the controversy.  “For the better” means different things to different people depending on their company’s key objectives.

 

“For the better” can mean:

  • Becoming more efficient to drive costs down so your profitability can improve
  • Reducing mistakes and improving your reputation, or the quality of your product
  • Differentiating your product to get ahead of the competition
  • Doing what you do, only faster than your competitors
  • Creating new revenue streams
  • Improving the customer experience. This is a big one, so I will dedicate an entire blog post to exploring exactly what it means.

 

Digital transformation is the key to achieving any one, or all of these benefits, and knowing your objectives and priorities will help you shape your digital transformation initiative. So to start, focus less on what digital transformation is, and more on what you want the outcome of a transformation to be.

 

Categories: DBA Blogs

GoldenGate 12.2 Big Data Adapters: part 4 – HBASE

Wed, 2016-05-11 10:51

This is the next post in my series about Oracle GoldenGate Big Data adapters. Here is list of all posts in the series:

  1. GoldenGate 12.2 Big Data Adapters: part 1 – HDFS
  2. GoldenGate 12.2 Big Data Adapters: part 2 – Flume
  3. GoldenGate 12.2 Big Data Adapters: part 3 – Kafka
  4. GoldenGate 12.2 Big Data Adapters: part 4 – HBASE

In this post I am going to explore HBASE adapter for GoldenGate. Let’s start by recalling what we know about HBASE. The Apache HBASE is non-relational, distributed database. It has been modelled after the Google’s Bigtable distributed database. It can provide read write access to the data and is based on top of Hadoop or HDFS.

So, what does it tell us? First, we can write and change the data. Second, we need to remember that it is non-relation database and it is a bit of a different approach to data in comparison with traditional relation databases. You can think about HBase as about a key-value store. We are not going deep inside HBASE architecture and internals here, since our main task is to test Oracle GoldenGate adapter and see how it works. Our configuration has an Oracle database as a source with a GoldenGate extract and target system where we have Oracle GoldenGate for BigData.

We have more information about setting up the source and target in the first post in the series about HDFS adapter. The source side replication part has already been configured and started. We have initial trail file for data initialization and trails for the ongoing replication. We capture changes for all tables in the ggtest schema on the oracle database.
Now we need to prepare our target site. Let’s start from HBase. I used a pseudo-distributed mode for my tests where I ran a fully-distributed mode on a single host. It is not acceptable for any production configuration but will suffice for our tests. On the same box I have HDFS to serve as a main storage. Oracle documentation for the adapter states that they support HBase from version 1.0.x . In my first attempt I tried to use HBase version 1.0.0 (Cloudera 5.6) but it didn’t work. I got errors in the GoldenGate and my extract was aborted.
Here is the error :

 
2016-03-29 11:51:31  ERROR   OGG-15051  Oracle GoldenGate Delivery, irhbase.prm:  Java or JNI exception:
java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)Lorg/apache/hadoop/hbase/HTableDescriptor;.
2016-03-29 11:51:31  ERROR   OGG-01668  Oracle GoldenGate Delivery, irhbase.prm:  PROCESS ABENDING.

So, I installed another version HBase and the version 1.1.4 worked just fine. I used simple, standard HBase configuration for pseudo-distributed mode where region server was on the same host as master and hbase.rootdir point to local hdfs.
Here is example of configuration:

<configuration>
<property>
  <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    <property>
      <name>hbase.rootdir</name>
        <value>hdfs://localhost:8020/user/oracle/hbase</value>
        </property>
</configuration>
[root@sandbox conf]# cat regionservers
localhost
[root@sandbox conf]#

As soon as we have HBase setup and running we can switch our attention to GoldenGate instead. We have already a trail file with initial load. Now we need to prepare our configuration files for initial and ongoing replication. Let’s go to our GoldenGate for Big Data home directory and prepare everything. In first, we need a hbase.conf file copied from $OGG_HOME/AdapterExamples/big-data/hbase directory to $OGG_HOME/dirprm. I left everything as it used to be in the original file changing only gg.classpath parameter to point it to my configuration files and libs for HBase.
Here is an example of the configuration files:

[oracle@sandbox oggbd]$ cat dirprm/hbase.props

gg.handlerlist=hbase

gg.handler.hbase.type=hbase
gg.handler.hbase.hBaseColumnFamilyName=cf
gg.handler.hbase.keyValueDelimiter=CDATA[=]
gg.handler.hbase.keyValuePairDelimiter=CDATA[,]
gg.handler.hbase.encoding=UTF-8
gg.handler.hbase.pkUpdateHandling=abend
gg.handler.hbase.nullValueRepresentation=CDATA[NULL]
gg.handler.hbase.authType=none
gg.handler.hbase.includeTokens=false

gg.handler.hbase.mode=tx

goldengate.userexit.timestamp=utc
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE

gg.log=log4j
gg.log.level=INFO

gg.report.time=30sec

gg.classpath=/u01/hbase/lib/*:/u01/hbase/conf:/usr/lib/hadoop/client/*

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

In second, we have to prepare a parameter file for our initial load. I used a simple file with minimum parameters.

[oracle@sandbox oggbd]$ cat dirprm/irhbase.prm
-- passive REPLICAT irhbase
-- Trail file for this example is located in "./dirdat/initld" file
-- Command to add REPLICAT
-- run replicat irhbase:
-- ./replicat paramfile dirprm/irhbase.prm reportfile dirrpt/irhbase.rpt
SPECIALRUN
END RUNTIME
EXTFILE /u01/oggbd/dirdat/initld
TARGETDB LIBFILE libggjava.so SET property=dirprm/hbase.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAP GGTEST.*, TARGET BDTEST.*;

Having that configuration file we can run the replicat in passive mode from command line and see the result.
Here is initial status for HBASE:

hbase(main):001:0> version
1.1.4, r14c0e77956f9bb4c6edf0378474264843e4a82c3, Wed Mar 16 21:18:26 PDT 2016

hbase(main):001:0> list
TABLE
0 row(s) in 0.3340 seconds

=> []
hbase(main):002:0>

Running the replicat:

oracle@sandbox oggbd]$ ./replicat paramfile dirprm/irhbase.prm reportfile dirrpt/irhbase.rpt
[oracle@sandbox oggbd]$

Now we have 2 tables in HBASE:

hbase(main):002:0> list
TABLE
BDTEST:TEST_TAB_1
BDTEST:TEST_TAB_2
2 row(s) in 0.3680 seconds

=> ["BDTEST:TEST_TAB_1", "BDTEST:TEST_TAB_2"]
hbase(main):003:0>

Let’s have a look to the tables structure and contains:


hbase(main):004:0> describe 'BDTEST:TEST_TAB_1'
Table BDTEST:TEST_TAB_1 is ENABLED
BDTEST:TEST_TAB_1
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.2090 seconds

hbase(main):005:0> scan 'BDTEST:TEST_TAB_1'
ROW                                            COLUMN+CELL
 1                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-01-22:12:14:30
 1                                             column=cf:PK_ID, timestamp=1459269153102, value=1
 1                                             column=cf:RND_STR, timestamp=1459269153102, value=371O62FX
 1                                             column=cf:RND_STR_1, timestamp=1459269153102, value=RJ68QYM5
 1                                             column=cf:USE_DATE, timestamp=1459269153102, value=2014-01-24:19:09:20
 2                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-05-11:05:23:23
 2                                             column=cf:PK_ID, timestamp=1459269153102, value=2
 2                                             column=cf:RND_STR, timestamp=1459269153102, value=371O62FX
 2                                             column=cf:RND_STR_1, timestamp=1459269153102, value=HW82LI73
 2                                             column=cf:USE_DATE, timestamp=1459269153102, value=2014-01-24:19:09:20
 3                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-01-22:12:14:30
 3                                             column=cf:PK_ID, timestamp=1459269153102, value=3
 3                                             column=cf:RND_STR, timestamp=1459269153102, value=RXZT5VUN
 3                                             column=cf:RND_STR_1, timestamp=1459269153102, value=RJ68QYM5
 3                                             column=cf:USE_DATE, timestamp=1459269153102, value=2013-09-04:23:32:56
 4                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-05-11:05:23:23
 4                                             column=cf:PK_ID, timestamp=1459269153102, value=4
 4                                             column=cf:RND_STR, timestamp=1459269153102, value=RXZT5VUN
 4                                             column=cf:RND_STR_1, timestamp=1459269153102, value=HW82LI73
 4                                             column=cf:USE_DATE, timestamp=1459269153102, value=2013-09-04:23:32:56
4 row(s) in 0.1630 seconds

hbase(main):006:0> scan 'BDTEST:TEST_TAB_2'
ROW                                            COLUMN+CELL
 7|IJWQRO7T|2013-07-07:08:13:52                column=cf:ACC_DATE, timestamp=1459269153132, value=2013-07-07:08:13:52
 7|IJWQRO7T|2013-07-07:08:13:52                column=cf:PK_ID, timestamp=1459269153132, value=7
 7|IJWQRO7T|2013-07-07:08:13:52                column=cf:RND_STR_1, timestamp=1459269153132, value=IJWQRO7T
1 row(s) in 0.0390 seconds

hbase(main):007:0>

Everything looks good for me. We have structure and records as expected. Let’s go forward and setup ongoing replication.
I have created a parameter file for my replicat using the the initial load parameters as a basis:

[oracle@sandbox oggbd]$ cat dirprm/rhbase.prm
REPLICAT rhbase
-- Trail file for this example is located in "dirdat/or" directory
-- Command to add REPLICAT
-- add replicat rhbase, exttrail dirdat/or
TARGETDB LIBFILE libggjava.so SET property=dirprm/hbase.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAP ggtest.*, TARGET bdtest.*;

We are checking our trail files and starting our replicat using the latest trail file. By default, a replicat would be looking for a trail with sequential number 0, but, since I have a purging policy on my GoldenGate it deletes old files and I need tell to replicat where to start exactly.

[oracle@sandbox oggbd]$ ll dirdat/
total 4940
-rw-r-----. 1 oracle oinstall    3028 Feb 16 14:17 initld
-rw-r-----. 1 oracle oinstall 2015199 Mar 24 13:07 or000043
-rw-r-----. 1 oracle oinstall 2015229 Mar 24 13:08 or000044
-rw-r-----. 1 oracle oinstall 1018490 Mar 24 13:09 or000045
[oracle@sandbox oggbd]$ ggsci

Oracle GoldenGate Command Interpreter
Version 12.2.0.1.0 OGGCORE_12.2.0.1.0_PLATFORMS_151101.1925.2
Linux, x64, 64bit (optimized), Generic on Nov 10 2015 16:18:12
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved.



GGSCI (sandbox.localdomain) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING


GGSCI (sandbox.localdomain) 2> add replicat rhbase, exttrail dirdat/or,EXTSEQNO 45
REPLICAT added.


GGSCI (sandbox.localdomain) 3> start replicat rhbase

Sending START request to MANAGER ...
REPLICAT RHBASE starting


GGSCI (sandbox.localdomain) 4> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
REPLICAT    RUNNING     RHBASE      00:00:00      00:00:06


GGSCI (sandbox.localdomain) 5> info rhbase

REPLICAT   RHBASE    Last Started 2016-03-29 12:56   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:08 ago)
Process ID           27277
Log Read Checkpoint  File dirdat/or000045
                     2016-03-24 13:09:02.000274  RBA 1018490


GGSCI (sandbox.localdomain) 6>

I inserted number of rows to test_tab_1 on oracle side and all of them were successfully replicated to HBASE.

hbase(main):015:0> count 'BDTEST:TEST_TAB_1'
Current count: 1000, row: 1005694
Current count: 2000, row: 442
Current count: 3000, row: 6333
3473 row(s) in 1.0810 seconds

=> 3473
hbase(main):016:0>

Let’s have a look bit close to test_tab_1 and test_tab_2:

hbase(main):005:0> scan 'BDTEST:TEST_TAB_1'
ROW                                            COLUMN+CELL
 1                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-01-22:12:14:30
 1                                             column=cf:PK_ID, timestamp=1459269153102, value=1
 1                                             column=cf:RND_STR, timestamp=1459269153102, value=371O62FX
 1                                             column=cf:RND_STR_1, timestamp=1459269153102, value=RJ68QYM5
 1                                             column=cf:USE_DATE, timestamp=1459269153102, value=2014-01-24:19:09:20
 2                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-05-11:05:23:23
 2                                             column=cf:PK_ID, timestamp=1459269153102, value=2
 2                                             column=cf:RND_STR, timestamp=1459269153102, value=371O62FX
 2                                             column=cf:RND_STR_1, timestamp=1459269153102, value=HW82LI73
 2                                             column=cf:USE_DATE, timestamp=1459269153102, value=2014-01-24:19:09:20
..............................................

hbase(main):006:0> scan 'BDTEST:TEST_TAB_2'
ROW                                            COLUMN+CELL
 7|IJWQRO7T|2013-07-07:08:13:52                column=cf:ACC_DATE, timestamp=1459269153132, value=2013-07-07:08:13:52
 7|IJWQRO7T|2013-07-07:08:13:52                column=cf:PK_ID, timestamp=1459269153132, value=7
 7|IJWQRO7T|2013-07-07:08:13:52                column=cf:RND_STR_1, timestamp=1459269153132, value=IJWQRO7T
1 row(s) in 0.0390 seconds

hbase(main):007:0>

You can see that row identifier for test_tab_1 is value for pk_id and for test_tab_2 it is concatenation of all values for all columns. Why is it so? The difference is in constraints for the tables. Since we don’t have a primary key or unique index for test_tab_2 it uses all columns as a key value. We can try to add a constraint and see the result.

select * from dba_constraints where owner='GGTEST' and table_name='TEST_TAB_2';

no rows selected

alter table ggtest.test_tab_2 add constraint pk_test_tab_2 primary key (pk_id);

Table altered.

insert into ggtest.test_tab_2 values(9,'PK_TEST',sysdate,null);

1 row created.

commit;

Commit complete.

orcl>

And let us comare with result on the HBASE:

hbase(main):012:0> scan 'BDTEST:TEST_TAB_2'
ROW                                           COLUMN+CELL
 7|IJWQRO7T|2013-07-07:08:13:52               column=cf:ACC_DATE, timestamp=1459275116849, value=2013-07-07:08:13:52
 7|IJWQRO7T|2013-07-07:08:13:52               column=cf:PK_ID, timestamp=1459275116849, value=7
 7|IJWQRO7T|2013-07-07:08:13:52               column=cf:RND_STR_1, timestamp=1459275116849, value=IJWQRO7T
 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER   column=cf:ACC_DATE, timestamp=1459278884047, value=2016-03-29:15:14:37
 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER   column=cf:PK_ID, timestamp=1459278884047, value=8
 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER   column=cf:RND_STR_1, timestamp=1459278884047, value=TEST_INS1
 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER   column=cf:TEST_COL, timestamp=1459278884047, value=TEST_ALTER
 9                                            column=cf:ACC_DATE, timestamp=1462473865704, value=2016-05-05:14:44:19
 9                                            column=cf:PK_ID, timestamp=1462473865704, value=9
 9                                            column=cf:RND_STR_1, timestamp=1462473865704, value=PK_TEST
 9                                            column=cf:TEST_COL, timestamp=1462473865704, value=NULL
3 row(s) in 0.0550 seconds

hbase(main):013:0>

It is fully dynamic and changed row id column on the fly. Will it work with unique index? Yes it will :


delete from ggtest.test_tab_2 where pk_id=9;

1 row deleted.

alter table ggtest.test_tab_2 drop constraint pk_test_tab_2;

Table altered.

create unique index ggtest.ux_test_tab_2 on ggtest.test_tab_2 (pk_id);

Index created.

insert into ggtest.test_tab_2 values(10,'UX_TEST',sysdate,null);

1 row created.

commit;

Here is the newly inserted row.

hbase(main):017:0> scan 'BDTEST:TEST_TAB_2'
ROW                                           COLUMN+CELL
 10                                           column=cf:ACC_DATE, timestamp=1462474389145, value=2016-05-05:14:53:03
 10                                           column=cf:PK_ID, timestamp=1462474389145, value=10
 10                                           column=cf:RND_STR_1, timestamp=1462474389145, value=UX_TEST
 10                                           column=cf:TEST_COL, timestamp=1462474389145, value=NULL
 7|IJWQRO7T|2013-07-07:08:13:52               column=cf:ACC_DATE, timestamp=1459275116849, value=2013-07-07:08:13:52
 7|IJWQRO7T|2013-07-07:08:13:52               column=cf:PK_ID, timestamp=1459275116849, value=7

But it will not make any difference if we just create an index on the source. It will not change anything. So, if we need to identify a key for a table we have to have at least unique constraint. Of course it is just default behavior for a schema replication and we may use KEYCOLS to identify keys for some tables.

Interesting that if we change a table structure it will affect all newly inserted rows but will not change existing even if we update some values. It works by this way if you have an unique identifier and it was not changed by your DDL operation.
Here is an example. We have a column “TEST_COL” in the table test_tab_2. Let’s drop the column and update the row. Keep in mind that our primary key is column PK_ID and we are not modifying the key.

alter table ggtest.test_tab_2 drop column TEST_COL;

Table altered.

update ggtest.test_tab_2 set rnd_str_1='TEST_COL' where pk_id=9;

1 row updated.

commit;

In HBASE we can see the same set of columns:

hbase(main):030:0> scan 'BDTEST:TEST_TAB_2'
ROW                                           COLUMN+CELL
 9                                            column=cf:ACC_DATE, timestamp=1462477581440, value=2016-05-05:15:46:13
 9                                            column=cf:PK_ID, timestamp=1462477794597, value=9
 9                                            column=cf:RND_STR_1, timestamp=1462477794597, value=TEST_COL
 9                                            column=cf:TEST_COL, timestamp=1462477581440, value=NULL
1 row(s) in 0.0200 seconds

We still have the deleted column TEST_COL even we’ve updated the row.
But if we insert any new row it will have the new set of columns:

insert into ggtest.test_tab_2 values(10,'TEST_COL',sysdate);

1 row created.

commit;

Commit complete.

And in HBASE:

hbase(main):031:0> scan 'BDTEST:TEST_TAB_2'
ROW                                           COLUMN+CELL
 10                                           column=cf:ACC_DATE, timestamp=1462477860649, value=2016-05-05:15:50:55
 10                                           column=cf:PK_ID, timestamp=1462477860649, value=10
 10                                           column=cf:RND_STR_1, timestamp=1462477860649, value=TEST_COL
 9                                            column=cf:ACC_DATE, timestamp=1462477581440, value=2016-05-05:15:46:13
 9                                            column=cf:PK_ID, timestamp=1462477794597, value=9
 9                                            column=cf:RND_STR_1, timestamp=1462477794597, value=TEST_COL
 9                                            column=cf:TEST_COL, timestamp=1462477581440, value=NULL
2 row(s) in 0.0340 seconds

And, as for all other cases, truncate on source table is not going to be replicated to the target and the operation will be ignored. You have to truncate the table in HBASE by yourself to keep the data in sync. In case you insert data again the data in HBASE will be “updated”. But it will not delete other rows. It will be more like a “merge” operation.
Here is an example:

truncate table ggtest.test_tab_2;

Table truncated.

insert into ggtest.test_tab_2 values(10,'TEST_COL2',sysdate);

1 row created.

commit;

Commit complete.

select * from ggtest.test_tab_2;

	   PK_ID RND_STR_1  ACC_DATE
---------------- ---------- -----------------
	      10 TEST_COL2  05/05/16 16:01:20

orcl>

HBASE:
hbase(main):033:0> scan 'BDTEST:TEST_TAB_2'
ROW                                           COLUMN+CELL
 10                                           column=cf:ACC_DATE, timestamp=1462478485067, value=2016-05-05:16:01:20
 10                                           column=cf:PK_ID, timestamp=1462478485067, value=10
 10                                           column=cf:RND_STR_1, timestamp=1462478485067, value=TEST_COL2
 9                                            column=cf:ACC_DATE, timestamp=1462477581440, value=2016-05-05:15:46:13
 9                                            column=cf:PK_ID, timestamp=1462477794597, value=9
 9                                            column=cf:RND_STR_1, timestamp=1462477794597, value=TEST_COL
 9                                            column=cf:TEST_COL, timestamp=1462477581440, value=NULL
2 row(s) in 0.0300 seconds

hbase(main):034:0>

I spent some time testing performance and found the main bottleneck was my Oracle source rather than GoldenGate and HBASE. I was able to sustain transaction rate up to 60 DML per second and my Oracle DB started to struggle to keep pace because of waiting for a commit. The HBASE and replicat were absolutely fine. I also checked how it handles big transactions and inserted about 2 billion rows by one transaction. It worked fine. Of course it doesn’t prove that any of your production configurations will be without any performance issues. To conduct real performance tests I need to use much bigger environment.
In addition, I noticed one more minor error in Oracle documentation for adapter related to “keyValuePairDelimiter” parameter. In documentation it is replaced by “keyValueDelimiter”. It just small mistype and the “keyValueDelimiter” is repeated twice. First time it is correct and the second time it stands on the place where “keyValuePairDelimiter” is supposed to be. Here is the link.

As a summary I can say that despite some minor issues the adapters and GoldenGate for Big Data showed quite mature status and readiness for real work. I think it is good robust technology and, hopefully, its development will continue improving it with new releases. I am looking forward to use it in a real production environment with significant workload. In following posts I will try to test different DDL operations and maybe some other datatypes. Stay tuned.

Categories: DBA Blogs

Learn how to optimize text searches in SQL Server 2014 by using Full-Text Search – Part 1

Tue, 2016-05-10 08:50

In this article, we’ll cover the functionality available in SQL Server 2014 for textual research known as Full-Text Search, its installation and implementation. Additionally, we will see how to develop textual searches using the predicates CONTAINS, CONTAINSTABLE, FREETEXT and FREETEXTTABLE, and use the FILESTREAM feature to improve the research and storage of binary data in tables.

The research based on words and phrases is one of the main features of the search tools on the web, like Google, and digital document management systems. To perform these searches efficiently, many developers create highly complex applications that do not have the necessary intelligence to find terms and phrases in the columns that store text and digital documents in the database tables.

What the vast majority of these professionals don’t know is that SQL Server has an advanced tool for textual research, the Full-Text Search (FTS).

FTS has been present in SQL Server since version 7, and through use textual searches can be performed both in columns that store characters, and in columns that store documents (for example, Office documents and PDFs), in its native form.

With options like searches for words and phrases, recognition of different languages, derivation of words (for example: play, played and playing), the possibility of developing a thesaurus, the creation of ranked results, and elimination of stopwords for search, FTS becomes a powerful tool for textual searches. As main factors for the use of textual searches we have:

  • The current databases are increasingly used as repositories of digital documents;
  • The cost for storage of information has slowed considerably, enabling the storage of Gigabytes, Terabytes and even Petabytes;
  • New types of digital documents are constantly being created, and the requirements for their storage, and subsequent research, are becoming larger and more complex;
  • Developers need a robust and reliable interface for performing textual research intelligence.

FTS has great advantages over other alternatives for textual research. For example, the command LIKE. The main tasks you can perform with FTS are:

  • Textual research based on linguistics. A linguistic research is based on words or phrases in a particular language, taking into consideration the verb conjugation, derived words, accent, among other features. Unlike the LIKE predicate, FTS uses an efficient indexing structure to perform textual research;
  • Automatic removal of stopwords informed in a textual research. The following are considered stopwords ones that don’t add to the result of the survey, such as from, to, the, the, a, an;
  • Assigning weights to the terms searched, making certain words are more important than others within the same textual research;
  • Generation of prioritization, allowing a better view of the documents that are most relevant according to the research carried out;
  • Indexing and searching in the most diverse types of digital documents. With FTS you can carry out searches in text files, spreadsheets, ZIP files, among others.

In this article will describe the architecture of the FTS, your installation and configuration, the main T-SQL commands used in textual research, the use of FTS in conjunction with the FILESTREAM, and also some techniques to optimize searches through the FTS.

FTS architecture

The architecture of the FTS has several components working in conjunction with the SQL Server query processor to perform textual research efficiently. The Figure 1 illustrates the major components of the architecture of the FTS. Let’s look at some of them:

  • Client Consultation: The client application sends the textual queries to the SQL Server query processor. It is the responsibility of the client application to ensure that the textual queries are written in the right way by following the syntax of FTS;
  • SQL Server Process (sqlservr.exe): The SQL Server process contains the query processor and also the engine of the FTS, which compiles and executes the textual queries. The integration between SQL Server and process the FTS offers a significant performance boost because it allows the query processor lot more efficient execution plans for textual searches;
  • SQL Server Query Processor: The query processor has multiple subcomponents that are responsible for validating the syntax, compile, generate execution plans and execute the SQL queries;
  • Full-Text Engine: When the SQL Server query processor receives a query FTS, it forwards the request to the FTS Engine. The Engine is responsible for validating FTS the FTS query syntax, check the full-text index, and then work together with the SQL Server query processor to return the textual search results;
  • Indexer: The indexer works in conjunction with other components to populate the full-text index;
  • Full-Text Index: The full-text index contains the most relevant words and their respective positions within the columns included in the index;
  • Stoplist: A stoplist is a list of stopwords for textual research. The indexer stoplist query during the indexing process and implementation of textual research to eliminate the words that don’t add value to the survey. SQL Server 2014 stores the stoplists within the database itself, thus facilitating their administration;
  • Thesaurus: The thesaurus is an XML file (stored externally to the database) in which you can define a list of synonyms that can be used for the textual research. The thesaurus must be based on the language that will be used in the search. The full-text engine reads the thesaurus file at the time of execution of research to verify the existence of synonyms that can increase the quality and comprehensiveness of the same;
  • Filter daemon host (fdhost.exe): Is responsible for managing the processes of filtering, word breaker and stemmer;
  • SQL Full-Text Filter Daemon Launcher (fdlauncher.exe): Is the process that starts the Filter daemon host (Fdhost.exe) when the full-text engine needs to use some of the processes managed by the same.

FTS1

Figure 1. Architecture of FTS.

For the better understanding of the process of creation, use and maintenance of the structure of full-text indexes, you must also know the meaning of some important concepts. They are:

  • Term: The word, phrase or character used in textual research;
  • Full-Text Catalog: A group of full-text indexes;
  • Word breaker: The process that is the barrier every word in a sentence, based on the grammar rules of the language selected for the creation of full-text index;
  • Token: A word, phrase or character defined by the word breaker;
  • Stemmer: The process that generates different verb forms for the words, based on the grammar rules of the language selected for the creation of full-text index;
  • Filter: Component responsible for extracting textual information from documents stored with the data type varbinary(max) and send this information to the process word breaker.
Indexing process

The indexing process is responsible for the initial population of a full-text index and update of this index when the data modifications occur on the columns that have been indexed by FTS. This initialization process and update the full-text index named crawl.

When the crawl process is started, the FTS component known as protocol handler accesses the data in the table being indexed and begins the process to load into memory the existing content in this table, also known as streaming. To have access to data which are stored on disk, the protocol handler allows FTS to communicate with the Storage Engine. After the end of streaming the filter daemon host process performs data filtering, and initiates the processes of word breaker and stemmer for the filling in of the full-text index.

During the indexing process the stoplist is queried to remove stopwords, and so fill the structure of the full-text index with words that are meaningful to the textual research. The last step of the indexing process is known as a master merge, in which every word indexed are grouped in a single full-text index.

Despite the indexing process requires a high i/o consumption, it is not necessary to the blocking of the data being indexed. However a query performed using a full-text index during the indexing process can generate a result incomplete.

Full-Text query processing

For the full-text query processing are used the same words and phrases limiters that were defined by the Word breaker during the indexing process. You can also use additional components, as for example, the stemmer and the thesaurus, depending on the full-text predicates (CONTAINS or FREETEXT) used in the textual research. The use of full-text predicates will be discussed later in this article.

The process stemmer generates inflectional forms of the searched words. For example, from the term “play” is searched also the terms “played”, “play”, “play” beyond the term itself “play”.

Through rules created in the thesaurus file you can use synonyms to replace or expand the searched terms. For example, when performing a textual search using the term “Ruby”, the full-text engine can replace it by the synonym “red”, or else expand the research considering the terms automatically “red”, “wine”, “Scarlet” and also “Ruby”.

After processing of the full-text query, the full-text engine provides information to SQL query processor that assist in creating an execution plan optimized for textual research. There is a greater integration between the full-text engine and the query processor of SQL (both are components of the SQL Server process), enabling textual searches are conducted in a more optimized.

In the next post of this 4 part series, we will learn how to install the FTS and how to use it. Stay tuned!

Categories: DBA Blogs