Feed aggregator

Database Migration and Integration using AWS DMS

Kubilay Çilkara - Thu, 2016-05-12 14:12


Amazon Web Services (AWS) recently released a product called AWS Data Migration Services (DMS) to migrate data between databases.

The experiment

I have used AWS DMS to try a migration from a source MySQL database to a target MySQL database, a homogeneous database migration.

The DMS service lets you use a resource in the middle Replication Instance - an automatically created EC2 instance - plus source and target Endpoints. Then you move data from the source database to the target database. Simple as that. DMS is also capable of doing heterogeneous database migrations like from MySQL to Oracle and even synchronous integrations. In addition AWS DMS also gives you a client tool called AWS Schema Converter tool which helps you convert your source database objects like stored procedures to the target database format. All things a cloud data integration project needs!

In my experiment and POC, I was particularly interested in the ability of the tool to move a simple data model as below, with 1-n relationship between tables t0(parent) and t1(child) like below.

(Pseudo code to quickly create two tables t0, t1 with 1-n relationship to try it. Create the tables both on source and target database)

t0 -> t1 Table DDL (Pseudo code)

CREATE TABLE `t0` (
  `id` int(11) NOT NULL,
  `txt` varchar(100) CHARACTER SET ucs2 DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `t1` (
  `id` mediumint(9) NOT NULL AUTO_INCREMENT,
  `t0id` int(9) DEFAULT NULL,
  `txt` char(100) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `t0id` (`t0id`),
  CONSTRAINT `t1_ibfk_1` FOREIGN KEY (`t0id`) REFERENCES `t0` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


In this experiment, I didn't want to see just a migration, a copy, of a table from source database to a target database. I was interested more to see how easy is to migrate a data model - with Primary Key and Foreign Key relationship in place -  from the source database to the target database with zero downtime and using their CDC (Changed data Capture) or Ongoing-Replication migration option and capabilities of AWS DMS. That is, zero downtime database migration.

Here are the results of the experiment.

AWS DMS is ubiquitous, you can quickly set-up an agent (Replication Instance) and define source & target endpoints and start mapping your tables to be migrated from source database to target database with the tool. All conveniently using the AWS console.

Once you setup your replication instance and endpoints, create a Migration Task (say Alpha) and do an initial full migration (load) from the source database to the target database. Do this with the foreign keys (FKs) disabled on the target. This is a recommendation in the AWS DMS Guide in order to dump the data super fast as it does it with parallel threads, at least this is the recommendations for MySQL targets.

Then you can create a second Migration Task (say Beta) using a different endpoint, but this time with the foreign keys enabled on the target. You can do this even before your full load with Alpha to avoid waiting times. Configure Beta interface/task to run forever and let it integrate and sync the delta which occurred during the initial load. You can even start the Beta interface from a cut-off timestamp point. It uses source MySQL database's binlogs to propagate the changes. If you don't create beta interface, that is to use a different endpoint for the target with the parameter which enables the FKs, the DELETE SQL statements on the source which occur during the migration will not propagate to the target correctly and the CASCADEs to the child tables will not work on the target. CASCADE is a property of the Foreign Key.

To reconcile, to find out if you have migrated everything, I had to count the rows in each table on source and the target databases to monitor and see if it all worked. To do that I used Pentaho Spoon CE to quickly create a job to count the rows on both source and target database and validate migration/integration interfaces.

Overall, I found AWS DMS very easy to use, it quickly helps you wire an integration interface in the Cloud and start pumping and syncing data between sources and targets databases be it on Premise or Cloud. A kind of Middleware setup in AWS style, in the Cloud. No more middleware tools for data migration, AWS now has it's own. 
Categories: DBA Blogs

Simbec-Orion Group Streamlines its Clinical Trial Management Processes with Oracle’s eClinical Cloud Suite

Oracle Press Releases - Thu, 2016-05-12 12:25
Press Release
Simbec-Orion Group Streamlines its Clinical Trial Management Processes with Oracle’s eClinical Cloud Suite

Redwood Shores, Calif.—May 12, 2016

As more pharmaceutical companies outsource their clinical trials to contract research organizations (CROs), CROs are challenged to complete the clinical research in a more efficient and timely manner. Oracle Health Sciences today announced that Simbec-Orion Group Limited, a full service, international, boutique CRO, has adopted Oracle’s suite of eClinical cloud solutions to simplify the management of its customer’s clinical studies.

Oracle’s life sciences-specific solutions, combined with its broader product and service offering, are used by thousands of pharma, biopharma, and academic research centers worldwide to redesign clinical R&D processes, advance drug therapies, and optimize care delivery to improve the health, well-being, and lives of people globally.

Simbec-Orion Group selected cloud-based Oracle Health Sciences InForm, Siebel Oracle's Siebel Clinical Trial Management System, and Oracle Health Sciences IRT to supplement its clinical development processes, streamline data capture, and speed up trial timelines as Simbec-Orion expands its international, full-service, business offerings.

“We reviewed a wide range of providers to ensure, that as a world-class CRO, we were working with the best–in-class systems”, said Ronald Openshaw, CEO, Simbec-Orion. “We chose to partner with Oracle, the market leader in eClinical solutions, as we wanted to rely upon an excellent, secure, robust, and state-of-the-art service that will meet current and foreseeable, future, market needs.  We believe the scalability will drive efficiency in our work practices as we grow, especially in the light of increasingly complex clinical trials.”

“Oracle shares Simbec-Orion’s commitment to helping pharma and biotechnology companies deliver high-quality therapies to the patients we serve,” said Steve Rosenberg, senior vice president and general manager, Oracle Health Sciences. “ We look forward to working with Simbec-Orion as it further expands and enhances its clinical development services powered by our innovative technology.”

In 2014 through the merger of the two independent CROs, Simbec Research and Orion Clinical, Simbec-Orion Group was created as an international boutique CRO. Simbec-Orion Group now offers a full spectrum of drug development services from first-in-human Phase I clinical studies, through to pivotal Phase III studies and Phase IV post marketing studies. Starting in January 2016, unless the client expresses different requirements, new studies across the Simbec-Orion Group will use Oracle’s eClinical suite.

Contact Info
Samantha Thai
Oracle
+1.510.917.5564
samantha.thai@oracle.com
About Oracle

Oracle offers a comprehensive and fully integrated stack of cloud applications and platform services. For more information about Oracle (NYSE:ORCL), visit www.oracle.com.

Trademarks

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Safe Harbor

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle Corporation. 

Talk to a Press Contact

Samantha Thai

  • +1.510.917.5564

China-Based Contract Research Organization Selects Oracle Cloud to Optimize Clinical Trial Data Management

Oracle Press Releases - Thu, 2016-05-12 12:25
Press Release
China-Based Contract Research Organization Selects Oracle Cloud to Optimize Clinical Trial Data Management

Redwood Shores, Calif.—May 12, 2016

Fountain Medical Development (FMD), a full-service Contract Research Organization (CRO) headquartered in Beijing, China, and its US affiliate, FMD K&L, have adopted Oracle’s cloud-based clinical data management solutions, to increase operational efficiencies and speed time to market. The solutions deliver increased productivity to the company’s drug development process, reduce overall costs, improve communications with its sponsors, and advance clinical study quality and speed, worldwide.

FMD was able to retire its aging, antiquated, legacy system and migrate to a more cost-effective cloud-based environment with Oracle Health Sciences InForm and Oracle Siebel Clinical Trial Management System (CTMS). The new, unified, technology platform has enhanced capabilities in data management, study planning and design, trial operations, analytics, and patient engagement.

“We are committed to delivering innovative solutions to our sponsors. This is why FMD decided to work with Oracle,” said Dr. Henry Wu, CEO, Fountain Medical Development. “Oracle’s data management solutions will enable us to improve our data quality, advance our trial data management efficiencies, reduce costs, and gain a deeper understanding of our clinical trial data across FMD’s integrated enterprise globally.”

Fountain Medical Development is a leading CRO offering a full range of International Conference on Harmonization Good Clinical Practice (ICH GCP) compliant clinical research services for multinational clients conducting clinical research in China. As one of the fastest growing clinical CROs in East Asia and China, FMD has established 12 operational offices throughout China, Hong Kong, Taiwan, Japan, Korea, the United States, and Armenia, with a strong focus on data management and biostatistics.

By moving to Oracle Health Sciences InForm and Oracle Siebel CTMS), FMD will gain powerful analytics capabilities that will provide a faster and better understanding of its clinical trial data across its locations, saving the company time and effort.

“Clinical research is becoming more complex and holistic by the day, and contract researchers are challenged to provide more services than ever before. Oracle Health Sciences provides global contract research organizations, like FMD, with an integrated platform and advanced analytics tools to help CROs and their sponsors find the cures of tomorrow,” said Steve Rosenberg, senior vice president and general manager, Oracle Health Sciences.

FMD is a member of the Oracle CRO Advantage Program, an initiative that helps drive CRO success with Oracle Health Sciences products through training, sales and marketing support, and bi-directional communication on Oracle products and strategy.

Contact Info
Samantha Thai
Oracle
+1.510.917.5564
samantha.thai@oracle.com
About Oracle

Oracle offers a comprehensive and fully integrated stack of cloud applications and platform services. For more information about Oracle (NYSE:ORCL), visit www.oracle.com.

Trademarks

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Safe Harbor

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle Corporation. 

Talk to a Press Contact

Samantha Thai

  • +1.510.917.5564

Oracle BPM 12c: Browsing the SOAINFRA

Jan Kettenis - Thu, 2016-05-12 12:17
In this article I discuss some tables from the SOAINFRA schema that might be most interesting to use when trying to find out why you don't see in Enterprise Manager what you expect.

Going from 11g to 12c, some things have significantly changed in the SOAINFRA schema. For example, your normal partners in helping with "what happened with my process?" type of queries, like the component_instance, and bpm_process tables, have become obsolete. On the other hand you have new friends with tables like sca_flow_instance, and sca_entity.

The following discusses some tables that you might want to look into when digging in the dirt of the SOA/BPM engine's intestines.

The tables I would like to discuss in more detail are:
- sca_flow_instance
- cube_instance
- wftask
- sca_entity
- bpm_cube_process
- bpm_cube_activity

Given that there is no official documentation on these tables, this is based on my observations an interpretations. No guarantee that these are flawless, so if you have anything to improve or add, let me know!

To better understand the data in the SOAINFRA in relation to an actual process, I used 1 composite with the following processes, that has two subprocesses (another BPM process and a BPEL process). The BPM subprocess has not been implemented as a reusable process (with a Call activity) but instead as a process-as-a-service.






As a side note: originally I created this process to be able to verify how the different states a process and its children can have, are represented in Enterprise Manager. The reason being that on one of my projects there were some doubts if this is always correct, given some issues in the past with 11g. With 12c I could find none. However, as the test case does not concern inter-composite interaction, nor does it include all types of technologies, you could argue that the test case is too limited to conclude anything from it. Also worth to mention is that the instances are ran on a server in development mode, and without in-memory optimization. I have heard rumors that you will observer different behavior when you disabled auditing completely. In some next posting I hope to discuss that as well.

I initiated several instances, for each possible state one:


sca_flow_instanceAs the name already suggests, this table contains 1 entry for each flow instance. You might be interested in the following columns:
  •   flow_id
  •   title
  •   active_component_instances
  •   recoverable_faults
  •   created_time
  •   updated_time

When queried this looks similar to this:

    The query used is like this:

    select sfi.flow_id
    ,      sfi.title
    ,      sfi.active_component_instances
    ,      sfi.recoverable_faults
    ,      sfi.created_time
    ,      sfi.updated_time
    from  sca_flow_instance sfi
    order by sfi.created_time

    cube_instance
    This table contains 1 entry for each component instance in the flow (e.g. bpmn, bpel). You might be interested in the following columns:
    • flow_id
    • composite_label (*)
    • cpst_inst_created_time (**)
    • composite_name
    • composite_revision
    • component_name
    • componenttype
    • state (of the component <== mention)
    • creation_date (incl time)
    • modify_date (incl time)
    • conversation_id

    (*) corresponds with the bpm_cube_process.scalabel
    (**) equals sca_flow_instance.created_time

    When queried this looks similar to this:

    The query used is like this:

    select cis.flow_id
    ,      cis.componenttype
    ,      cis.component_name
    ,      cis.state
    from   cube_instance cis
    order by cis.flow_id


    wftask
    This table contains an entry for each open process activity and open or closed human activity. You might be interested in the following columns:
    • flow_id
    • instanceid
    • processname
    • accesskey (not for human tasks) (*)
    • createddate
    • updateddate
    • (only in case of human tasks, the flex fields)
    • componentname
    • compositename (not for human tasks)
    • conversationid
    • componenttype (***)
    • activityname
    • activityid (****)
    • component_instance_id (only for human tasks)
    • state (*****)

    (*) : the type of activity, e.g. USER_TASK, INCLUSIVE_GATEWAY, END_EVENT
    (**) not for human tasks
    (***) e.g. Workflow, BPMN
    (****) Corresponds with the activityid of bpm_cube_activity. The user activity and its corresponding human task appear to have the same activityid. After the human task is completed, the user activity disappears but the human task is kept with an null state.
    (*****) e.g. OPEN for running activities, ASSIGNED for running human tasks. Other states are ABORTED, PENDING_MIGRATION_SUSPENDED, ERRORED, etc.

    When queried this looks similar to this:


    The query used is like this:

    select wft.instanceid
    ,      wft.processname
    ,      wft.accesskey
    ,      wft.createddate
    ,      wft.updateddate
    ,      wft.componentname
    ,      wft.compositename
    ,      wft.conversationid
    ,      wft.componenttype
    ,      wft.activityname
    ,      wft.activityid
    ,      wft.component_instance_id
    ,      wft.state
    from   wftask wft
    where  wft.flow_id = 130001
    order by wft.updateddate

    sca_entity This table contains an entry for each SCA entity (e.g. service, wire). The following column might be of use:
    •  id
    •  composite (name)
    •  label (corresponds with the scalabel of bpm_cube_process)

    When queried this looks similar to this:


    The query used is like this:

    select sen.composite
    ,      sen.id
    ,      sen.label
    from   sca_entity sen
    where  sen.composite = 'FlowState'
    order by sen.composite

    bpm_cube_processThis table contains metadata. For each deployed composite it contains an entry for each BPM process. If 2 BPM processes in once composite: 2 entries. The following columns might be of use:
    • domainname
    • compositename
    • revision
    • processid
    • processname
    • scalabel
    • compositedn
    • creationdate  (incl time)
    • undeploydate
    • migrationstatus (*)
    (*) Values are LATEST, MIGRATED.

    When queried this looks similar to this:



    The query used is like this:


    select bcp.domainname
    ,      bcp.compositename
    ,      bcp.revision
    ,      bcp.processname
    ,      bcp.processid
    ,      bcp.scalabel
    ,      bcp.compositedn
    ,      bcp.creationdate
    ,      bcp.undeploydate
    ,      bcp.migrationstatus
    from   bpm_cube_process bcp
    where  bcp.compositename = 'FlowState'
    order by bcp.processname
    ,        bcp.creationdate


    bpm_cube_activityThis table contains metadata, There is an entry for each individual activity, event, and gateway of a bpmn process. The following column might be of use:
    • processid (corresponds with the bpm_cube_process.processid)
    • activityid
    • activityname (technical, internal name can be found in the .bpmn source)
    • activitytype (e.g. START_EVENT, SCRIPT_TASK, CALL_ACTIVITY, etc.)
    • label (name as in the BPMN diagram)
    The rows in the example below have been queried by a join with the bpm_cube_process table on processid, where undeploydate is not null and migrationstatus is 'LATEST' to get only the activities of the last revision of one particular process:


    The query used is like this:

    select cbi.flow_id
    ,      cbi.composite_label
    ,      cbi.cpst_inst_created_time
    ,      cbi.composite_name
    ,      cbi.composite_revision
    ,      cbi.component_name
    ,      cbi.componenttype
    ,      cbi.state
    ,      cbi.creation_date
    ,      cbi.modify_date
    ,      cbi.conversation_id
    from   cube_instance cbi
    order by cbi.creation_date

    Obsolete TablesThe following table have become obsolete:
    • bpm_activity
    • bpm_activity_instance
    • bpm_cube_activity_instance
    • bpm_process
    • component_instance
    The composite_instance is still used, but more or less superseded by the sca_flow_instance (although the number of instances are not the same). I do not longer find it useful to query.

    Properly removing users in MySQL

    Pythian Group - Thu, 2016-05-12 09:58

    Removing users in MySQL is an easy task, but there are some drawbacks to check carefully before dropping a user. Not taking these possible issues into consideration can render your application unusable.

    First it is important to understand the concept of user in MySQL. A user has three main functions: authentication, privileges and resources. This is different from other databases: in MySQL users don’t own objects. An object belongs to a database and there is no direct relationship between objects and users. This simplifies the process of deleting a user because no objects will be erased.

    But users, as I wrote before, have an associated set of privileges that define what database sessions can do, and the privileges applied both in stored programs and view execution.

    At the same time, procedures, functions, triggers, and views have two possible security contexts: invoker and definer. If they are created using the invoker security model, then the user calling the routine or view must have enough privileges to execute the code within the routine or view. But if created using the definer security model, the code can be executed if the definer has enough privileges to execute it. And yes, the default security model in MySQL is definer.

    This means that, unless declared otherwise, most routines will check the privileges for the user that created the routine. If somebody removes that user, querying the view or executing the code will fail with error. All the procedures, functions, views and triggers created by that user with the default options will become unusable.

    How do we fix this? The quick and dirty way is to recreate the user with a different password to avoid user logins. It is not an elegant solution but probably this is the first thing you will do while you solve the problem correctly.  Another alternative, if you are running MySQL 5.7 is account locking, this feature disables login for that account but allows code and views to be executed. In any case, it is a good practice to make a backup of the user creation scripts prior to dropping the user. Percona toolkit has the utility pt-show-grants for that purpose.

    The elegant way to avoid the problem is to check that there are not routines or views using the definer security model and configured to run with the user privileges of the user you want to remove. There are several tables in the mysql user catalog that provide you with this information.

    Unfortunately there is not an easy way to change this attributes. The best thing you can do is drop and recreate those objects using different security characteristics.

    Let see an example:

    Connect to the database using an account with enough privileges to create users, databases and procedures and create a new database only for testing purposes.

    mysql> create database remove_test;
    Query OK, 1 row affected (0,05 sec)

    Create a user with privileges on the database created in previous step:

    mysql> grant all privileges on remove_test.* to test_user@'%' identified by 'test';
    Query OK, 0 rows affected, 1 warning (0,20 sec)

    Connect to the database using the user created:

    $ mysql -u test_user -ptest remove_test

    Create a view, check the attributes and execute it. We are going to use a couple of functions that I will explain a bit later.

    mysql> create view test_view as select current_user(),user();
    Query OK, 0 rows affected (0,05 sec)
    mysql> show create view test_view\G
    *************************** 1. row ***************************
                    View: test_view
             Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`test_user`@`%` SQL SECURITY DEFINER VIEW `test_view` AS select current_user() AS `current_user()`,user() AS `user()`
    character_set_client: utf8
    collation_connection: utf8_general_ci
    1 row in set (0,00 sec)
    
    mysql> select * from test_view;
    +----------------+---------------------+
    | current_user() | user()              |
    +----------------+---------------------+
    | test_user@%    | test_user@localhost |
    +----------------+---------------------+
    1 row in set (0,00 sec)

    As we can see, the view has been created with SQL SECURITY DEFINER and DEFINER=`test_user`@`%`. The view returns the value from two functions: current_user() returns the value that matches against the mysql.user table and that defines the privileges the current session or current execution context has. The function user() returns the host you are connected from and the username specified as connection identifier.

    Now we reconnect to the database using the privileged account from the first step, and requery the view:

    $ mysql -u root -p remove_test
    mysql> select * from test_view;
    +----------------+----------------+
    | current_user() | user()         |
    +----------------+----------------+
    | test_user@%    | root@localhost |
    +----------------+----------------+
    1 row in set (0,00 sec)

    The view is executed by root@localhost but with the privileges of test_user@%. It is important to note that if you monitor the execution of a routine or view, using SHOW PROCESSLIST or query against information_schema.processlist, the contents of user and host are the same that current_user() return, not the ones returned by user().

    Now we will drop the user and query once again the view.

    mysql> drop user test_user@'%';
    Query OK, 0 rows affected (0,07 sec)
    mysql> select * from test_view;
    ERROR 1449 (HY000): The user specified as a definer ('test_user'@'%') does not exist

    In case you don’t see the error and still get the correct results, this is because the contents of the view are stored in the query cache. Clear their contents and repeat the query.

    mysql> reset query cache;
    Query OK, 0 rows affected (0,00 sec)

    How to validate if it is safe to remove a user? You must query the information_schema tables to find objects than can bring problems and recreate them.

    mysql> select routine_schema as db,
        ->        routine_type as object_type,
        ->        routine_name as object_name
        -> from routines
        -> where security_type='DEFINER'
        ->   and definer='test_user@%'
        -> union all
        -> select trigger_schema as db,
        ->        'TRIGGER' as object_type,
        ->         trigger_name as object_name
        -> from triggers
        -> where definer='test_user@%'
        -> union all
        -> select table_schema as db,
        ->        'VIEW' as object_type,
        -> table_name as object_name
        -> from views
        -> where security_type='DEFINER'
        ->   and definer='test_user@%';
    +-------------+-------------+-------------+
    | db          | object_type | object_name |
    +-------------+-------------+-------------+
    | remove_test | VIEW        | test_view   |
    +-------------+-------------+-------------+
    1 row in set (0,02 sec)

    Replace test_user@% with the value of the user you want to remove.

    This way you get a list of the objects you must change to avoid problems. As I said before the best thing you can do to change the definer is recreating the objects. To obtain the creation script you should use SHOW CREATE VIEW, SHOW CREATE TRIGGER, SHOW CREATE PROCEDURE or SHOW CREATE FUNCTION.

    If you want to avoid problems in the future, my recommendation is that for every application, create an account for that application, lock it with an unknown password or using account locking feature. Grant that account all the privileges needed to run the application and make sure all the views, procedures, functions and triggers have that account in the definer field.

    Categories: DBA Blogs

    Oracle AVG Function with Examples

    Complete IT Professional - Thu, 2016-05-12 06:00
    The Oracle AVG function is a common function in Oracle SQL. I’ll explain how to use it in this article and show you some examples. Purpose of the Oracle AVG Function The AVG function calculates the average of a series of values that you provide to it. Most of the time, this will be a […]
    Categories: Development

    technology behind DBaaS

    Pat Shuff - Thu, 2016-05-12 02:07
    Before we can analyze different use cases we need to first look at a couple of things that enable these use cases. The foundation for most of these use cases is data replication. We need to be able to replicate data from our on-premise database into a cloud database. The first issue is replicating data and the second is access rights to the data and database allowing you to pull the data into your cloud database.

    Let's first look at how data is stored in a database. If you use a Linux operating system, this is typically done by splitting information into four categories; ORACLE_HOME, +DATA, +FRA, and +RECO. The binaries that represent the database and all of the database processes go into the ORACLE_HOME or ORACLE_BASE. In the cloud this is dropped into /u01. If you are using non-rac the file system is a logical volume manager (LVM) where you stripe multiple disks to mirror or triple mirror data to keep a single disk failure from bringing down your database or data. If you are using a rac database this goes into ASM. ASM is a disk technology that manages replication and performance. There are a variety of books and websites written on this technology

    LVM links

    ASM links

    The reason why we go into storage technologies is that we need to know how to manage how and where data is stored in our DBaaS. If we access everything with IaaS and roll out raw compute and storage, we need to know how to scale up storage if we run out of space. With DBaaS this is done with the scale up menu item. We can grow the file system by adding logical units to our instance and grow the space allocated for data storage or data logging.

    The second file system that we should focus on is the +DATA area. This is where data is stored and all of our file extents and tables are located. For our Linux cloud database this is auto-provisioned into /u02. In our test system we create a 25 GB data area and get a 20G file system in the +DATA area.

    If we look at the /u02 file system we notice that there is one major directory /u02/app/oracle/oradata. In the oradata there is one directory associated with the ORACLE_SID. In our example we called it ORCL. In this directory we have the control01.dbf, sysaux01.dbf, system01.dbf, temp01.dbf, undotbs01.dbf, and users01.dbf. These files are the place where data is stored for the ORCL SID. There is also a PDB1 directory in this file structure. This correlates to the pluggable database that we called PDB1. The files in this directory correspond to the tables, system, and user information relating to this pluggable database. If we create a second pluggable a new directory is created and all of these files are created in that directory. The users01.dbf, PDB1_users01.pdf in the PDB1 directory, file defines all of the users and their access rights. The system01.dbf file defines the tables and system level structures. In a pluggable database the system01 file defines the structures for the PDB1 and not the entire database. The temp01.dbf holds temp data tables and scratch areas. The sysaux01.dbf contains the system information contains the control area structures and management information. The undotbs01.dbf is the flashback area so that we can look at information that was stored three days ago in a table. Note that there is no undotbs01.dbf file in the pluggable because this is done at a global area and not at the pluggable layer. Backups are done for the SID and not each PID. Tuning of memory and system tunables are done at the SID layer as well.

    Now that we have looked at the files corresponding to tables and table extents, we can talk about data replication. If you follow the methodology of EMC and NetApp you should be able to replicate the dbf files between two file systems. Products like SnapMirror allow you to block copy any changes that happen to the file to another file system in another data center. This is difficult to do between an on-premise server and cloud instance. The way that EMC and NetApp do this are in the controller layer. They log write changes to the disk, track what blocks get changed, and communicate the changes to the other controller on the target system. The target system takes these block changes, figures out what actual blocks they correspond to on their disk layout and update the blocks as needed. This does not work in a cloud storage instance. We deal on a file layer and not on a track and sector or bock layer. The fundamental problem with this data replication mechanism is that you must restart or ingest the new file into the database. The database server does not do well if files change under it because it tends to cache information in memory and indexes into data get broken if data is moved to another location. This type of replication is good if you have an hour or more recovery point objective. If you are looking at minutes replication you will need to go with something like DataGuard, GoldenGate, or Active DataGuard.

    DataGuard works similar to the block change recording but does so at the database layer and not the file system/block layer. When an update or insert command is executed in the database, these changes are written to the /u04 directory. In our example the +REDO area is allocated for 9.8 GB of disk. If we look at our /u04 structure we see /u04/app/oracle/redo contains redoXX.log file. With DataGuard we take these redo files, compress them, and transfer them to our target system. The target system takes the redo file, uncompresses it, and applies the changes to the database. You can structure the changes either as physical logging or logical logging. Physical logging allows you to translate everything in the database and records the block level changes. Logic logging takes the actual select statement and replicates it to the target system. The target system either inserts the physical changes into the file or executes the select statement on the target database. The physical system is used more than the logical replication because logical has limitations on some of the statements. For example, any blob or file operations can not translate to the target system because you can't guarantee that the file structure is the same between the two systems. There are a variety of books available on DataGuard. It is also important to note that DataGuard is not available for Standard Edition and Enterprise Edition but for High Performance Edition and Extreme Performance Edition only.

    • Oracle Data Guard 11g Handbook
    • Oracle Dataguard: Standby Database Failover Handbook
    • Creating a Physical Standby Documentation
    • Creating a Logical Standby Documentation

      Golden Gate is a similar process but there is an intermediary agent that takes the redo log, analyzes it, and translates it into the target system. This allows us to take data from an Oracle database and replicate it to SQL Server. It also allows us to go in the other direction. SQL Server, for example, is typically used for SCADA or process control systems. The Oracle database is typically used for analytics and heavy duty number crunching on a much larger scale. If we want to look at how our process control systems is operating in relation to our budget we will want to pull in the data for the process systems and look at how much we spend on each system. We can do this by either selecting data from the SQL Server or replicating the data into a table on the Oracle system. If we are doing complex join statements and pulling data in from multiple tables we would typically want to do this on one system rather than pulling the data across the network multiple times. Golden Gate allows us to pull the data into a local table and perform the complex select statements without having to suffer network latency more than the initial copy. Golden Gate is a separate product that you must pay for either on-premise or in the cloud. If you are replicating between two Oracle databases you could use Active DataGuard to make this work and this is available as part of Extreme Edition of the database.

      The /u03 area in our file system is where backups are placed. The file system for our sample system shows /u03/app/oracle/fast_recovery_area/ORCL. The ORCL is the ORACLE_SID of our installation. Note that there is no PDB1 area because all of the backup data is done at the system layer and not at the pluggable layer. The tool used to backup the database is RMAN. There are a variety of books available to help with RMAN as well as an RMAN online tutorial

      It is important to note that RMAN requires a system level access to the database. Amazon RDS does not allow you to replicate your data using RMAN but uses a volume snapshot and copies this to another zone. The impact of this is that first, you can not get your data out of Amazon with a backup and you can not copy your changes and data from the Amazon RDS to your on-premise system. The second impact is that you can't use Amazon RDS for DataGuard. You don't have sys access into the database which is required to setup DataGuard and you don't have access to a filesystem to copy the redo logs to drop into. To make this available with Amazon you need to deploy the Oracle database into EC2 with S3 storage as the back end. The same is true with Azure. Everything is deployed into raw compute and you have to install the Oracle database on top of the operating system. This is more of an IaaS play and not a PaaS play. You loose patching of the OS and database, automated backups, and automatic restart of the database if something fails. You also need to lay out the file system on your own and select LVM or some other clustering file system to prevent data loss from a single disk corruption. All of this is done for you with PaaS and DBaaS. Oracle does offer a manual process to perform backups without having to dive deep into RMAN technology. If you are making a change to your instance and want a backup copy before you make the change, you can backup your instance manually and not have to wait for the automated backup. You can also change the timing if 2am does not work for your backup and need to move it to 4am instead.

      We started this conversation talking about growing a table because we ran out of space. With the Amazon and Azure solutions, this must be done manually. You have to attach a new logical unit, map it into the file system, grow the file system, and potentially reboot the operating system. With the Oracle DBaaS we have the option of growing the file system either as a new logical unit, grow the /u02 file system to handle more table spaces, or grow the /u03 file system to handle more backup space.

      Once we finish our scale up the /u03 file system is no longer 20 GB but 1020 GB in size. The PaaS management console allocates the storage, attaches the storage to the instance, grows the logical volume to fill the additional space, and grows the file system to handle the additional storage. It is important to note that we did not require root privileges to do any of these operations. The DBA or cloud admin can scale up the database and expand table resources. We did not need to involve an operating system administrator. We did not need to request an additional logical unit from the storage admin. We did not need to get a senior DBA to reconfigure the system. All of this can be done either by a junior DBA or an automated script to grow the file system if we run out of space. The only thing missing for the automated script is a monitoring tool to recognize that we are running into a limit. The Oracle Enterprise Manager (OEM) 12c and 13c can do this monitoring and kick off processes if thresholds are crossed. It is important to note that you can not use OEM with Amazon RDS because you don't have root, file system, or system access to the installation which is required to install the OEM agent.

      In summary, we looked at the file system structure that is required to replicate data between two instances. We talked about how many people use third party disk replication technologies to "snap mirror" between two disk installations and talked about how this does not work when replicating from an on-premise to a cloud instance. We talked about DataGuard and GoldenGate replication to allow us to replicate data to the cloud and to our data center. We looked at some of the advantages of using DBaaS rather than database on IaaS to grow the file system and backup the database. Operations like backup, growing the file system, and adding or removing processors temporarily can be done by a cloud admin or junior DBA. These features required multiple people to make this happen in the past. All of these technologies are needed when we start talking about use cases. Most of the use cases assume that the data and data structures that exist in your on-premise database also exist in the cloud and that you can replicate data to the cloud as well as back from the cloud. If you are going to run a disaster recovery instance in the cloud, you need to be able to copy your changes to the cloud, make the cloud a primary instance, and replicate the changes back to your data center once you bring your database back online. The same is true for development and testing. It is important to be able to attach to both your on-premise database and database provisioned in the cloud and look at the differences between the two configurations.

    IBM Bluemix Dedicated/Local Status Page

    Pas Apicella - Wed, 2016-05-11 20:37
    With Bluemix Public you can view the status page which details all the runtimes and services and thier current status on all 3 PUBLIC regions. Those customers with Bluemix Dedicated or Local get a status page which includes a column on the status of thier Dedicated or Local instance.

    To navigate to it perform the following steps:

    1. Log into your Bluemix dedicated or local instance web console

    2. Click on the Status link which is accessed through the profile icon on the top right hand corner


    3. You will see a table as follows as well as status messages to indicate the current status of your own Bluemix Local or Dedicated Environment.



    More Information

    https://console.ng.bluemix.net/docs/admin/index.html#oc_status
    Categories: Fusion Middleware

    Maven: how to copy files after a build into several distribution directories

    XTended Oracle SQL - Wed, 2016-05-11 17:37

    Sometimes it is convenient to copy jar-files automatically after a build into several different directories, for example if you have different config files for local tests and for office test server, then you may want to copy these files into local test directory, internal office test server and public distribution directory.
    This short part of pom.xml contains 2 different methods to build UBER-JAR and copying them into 3 different directories: localtest, officetest and public

        <build>
            <plugins>
                <plugin>
                    <artifactId>maven-assembly-plugin</artifactId>
                    <version>2.6</version>
                    <configuration>
                        <archive>
                            <manifest>
                                <mainClass>tv.tmd.YourMainClass</mainClass>
                            </manifest>
                            <manifestEntries>
                                <Class-Path>.</Class-Path>
                            </manifestEntries>
                        </archive>
                        <descriptorRefs>
                            <descriptorRef>jar-with-dependencies</descriptorRef>
                        </descriptorRefs>
                    </configuration>
                    <executions>
                        <execution>
                            <id>make-assembly</id>
                            <phase>package</phase>
                            <goals>
                                <goal>single</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
    
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-shade-plugin</artifactId>
                    <version>2.4.3</version>
                    <executions>
                        <execution>
                            <phase>package</phase>
                            <goals>
                                <goal>shade</goal>
                            </goals>
                            <configuration>
                                <transformers>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                        <manifestEntries>
                                            <Main-Class>tv.tmd.YourMainClass</Main-Class>
                                            <Build-Number>2</Build-Number>
                                        </manifestEntries>
                                    </transformer>
                                </transformers>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
    
                <plugin>
                    <artifactId>maven-antrun-plugin</artifactId>
                    <version>1.8</version>
                    <executions>
                        <execution>
                            <id>copy</id>
                            <phase>package</phase>
                            <configuration>
                                <target>
                                    <echo>ANT TASK - copying files....</echo>
                                    <copy todir="${basedir}/distribution/localtest" overwrite="true" flatten="true">
                                        <fileset dir="${basedir}" includes="*.bat" >
                                            <include name="*.bat" />
                                            <include name="ReadME.*" />
                                            <include name="target/*.jar" />
                                        </fileset>
                                    </copy>
    
                                    <copy todir="${basedir}/distribution/officetest" overwrite="true" flatten="true">
                                        <fileset dir="${basedir}" includes="*.bat" >
                                            <include name="*.bat" />
                                            <include name="ReadME.*" />
                                            <include name="target/*.jar" />
                                        </fileset>
                                    </copy>
    
                                    <copy todir="${basedir}/distribution/public" overwrite="true" flatten="true">
                                        <fileset dir="${basedir}" includes="*.bat" >
                                            <include name="*.bat" />
                                            <include name="ReadME.*" />
                                            <include name="target/*.jar" />
                                        </fileset>
                                    </copy>
                                </target>
                            </configuration>
                            <goals>
                                <goal>run</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    
    Categories: Development

    Log Buffer #473: A Carnival of the Vanities for DBAs

    Pythian Group - Wed, 2016-05-11 16:32

    This Log Buffer Edition covers some of the hottest blog posts of Oracle, SQL Server and MySQL for the previous week.

    Oracle:

    A recent post on the OTN database forum reminded Jonathan how easy it is to forget to keep repeating a piece of information after the first couple of hundred times you’ve explained it. No matter how “intuitively obvious” it is for one person, it’s new to someone else.

    The OMS Patcher is a newer patching mechanism for the OMS specifically.

    A SQLT report has all kinds of pertinent information including—to name just a few—optimizer settings, indexes, statistics, plan history, and view definitions.

    Joins are fundamental in SQL, and are used in most every production query. There are three types in particular that every developer should fully understand.

    Why should Oracle Database professionals care about NoSQL and where to start?

    SQL Server:

    Understanding OPENJSON and FOR JSON in Azure and SQL 2016

    Using Data Compression in Master Data Services 2016

    The most frustrating thing with any new system is often just working out how to connect to it.

    There are several decisions to be made when designing indexes for Memory-optimized tables in In-Memory OLTP, and to make the best choice, it pays to understand something about the nature and usage of memory-optimised indexes.

    Database Hardware and Infrastructure Trends

    STRING_SPLIT() in SQL Server 2016 : Follow-Up #2

    MySQL:

    Why would I run MySQL/MariaDB on POWER anyway?

    By default, the write() system call returns after all data has been copied from the user space file descriptor into the kernel space buffers. There is no guarantee that data has actually reached the physical storage.

    OpenCPS: Vietnam’s Public Sector goes Open Source

    MySQL uses metadata locking to manage concurrent access to database objects, and to ensure data consistency when performing modifications to the schema

    Using Docker for development is a great way of ensuring that what you develop will be the same that you deploy in production.

    Categories: DBA Blogs

    Defining Digital Transformation

    Pythian Group - Wed, 2016-05-11 15:33

     

    Terminology is important—and it’s particularly important for us to define terms that are central to what we do. So when it comes to the subject of digital transformation, what exactly are we talking about?

     

    In speaking with clients and industry thought leaders, I’ve come to realize that the term “digital transformation” has a different meaning to different people. For a term that is so widely used — and that, on its surface, seems pretty straightforward — the range of interpretation is remarkable. It’s a bit like when we say “I’ll do it later.”  “Later” to one person means “before the sun goes down today.” “Later” to another person means “sometime in the future”, and it could mean days or weeks in their mind. “Later” to a third person can mean “I have no plans to do it, and this is my way of telling you nicely.”

     

    Because the term is so essential to the work we do for our clients, I thought it would be helpful to define what digital transformation means to us here at Pythian. There’s so much we can say on the topic, so I plan to follow up with a series of articles about how I’ve seen it implemented, or worse, not implemented or even embraced as a concept.

     

    To start, “digital transformation” is about technology. I know that to some people it isn’t, but I disagree. These days, you can’t transform your business without technology. It’s not about which technology you choose, as much as it’s about how to use it. Even more specifically, we’ve found that the businesses that are achieving positive transformation are using technology to capitalize on data. I have yet to see a single transformation project that didn’t use data as a major component of its success.

     

    Let’s look at the term “transformation.” This equates to change, but it doesn’t mean change for its own sake. The change we’re talking about has to benefit the business. However, the factor that can make or break successful change is people. Their attitudes, preconceptions, and ideas almost always have to be aligned with the change for successful transformation to occur. People need to get behind the initiative, people have to fund it, people have to develop it, and people have to support it once it’s developed. And we all know that getting people to change can be more difficult than developing any new technology. In short, the transformative capabilities inherent in technology can only be realized when coupled with the willingness to embrace change.

    Why Digital Transformation?

    Why is the concept of digital transformation important in the first place? At Pythian, we believe that it’s about using technology and data to change your business for the better. What do we mean when we say “for the better”? Therein lies the controversy.  “For the better” means different things to different people depending on their company’s key objectives.

     

    “For the better” can mean:

    • Becoming more efficient to drive costs down so your profitability can improve
    • Reducing mistakes and improving your reputation, or the quality of your product
    • Differentiating your product to get ahead of the competition
    • Doing what you do, only faster than your competitors
    • Creating new revenue streams
    • Improving the customer experience. This is a big one, so I will dedicate an entire blog post to exploring exactly what it means.

     

    Digital transformation is the key to achieving any one, or all of these benefits, and knowing your objectives and priorities will help you shape your digital transformation initiative. So to start, focus less on what digital transformation is, and more on what you want the outcome of a transformation to be.

     

    Categories: DBA Blogs

    select from table with no direct relation or foriegn keys

    Learn DB Concepts with me... - Wed, 2016-05-11 12:50
    SELECT
      E.EMPNO,
      E.ENAME,
      E.JOB,
      D.DEPTNO,
      D.LOC,
      E.SAL
    FROM
      scott.emp E
    LEFT JOIN SCOTT.DEPT D
    ON
      E.DEPTNO=D.DEPTNO;





    SELECT
      E.EMPNO,
      E.ENAME,
      E.JOB,
      D.DEPTNO,
      D.LOC,
      E.SAL,
      (
        SELECT      grade
        FROM
          SCOTT.SALGRADE S
        WHERE
          E.SAL BETWEEN S.LOSAL AND S.HISAL
      ) AS SALGRADE
    FROM
      scott.emp E
    LEFT JOIN SCOTT.DEPT D
    ON
      E.DEPTNO=D.DEPTNO;










    Categories: DBA Blogs

    GoldenGate 12.2 Big Data Adapters: part 4 – HBASE

    Pythian Group - Wed, 2016-05-11 11:51

    This is the next post in my series about Oracle GoldenGate Big Data adapters. Here is list of all posts in the series:

    1. GoldenGate 12.2 Big Data Adapters: part 1 – HDFS
    2. GoldenGate 12.2 Big Data Adapters: part 2 – Flume
    3. GoldenGate 12.2 Big Data Adapters: part 3 – Kafka
    4. GoldenGate 12.2 Big Data Adapters: part 4 – HBASE

    In this post I am going to explore HBASE adapter for GoldenGate. Let’s start by recalling what we know about HBASE. The Apache HBASE is non-relational, distributed database. It has been modelled after the Google’s Bigtable distributed database. It can provide read write access to the data and is based on top of Hadoop or HDFS.

    So, what does it tell us? First, we can write and change the data. Second, we need to remember that it is non-relation database and it is a bit of a different approach to data in comparison with traditional relation databases. You can think about HBase as about a key-value store. We are not going deep inside HBASE architecture and internals here, since our main task is to test Oracle GoldenGate adapter and see how it works. Our configuration has an Oracle database as a source with a GoldenGate extract and target system where we have Oracle GoldenGate for BigData.

    We have more information about setting up the source and target in the first post in the series about HDFS adapter. The source side replication part has already been configured and started. We have initial trail file for data initialization and trails for the ongoing replication. We capture changes for all tables in the ggtest schema on the oracle database.
    Now we need to prepare our target site. Let’s start from HBase. I used a pseudo-distributed mode for my tests where I ran a fully-distributed mode on a single host. It is not acceptable for any production configuration but will suffice for our tests. On the same box I have HDFS to serve as a main storage. Oracle documentation for the adapter states that they support HBase from version 1.0.x . In my first attempt I tried to use HBase version 1.0.0 (Cloudera 5.6) but it didn’t work. I got errors in the GoldenGate and my extract was aborted.
    Here is the error :

     
    2016-03-29 11:51:31  ERROR   OGG-15051  Oracle GoldenGate Delivery, irhbase.prm:  Java or JNI exception:
    java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)Lorg/apache/hadoop/hbase/HTableDescriptor;.
    2016-03-29 11:51:31  ERROR   OGG-01668  Oracle GoldenGate Delivery, irhbase.prm:  PROCESS ABENDING.
    

    So, I installed another version HBase and the version 1.1.4 worked just fine. I used simple, standard HBase configuration for pseudo-distributed mode where region server was on the same host as master and hbase.rootdir point to local hdfs.
    Here is example of configuration:

    <configuration>
    <property>
      <name>hbase.cluster.distributed</name>
        <value>true</value>
        </property>
        <property>
          <name>hbase.rootdir</name>
            <value>hdfs://localhost:8020/user/oracle/hbase</value>
            </property>
    </configuration>
    [root@sandbox conf]# cat regionservers
    localhost
    [root@sandbox conf]#
    

    As soon as we have HBase setup and running we can switch our attention to GoldenGate instead. We have already a trail file with initial load. Now we need to prepare our configuration files for initial and ongoing replication. Let’s go to our GoldenGate for Big Data home directory and prepare everything. In first, we need a hbase.conf file copied from $OGG_HOME/AdapterExamples/big-data/hbase directory to $OGG_HOME/dirprm. I left everything as it used to be in the original file changing only gg.classpath parameter to point it to my configuration files and libs for HBase.
    Here is an example of the configuration files:

    [oracle@sandbox oggbd]$ cat dirprm/hbase.props
    
    gg.handlerlist=hbase
    
    gg.handler.hbase.type=hbase
    gg.handler.hbase.hBaseColumnFamilyName=cf
    gg.handler.hbase.keyValueDelimiter=CDATA[=]
    gg.handler.hbase.keyValuePairDelimiter=CDATA[,]
    gg.handler.hbase.encoding=UTF-8
    gg.handler.hbase.pkUpdateHandling=abend
    gg.handler.hbase.nullValueRepresentation=CDATA[NULL]
    gg.handler.hbase.authType=none
    gg.handler.hbase.includeTokens=false
    
    gg.handler.hbase.mode=tx
    
    goldengate.userexit.timestamp=utc
    goldengate.userexit.writers=javawriter
    javawriter.stats.display=TRUE
    javawriter.stats.full=TRUE
    
    gg.log=log4j
    gg.log.level=INFO
    
    gg.report.time=30sec
    
    gg.classpath=/u01/hbase/lib/*:/u01/hbase/conf:/usr/lib/hadoop/client/*
    
    javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar
    
    

    In second, we have to prepare a parameter file for our initial load. I used a simple file with minimum parameters.

    [oracle@sandbox oggbd]$ cat dirprm/irhbase.prm
    -- passive REPLICAT irhbase
    -- Trail file for this example is located in "./dirdat/initld" file
    -- Command to add REPLICAT
    -- run replicat irhbase:
    -- ./replicat paramfile dirprm/irhbase.prm reportfile dirrpt/irhbase.rpt
    SPECIALRUN
    END RUNTIME
    EXTFILE /u01/oggbd/dirdat/initld
    TARGETDB LIBFILE libggjava.so SET property=dirprm/hbase.props
    REPORTCOUNT EVERY 1 MINUTES, RATE
    GROUPTRANSOPS 10000
    MAP GGTEST.*, TARGET BDTEST.*;
    

    Having that configuration file we can run the replicat in passive mode from command line and see the result.
    Here is initial status for HBASE:

    hbase(main):001:0> version
    1.1.4, r14c0e77956f9bb4c6edf0378474264843e4a82c3, Wed Mar 16 21:18:26 PDT 2016
    
    hbase(main):001:0> list
    TABLE
    0 row(s) in 0.3340 seconds
    
    => []
    hbase(main):002:0>
    

    Running the replicat:

    oracle@sandbox oggbd]$ ./replicat paramfile dirprm/irhbase.prm reportfile dirrpt/irhbase.rpt
    [oracle@sandbox oggbd]$
    

    Now we have 2 tables in HBASE:

    hbase(main):002:0> list
    TABLE
    BDTEST:TEST_TAB_1
    BDTEST:TEST_TAB_2
    2 row(s) in 0.3680 seconds
    
    => ["BDTEST:TEST_TAB_1", "BDTEST:TEST_TAB_2"]
    hbase(main):003:0>
    

    Let’s have a look to the tables structure and contains:

    
    hbase(main):004:0> describe 'BDTEST:TEST_TAB_1'
    Table BDTEST:TEST_TAB_1 is ENABLED
    BDTEST:TEST_TAB_1
    COLUMN FAMILIES DESCRIPTION
    {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI
    N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
    1 row(s) in 0.2090 seconds
    
    hbase(main):005:0> scan 'BDTEST:TEST_TAB_1'
    ROW                                            COLUMN+CELL
     1                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-01-22:12:14:30
     1                                             column=cf:PK_ID, timestamp=1459269153102, value=1
     1                                             column=cf:RND_STR, timestamp=1459269153102, value=371O62FX
     1                                             column=cf:RND_STR_1, timestamp=1459269153102, value=RJ68QYM5
     1                                             column=cf:USE_DATE, timestamp=1459269153102, value=2014-01-24:19:09:20
     2                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-05-11:05:23:23
     2                                             column=cf:PK_ID, timestamp=1459269153102, value=2
     2                                             column=cf:RND_STR, timestamp=1459269153102, value=371O62FX
     2                                             column=cf:RND_STR_1, timestamp=1459269153102, value=HW82LI73
     2                                             column=cf:USE_DATE, timestamp=1459269153102, value=2014-01-24:19:09:20
     3                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-01-22:12:14:30
     3                                             column=cf:PK_ID, timestamp=1459269153102, value=3
     3                                             column=cf:RND_STR, timestamp=1459269153102, value=RXZT5VUN
     3                                             column=cf:RND_STR_1, timestamp=1459269153102, value=RJ68QYM5
     3                                             column=cf:USE_DATE, timestamp=1459269153102, value=2013-09-04:23:32:56
     4                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-05-11:05:23:23
     4                                             column=cf:PK_ID, timestamp=1459269153102, value=4
     4                                             column=cf:RND_STR, timestamp=1459269153102, value=RXZT5VUN
     4                                             column=cf:RND_STR_1, timestamp=1459269153102, value=HW82LI73
     4                                             column=cf:USE_DATE, timestamp=1459269153102, value=2013-09-04:23:32:56
    4 row(s) in 0.1630 seconds
    
    hbase(main):006:0> scan 'BDTEST:TEST_TAB_2'
    ROW                                            COLUMN+CELL
     7|IJWQRO7T|2013-07-07:08:13:52                column=cf:ACC_DATE, timestamp=1459269153132, value=2013-07-07:08:13:52
     7|IJWQRO7T|2013-07-07:08:13:52                column=cf:PK_ID, timestamp=1459269153132, value=7
     7|IJWQRO7T|2013-07-07:08:13:52                column=cf:RND_STR_1, timestamp=1459269153132, value=IJWQRO7T
    1 row(s) in 0.0390 seconds
    
    hbase(main):007:0>
    
    

    Everything looks good for me. We have structure and records as expected. Let’s go forward and setup ongoing replication.
    I have created a parameter file for my replicat using the the initial load parameters as a basis:

    [oracle@sandbox oggbd]$ cat dirprm/rhbase.prm
    REPLICAT rhbase
    -- Trail file for this example is located in "dirdat/or" directory
    -- Command to add REPLICAT
    -- add replicat rhbase, exttrail dirdat/or
    TARGETDB LIBFILE libggjava.so SET property=dirprm/hbase.props
    REPORTCOUNT EVERY 1 MINUTES, RATE
    GROUPTRANSOPS 10000
    MAP ggtest.*, TARGET bdtest.*;
    

    We are checking our trail files and starting our replicat using the latest trail file. By default, a replicat would be looking for a trail with sequential number 0, but, since I have a purging policy on my GoldenGate it deletes old files and I need tell to replicat where to start exactly.

    [oracle@sandbox oggbd]$ ll dirdat/
    total 4940
    -rw-r-----. 1 oracle oinstall    3028 Feb 16 14:17 initld
    -rw-r-----. 1 oracle oinstall 2015199 Mar 24 13:07 or000043
    -rw-r-----. 1 oracle oinstall 2015229 Mar 24 13:08 or000044
    -rw-r-----. 1 oracle oinstall 1018490 Mar 24 13:09 or000045
    [oracle@sandbox oggbd]$ ggsci
    
    Oracle GoldenGate Command Interpreter
    Version 12.2.0.1.0 OGGCORE_12.2.0.1.0_PLATFORMS_151101.1925.2
    Linux, x64, 64bit (optimized), Generic on Nov 10 2015 16:18:12
    Operating system character set identified as UTF-8.
    
    Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved.
    
    
    
    GGSCI (sandbox.localdomain) 1> info all
    
    Program     Status      Group       Lag at Chkpt  Time Since Chkpt
    
    MANAGER     RUNNING
    
    
    GGSCI (sandbox.localdomain) 2> add replicat rhbase, exttrail dirdat/or,EXTSEQNO 45
    REPLICAT added.
    
    
    GGSCI (sandbox.localdomain) 3> start replicat rhbase
    
    Sending START request to MANAGER ...
    REPLICAT RHBASE starting
    
    
    GGSCI (sandbox.localdomain) 4> info all
    
    Program     Status      Group       Lag at Chkpt  Time Since Chkpt
    
    MANAGER     RUNNING
    REPLICAT    RUNNING     RHBASE      00:00:00      00:00:06
    
    
    GGSCI (sandbox.localdomain) 5> info rhbase
    
    REPLICAT   RHBASE    Last Started 2016-03-29 12:56   Status RUNNING
    Checkpoint Lag       00:00:00 (updated 00:00:08 ago)
    Process ID           27277
    Log Read Checkpoint  File dirdat/or000045
                         2016-03-24 13:09:02.000274  RBA 1018490
    
    
    GGSCI (sandbox.localdomain) 6>
    
    

    I inserted number of rows to test_tab_1 on oracle side and all of them were successfully replicated to HBASE.

    hbase(main):015:0> count 'BDTEST:TEST_TAB_1'
    Current count: 1000, row: 1005694
    Current count: 2000, row: 442
    Current count: 3000, row: 6333
    3473 row(s) in 1.0810 seconds
    
    => 3473
    hbase(main):016:0>
    

    Let’s have a look bit close to test_tab_1 and test_tab_2:

    hbase(main):005:0> scan 'BDTEST:TEST_TAB_1'
    ROW                                            COLUMN+CELL
     1                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-01-22:12:14:30
     1                                             column=cf:PK_ID, timestamp=1459269153102, value=1
     1                                             column=cf:RND_STR, timestamp=1459269153102, value=371O62FX
     1                                             column=cf:RND_STR_1, timestamp=1459269153102, value=RJ68QYM5
     1                                             column=cf:USE_DATE, timestamp=1459269153102, value=2014-01-24:19:09:20
     2                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-05-11:05:23:23
     2                                             column=cf:PK_ID, timestamp=1459269153102, value=2
     2                                             column=cf:RND_STR, timestamp=1459269153102, value=371O62FX
     2                                             column=cf:RND_STR_1, timestamp=1459269153102, value=HW82LI73
     2                                             column=cf:USE_DATE, timestamp=1459269153102, value=2014-01-24:19:09:20
    ..............................................
    
    hbase(main):006:0> scan 'BDTEST:TEST_TAB_2'
    ROW                                            COLUMN+CELL
     7|IJWQRO7T|2013-07-07:08:13:52                column=cf:ACC_DATE, timestamp=1459269153132, value=2013-07-07:08:13:52
     7|IJWQRO7T|2013-07-07:08:13:52                column=cf:PK_ID, timestamp=1459269153132, value=7
     7|IJWQRO7T|2013-07-07:08:13:52                column=cf:RND_STR_1, timestamp=1459269153132, value=IJWQRO7T
    1 row(s) in 0.0390 seconds
    
    hbase(main):007:0>
    

    You can see that row identifier for test_tab_1 is value for pk_id and for test_tab_2 it is concatenation of all values for all columns. Why is it so? The difference is in constraints for the tables. Since we don’t have a primary key or unique index for test_tab_2 it uses all columns as a key value. We can try to add a constraint and see the result.

    select * from dba_constraints where owner='GGTEST' and table_name='TEST_TAB_2';
    
    no rows selected
    
    alter table ggtest.test_tab_2 add constraint pk_test_tab_2 primary key (pk_id);
    
    Table altered.
    
    insert into ggtest.test_tab_2 values(9,'PK_TEST',sysdate,null);
    
    1 row created.
    
    commit;
    
    Commit complete.
    
    orcl>
    

    And let us comare with result on the HBASE:

    hbase(main):012:0> scan 'BDTEST:TEST_TAB_2'
    ROW                                           COLUMN+CELL
     7|IJWQRO7T|2013-07-07:08:13:52               column=cf:ACC_DATE, timestamp=1459275116849, value=2013-07-07:08:13:52
     7|IJWQRO7T|2013-07-07:08:13:52               column=cf:PK_ID, timestamp=1459275116849, value=7
     7|IJWQRO7T|2013-07-07:08:13:52               column=cf:RND_STR_1, timestamp=1459275116849, value=IJWQRO7T
     8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER   column=cf:ACC_DATE, timestamp=1459278884047, value=2016-03-29:15:14:37
     8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER   column=cf:PK_ID, timestamp=1459278884047, value=8
     8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER   column=cf:RND_STR_1, timestamp=1459278884047, value=TEST_INS1
     8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER   column=cf:TEST_COL, timestamp=1459278884047, value=TEST_ALTER
     9                                            column=cf:ACC_DATE, timestamp=1462473865704, value=2016-05-05:14:44:19
     9                                            column=cf:PK_ID, timestamp=1462473865704, value=9
     9                                            column=cf:RND_STR_1, timestamp=1462473865704, value=PK_TEST
     9                                            column=cf:TEST_COL, timestamp=1462473865704, value=NULL
    3 row(s) in 0.0550 seconds
    
    hbase(main):013:0>
    

    It is fully dynamic and changed row id column on the fly. Will it work with unique index? Yes it will :

    
    delete from ggtest.test_tab_2 where pk_id=9;
    
    1 row deleted.
    
    alter table ggtest.test_tab_2 drop constraint pk_test_tab_2;
    
    Table altered.
    
    create unique index ggtest.ux_test_tab_2 on ggtest.test_tab_2 (pk_id);
    
    Index created.
    
    insert into ggtest.test_tab_2 values(10,'UX_TEST',sysdate,null);
    
    1 row created.
    
    commit;
    

    Here is the newly inserted row.

    hbase(main):017:0> scan 'BDTEST:TEST_TAB_2'
    ROW                                           COLUMN+CELL
     10                                           column=cf:ACC_DATE, timestamp=1462474389145, value=2016-05-05:14:53:03
     10                                           column=cf:PK_ID, timestamp=1462474389145, value=10
     10                                           column=cf:RND_STR_1, timestamp=1462474389145, value=UX_TEST
     10                                           column=cf:TEST_COL, timestamp=1462474389145, value=NULL
     7|IJWQRO7T|2013-07-07:08:13:52               column=cf:ACC_DATE, timestamp=1459275116849, value=2013-07-07:08:13:52
     7|IJWQRO7T|2013-07-07:08:13:52               column=cf:PK_ID, timestamp=1459275116849, value=7
    

    But it will not make any difference if we just create an index on the source. It will not change anything. So, if we need to identify a key for a table we have to have at least unique constraint. Of course it is just default behavior for a schema replication and we may use KEYCOLS to identify keys for some tables.

    Interesting that if we change a table structure it will affect all newly inserted rows but will not change existing even if we update some values. It works by this way if you have an unique identifier and it was not changed by your DDL operation.
    Here is an example. We have a column “TEST_COL” in the table test_tab_2. Let’s drop the column and update the row. Keep in mind that our primary key is column PK_ID and we are not modifying the key.

    alter table ggtest.test_tab_2 drop column TEST_COL;
    
    Table altered.
    
    update ggtest.test_tab_2 set rnd_str_1='TEST_COL' where pk_id=9;
    
    1 row updated.
    
    commit;
    

    In HBASE we can see the same set of columns:

    hbase(main):030:0> scan 'BDTEST:TEST_TAB_2'
    ROW                                           COLUMN+CELL
     9                                            column=cf:ACC_DATE, timestamp=1462477581440, value=2016-05-05:15:46:13
     9                                            column=cf:PK_ID, timestamp=1462477794597, value=9
     9                                            column=cf:RND_STR_1, timestamp=1462477794597, value=TEST_COL
     9                                            column=cf:TEST_COL, timestamp=1462477581440, value=NULL
    1 row(s) in 0.0200 seconds
    

    We still have the deleted column TEST_COL even we’ve updated the row.
    But if we insert any new row it will have the new set of columns:

    insert into ggtest.test_tab_2 values(10,'TEST_COL',sysdate);
    
    1 row created.
    
    commit;
    
    Commit complete.
    

    And in HBASE:

    hbase(main):031:0> scan 'BDTEST:TEST_TAB_2'
    ROW                                           COLUMN+CELL
     10                                           column=cf:ACC_DATE, timestamp=1462477860649, value=2016-05-05:15:50:55
     10                                           column=cf:PK_ID, timestamp=1462477860649, value=10
     10                                           column=cf:RND_STR_1, timestamp=1462477860649, value=TEST_COL
     9                                            column=cf:ACC_DATE, timestamp=1462477581440, value=2016-05-05:15:46:13
     9                                            column=cf:PK_ID, timestamp=1462477794597, value=9
     9                                            column=cf:RND_STR_1, timestamp=1462477794597, value=TEST_COL
     9                                            column=cf:TEST_COL, timestamp=1462477581440, value=NULL
    2 row(s) in 0.0340 seconds
    

    And, as for all other cases, truncate on source table is not going to be replicated to the target and the operation will be ignored. You have to truncate the table in HBASE by yourself to keep the data in sync. In case you insert data again the data in HBASE will be “updated”. But it will not delete other rows. It will be more like a “merge” operation.
    Here is an example:

    truncate table ggtest.test_tab_2;
    
    Table truncated.
    
    insert into ggtest.test_tab_2 values(10,'TEST_COL2',sysdate);
    
    1 row created.
    
    commit;
    
    Commit complete.
    
    select * from ggtest.test_tab_2;
    
    	   PK_ID RND_STR_1  ACC_DATE
    ---------------- ---------- -----------------
    	      10 TEST_COL2  05/05/16 16:01:20
    
    orcl>
    
    HBASE:
    hbase(main):033:0> scan 'BDTEST:TEST_TAB_2'
    ROW                                           COLUMN+CELL
     10                                           column=cf:ACC_DATE, timestamp=1462478485067, value=2016-05-05:16:01:20
     10                                           column=cf:PK_ID, timestamp=1462478485067, value=10
     10                                           column=cf:RND_STR_1, timestamp=1462478485067, value=TEST_COL2
     9                                            column=cf:ACC_DATE, timestamp=1462477581440, value=2016-05-05:15:46:13
     9                                            column=cf:PK_ID, timestamp=1462477794597, value=9
     9                                            column=cf:RND_STR_1, timestamp=1462477794597, value=TEST_COL
     9                                            column=cf:TEST_COL, timestamp=1462477581440, value=NULL
    2 row(s) in 0.0300 seconds
    
    hbase(main):034:0>
    

    I spent some time testing performance and found the main bottleneck was my Oracle source rather than GoldenGate and HBASE. I was able to sustain transaction rate up to 60 DML per second and my Oracle DB started to struggle to keep pace because of waiting for a commit. The HBASE and replicat were absolutely fine. I also checked how it handles big transactions and inserted about 2 billion rows by one transaction. It worked fine. Of course it doesn’t prove that any of your production configurations will be without any performance issues. To conduct real performance tests I need to use much bigger environment.
    In addition, I noticed one more minor error in Oracle documentation for adapter related to “keyValuePairDelimiter” parameter. In documentation it is replaced by “keyValueDelimiter”. It just small mistype and the “keyValueDelimiter” is repeated twice. First time it is correct and the second time it stands on the place where “keyValuePairDelimiter” is supposed to be. Here is the link.

    As a summary I can say that despite some minor issues the adapters and GoldenGate for Big Data showed quite mature status and readiness for real work. I think it is good robust technology and, hopefully, its development will continue improving it with new releases. I am looking forward to use it in a real production environment with significant workload. In following posts I will try to test different DDL operations and maybe some other datatypes. Stay tuned.

    Categories: DBA Blogs

    Oracle Management Cloud – Log Analytics

    Marco Gralike - Wed, 2016-05-11 10:18
    In this last overview post I will give you a first glance of the Oracle…

    run sql from windows cmd prompt CLI

    Learn DB Concepts with me... - Wed, 2016-05-11 09:36

    Method 1:

    set ORACLE_HOME=D:\app\oracle\product\11.2.0\db_1
    set ORACLE_SID=ORCL

    sqlplus -s "USERNAME/PASSWORD" @C:\Shell\Drop_objs.sql

    sqlplus -s "USERNAME/PASSWORD" @C:\Shell\Change_pass.sql



    Method 2:

    set ORACLE_HOME=D:\app\oracle\product\11.2.0\db_1
    set ORACLE_SID=ORCL
    (ECHO select username from dba_users where username in ('SCOTT');
    ECHO exit;) | sqlplus -s "USER/PASS" > C:\Shell\test_out.txt



    Categories: DBA Blogs

    Oracle Midlands : Event #15

    Tim Hall - Wed, 2016-05-11 08:42

    Don’t forget Oracle Midlands Event #15 next week!

    om15

    Please show your support and come along. It’s free thanks to the sponsorship by RedStackTech.

    Cheers

    Tim…

    Oracle Midlands : Event #15 was first posted on May 11, 2016 at 2:42 pm.
    ©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

    MobaXterm 9.0

    Tim Hall - Wed, 2016-05-11 08:21

    Another recent release that I managed to miss was

    dbms_xplan

    Jonathan Lewis - Wed, 2016-05-11 06:22

    My favourite format options for dbms_xplan.display_cursor().

    This is another of those posts where I tell you about something that I’ve frequently mentioned but never documented explicitly as a good (or, at least, convenient) idea. It also another example of how easy it is to tell half the story most of the time when someone asks a “simple” question.

    You’re probably familiar with the idea of “tuning by cardinality feedback” – comparing the predicted data volumes with the actual data volumes from an execution plan – and I wrote a short note about how to make that comparison last week; and you’re probably familiar with making a call to dbms_xplan.display_cursor() after enabling the capture of rowsource execution statistics (in one of three ways) for the execution of the query, and the format parameter usually suggested for the call is ‘allstats last’ to get the execution stats for the most recent execution of the query. I actually like to see the Cost column of the execution plan as well, so I usually add that to the format, so (with all three strategies shown for an SQL*Plus environment):

    set linesize 180
    set trimspool on
    set pagesize 60
    set serveroutput off
    
    alter session set "_rowsource_execution_statistics"=true;
    alter session set statistics_level=all;
    
    select /*+ gather_plan_statistics */ * from user_tablespaces;
    
    select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));
    
    

    So what do we often forget to mention:

    • For SQL*Plus it is important to ensure that serveroutput is off
    • The /*+ gather_plan_statistics */ option uses sampling, so may be a bit inaccurate
    • The two accurate strategies may add a significant, sometimes catastrophic, amount of CPU overhead
    • This isn’t appropriate if the query runs parallel

    For a parallel query the “last” execution of a query is typically carried out by the query co-ordinator, so the rowsource execution stats of many (or all) of the parallel execution slaves are likely to disappear from the output. If you’re testing with parallel queries you need to add some “tag” text to the query to make it unique and omit the ‘last’ option from the format string.

    Now, a common suggestion is that you need to add the ‘all’ format option instead – but this doesn’t mean “all executions” it means (though doesn’t actually deliver) all the data that’s available about the plan. So here’s an execution plans produced after running a parallel query and using ‘allstats all’ as the format option (t1 is a copy of all_objects, and this demo is running on 12.1.0.2).

    SQL_ID  51u5j42rvnnfg, child number 1
    -------------------------------------
    select  /*+   parallel(2)  */  object_type,  sum(object_id) from t1
    group by object_type order by object_type
    
    Plan hash value: 2919148568
    
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                | Name     | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT         |          |      1 |        |       |   113 (100)|          |        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
    |   1 |  PX COORDINATOR          |          |      1 |        |       |            |          |        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
    |   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,01 | P->S | QC (ORDER) |      0 |00:00:00.01 |       0 |      0 |       |       |          |
    |   3 |    SORT GROUP BY         |          |      2 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,01 | PCWP |            |     30 |00:00:00.01 |       0 |      0 |  2048 |  2048 |     2/0/0|
    |   4 |     PX RECEIVE           |          |      2 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,01 | PCWP |            |     50 |00:00:00.01 |       0 |      0 |       |       |          |
    |   5 |      PX SEND RANGE       | :TQ10000 |      0 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,00 | P->P | RANGE      |      0 |00:00:00.01 |       0 |      0 |       |       |          |
    |   6 |       HASH GROUP BY      |          |      2 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,00 | PCWP |            |     50 |00:00:00.05 |    1492 |   1440 |  1048K|  1048K|     2/0/0|
    |   7 |        PX BLOCK ITERATOR |          |      2 |  85330 |  1166K|   105   (2)| 00:00:01 |  Q1,00 | PCWC |            |  85330 |00:00:00.03 |    1492 |   1440 |       |       |          |
    |*  8 |         TABLE ACCESS FULL| T1       |     26 |  85330 |  1166K|   105   (2)| 00:00:01 |  Q1,00 | PCWP |            |  85330 |00:00:00.01 |    1492 |   1440 |       |       |          |
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    Query Block Name / Object Alias (identified by operation id):
    -------------------------------------------------------------
    
       1 - SEL$1
       8 - SEL$1 / T1@SEL$1
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       8 - access(:Z>=:Z AND :Z<=:Z)
    
    Column Projection Information (identified by operation id):
    -----------------------------------------------------------
    
       1 - "OBJECT_TYPE"[VARCHAR2,23], SUM()[22]
       2 - (#keys=0) "OBJECT_TYPE"[VARCHAR2,23], SUM()[22]
       3 - (#keys=1; rowset=200) "OBJECT_TYPE"[VARCHAR2,23], SUM()[22]
       4 - (rowset=200) "OBJECT_TYPE"[VARCHAR2,23], SYS_OP_MSR()[25]
       5 - (#keys=1) "OBJECT_TYPE"[VARCHAR2,23], SYS_OP_MSR()[25]
       6 - (rowset=200) "OBJECT_TYPE"[VARCHAR2,23], SYS_OP_MSR()[25]
       7 - (rowset=200) "OBJECT_ID"[NUMBER,22], "OBJECT_TYPE"[VARCHAR2,23]
       8 - (rowset=200) "OBJECT_ID"[NUMBER,22], "OBJECT_TYPE"[VARCHAR2,23]
    
    Note
    -----
       - Degree of Parallelism is 2 because of hint
    
    
    48 rows selected.
    
    

    You’ll notice we’ve reported the “alias” and “projection” information – those are two of the format options that you can use with a + or – to include or exclude if you want. We’ve also got E-Bytes and E-time columns in the body of the plan. In other words (at least in my opinion) we’ve got extra information that makes the output longer and wider and therefore harder to read.

    The format string I tend to use for parallel query is ‘allstats parallel cost’ – which (typically) gives something like the following:

    SQL_ID  51u5j42rvnnfg, child number 1
    -------------------------------------
    select  /*+   parallel(2)  */  object_type,  sum(object_id) from t1
    group by object_type order by object_type
    
    Plan hash value: 2919148568
    
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT         |          |      1 |        |   113 (100)|        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
    |   1 |  PX COORDINATOR          |          |      1 |        |            |        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
    |   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |     30 |   113   (9)|  Q1,01 | P->S | QC (ORDER) |      0 |00:00:00.01 |       0 |      0 |       |       |          |
    |   3 |    SORT GROUP BY         |          |      2 |     30 |   113   (9)|  Q1,01 | PCWP |            |     30 |00:00:00.01 |       0 |      0 |  2048 |  2048 |     2/0/0|
    |   4 |     PX RECEIVE           |          |      2 |     30 |   113   (9)|  Q1,01 | PCWP |            |     50 |00:00:00.01 |       0 |      0 |       |       |          |
    |   5 |      PX SEND RANGE       | :TQ10000 |      0 |     30 |   113   (9)|  Q1,00 | P->P | RANGE      |      0 |00:00:00.01 |       0 |      0 |       |       |          |
    |   6 |       HASH GROUP BY      |          |      2 |     30 |   113   (9)|  Q1,00 | PCWP |            |     50 |00:00:00.05 |    1492 |   1440 |  1048K|  1048K|     2/0/0|
    |   7 |        PX BLOCK ITERATOR |          |      2 |  85330 |   105   (2)|  Q1,00 | PCWC |            |  85330 |00:00:00.03 |    1492 |   1440 |       |       |          |
    |*  8 |         TABLE ACCESS FULL| T1       |     26 |  85330 |   105   (2)|  Q1,00 | PCWP |            |  85330 |00:00:00.01 |    1492 |   1440 |       |       |          |
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       8 - access(:Z>=:Z AND :Z<=:Z)
    
    Note
    -----
       - Degree of Parallelism is 2 because of hint
    
    
    30 rows selected.
    
    

    Of course you may prefer ‘allstats all’ – and sometimes I do actually want to see the alias or projection information – but I think there’s so much information available on the execution plan output that anything that makes it a little shorter, cleaner and tidier is a good thing.

    You might have noticed, by the way, that the Buffers, Reads, and A-Time columns have still managed to lose information on the way up from operation 6; information that should have been summing up the plan has simply disappeared.  Make sure you do a sanity check for disappearing numbers when you’re looking at more complex plans.

     


    DBaaS in Oracle Public Cloud

    Pat Shuff - Wed, 2016-05-11 02:07
    Before we dive deep into database as a service with Oracle we need to define some terms. We have thrown around concepts like Standard Edition, Enterprise Edition, High Performance Edition, and Extreme Performance Edition. We have talked about concepts like DataGuard, Real Application Clustering, Partitioning, and Compression. Today we will dive a little deeper into this so that we can focus on comparing them running in the Oracle Public Cloud as well as other cloud providers.

    First, let's tackle Standard Edition (SE) vs Enterprise Edition (EE). Not only is there a SE, there is a SE One and SE2. SE2 is new with the 12c release of the database and the same as SE and SE1 but with different processor and socket restrictions. The Oracle 12c documentation details the differences between the different versions. We will highlight the differences here. Note that you can still store data. The data types do not change between the versions of the database. A select statement that works in SE will work in SE2 and will work in EE.

    The first big difference between SE and EE is that SE is licensed on a per socket basis and EE is licensed on a per core basis. The base cost of a SE system is $600 per month per processor in the Oracle Public Cloud. The Standard Edition is limited to 8 cores in the cloud. If you are purchasing a perpetual license the cost is $17,500 and can run across two sockets or single sockets on two systems. The SE2 comes with a Real Application Cluster (RAC) license so that you can have a single instance running on two computers. The SE2 instance will also limit the database to run in 16 threads so running in more cores will have no advantage. To learn more about the differences and limitations, I recommend reading Mike Dietrich's Blog on SE2.

    The second big difference is that many of the optional features are not available with SE. For example, you can't use diagnostics and tuning to figure out if your sql command is running at top efficiency. You can't use multi-tenant but you can provision a single pluggable database. This means that you can unplug and move the database to another database (and even another version like EE). The multi-tenant option allows you to have multiple pluggable databases and control them with a master SGA. This allows admins to backup and patch a group of databases all at once rather than having to patch each one individually. You can separate security and have different logins to the different databases but use a global system or sys account to manage and control all of the databases. Storage optimization features like compression and partitioning are not available in SE either. Data recovery features like DataGuard and FlashBack are not supported in SE. DataGuard is a feature that copies changes from one system through the change logs and apply them to the second system. FlashBack does something similar and allows you to query a database at a previous time and return the state of the database at that time. It uses the change log to reconstruct the database as it was at the time requested. Tools like RMAN backup and streams don't work in SE. Taking a copy of a database and copying it to another system is not allowed. The single exception to this is RMAN works in the cloud instance but not in the perpetual on-premise version. Security like Transparent Data Encryption, Label Security, Data Vault, and Audit Vault are not supported in SE. The single exception is transparent data encryption to allow for encryption in the public cloud is supported for SE. All of these features are described here.

    When we get Enterprise Edition in the Oracle Public Cloud at $3K/OCPU/month or $5.04/OCPU/hour and the only option that we get is transportation data encryption (TDE) bundled with the database. This allows us to encrypt all or part of a table. TDE encrypts data on the disk when it is written with a SQL insert or update command. Keys are used to encrypt this data and can only be read by presenting the keys using the Oracle Wallet interface. More information on TDE can be found here. The Security Inside Out blog is also a good place to look for updates and references relating to TDE. This version of the database allows us to scale upto 16 processors and 4.6 TB of storage. If we are looking to backup this database, the largest size that we can have for storage is 2.3 TB. If our table requirements are greater than 2.3 TB or 4.6 TB you need to go to Exadata as a Service or purchase a perpetual license and run it on-premise. If we are looking to run this database in our data center we will need to purchase a perpetual license for $47.5K per processor license. If you are running on an IBM Power Server you need to license each processor per core. If you are running on x86 or Sparc servers you multiply the number of cores by 0.5 and can run two cores per processor license. TDE is part of the Advanced Security Option which lists for $15K per processor license. When calculating to see if it is cheaper to run on-premise vs the public cloud you need to factor in both license requirements. The same is true if you decide to run EE in AWS EC2 or Azure Compute. Make sure to read Cloud Licensing Requirements to understand the limits of the cost of running on EC2 or Azure Compute. Since all cloud providers use x86 processors the multiplication factor is 0.5 times the number of cores on the service.

    The High Performance Edition contains the EE features, TDE, as well as multi-tenant, partitioning, advanced compression, advanced security, real application testing, olap, DataGuard, and all of the database management packs. This is basically everything with the exception of Real Application Clusters (RAC), Active DataGuard, and In-Memory options. High Performance comes in at $4K/processor/month or $6.72/OCPU/hour. If we wanted to bundle all of this together and run it in our data center we need to compare the database at $47.5K/processor license plus roughly $15K/processor/option (there are 12 of them). We can then calculate which is cheaper based on our accounting rules and amortization schedule. The key differential is that I can use this version on an hourly or monthly basis for less than a full year. For example, if we do patch testing once a quarter and allocate three weeks a quarter to test if the patch is good or bad, we only need 12 weeks a year to run the database. This basically costs us $12K/processor/year to test on a single processor and $24K on a dual processor. If we purchased the system it would cost us $47.5K capital expenditure plus 22% annually for support. Paying this amount just to do patch testing does not make sense. With the three year cost of ownership running this on premise will cost us $78,850. If we use the metered services in the public cloud this will cost us $72K. The $6,850 does not seem like a lot but with the public cloud service we won't need to pay for the hardware, storage, or operating system. We can provision the cloud service in an hour and replicate our on site data to the cloud for the testing. If we did this to a computer or virtual image on site it will take hours/days to provision a new computer, storage, database, and replicate the data.

    It is important to note here that you need to be careful with virtualization. You need to use software that allows for hard partitioning. Products like VMWare and HyperV are soft partitioning virtualization software. This means that you can grow the number of processors dynamically and are required to license the Oracle software for the potential high water mark or all of the cores in the cluster. If you are running on something like a Cisco UCS blade server that has a dual socket 16 core processor, you must license all 32 cores to run the database even though you might just create a 2 core virtual instance in this VMWare installation. It gets even worse if you cluster 8 blades into one cluster then you must license all 256 cores. This get a little expensive at $47.5K times 128 processors. Products like OracleVM, Solaris Contailers, and AIX LPARs solve this cost problem with hard partitions.

    The third enterprise edition is the Extreme Performance Edition of the database. This feature is $5K/OCPU/month or $8.401/processor/hour. This option comes with RAC, Active DataGuard, and In-Memory. RAC allows you to run across multiple compute instances and restart queries that might fail if one node fails. Active DataGuard allows you to have two databases replicating to each other and for both to be open and active at the same time. Regular or passive DataGuard allows you to replicate the data but not keep the target open and active. In-Memory allows you to store data not only in row format but in column format. When data is entered into the table it is stored on disk in row format. A copy is also placed in memory but stored in column format. This allows you to search faster given that you have already sorted the data in memory and can skip stuff that does not apply to your search. This is typically done with an index but we can't always predict what questions that the users are going to ask and adding too many indexes slows down all operations.

    It is important to reiterate that we can take our perpetual license and run it in IaaS or generic compute. We can also effectively lease these licenses on a monthly or hourly rate. If you are running the database, you are consuming licenses. If you stop the database, you stop consuming the database license but continue to consume the storage and processor services. If you terminate the database you stop consuming the database, processor, and storage services because they are all deleted upon termination.

    In summary, there are four flavors of DBaaS; Standard Edition, Enterprise Edition, High Performance Edition, and Extreme Performance Edition. Standard Edition and Enterprise Edition are available by other cloud providers but some require perpetual licenses and some do not. If you decide to run this service as PaaS or DBaaS in the Oracle Public Cloud you can pay hourly or monthly and start/stop these services if they are metered to help save money. All of these services come with partial management features offloaded and done by Oracle. Backups, patches, and, restart of services are done automatically for you. This allows you to focus more on how to apply the database service to provide business benefits rather than the feeding and maintenance to keep the database operational.

    Up next, we will dive into use cases for database as a service and look at different configurations and pricing models to solve a real business problem.

    Generate Multiple AWR Reports Quickly

    VitalSoftTech - Tue, 2016-05-10 20:40
    Occasionally there is a need to generate multiple AWR reports for database analysis. In my case, a storage vendor will use a tool to extract data from all time periods from the AWR reports to find IO related specific information. Here is how I generated these reports.
    Categories: DBA Blogs

    Pages

    Subscribe to Oracle FAQ aggregator