Feed aggregator

Oracle AVG Function with Examples

Complete IT Professional - Thu, 2016-05-12 06:00
The Oracle AVG function is a common function in Oracle SQL. I’ll explain how to use it in this article and show you some examples. Purpose of the Oracle AVG Function The AVG function calculates the average of a series of values that you provide to it. Most of the time, this will be a […]
Categories: Development

technology behind DBaaS

Pat Shuff - Thu, 2016-05-12 02:07
Before we can analyze different use cases we need to first look at a couple of things that enable these use cases. The foundation for most of these use cases is data replication. We need to be able to replicate data from our on-premise database into a cloud database. The first issue is replicating data and the second is access rights to the data and database allowing you to pull the data into your cloud database.

Let's first look at how data is stored in a database. If you use a Linux operating system, this is typically done by splitting information into four categories; ORACLE_HOME, +DATA, +FRA, and +RECO. The binaries that represent the database and all of the database processes go into the ORACLE_HOME or ORACLE_BASE. In the cloud this is dropped into /u01. If you are using non-rac the file system is a logical volume manager (LVM) where you stripe multiple disks to mirror or triple mirror data to keep a single disk failure from bringing down your database or data. If you are using a rac database this goes into ASM. ASM is a disk technology that manages replication and performance. There are a variety of books and websites written on this technology

LVM links

ASM links

The reason why we go into storage technologies is that we need to know how to manage how and where data is stored in our DBaaS. If we access everything with IaaS and roll out raw compute and storage, we need to know how to scale up storage if we run out of space. With DBaaS this is done with the scale up menu item. We can grow the file system by adding logical units to our instance and grow the space allocated for data storage or data logging.

The second file system that we should focus on is the +DATA area. This is where data is stored and all of our file extents and tables are located. For our Linux cloud database this is auto-provisioned into /u02. In our test system we create a 25 GB data area and get a 20G file system in the +DATA area.

If we look at the /u02 file system we notice that there is one major directory /u02/app/oracle/oradata. In the oradata there is one directory associated with the ORACLE_SID. In our example we called it ORCL. In this directory we have the control01.dbf, sysaux01.dbf, system01.dbf, temp01.dbf, undotbs01.dbf, and users01.dbf. These files are the place where data is stored for the ORCL SID. There is also a PDB1 directory in this file structure. This correlates to the pluggable database that we called PDB1. The files in this directory correspond to the tables, system, and user information relating to this pluggable database. If we create a second pluggable a new directory is created and all of these files are created in that directory. The users01.dbf, PDB1_users01.pdf in the PDB1 directory, file defines all of the users and their access rights. The system01.dbf file defines the tables and system level structures. In a pluggable database the system01 file defines the structures for the PDB1 and not the entire database. The temp01.dbf holds temp data tables and scratch areas. The sysaux01.dbf contains the system information contains the control area structures and management information. The undotbs01.dbf is the flashback area so that we can look at information that was stored three days ago in a table. Note that there is no undotbs01.dbf file in the pluggable because this is done at a global area and not at the pluggable layer. Backups are done for the SID and not each PID. Tuning of memory and system tunables are done at the SID layer as well.

Now that we have looked at the files corresponding to tables and table extents, we can talk about data replication. If you follow the methodology of EMC and NetApp you should be able to replicate the dbf files between two file systems. Products like SnapMirror allow you to block copy any changes that happen to the file to another file system in another data center. This is difficult to do between an on-premise server and cloud instance. The way that EMC and NetApp do this are in the controller layer. They log write changes to the disk, track what blocks get changed, and communicate the changes to the other controller on the target system. The target system takes these block changes, figures out what actual blocks they correspond to on their disk layout and update the blocks as needed. This does not work in a cloud storage instance. We deal on a file layer and not on a track and sector or bock layer. The fundamental problem with this data replication mechanism is that you must restart or ingest the new file into the database. The database server does not do well if files change under it because it tends to cache information in memory and indexes into data get broken if data is moved to another location. This type of replication is good if you have an hour or more recovery point objective. If you are looking at minutes replication you will need to go with something like DataGuard, GoldenGate, or Active DataGuard.

DataGuard works similar to the block change recording but does so at the database layer and not the file system/block layer. When an update or insert command is executed in the database, these changes are written to the /u04 directory. In our example the +REDO area is allocated for 9.8 GB of disk. If we look at our /u04 structure we see /u04/app/oracle/redo contains redoXX.log file. With DataGuard we take these redo files, compress them, and transfer them to our target system. The target system takes the redo file, uncompresses it, and applies the changes to the database. You can structure the changes either as physical logging or logical logging. Physical logging allows you to translate everything in the database and records the block level changes. Logic logging takes the actual select statement and replicates it to the target system. The target system either inserts the physical changes into the file or executes the select statement on the target database. The physical system is used more than the logical replication because logical has limitations on some of the statements. For example, any blob or file operations can not translate to the target system because you can't guarantee that the file structure is the same between the two systems. There are a variety of books available on DataGuard. It is also important to note that DataGuard is not available for Standard Edition and Enterprise Edition but for High Performance Edition and Extreme Performance Edition only.

  • Oracle Data Guard 11g Handbook
  • Oracle Dataguard: Standby Database Failover Handbook
  • Creating a Physical Standby Documentation
  • Creating a Logical Standby Documentation

    Golden Gate is a similar process but there is an intermediary agent that takes the redo log, analyzes it, and translates it into the target system. This allows us to take data from an Oracle database and replicate it to SQL Server. It also allows us to go in the other direction. SQL Server, for example, is typically used for SCADA or process control systems. The Oracle database is typically used for analytics and heavy duty number crunching on a much larger scale. If we want to look at how our process control systems is operating in relation to our budget we will want to pull in the data for the process systems and look at how much we spend on each system. We can do this by either selecting data from the SQL Server or replicating the data into a table on the Oracle system. If we are doing complex join statements and pulling data in from multiple tables we would typically want to do this on one system rather than pulling the data across the network multiple times. Golden Gate allows us to pull the data into a local table and perform the complex select statements without having to suffer network latency more than the initial copy. Golden Gate is a separate product that you must pay for either on-premise or in the cloud. If you are replicating between two Oracle databases you could use Active DataGuard to make this work and this is available as part of Extreme Edition of the database.

    The /u03 area in our file system is where backups are placed. The file system for our sample system shows /u03/app/oracle/fast_recovery_area/ORCL. The ORCL is the ORACLE_SID of our installation. Note that there is no PDB1 area because all of the backup data is done at the system layer and not at the pluggable layer. The tool used to backup the database is RMAN. There are a variety of books available to help with RMAN as well as an RMAN online tutorial

    It is important to note that RMAN requires a system level access to the database. Amazon RDS does not allow you to replicate your data using RMAN but uses a volume snapshot and copies this to another zone. The impact of this is that first, you can not get your data out of Amazon with a backup and you can not copy your changes and data from the Amazon RDS to your on-premise system. The second impact is that you can't use Amazon RDS for DataGuard. You don't have sys access into the database which is required to setup DataGuard and you don't have access to a filesystem to copy the redo logs to drop into. To make this available with Amazon you need to deploy the Oracle database into EC2 with S3 storage as the back end. The same is true with Azure. Everything is deployed into raw compute and you have to install the Oracle database on top of the operating system. This is more of an IaaS play and not a PaaS play. You loose patching of the OS and database, automated backups, and automatic restart of the database if something fails. You also need to lay out the file system on your own and select LVM or some other clustering file system to prevent data loss from a single disk corruption. All of this is done for you with PaaS and DBaaS. Oracle does offer a manual process to perform backups without having to dive deep into RMAN technology. If you are making a change to your instance and want a backup copy before you make the change, you can backup your instance manually and not have to wait for the automated backup. You can also change the timing if 2am does not work for your backup and need to move it to 4am instead.

    We started this conversation talking about growing a table because we ran out of space. With the Amazon and Azure solutions, this must be done manually. You have to attach a new logical unit, map it into the file system, grow the file system, and potentially reboot the operating system. With the Oracle DBaaS we have the option of growing the file system either as a new logical unit, grow the /u02 file system to handle more table spaces, or grow the /u03 file system to handle more backup space.

    Once we finish our scale up the /u03 file system is no longer 20 GB but 1020 GB in size. The PaaS management console allocates the storage, attaches the storage to the instance, grows the logical volume to fill the additional space, and grows the file system to handle the additional storage. It is important to note that we did not require root privileges to do any of these operations. The DBA or cloud admin can scale up the database and expand table resources. We did not need to involve an operating system administrator. We did not need to request an additional logical unit from the storage admin. We did not need to get a senior DBA to reconfigure the system. All of this can be done either by a junior DBA or an automated script to grow the file system if we run out of space. The only thing missing for the automated script is a monitoring tool to recognize that we are running into a limit. The Oracle Enterprise Manager (OEM) 12c and 13c can do this monitoring and kick off processes if thresholds are crossed. It is important to note that you can not use OEM with Amazon RDS because you don't have root, file system, or system access to the installation which is required to install the OEM agent.

    In summary, we looked at the file system structure that is required to replicate data between two instances. We talked about how many people use third party disk replication technologies to "snap mirror" between two disk installations and talked about how this does not work when replicating from an on-premise to a cloud instance. We talked about DataGuard and GoldenGate replication to allow us to replicate data to the cloud and to our data center. We looked at some of the advantages of using DBaaS rather than database on IaaS to grow the file system and backup the database. Operations like backup, growing the file system, and adding or removing processors temporarily can be done by a cloud admin or junior DBA. These features required multiple people to make this happen in the past. All of these technologies are needed when we start talking about use cases. Most of the use cases assume that the data and data structures that exist in your on-premise database also exist in the cloud and that you can replicate data to the cloud as well as back from the cloud. If you are going to run a disaster recovery instance in the cloud, you need to be able to copy your changes to the cloud, make the cloud a primary instance, and replicate the changes back to your data center once you bring your database back online. The same is true for development and testing. It is important to be able to attach to both your on-premise database and database provisioned in the cloud and look at the differences between the two configurations.

IBM Bluemix Dedicated/Local Status Page

Pas Apicella - Wed, 2016-05-11 20:37
With Bluemix Public you can view the status page which details all the runtimes and services and thier current status on all 3 PUBLIC regions. Those customers with Bluemix Dedicated or Local get a status page which includes a column on the status of thier Dedicated or Local instance.

To navigate to it perform the following steps:

1. Log into your Bluemix dedicated or local instance web console

2. Click on the Status link which is accessed through the profile icon on the top right hand corner


3. You will see a table as follows as well as status messages to indicate the current status of your own Bluemix Local or Dedicated Environment.



More Information

https://console.ng.bluemix.net/docs/admin/index.html#oc_status
Categories: Fusion Middleware

Maven: how to copy files after a build into several distribution directories

XTended Oracle SQL - Wed, 2016-05-11 17:37

Sometimes it is convenient to copy jar-files automatically after a build into several different directories, for example if you have different config files for local tests and for office test server, then you may want to copy these files into local test directory, internal office test server and public distribution directory.
This short part of pom.xml contains 2 different methods to build UBER-JAR and copying them into 3 different directories: localtest, officetest and public

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.6</version>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>tv.tmd.YourMainClass</mainClass>
                        </manifest>
                        <manifestEntries>
                            <Class-Path>.</Class-Path>
                        </manifestEntries>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.3</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <manifestEntries>
                                        <Main-Class>tv.tmd.YourMainClass</Main-Class>
                                        <Build-Number>2</Build-Number>
                                    </manifestEntries>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <artifactId>maven-antrun-plugin</artifactId>
                <version>1.8</version>
                <executions>
                    <execution>
                        <id>copy</id>
                        <phase>package</phase>
                        <configuration>
                            <target>
                                <echo>ANT TASK - copying files....</echo>
                                <copy todir="${basedir}/distribution/localtest" overwrite="true" flatten="true">
                                    <fileset dir="${basedir}" includes="*.bat" >
                                        <include name="*.bat" />
                                        <include name="ReadME.*" />
                                        <include name="target/*.jar" />
                                    </fileset>
                                </copy>

                                <copy todir="${basedir}/distribution/officetest" overwrite="true" flatten="true">
                                    <fileset dir="${basedir}" includes="*.bat" >
                                        <include name="*.bat" />
                                        <include name="ReadME.*" />
                                        <include name="target/*.jar" />
                                    </fileset>
                                </copy>

                                <copy todir="${basedir}/distribution/public" overwrite="true" flatten="true">
                                    <fileset dir="${basedir}" includes="*.bat" >
                                        <include name="*.bat" />
                                        <include name="ReadME.*" />
                                        <include name="target/*.jar" />
                                    </fileset>
                                </copy>
                            </target>
                        </configuration>
                        <goals>
                            <goal>run</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
Categories: Development

Log Buffer #473: A Carnival of the Vanities for DBAs

Pythian Group - Wed, 2016-05-11 16:32

This Log Buffer Edition covers some of the hottest blog posts of Oracle, SQL Server and MySQL for the previous week.

Oracle:

A recent post on the OTN database forum reminded Jonathan how easy it is to forget to keep repeating a piece of information after the first couple of hundred times you’ve explained it. No matter how “intuitively obvious” it is for one person, it’s new to someone else.

The OMS Patcher is a newer patching mechanism for the OMS specifically.

A SQLT report has all kinds of pertinent information including—to name just a few—optimizer settings, indexes, statistics, plan history, and view definitions.

Joins are fundamental in SQL, and are used in most every production query. There are three types in particular that every developer should fully understand.

Why should Oracle Database professionals care about NoSQL and where to start?

SQL Server:

Understanding OPENJSON and FOR JSON in Azure and SQL 2016

Using Data Compression in Master Data Services 2016

The most frustrating thing with any new system is often just working out how to connect to it.

There are several decisions to be made when designing indexes for Memory-optimized tables in In-Memory OLTP, and to make the best choice, it pays to understand something about the nature and usage of memory-optimised indexes.

Database Hardware and Infrastructure Trends

STRING_SPLIT() in SQL Server 2016 : Follow-Up #2

MySQL:

Why would I run MySQL/MariaDB on POWER anyway?

By default, the write() system call returns after all data has been copied from the user space file descriptor into the kernel space buffers. There is no guarantee that data has actually reached the physical storage.

OpenCPS: Vietnam’s Public Sector goes Open Source

MySQL uses metadata locking to manage concurrent access to database objects, and to ensure data consistency when performing modifications to the schema

Using Docker for development is a great way of ensuring that what you develop will be the same that you deploy in production.

Categories: DBA Blogs

Defining Digital Transformation

Pythian Group - Wed, 2016-05-11 15:33

 

Terminology is important—and it’s particularly important for us to define terms that are central to what we do. So when it comes to the subject of digital transformation, what exactly are we talking about?

 

In speaking with clients and industry thought leaders, I’ve come to realize that the term “digital transformation” has a different meaning to different people. For a term that is so widely used — and that, on its surface, seems pretty straightforward — the range of interpretation is remarkable. It’s a bit like when we say “I’ll do it later.”  “Later” to one person means “before the sun goes down today.” “Later” to another person means “sometime in the future”, and it could mean days or weeks in their mind. “Later” to a third person can mean “I have no plans to do it, and this is my way of telling you nicely.”

 

Because the term is so essential to the work we do for our clients, I thought it would be helpful to define what digital transformation means to us here at Pythian. There’s so much we can say on the topic, so I plan to follow up with a series of articles about how I’ve seen it implemented, or worse, not implemented or even embraced as a concept.

 

To start, “digital transformation” is about technology. I know that to some people it isn’t, but I disagree. These days, you can’t transform your business without technology. It’s not about which technology you choose, as much as it’s about how to use it. Even more specifically, we’ve found that the businesses that are achieving positive transformation are using technology to capitalize on data. I have yet to see a single transformation project that didn’t use data as a major component of its success.

 

Let’s look at the term “transformation.” This equates to change, but it doesn’t mean change for its own sake. The change we’re talking about has to benefit the business. However, the factor that can make or break successful change is people. Their attitudes, preconceptions, and ideas almost always have to be aligned with the change for successful transformation to occur. People need to get behind the initiative, people have to fund it, people have to develop it, and people have to support it once it’s developed. And we all know that getting people to change can be more difficult than developing any new technology. In short, the transformative capabilities inherent in technology can only be realized when coupled with the willingness to embrace change.

Why Digital Transformation?

Why is the concept of digital transformation important in the first place? At Pythian, we believe that it’s about using technology and data to change your business for the better. What do we mean when we say “for the better”? Therein lies the controversy.  “For the better” means different things to different people depending on their company’s key objectives.

 

“For the better” can mean:

  • Becoming more efficient to drive costs down so your profitability can improve
  • Reducing mistakes and improving your reputation, or the quality of your product
  • Differentiating your product to get ahead of the competition
  • Doing what you do, only faster than your competitors
  • Creating new revenue streams
  • Improving the customer experience. This is a big one, so I will dedicate an entire blog post to exploring exactly what it means.

 

Digital transformation is the key to achieving any one, or all of these benefits, and knowing your objectives and priorities will help you shape your digital transformation initiative. So to start, focus less on what digital transformation is, and more on what you want the outcome of a transformation to be.

 

Categories: DBA Blogs

select from table with no direct relation or foriegn keys

Learn DB Concepts with me... - Wed, 2016-05-11 12:50
SELECT
  E.EMPNO,
  E.ENAME,
  E.JOB,
  D.DEPTNO,
  D.LOC,
  E.SAL
FROM
  scott.emp E
LEFT JOIN SCOTT.DEPT D
ON
  E.DEPTNO=D.DEPTNO;





SELECT
  E.EMPNO,
  E.ENAME,
  E.JOB,
  D.DEPTNO,
  D.LOC,
  E.SAL,
  (
    SELECT      grade
    FROM
      SCOTT.SALGRADE S
    WHERE
      E.SAL BETWEEN S.LOSAL AND S.HISAL
  ) AS SALGRADE
FROM
  scott.emp E
LEFT JOIN SCOTT.DEPT D
ON
  E.DEPTNO=D.DEPTNO;










Categories: DBA Blogs

GoldenGate 12.2 Big Data Adapters: part 4 – HBASE

Pythian Group - Wed, 2016-05-11 11:51

This is the next post in my series about Oracle GoldenGate Big Data adapters. Here is list of all posts in the series:

  1. GoldenGate 12.2 Big Data Adapters: part 1 – HDFS
  2. GoldenGate 12.2 Big Data Adapters: part 2 – Flume
  3. GoldenGate 12.2 Big Data Adapters: part 3 – Kafka
  4. GoldenGate 12.2 Big Data Adapters: part 4 – HBASE

In this post I am going to explore HBASE adapter for GoldenGate. Let’s start by recalling what we know about HBASE. The Apache HBASE is non-relational, distributed database. It has been modelled after the Google’s Bigtable distributed database. It can provide read write access to the data and is based on top of Hadoop or HDFS.

So, what does it tell us? First, we can write and change the data. Second, we need to remember that it is non-relation database and it is a bit of a different approach to data in comparison with traditional relation databases. You can think about HBase as about a key-value store. We are not going deep inside HBASE architecture and internals here, since our main task is to test Oracle GoldenGate adapter and see how it works. Our configuration has an Oracle database as a source with a GoldenGate extract and target system where we have Oracle GoldenGate for BigData.

We have more information about setting up the source and target in the first post in the series about HDFS adapter. The source side replication part has already been configured and started. We have initial trail file for data initialization and trails for the ongoing replication. We capture changes for all tables in the ggtest schema on the oracle database.
Now we need to prepare our target site. Let’s start from HBase. I used a pseudo-distributed mode for my tests where I ran a fully-distributed mode on a single host. It is not acceptable for any production configuration but will suffice for our tests. On the same box I have HDFS to serve as a main storage. Oracle documentation for the adapter states that they support HBase from version 1.0.x . In my first attempt I tried to use HBase version 1.0.0 (Cloudera 5.6) but it didn’t work. I got errors in the GoldenGate and my extract was aborted.
Here is the error :

 
2016-03-29 11:51:31  ERROR   OGG-15051  Oracle GoldenGate Delivery, irhbase.prm:  Java or JNI exception:
java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)Lorg/apache/hadoop/hbase/HTableDescriptor;.
2016-03-29 11:51:31  ERROR   OGG-01668  Oracle GoldenGate Delivery, irhbase.prm:  PROCESS ABENDING.

So, I installed another version HBase and the version 1.1.4 worked just fine. I used simple, standard HBase configuration for pseudo-distributed mode where region server was on the same host as master and hbase.rootdir point to local hdfs.
Here is example of configuration:

<configuration>
<property>
  <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    <property>
      <name>hbase.rootdir</name>
        <value>hdfs://localhost:8020/user/oracle/hbase</value>
        </property>
</configuration>
[root@sandbox conf]# cat regionservers
localhost
[root@sandbox conf]#

As soon as we have HBase setup and running we can switch our attention to GoldenGate instead. We have already a trail file with initial load. Now we need to prepare our configuration files for initial and ongoing replication. Let’s go to our GoldenGate for Big Data home directory and prepare everything. In first, we need a hbase.conf file copied from $OGG_HOME/AdapterExamples/big-data/hbase directory to $OGG_HOME/dirprm. I left everything as it used to be in the original file changing only gg.classpath parameter to point it to my configuration files and libs for HBase.
Here is an example of the configuration files:

[oracle@sandbox oggbd]$ cat dirprm/hbase.props

gg.handlerlist=hbase

gg.handler.hbase.type=hbase
gg.handler.hbase.hBaseColumnFamilyName=cf
gg.handler.hbase.keyValueDelimiter=CDATA[=]
gg.handler.hbase.keyValuePairDelimiter=CDATA[,]
gg.handler.hbase.encoding=UTF-8
gg.handler.hbase.pkUpdateHandling=abend
gg.handler.hbase.nullValueRepresentation=CDATA[NULL]
gg.handler.hbase.authType=none
gg.handler.hbase.includeTokens=false

gg.handler.hbase.mode=tx

goldengate.userexit.timestamp=utc
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE

gg.log=log4j
gg.log.level=INFO

gg.report.time=30sec

gg.classpath=/u01/hbase/lib/*:/u01/hbase/conf:/usr/lib/hadoop/client/*

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

In second, we have to prepare a parameter file for our initial load. I used a simple file with minimum parameters.

[oracle@sandbox oggbd]$ cat dirprm/irhbase.prm
-- passive REPLICAT irhbase
-- Trail file for this example is located in "./dirdat/initld" file
-- Command to add REPLICAT
-- run replicat irhbase:
-- ./replicat paramfile dirprm/irhbase.prm reportfile dirrpt/irhbase.rpt
SPECIALRUN
END RUNTIME
EXTFILE /u01/oggbd/dirdat/initld
TARGETDB LIBFILE libggjava.so SET property=dirprm/hbase.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAP GGTEST.*, TARGET BDTEST.*;

Having that configuration file we can run the replicat in passive mode from command line and see the result.
Here is initial status for HBASE:

hbase(main):001:0> version
1.1.4, r14c0e77956f9bb4c6edf0378474264843e4a82c3, Wed Mar 16 21:18:26 PDT 2016

hbase(main):001:0> list
TABLE
0 row(s) in 0.3340 seconds

=> []
hbase(main):002:0>

Running the replicat:

oracle@sandbox oggbd]$ ./replicat paramfile dirprm/irhbase.prm reportfile dirrpt/irhbase.rpt
[oracle@sandbox oggbd]$

Now we have 2 tables in HBASE:

hbase(main):002:0> list
TABLE
BDTEST:TEST_TAB_1
BDTEST:TEST_TAB_2
2 row(s) in 0.3680 seconds

=> ["BDTEST:TEST_TAB_1", "BDTEST:TEST_TAB_2"]
hbase(main):003:0>

Let’s have a look to the tables structure and contains:


hbase(main):004:0> describe 'BDTEST:TEST_TAB_1'
Table BDTEST:TEST_TAB_1 is ENABLED
BDTEST:TEST_TAB_1
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.2090 seconds

hbase(main):005:0> scan 'BDTEST:TEST_TAB_1'
ROW                                            COLUMN+CELL
 1                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-01-22:12:14:30
 1                                             column=cf:PK_ID, timestamp=1459269153102, value=1
 1                                             column=cf:RND_STR, timestamp=1459269153102, value=371O62FX
 1                                             column=cf:RND_STR_1, timestamp=1459269153102, value=RJ68QYM5
 1                                             column=cf:USE_DATE, timestamp=1459269153102, value=2014-01-24:19:09:20
 2                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-05-11:05:23:23
 2                                             column=cf:PK_ID, timestamp=1459269153102, value=2
 2                                             column=cf:RND_STR, timestamp=1459269153102, value=371O62FX
 2                                             column=cf:RND_STR_1, timestamp=1459269153102, value=HW82LI73
 2                                             column=cf:USE_DATE, timestamp=1459269153102, value=2014-01-24:19:09:20
 3                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-01-22:12:14:30
 3                                             column=cf:PK_ID, timestamp=1459269153102, value=3
 3                                             column=cf:RND_STR, timestamp=1459269153102, value=RXZT5VUN
 3                                             column=cf:RND_STR_1, timestamp=1459269153102, value=RJ68QYM5
 3                                             column=cf:USE_DATE, timestamp=1459269153102, value=2013-09-04:23:32:56
 4                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-05-11:05:23:23
 4                                             column=cf:PK_ID, timestamp=1459269153102, value=4
 4                                             column=cf:RND_STR, timestamp=1459269153102, value=RXZT5VUN
 4                                             column=cf:RND_STR_1, timestamp=1459269153102, value=HW82LI73
 4                                             column=cf:USE_DATE, timestamp=1459269153102, value=2013-09-04:23:32:56
4 row(s) in 0.1630 seconds

hbase(main):006:0> scan 'BDTEST:TEST_TAB_2'
ROW                                            COLUMN+CELL
 7|IJWQRO7T|2013-07-07:08:13:52                column=cf:ACC_DATE, timestamp=1459269153132, value=2013-07-07:08:13:52
 7|IJWQRO7T|2013-07-07:08:13:52                column=cf:PK_ID, timestamp=1459269153132, value=7
 7|IJWQRO7T|2013-07-07:08:13:52                column=cf:RND_STR_1, timestamp=1459269153132, value=IJWQRO7T
1 row(s) in 0.0390 seconds

hbase(main):007:0>

Everything looks good for me. We have structure and records as expected. Let’s go forward and setup ongoing replication.
I have created a parameter file for my replicat using the the initial load parameters as a basis:

[oracle@sandbox oggbd]$ cat dirprm/rhbase.prm
REPLICAT rhbase
-- Trail file for this example is located in "dirdat/or" directory
-- Command to add REPLICAT
-- add replicat rhbase, exttrail dirdat/or
TARGETDB LIBFILE libggjava.so SET property=dirprm/hbase.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAP ggtest.*, TARGET bdtest.*;

We are checking our trail files and starting our replicat using the latest trail file. By default, a replicat would be looking for a trail with sequential number 0, but, since I have a purging policy on my GoldenGate it deletes old files and I need tell to replicat where to start exactly.

[oracle@sandbox oggbd]$ ll dirdat/
total 4940
-rw-r-----. 1 oracle oinstall    3028 Feb 16 14:17 initld
-rw-r-----. 1 oracle oinstall 2015199 Mar 24 13:07 or000043
-rw-r-----. 1 oracle oinstall 2015229 Mar 24 13:08 or000044
-rw-r-----. 1 oracle oinstall 1018490 Mar 24 13:09 or000045
[oracle@sandbox oggbd]$ ggsci

Oracle GoldenGate Command Interpreter
Version 12.2.0.1.0 OGGCORE_12.2.0.1.0_PLATFORMS_151101.1925.2
Linux, x64, 64bit (optimized), Generic on Nov 10 2015 16:18:12
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved.



GGSCI (sandbox.localdomain) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING


GGSCI (sandbox.localdomain) 2> add replicat rhbase, exttrail dirdat/or,EXTSEQNO 45
REPLICAT added.


GGSCI (sandbox.localdomain) 3> start replicat rhbase

Sending START request to MANAGER ...
REPLICAT RHBASE starting


GGSCI (sandbox.localdomain) 4> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
REPLICAT    RUNNING     RHBASE      00:00:00      00:00:06


GGSCI (sandbox.localdomain) 5> info rhbase

REPLICAT   RHBASE    Last Started 2016-03-29 12:56   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:08 ago)
Process ID           27277
Log Read Checkpoint  File dirdat/or000045
                     2016-03-24 13:09:02.000274  RBA 1018490


GGSCI (sandbox.localdomain) 6>

I inserted number of rows to test_tab_1 on oracle side and all of them were successfully replicated to HBASE.

hbase(main):015:0> count 'BDTEST:TEST_TAB_1'
Current count: 1000, row: 1005694
Current count: 2000, row: 442
Current count: 3000, row: 6333
3473 row(s) in 1.0810 seconds

=> 3473
hbase(main):016:0>

Let’s have a look bit close to test_tab_1 and test_tab_2:

hbase(main):005:0> scan 'BDTEST:TEST_TAB_1'
ROW                                            COLUMN+CELL
 1                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-01-22:12:14:30
 1                                             column=cf:PK_ID, timestamp=1459269153102, value=1
 1                                             column=cf:RND_STR, timestamp=1459269153102, value=371O62FX
 1                                             column=cf:RND_STR_1, timestamp=1459269153102, value=RJ68QYM5
 1                                             column=cf:USE_DATE, timestamp=1459269153102, value=2014-01-24:19:09:20
 2                                             column=cf:ACC_DATE, timestamp=1459269153102, value=2014-05-11:05:23:23
 2                                             column=cf:PK_ID, timestamp=1459269153102, value=2
 2                                             column=cf:RND_STR, timestamp=1459269153102, value=371O62FX
 2                                             column=cf:RND_STR_1, timestamp=1459269153102, value=HW82LI73
 2                                             column=cf:USE_DATE, timestamp=1459269153102, value=2014-01-24:19:09:20
..............................................

hbase(main):006:0> scan 'BDTEST:TEST_TAB_2'
ROW                                            COLUMN+CELL
 7|IJWQRO7T|2013-07-07:08:13:52                column=cf:ACC_DATE, timestamp=1459269153132, value=2013-07-07:08:13:52
 7|IJWQRO7T|2013-07-07:08:13:52                column=cf:PK_ID, timestamp=1459269153132, value=7
 7|IJWQRO7T|2013-07-07:08:13:52                column=cf:RND_STR_1, timestamp=1459269153132, value=IJWQRO7T
1 row(s) in 0.0390 seconds

hbase(main):007:0>

You can see that row identifier for test_tab_1 is value for pk_id and for test_tab_2 it is concatenation of all values for all columns. Why is it so? The difference is in constraints for the tables. Since we don’t have a primary key or unique index for test_tab_2 it uses all columns as a key value. We can try to add a constraint and see the result.

select * from dba_constraints where owner='GGTEST' and table_name='TEST_TAB_2';

no rows selected

alter table ggtest.test_tab_2 add constraint pk_test_tab_2 primary key (pk_id);

Table altered.

insert into ggtest.test_tab_2 values(9,'PK_TEST',sysdate,null);

1 row created.

commit;

Commit complete.

orcl>

And let us comare with result on the HBASE:

hbase(main):012:0> scan 'BDTEST:TEST_TAB_2'
ROW                                           COLUMN+CELL
 7|IJWQRO7T|2013-07-07:08:13:52               column=cf:ACC_DATE, timestamp=1459275116849, value=2013-07-07:08:13:52
 7|IJWQRO7T|2013-07-07:08:13:52               column=cf:PK_ID, timestamp=1459275116849, value=7
 7|IJWQRO7T|2013-07-07:08:13:52               column=cf:RND_STR_1, timestamp=1459275116849, value=IJWQRO7T
 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER   column=cf:ACC_DATE, timestamp=1459278884047, value=2016-03-29:15:14:37
 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER   column=cf:PK_ID, timestamp=1459278884047, value=8
 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER   column=cf:RND_STR_1, timestamp=1459278884047, value=TEST_INS1
 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER   column=cf:TEST_COL, timestamp=1459278884047, value=TEST_ALTER
 9                                            column=cf:ACC_DATE, timestamp=1462473865704, value=2016-05-05:14:44:19
 9                                            column=cf:PK_ID, timestamp=1462473865704, value=9
 9                                            column=cf:RND_STR_1, timestamp=1462473865704, value=PK_TEST
 9                                            column=cf:TEST_COL, timestamp=1462473865704, value=NULL
3 row(s) in 0.0550 seconds

hbase(main):013:0>

It is fully dynamic and changed row id column on the fly. Will it work with unique index? Yes it will :


delete from ggtest.test_tab_2 where pk_id=9;

1 row deleted.

alter table ggtest.test_tab_2 drop constraint pk_test_tab_2;

Table altered.

create unique index ggtest.ux_test_tab_2 on ggtest.test_tab_2 (pk_id);

Index created.

insert into ggtest.test_tab_2 values(10,'UX_TEST',sysdate,null);

1 row created.

commit;

Here is the newly inserted row.

hbase(main):017:0> scan 'BDTEST:TEST_TAB_2'
ROW                                           COLUMN+CELL
 10                                           column=cf:ACC_DATE, timestamp=1462474389145, value=2016-05-05:14:53:03
 10                                           column=cf:PK_ID, timestamp=1462474389145, value=10
 10                                           column=cf:RND_STR_1, timestamp=1462474389145, value=UX_TEST
 10                                           column=cf:TEST_COL, timestamp=1462474389145, value=NULL
 7|IJWQRO7T|2013-07-07:08:13:52               column=cf:ACC_DATE, timestamp=1459275116849, value=2013-07-07:08:13:52
 7|IJWQRO7T|2013-07-07:08:13:52               column=cf:PK_ID, timestamp=1459275116849, value=7

But it will not make any difference if we just create an index on the source. It will not change anything. So, if we need to identify a key for a table we have to have at least unique constraint. Of course it is just default behavior for a schema replication and we may use KEYCOLS to identify keys for some tables.

Interesting that if we change a table structure it will affect all newly inserted rows but will not change existing even if we update some values. It works by this way if you have an unique identifier and it was not changed by your DDL operation.
Here is an example. We have a column “TEST_COL” in the table test_tab_2. Let’s drop the column and update the row. Keep in mind that our primary key is column PK_ID and we are not modifying the key.

alter table ggtest.test_tab_2 drop column TEST_COL;

Table altered.

update ggtest.test_tab_2 set rnd_str_1='TEST_COL' where pk_id=9;

1 row updated.

commit;

In HBASE we can see the same set of columns:

hbase(main):030:0> scan 'BDTEST:TEST_TAB_2'
ROW                                           COLUMN+CELL
 9                                            column=cf:ACC_DATE, timestamp=1462477581440, value=2016-05-05:15:46:13
 9                                            column=cf:PK_ID, timestamp=1462477794597, value=9
 9                                            column=cf:RND_STR_1, timestamp=1462477794597, value=TEST_COL
 9                                            column=cf:TEST_COL, timestamp=1462477581440, value=NULL
1 row(s) in 0.0200 seconds

We still have the deleted column TEST_COL even we’ve updated the row.
But if we insert any new row it will have the new set of columns:

insert into ggtest.test_tab_2 values(10,'TEST_COL',sysdate);

1 row created.

commit;

Commit complete.

And in HBASE:

hbase(main):031:0> scan 'BDTEST:TEST_TAB_2'
ROW                                           COLUMN+CELL
 10                                           column=cf:ACC_DATE, timestamp=1462477860649, value=2016-05-05:15:50:55
 10                                           column=cf:PK_ID, timestamp=1462477860649, value=10
 10                                           column=cf:RND_STR_1, timestamp=1462477860649, value=TEST_COL
 9                                            column=cf:ACC_DATE, timestamp=1462477581440, value=2016-05-05:15:46:13
 9                                            column=cf:PK_ID, timestamp=1462477794597, value=9
 9                                            column=cf:RND_STR_1, timestamp=1462477794597, value=TEST_COL
 9                                            column=cf:TEST_COL, timestamp=1462477581440, value=NULL
2 row(s) in 0.0340 seconds

And, as for all other cases, truncate on source table is not going to be replicated to the target and the operation will be ignored. You have to truncate the table in HBASE by yourself to keep the data in sync. In case you insert data again the data in HBASE will be “updated”. But it will not delete other rows. It will be more like a “merge” operation.
Here is an example:

truncate table ggtest.test_tab_2;

Table truncated.

insert into ggtest.test_tab_2 values(10,'TEST_COL2',sysdate);

1 row created.

commit;

Commit complete.

select * from ggtest.test_tab_2;

	   PK_ID RND_STR_1  ACC_DATE
---------------- ---------- -----------------
	      10 TEST_COL2  05/05/16 16:01:20

orcl>

HBASE:
hbase(main):033:0> scan 'BDTEST:TEST_TAB_2'
ROW                                           COLUMN+CELL
 10                                           column=cf:ACC_DATE, timestamp=1462478485067, value=2016-05-05:16:01:20
 10                                           column=cf:PK_ID, timestamp=1462478485067, value=10
 10                                           column=cf:RND_STR_1, timestamp=1462478485067, value=TEST_COL2
 9                                            column=cf:ACC_DATE, timestamp=1462477581440, value=2016-05-05:15:46:13
 9                                            column=cf:PK_ID, timestamp=1462477794597, value=9
 9                                            column=cf:RND_STR_1, timestamp=1462477794597, value=TEST_COL
 9                                            column=cf:TEST_COL, timestamp=1462477581440, value=NULL
2 row(s) in 0.0300 seconds

hbase(main):034:0>

I spent some time testing performance and found the main bottleneck was my Oracle source rather than GoldenGate and HBASE. I was able to sustain transaction rate up to 60 DML per second and my Oracle DB started to struggle to keep pace because of waiting for a commit. The HBASE and replicat were absolutely fine. I also checked how it handles big transactions and inserted about 2 billion rows by one transaction. It worked fine. Of course it doesn’t prove that any of your production configurations will be without any performance issues. To conduct real performance tests I need to use much bigger environment.
In addition, I noticed one more minor error in Oracle documentation for adapter related to “keyValuePairDelimiter” parameter. In documentation it is replaced by “keyValueDelimiter”. It just small mistype and the “keyValueDelimiter” is repeated twice. First time it is correct and the second time it stands on the place where “keyValuePairDelimiter” is supposed to be. Here is the link.

As a summary I can say that despite some minor issues the adapters and GoldenGate for Big Data showed quite mature status and readiness for real work. I think it is good robust technology and, hopefully, its development will continue improving it with new releases. I am looking forward to use it in a real production environment with significant workload. In following posts I will try to test different DDL operations and maybe some other datatypes. Stay tuned.

Categories: DBA Blogs

Oracle Management Cloud – Log Analytics

Marco Gralike - Wed, 2016-05-11 10:18
In this last overview post I will give you a first glance of the Oracle…

run sql from windows cmd prompt CLI

Learn DB Concepts with me... - Wed, 2016-05-11 09:36

Method 1:

set ORACLE_HOME=D:\app\oracle\product\11.2.0\db_1
set ORACLE_SID=ORCL

sqlplus -s "USERNAME/PASSWORD" @C:\Shell\Drop_objs.sql

sqlplus -s "USERNAME/PASSWORD" @C:\Shell\Change_pass.sql



Method 2:

set ORACLE_HOME=D:\app\oracle\product\11.2.0\db_1
set ORACLE_SID=ORCL
(ECHO select username from dba_users where username in ('SCOTT');
ECHO exit;) | sqlplus -s "USER/PASS" > C:\Shell\test_out.txt



Categories: DBA Blogs

Oracle Midlands : Event #15

Tim Hall - Wed, 2016-05-11 08:42

Don’t forget Oracle Midlands Event #15 next week!

om15

Please show your support and come along. It’s free thanks to the sponsorship by RedStackTech.

Cheers

Tim…

Oracle Midlands : Event #15 was first posted on May 11, 2016 at 2:42 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

MobaXterm 9.0

Tim Hall - Wed, 2016-05-11 08:21

Another recent release that I managed to miss was

dbms_xplan

Jonathan Lewis - Wed, 2016-05-11 06:22

My favourite format options for dbms_xplan.display_cursor().

This is another of those posts where I tell you about something that I’ve frequently mentioned but never documented explicitly as a good (or, at least, convenient) idea. It also another example of how easy it is to tell half the story most of the time when someone asks a “simple” question.

You’re probably familiar with the idea of “tuning by cardinality feedback” – comparing the predicted data volumes with the actual data volumes from an execution plan – and I wrote a short note about how to make that comparison last week; and you’re probably familiar with making a call to dbms_xplan.display_cursor() after enabling the capture of rowsource execution statistics (in one of three ways) for the execution of the query, and the format parameter usually suggested for the call is ‘allstats last’ to get the execution stats for the most recent execution of the query. I actually like to see the Cost column of the execution plan as well, so I usually add that to the format, so (with all three strategies shown for an SQL*Plus environment):

set linesize 180
set trimspool on
set pagesize 60
set serveroutput off

alter session set "_rowsource_execution_statistics"=true;
alter session set statistics_level=all;

select /*+ gather_plan_statistics */ * from user_tablespaces;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

So what do we often forget to mention:

  • For SQL*Plus it is important to ensure that serveroutput is off
  • The /*+ gather_plan_statistics */ option uses sampling, so may be a bit inaccurate
  • The two accurate strategies may add a significant, sometimes catastrophic, amount of CPU overhead
  • This isn’t appropriate if the query runs parallel

For a parallel query the “last” execution of a query is typically carried out by the query co-ordinator, so the rowsource execution stats of many (or all) of the parallel execution slaves are likely to disappear from the output. If you’re testing with parallel queries you need to add some “tag” text to the query to make it unique and omit the ‘last’ option from the format string.

Now, a common suggestion is that you need to add the ‘all’ format option instead – but this doesn’t mean “all executions” it means (though doesn’t actually deliver) all the data that’s available about the plan. So here’s an execution plans produced after running a parallel query and using ‘allstats all’ as the format option (t1 is a copy of all_objects, and this demo is running on 12.1.0.2).

SQL_ID  51u5j42rvnnfg, child number 1
-------------------------------------
select  /*+   parallel(2)  */  object_type,  sum(object_id) from t1
group by object_type order by object_type

Plan hash value: 2919148568

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |       |   113 (100)|          |        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |       |            |          |        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,01 | P->S | QC (ORDER) |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,01 | PCWP |            |     30 |00:00:00.01 |       0 |      0 |  2048 |  2048 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,01 | PCWP |            |     50 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,00 | P->P | RANGE      |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY      |          |      2 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,00 | PCWP |            |     50 |00:00:00.05 |    1492 |   1440 |  1048K|  1048K|     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  85330 |  1166K|   105   (2)| 00:00:01 |  Q1,00 | PCWC |            |  85330 |00:00:00.03 |    1492 |   1440 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  85330 |  1166K|   105   (2)| 00:00:01 |  Q1,00 | PCWP |            |  85330 |00:00:00.01 |    1492 |   1440 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   8 - SEL$1 / T1@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(:Z>=:Z AND :Z<=:Z)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "OBJECT_TYPE"[VARCHAR2,23], SUM()[22]
   2 - (#keys=0) "OBJECT_TYPE"[VARCHAR2,23], SUM()[22]
   3 - (#keys=1; rowset=200) "OBJECT_TYPE"[VARCHAR2,23], SUM()[22]
   4 - (rowset=200) "OBJECT_TYPE"[VARCHAR2,23], SYS_OP_MSR()[25]
   5 - (#keys=1) "OBJECT_TYPE"[VARCHAR2,23], SYS_OP_MSR()[25]
   6 - (rowset=200) "OBJECT_TYPE"[VARCHAR2,23], SYS_OP_MSR()[25]
   7 - (rowset=200) "OBJECT_ID"[NUMBER,22], "OBJECT_TYPE"[VARCHAR2,23]
   8 - (rowset=200) "OBJECT_ID"[NUMBER,22], "OBJECT_TYPE"[VARCHAR2,23]

Note
-----
   - Degree of Parallelism is 2 because of hint


48 rows selected.

You’ll notice we’ve reported the “alias” and “projection” information – those are two of the format options that you can use with a + or – to include or exclude if you want. We’ve also got E-Bytes and E-time columns in the body of the plan. In other words (at least in my opinion) we’ve got extra information that makes the output longer and wider and therefore harder to read.

The format string I tend to use for parallel query is ‘allstats parallel cost’ – which (typically) gives something like the following:

SQL_ID  51u5j42rvnnfg, child number 1
-------------------------------------
select  /*+   parallel(2)  */  object_type,  sum(object_id) from t1
group by object_type order by object_type

Plan hash value: 2919148568

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |   113 (100)|        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |     30 |   113   (9)|  Q1,01 | P->S | QC (ORDER) |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |     30 |   113   (9)|  Q1,01 | PCWP |            |     30 |00:00:00.01 |       0 |      0 |  2048 |  2048 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |     30 |   113   (9)|  Q1,01 | PCWP |            |     50 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |     30 |   113   (9)|  Q1,00 | P->P | RANGE      |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY      |          |      2 |     30 |   113   (9)|  Q1,00 | PCWP |            |     50 |00:00:00.05 |    1492 |   1440 |  1048K|  1048K|     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  85330 |   105   (2)|  Q1,00 | PCWC |            |  85330 |00:00:00.03 |    1492 |   1440 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  85330 |   105   (2)|  Q1,00 | PCWP |            |  85330 |00:00:00.01 |    1492 |   1440 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
   - Degree of Parallelism is 2 because of hint


30 rows selected.

Of course you may prefer ‘allstats all’ – and sometimes I do actually want to see the alias or projection information – but I think there’s so much information available on the execution plan output that anything that makes it a little shorter, cleaner and tidier is a good thing.

You might have noticed, by the way, that the Buffers, Reads, and A-Time columns have still managed to lose information on the way up from operation 6; information that should have been summing up the plan has simply disappeared.  Make sure you do a sanity check for disappearing numbers when you’re looking at more complex plans.

 


DBaaS in Oracle Public Cloud

Pat Shuff - Wed, 2016-05-11 02:07
Before we dive deep into database as a service with Oracle we need to define some terms. We have thrown around concepts like Standard Edition, Enterprise Edition, High Performance Edition, and Extreme Performance Edition. We have talked about concepts like DataGuard, Real Application Clustering, Partitioning, and Compression. Today we will dive a little deeper into this so that we can focus on comparing them running in the Oracle Public Cloud as well as other cloud providers.

First, let's tackle Standard Edition (SE) vs Enterprise Edition (EE). Not only is there a SE, there is a SE One and SE2. SE2 is new with the 12c release of the database and the same as SE and SE1 but with different processor and socket restrictions. The Oracle 12c documentation details the differences between the different versions. We will highlight the differences here. Note that you can still store data. The data types do not change between the versions of the database. A select statement that works in SE will work in SE2 and will work in EE.

The first big difference between SE and EE is that SE is licensed on a per socket basis and EE is licensed on a per core basis. The base cost of a SE system is $600 per month per processor in the Oracle Public Cloud. The Standard Edition is limited to 8 cores in the cloud. If you are purchasing a perpetual license the cost is $17,500 and can run across two sockets or single sockets on two systems. The SE2 comes with a Real Application Cluster (RAC) license so that you can have a single instance running on two computers. The SE2 instance will also limit the database to run in 16 threads so running in more cores will have no advantage. To learn more about the differences and limitations, I recommend reading Mike Dietrich's Blog on SE2.

The second big difference is that many of the optional features are not available with SE. For example, you can't use diagnostics and tuning to figure out if your sql command is running at top efficiency. You can't use multi-tenant but you can provision a single pluggable database. This means that you can unplug and move the database to another database (and even another version like EE). The multi-tenant option allows you to have multiple pluggable databases and control them with a master SGA. This allows admins to backup and patch a group of databases all at once rather than having to patch each one individually. You can separate security and have different logins to the different databases but use a global system or sys account to manage and control all of the databases. Storage optimization features like compression and partitioning are not available in SE either. Data recovery features like DataGuard and FlashBack are not supported in SE. DataGuard is a feature that copies changes from one system through the change logs and apply them to the second system. FlashBack does something similar and allows you to query a database at a previous time and return the state of the database at that time. It uses the change log to reconstruct the database as it was at the time requested. Tools like RMAN backup and streams don't work in SE. Taking a copy of a database and copying it to another system is not allowed. The single exception to this is RMAN works in the cloud instance but not in the perpetual on-premise version. Security like Transparent Data Encryption, Label Security, Data Vault, and Audit Vault are not supported in SE. The single exception is transparent data encryption to allow for encryption in the public cloud is supported for SE. All of these features are described here.

When we get Enterprise Edition in the Oracle Public Cloud at $3K/OCPU/month or $5.04/OCPU/hour and the only option that we get is transportation data encryption (TDE) bundled with the database. This allows us to encrypt all or part of a table. TDE encrypts data on the disk when it is written with a SQL insert or update command. Keys are used to encrypt this data and can only be read by presenting the keys using the Oracle Wallet interface. More information on TDE can be found here. The Security Inside Out blog is also a good place to look for updates and references relating to TDE. This version of the database allows us to scale upto 16 processors and 4.6 TB of storage. If we are looking to backup this database, the largest size that we can have for storage is 2.3 TB. If our table requirements are greater than 2.3 TB or 4.6 TB you need to go to Exadata as a Service or purchase a perpetual license and run it on-premise. If we are looking to run this database in our data center we will need to purchase a perpetual license for $47.5K per processor license. If you are running on an IBM Power Server you need to license each processor per core. If you are running on x86 or Sparc servers you multiply the number of cores by 0.5 and can run two cores per processor license. TDE is part of the Advanced Security Option which lists for $15K per processor license. When calculating to see if it is cheaper to run on-premise vs the public cloud you need to factor in both license requirements. The same is true if you decide to run EE in AWS EC2 or Azure Compute. Make sure to read Cloud Licensing Requirements to understand the limits of the cost of running on EC2 or Azure Compute. Since all cloud providers use x86 processors the multiplication factor is 0.5 times the number of cores on the service.

The High Performance Edition contains the EE features, TDE, as well as multi-tenant, partitioning, advanced compression, advanced security, real application testing, olap, DataGuard, and all of the database management packs. This is basically everything with the exception of Real Application Clusters (RAC), Active DataGuard, and In-Memory options. High Performance comes in at $4K/processor/month or $6.72/OCPU/hour. If we wanted to bundle all of this together and run it in our data center we need to compare the database at $47.5K/processor license plus roughly $15K/processor/option (there are 12 of them). We can then calculate which is cheaper based on our accounting rules and amortization schedule. The key differential is that I can use this version on an hourly or monthly basis for less than a full year. For example, if we do patch testing once a quarter and allocate three weeks a quarter to test if the patch is good or bad, we only need 12 weeks a year to run the database. This basically costs us $12K/processor/year to test on a single processor and $24K on a dual processor. If we purchased the system it would cost us $47.5K capital expenditure plus 22% annually for support. Paying this amount just to do patch testing does not make sense. With the three year cost of ownership running this on premise will cost us $78,850. If we use the metered services in the public cloud this will cost us $72K. The $6,850 does not seem like a lot but with the public cloud service we won't need to pay for the hardware, storage, or operating system. We can provision the cloud service in an hour and replicate our on site data to the cloud for the testing. If we did this to a computer or virtual image on site it will take hours/days to provision a new computer, storage, database, and replicate the data.

It is important to note here that you need to be careful with virtualization. You need to use software that allows for hard partitioning. Products like VMWare and HyperV are soft partitioning virtualization software. This means that you can grow the number of processors dynamically and are required to license the Oracle software for the potential high water mark or all of the cores in the cluster. If you are running on something like a Cisco UCS blade server that has a dual socket 16 core processor, you must license all 32 cores to run the database even though you might just create a 2 core virtual instance in this VMWare installation. It gets even worse if you cluster 8 blades into one cluster then you must license all 256 cores. This get a little expensive at $47.5K times 128 processors. Products like OracleVM, Solaris Contailers, and AIX LPARs solve this cost problem with hard partitions.

The third enterprise edition is the Extreme Performance Edition of the database. This feature is $5K/OCPU/month or $8.401/processor/hour. This option comes with RAC, Active DataGuard, and In-Memory. RAC allows you to run across multiple compute instances and restart queries that might fail if one node fails. Active DataGuard allows you to have two databases replicating to each other and for both to be open and active at the same time. Regular or passive DataGuard allows you to replicate the data but not keep the target open and active. In-Memory allows you to store data not only in row format but in column format. When data is entered into the table it is stored on disk in row format. A copy is also placed in memory but stored in column format. This allows you to search faster given that you have already sorted the data in memory and can skip stuff that does not apply to your search. This is typically done with an index but we can't always predict what questions that the users are going to ask and adding too many indexes slows down all operations.

It is important to reiterate that we can take our perpetual license and run it in IaaS or generic compute. We can also effectively lease these licenses on a monthly or hourly rate. If you are running the database, you are consuming licenses. If you stop the database, you stop consuming the database license but continue to consume the storage and processor services. If you terminate the database you stop consuming the database, processor, and storage services because they are all deleted upon termination.

In summary, there are four flavors of DBaaS; Standard Edition, Enterprise Edition, High Performance Edition, and Extreme Performance Edition. Standard Edition and Enterprise Edition are available by other cloud providers but some require perpetual licenses and some do not. If you decide to run this service as PaaS or DBaaS in the Oracle Public Cloud you can pay hourly or monthly and start/stop these services if they are metered to help save money. All of these services come with partial management features offloaded and done by Oracle. Backups, patches, and, restart of services are done automatically for you. This allows you to focus more on how to apply the database service to provide business benefits rather than the feeding and maintenance to keep the database operational.

Up next, we will dive into use cases for database as a service and look at different configurations and pricing models to solve a real business problem.

Generate Multiple AWR Reports Quickly

VitalSoftTech - Tue, 2016-05-10 20:40
Occasionally there is a need to generate multiple AWR reports for database analysis. In my case, a storage vendor will use a tool to extract data from all time periods from the AWR reports to find IO related specific information. Here is how I generated these reports.
Categories: DBA Blogs

Join Oracle for a Hackathon on 5/18 - New York, NY

OTN TechBlog - Tue, 2016-05-10 14:48

Developing and Deploying Java SE, Node.JS Managed Microservices in the Cloud

Join us Wed 5/18 for a fun, hands-on, informative Hackathon day.  Learn lightweight microservices development using Java 8, Javascript development and cloud devops tools developing with the latest Java 8 features, JAX-RS via Jersey/Grizzly, Node.js, a new front end Javascript toolkit called JET and get first hand experience with Developer Cloud Service and Application Container Cloud Service (Java SE and Node.js) from Oracle. 

A foundation reference implementation and source code will be provided of a back end Java SE microservice that authenticates to Twitter via OAuth using only Jersey/Grizzly without a servlet container or Java EE container and exposing a real time tweet stream to a front end Node.js instance running the JET Javascript toolkit and building something new and exciting from that foundation.

Seats are very limited so register early!

Learn how to optimize text searches in SQL Server 2014 by using Full-Text Search – Part 1

Pythian Group - Tue, 2016-05-10 09:50

In this article, we’ll cover the functionality available in SQL Server 2014 for textual research known as Full-Text Search, its installation and implementation. Additionally, we will see how to develop textual searches using the predicates CONTAINS, CONTAINSTABLE, FREETEXT and FREETEXTTABLE, and use the FILESTREAM feature to improve the research and storage of binary data in tables.

The research based on words and phrases is one of the main features of the search tools on the web, like Google, and digital document management systems. To perform these searches efficiently, many developers create highly complex applications that do not have the necessary intelligence to find terms and phrases in the columns that store text and digital documents in the database tables.

What the vast majority of these professionals don’t know is that SQL Server has an advanced tool for textual research, the Full-Text Search (FTS).

FTS has been present in SQL Server since version 7, and through use textual searches can be performed both in columns that store characters, and in columns that store documents (for example, Office documents and PDFs), in its native form.

With options like searches for words and phrases, recognition of different languages, derivation of words (for example: play, played and playing), the possibility of developing a thesaurus, the creation of ranked results, and elimination of stopwords for search, FTS becomes a powerful tool for textual searches. As main factors for the use of textual searches we have:

  • The current databases are increasingly used as repositories of digital documents;
  • The cost for storage of information has slowed considerably, enabling the storage of Gigabytes, Terabytes and even Petabytes;
  • New types of digital documents are constantly being created, and the requirements for their storage, and subsequent research, are becoming larger and more complex;
  • Developers need a robust and reliable interface for performing textual research intelligence.

FTS has great advantages over other alternatives for textual research. For example, the command LIKE. The main tasks you can perform with FTS are:

  • Textual research based on linguistics. A linguistic research is based on words or phrases in a particular language, taking into consideration the verb conjugation, derived words, accent, among other features. Unlike the LIKE predicate, FTS uses an efficient indexing structure to perform textual research;
  • Automatic removal of stopwords informed in a textual research. The following are considered stopwords ones that don’t add to the result of the survey, such as from, to, the, the, a, an;
  • Assigning weights to the terms searched, making certain words are more important than others within the same textual research;
  • Generation of prioritization, allowing a better view of the documents that are most relevant according to the research carried out;
  • Indexing and searching in the most diverse types of digital documents. With FTS you can carry out searches in text files, spreadsheets, ZIP files, among others.

In this article will describe the architecture of the FTS, your installation and configuration, the main T-SQL commands used in textual research, the use of FTS in conjunction with the FILESTREAM, and also some techniques to optimize searches through the FTS.

FTS architecture

The architecture of the FTS has several components working in conjunction with the SQL Server query processor to perform textual research efficiently. The Figure 1 illustrates the major components of the architecture of the FTS. Let’s look at some of them:

  • Client Consultation: The client application sends the textual queries to the SQL Server query processor. It is the responsibility of the client application to ensure that the textual queries are written in the right way by following the syntax of FTS;
  • SQL Server Process (sqlservr.exe): The SQL Server process contains the query processor and also the engine of the FTS, which compiles and executes the textual queries. The integration between SQL Server and process the FTS offers a significant performance boost because it allows the query processor lot more efficient execution plans for textual searches;
  • SQL Server Query Processor: The query processor has multiple subcomponents that are responsible for validating the syntax, compile, generate execution plans and execute the SQL queries;
  • Full-Text Engine: When the SQL Server query processor receives a query FTS, it forwards the request to the FTS Engine. The Engine is responsible for validating FTS the FTS query syntax, check the full-text index, and then work together with the SQL Server query processor to return the textual search results;
  • Indexer: The indexer works in conjunction with other components to populate the full-text index;
  • Full-Text Index: The full-text index contains the most relevant words and their respective positions within the columns included in the index;
  • Stoplist: A stoplist is a list of stopwords for textual research. The indexer stoplist query during the indexing process and implementation of textual research to eliminate the words that don’t add value to the survey. SQL Server 2014 stores the stoplists within the database itself, thus facilitating their administration;
  • Thesaurus: The thesaurus is an XML file (stored externally to the database) in which you can define a list of synonyms that can be used for the textual research. The thesaurus must be based on the language that will be used in the search. The full-text engine reads the thesaurus file at the time of execution of research to verify the existence of synonyms that can increase the quality and comprehensiveness of the same;
  • Filter daemon host (fdhost.exe): Is responsible for managing the processes of filtering, word breaker and stemmer;
  • SQL Full-Text Filter Daemon Launcher (fdlauncher.exe): Is the process that starts the Filter daemon host (Fdhost.exe) when the full-text engine needs to use some of the processes managed by the same.

FTS1

Figure 1. Architecture of FTS.

For the better understanding of the process of creation, use and maintenance of the structure of full-text indexes, you must also know the meaning of some important concepts. They are:

  • Term: The word, phrase or character used in textual research;
  • Full-Text Catalog: A group of full-text indexes;
  • Word breaker: The process that is the barrier every word in a sentence, based on the grammar rules of the language selected for the creation of full-text index;
  • Token: A word, phrase or character defined by the word breaker;
  • Stemmer: The process that generates different verb forms for the words, based on the grammar rules of the language selected for the creation of full-text index;
  • Filter: Component responsible for extracting textual information from documents stored with the data type varbinary(max) and send this information to the process word breaker.
Indexing process

The indexing process is responsible for the initial population of a full-text index and update of this index when the data modifications occur on the columns that have been indexed by FTS. This initialization process and update the full-text index named crawl.

When the crawl process is started, the FTS component known as protocol handler accesses the data in the table being indexed and begins the process to load into memory the existing content in this table, also known as streaming. To have access to data which are stored on disk, the protocol handler allows FTS to communicate with the Storage Engine. After the end of streaming the filter daemon host process performs data filtering, and initiates the processes of word breaker and stemmer for the filling in of the full-text index.

During the indexing process the stoplist is queried to remove stopwords, and so fill the structure of the full-text index with words that are meaningful to the textual research. The last step of the indexing process is known as a master merge, in which every word indexed are grouped in a single full-text index.

Despite the indexing process requires a high i/o consumption, it is not necessary to the blocking of the data being indexed. However a query performed using a full-text index during the indexing process can generate a result incomplete.

Full-Text query processing

For the full-text query processing are used the same words and phrases limiters that were defined by the Word breaker during the indexing process. You can also use additional components, as for example, the stemmer and the thesaurus, depending on the full-text predicates (CONTAINS or FREETEXT) used in the textual research. The use of full-text predicates will be discussed later in this article.

The process stemmer generates inflectional forms of the searched words. For example, from the term “play” is searched also the terms “played”, “play”, “play” beyond the term itself “play”.

Through rules created in the thesaurus file you can use synonyms to replace or expand the searched terms. For example, when performing a textual search using the term “Ruby”, the full-text engine can replace it by the synonym “red”, or else expand the research considering the terms automatically “red”, “wine”, “Scarlet” and also “Ruby”.

After processing of the full-text query, the full-text engine provides information to SQL query processor that assist in creating an execution plan optimized for textual research. There is a greater integration between the full-text engine and the query processor of SQL (both are components of the SQL Server process), enabling textual searches are conducted in a more optimized.

In the next post of this 4 part series, we will learn how to install the FTS and how to use it. Stay tuned!

Categories: DBA Blogs

Oracle JET 2.0.1 - Upgrade for CRUD Sample

Andrejus Baranovski - Tue, 2016-05-10 09:39
Oracle JET 2.0.1 was released in April and I decided to upgrade my CRUD sample (based on ADF BC REST services) implemented with previous JET version. There is migration guide, describing main points to consider, while moving to the next JET version. I'm going to add few more points. Hopefully you will find it useful.

Here is CRUD sample UI running with Oracle JET 2.0.1 and interacting with ADF BC REST:


This sample comes with advanced validation rule integration, which is being enforced in ADF BC and propagated to JET UI (read my previous posts about this):


As per migration guide, you must update reference for Alta CSS (2.0.1):


Next update module names in main.js:


CRUD app is rendering paged table out of collection table datasource. Somehow it worked previously, without specifying ojs/ojcollectiontabledatasource module, now it must be specified, otherwise there is error about constructor:


There are changes on UI side. I have wrapped table/form into div with oj-flex-items-pad (includes padding). Table and form and places into separate div's, with oj-flex-item class:


To enable table pagination, I moved it out of div, in previous version it worked inside separate div:


Form group elements are wrapped into oj-flex/oj-flex-item class, this creates better layout for form items:


Update method was changed to handle ADF BC validation error in slightly different way. I'm showing error first and next resetting collection data - to return original value into the table.

Download sample application - JETCRUDApp_v6.zip.

Speaker Scores

Jonathan Lewis - Tue, 2016-05-10 07:10

I published a note this morning that I drafted in January 2015, and I didn’t notice that it had gone back in time to publish itself on the date that I first drafted it – and it’s already been tweeted twice so I can’t move it. So this is a temporary link to pop it to the head of the queue while leaving it where it first appeared.


KeePass 2.33

Tim Hall - Tue, 2016-05-10 03:30

I just noticed KeePass 2.33 was released a couple of days ago. You can download it here.

You can read about how I use KeePass and KeePassX2 on my Mac, Windows and Android devices here.

Cheers

Tim…

KeePass 2.33 was first posted on May 10, 2016 at 9:30 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Pages

Subscribe to Oracle FAQ aggregator