DBA Blogs

Pythian Tech Chatter

Pythian Group - Fri, 2016-03-18 13:23

At Pythian, the DevOps, Big Data, and Data Science teams use Slack for our IM system. We’re a diverse group drawn to a wide range of technologies so there’s some interesting and valuable chatter about what folks are reading. Here are some subjects that came across our channels the past couple of weeks:

DevOps

Pythian engineers support Solr and Elasticsearch for a number of clients. Here’s a great summary of the important tunables in elasticsearch:
https://tech.scrunch.com/blog/lessons-learned-from-a-year-of-running-elasticsearch-in-production/

We often are called into companies to help manage deployment infrastructure, and it is quite common to encounter an unwieldy monolithic application that has been cobbled together over the years. Our head of DevOps pointed out this book as a great resource for strategies to decompose those into microservices:
http://www.amazon.com/Microservices-Patterns-Applications-Designing-fine-grained/dp/069242427X

Big Data

As Data Lake architectures mature, we’re seeing more comprehensive offerings from vendors. This blog talks about Microsoft’s offerings. I like the integration of active directory for strong security and the U-SQL approach to pulling data from data lakes.  While I’m not a huge fan of C#, the concept of having a library of extractors and outputters is a nice nut and bolt approach:
http://tomkerkhove.ghost.io/2015/10/22/exploring-azures-data-lake/

Another thread from the Data Lakes discussion highlights the critical importance of Data Governance. Waterline’s Data Inventory tool is a strong player for MetaData/Governance automation:
http://blog.waterlinedata.com/blog/the-d-artagnan-of-hadoop-spoiler-alert-data-governance-for-hadoop

Data Science

Facebook released its implementation of deep learning neural nets last year. The Data Science team has been spending some time with it as they evaluate and build AI tools:
https://github.com/facebook/MemNN

The team has also been using some great Java tools for natural language processing from Stanford:
http://stanfordnlp.github.io/CoreNLP/index.html

Categories: DBA Blogs

The Art of Mobilising Oracle Forms

It is one thing to have a vision to mobilise parts of your business to realise benefits, but it is quite another thing to embark on an undertaking. Yes you want an amazing mobile experience, push...

We share our skills to maximize your revenue!
Categories: DBA Blogs

2016 Oracle Digital Transformation EMEA Partner Community Forum 12.-13. April 2016, Budapest

Registration for the FREE exclusive two-day EMEA Digital Transformation partner Community forum, April 12&13, Budapest (Hungary), is now open! This is a must attend event if you are: ...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Math Resources

Bobby Durrett's DBA Blog - Thu, 2016-03-17 17:51

I feel like I have not been posting very much on this blog lately. I have been focused on things outside of Oracle performance so I haven’t had a lot of new scripts to post.  I have been quietly updating my Python source code on GitHub so check that out. I have spent a lot of time educating myself in various ways including through the leadership and communication training program that comes from Toastmasters. My new job title is “Technical Architect” which is a form of technical leadership so I’m trying to expand myself beyond being an Oracle database administrator that specializes in performance tuning.

In addition to developing my leadership and communication skills I have gotten into a general computer science self-education kick. I took two introductory C.S. classes on edX. I also read a book on Linux hacking and a book on computer history. I was thinking of buying one of the Donald Knuth books or going through MIT’s free online algorithms class class 6.006. I have a computer science degree and spent two years in C.S. graduate school but that was a long time ago. It is kind of fun to refresh my memory and catch up with the latest trends. But the catch is that both the Knuth book and MIT’s 6.006 class require math that I either never learned or have forgotten. So, I am working my way through some math resources that I wanted to share with those who read this blog.

The first thing I did was to buy a computer math book, called Concrete Mathematics,  that seemed to cover the needed material. Reviews on Amazon.com recommended this book as good background for the Knuth series and one of the Oracle performance experts that I follow on Twitter recommended it for similar reasons. But, after finishing my second edX class I began exploring the MIT OCW math class that was a prerequisite to MIT’s 6.006 algorithms class. MIT calls the math class 6.042J and I am working through the Fall 2010 version of the class. There is a lot of overlap between the class and the book but they are not a perfect match. The book has some more difficult to follow material than the class. It is probably more advanced.  The class covers some topics, namely graph theory, that the book does not.  The free online class has some very good lecture videos by a top MIT professorTom Leighton. I even had my wife and daughters sit down and watch his first lecture with me on our family television for fun on my birthday.

The book led me to a great free math resource called Maxima. Maxima has all kinds of great math built into it such as solving equations, factoring integers, etc. Plus, it is free. There are other similar and I think more popular programs that are not free but for my use it was great to simply download Maxima and have its functionality at my fingertips.

The last resource that I wanted to mention is the Mathematics section of Stack Exchange. It is a pretty structured online forum with a question and answer format. It is helpful to me since I am going through 6.042J without a professor or teaching assistant to answer my questions. The people on math stack exchange are very helpful if you at least try to follow the etiquette for their forum. For example, they have an easy to use way to format math formulas in your questions and answers and the users of the forum expect you to use it. But it isn’t hard. I had one question from the Concrete Math book where I couldn’t understand the answer key in the back. I asked about it on stack exchange and got a great answer in no time.

Anyway, maybe all of this math and computer science study is a departure from my bread and butter Oracle database work and performance tuning. But the free online resources like the OCW web site, the Maxima program, and the stack exchange forum along with the book that I paid for are a great set of resources. I have already used some of the concepts that I have learned about number theory and its application to RSA encryption. But, at the same time I am enjoying studying these things and mostly see it as something fun to do in my spare time. (I’m weird I know.)

So, I have written this blog post to share the math related things that I am studying and using to those who might benefit from them. I am not a math expert, but I am getting a lot out of these materials. I hope that others find these resources as enjoyable and educational as I have.

Bobby

Categories: DBA Blogs

Drop all objects in your current schema

Learn DB Concepts with me... - Thu, 2016-03-17 14:49

**********************************************************************
You can use this simple pl-sql block to drop all objects in your current schema.
**********************************************************************

declare
VSQL varchar2(4000);
OBJ_NAME varchar2(100);
OBJ_TYPE varchar2(100);
OBJ_OWNER varchar2(100);
cursor c1 is select object_type,object_name from user_objects where object_type in ('TABLE','VIEW');

begin

open c1;
loop
fetch c1 into OBJ_TYPE,OBJ_NAME;
exit when c1%NOTFOUND;
IF OBJ_TYPE='TABLE' 
THEN
VSQL:=' drop '||OBJ_TYPE||' '||OBJ_NAME||' cascade constraints';
DBMS_OUTPUT.PUT_LINE(VSQL);
execute IMMEDIATE VSQL;
ELSE  
VSQL:=' drop '||OBJ_TYPE||' '||OBJ_NAME;
DBMS_OUTPUT.PUT_LINE(VSQL);
execute IMMEDIATE VSQL;
END IF;
end loop;
close c1;
end;
/




Note :

Remove "where object_type in ('TABLE','VIEW')" to drop all objects from current user.
Or you can edit to include only type of objects you want to be dropped.

You can Alter OBJECT TYPE any of below:


'TABLE','VIEW','SYNONYM','SEQUENCE','PROCEDURE','TRIGGER'
Categories: DBA Blogs

GoldenGate 12.2 Big Data Adapters: part 2 – FLUME

Pythian Group - Thu, 2016-03-17 12:36

In this blog post we continue our review of the new Oracle GoldenGate Big Data adapters. In the first part of the series I tested the basic HDFS adapter and checked how it worked with some DML and DDL. In this article I will try the Flume adapter and see how it works.

A quick reminder on what Flume is. It is not a topic about the popular Australian musician. Today we are talking about Apache Flume. In short, it is a pipeline or a kind of streaming system that allows you to move large amount of data. It has simple architecture and, in general, there are three main components:

a) Source: where data enters into Flume from an outside system.
b) Sink: responsible for passing data to the destination system whether it is the final destination, or another flow.
c) Channel: connects the Source and Sink.

I know that it is a rather simplistic description, but the main subject of this article is not what Flume can do, but how we can pass our data from Oracle to Flume using GoldenGate.
My first post discussed how you set up an Oracle source system, and how to start GoldenGate initial load and extract. I am not repeating it here. Let’s assume we have the source system. It is Oracle database, replicating DML and DDL for one particular schema, and GGTEST using Oracle GoldenGate 12.2 to trail files on our box where we already have GoldenGate for Big Data. Have a look at the first part to see how to set up the GoldenGate for Big Data (OGG BD).

So, we have our OGG BD setup and the manager up and running.

GGSCI (sandbox.localdomain) 1> info manager

Manager is running (IP port sandbox.localdomain.7839, Process ID 18521).


GGSCI (sandbox.localdomain) 2>

What we need now is to prepare our Flume agent to accept messages from OGG. I’ve already set up my Flume’s agent-ng service on my Linux box, and now we need to prepare the configuration file for the agent to handle the income stream, and pass it to the destination system. We will set up our source to “avro” and sink will be writing to HDFS. The source can be either avro or thrift. According to oracle documentation the Flume handler can stream data from a trail file to Avro or Thrift RPC Flume sources.

I have to admit that the destination as HDFS looks quite artificial since we have a special adapter for HDFS and don’t need a Flume to write there. But such a configuration can help us compare different adapters and what they can do.
I used Flume version 1.6.0:

[oracle@sandbox flume-ng]$ bin/flume-ng version
Flume 1.6.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080
Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015
From source with checksum b29e416802ce9ece3269d34233baf43f
[oracle@sandbox flume-ng]$

Here is my configuration file for the Flume agent:

# Name/aliases for the components on this agent
agent.sources = ogg1
agent.sinks = hdfs1
agent.channels = ch1
#Avro source
agent.sources.ogg1.type = avro
agent.sources.ogg1.bind = 0.0.0.0
agent.sources.ogg1.port = 4141

# Describe the sink
agent.sinks.hdfs1.type = hdfs
agent.sinks.hdfs1.hdfs.path = hdfs://sandbox/user/oracle/ggflume
#agent.sinks.hdfs1.type = logger

# Use a channel which buffers events in memory
agent.channels.ch1.type = memory
agent.channels.ch1.capacity = 100000
agent.channels.ch1.transactionCapacity = 10000

# Bind the source and sink to the channel
agent.sources.ogg1.channels = ch1
agent.sinks.hdfs1.channel = ch1

I’ve made the configuration simple and clear. You may change agent.sources.ogg1.port and agent.sinks.hdfs1.hdfs.path  depending on your system.

On the target HDFS we have to create directory as it was defined in our sink configuration.

[oracle@sandbox ~]$ hadoop fs -mkdir /user/oracle/ggflume
[oracle@sandbox ~]$ hadoop fs -ls /user/oracle/ggflume
[oracle@sandbox ~]$

We can start our Flume agent now.

[root@sandbox conf]# service flume-ng-agent start
Starting Flume NG agent daemon (flume-ng-agent):           [  OK  ]
[root@sandbox conf]# service flume-ng-agent status
Flume NG agent is running                                  [  OK  ]
[root@sandbox conf]#
[root@sandbox conf]# tail /var/log/flume-ng/flume.log
25 Feb 2016 11:56:37,113 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:120)  - Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean.
25 Feb 2016 11:56:37,121 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:96)  - Component type: CHANNEL, name: ch1 started
25 Feb 2016 11:56:37,122 INFO  [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:173)  - Starting Sink hdfs1
25 Feb 2016 11:56:37,123 INFO  [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:184)  - Starting Source ogg1
25 Feb 2016 11:56:37,139 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.source.AvroSource.start:228)  - Starting Avro source ogg1: { bindAddress: 0.0.0.0, port: 4141 }...
25 Feb 2016 11:56:37,146 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:120)  - Monitored counter group for type: SINK, name: hdfs1: Successfully registered new MBean.
25 Feb 2016 11:56:37,147 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:96)  - Component type: SINK, name: hdfs1 started
25 Feb 2016 11:56:38,114 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:120)  - Monitored counter group for type: SOURCE, name: ogg1: Successfully registered new MBean.
25 Feb 2016 11:56:38,115 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:96)  - Component type: SOURCE, name: ogg1 started
25 Feb 2016 11:56:38,116 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.source.AvroSource.start:253)  - Avro source ogg1 started.
[root@sandbox conf]#

Flume is ready, and we can now prepare our OGG configuration. We have examples for the Flume adapter configuration files in $OGG_HOME/AdapterExamples/big-data/flume/ :

[oracle@sandbox oggbd]$ ll AdapterExamples/big-data/flume/
total 12
-rw-r--r--. 1 oracle oinstall 107 Dec  9 12:56 custom-flume-rpc.properties
-r-xr-xr-x. 1 oracle oinstall 812 Dec  9 12:56 flume.props
-rw-r--r--. 1 oracle oinstall 332 Dec  9 12:56 rflume.prm
[oracle@sandbox oggbd]$

We can copy the examples to our configuration directory and adjust them to our needs:

[oracle@sandbox oggbd]$ cp AdapterExamples/big-data/flume/* dirprm/

Here is configuration file for our adapter:

[oracle@sandbox oggbd]$ cat dirprm/flume.props

gg.handlerlist = flumehandler
gg.handler.flumehandler.type=flume
gg.handler.flumehandler.RpcClientPropertiesFile=custom-flume-rpc.properties
gg.handler.flumehandler.format=avro_op
gg.handler.flumehandler.mode=tx
#gg.handler.flumehandler.maxGroupSize=100, 1Mb
#gg.handler.flumehandler.minGroupSize=50, 500 Kb
gg.handler.flumehandler.EventMapsTo=tx
gg.handler.flumehandler.PropagateSchema=true
gg.handler.flumehandler.includeTokens=false
gg.handler.flumehandler.format.WrapMessageInGenericAvroMessage=true

goldengate.userexit.timestamp=utc
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE

gg.log=log4j
gg.log.level=INFO

gg.report.time=30sec

gg.classpath=dirprm/:/usr/lib/flume-ng/lib/*:

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

You will need to adjust your gg.classpath depending on your system, as it has to include Flume java classes and a file with Flume source properties (custom-flume-rpc.properties file).
Here is my example for the custom-flume-rpc.properties file which will be used by OGG adapter to connect to the flume-ng agent. I’ve placed it to the dirprm directory along with other parameters.

[oracle@sandbox oggbd]$ cat dirprm/custom-flume-rpc.properties
client.type=default
hosts=h1
hosts.h1=localhost:4141
batch-size=100
connect-timeout=20000
request-timeout=20000</pre>
<pre>

As you can see, my flume-ng agent is on the same host as the OGG which may not be the same for you. In your case you may need to provide hostname and port for your running glume-ng agent. We need to prepare the configuration file for our initial load. The OGG trail file is located in the dirdat/ directory and has name initld.

</pre>
<pre></pre> <pre>Here is our parameter file for initial load using passive replicat: [oracle@sandbox oggbd]$ cat dirprm/irflume.prm --initial REPLICAT irflume -- Command to run REPLICAT in passive mode -- ./replicat paramfile dirprm/irflume.prm reportfile dirrpt/irflume.rpt SPECIALRUN END RUNTIME EXTFILE /u01/oggbd/dirdat/initld --DDLERROR default discard DDL include all TARGETDB LIBFILE libggjava.so SET property=dirprm/flume.props REPORTCOUNT EVERY 1 MINUTES, RATE GROUPTRANSOPS 10000 MAP ggtest.*, TARGET bdtest.*; 

Let’s run the load and see what we get in the end:

[oracle@sandbox oggbd]$ ./replicat paramfile dirprm/irflume.prm reportfile dirrpt/irflume.rpt

The command completed successfully and we got three new files on HDFS. The first 2 files had the schema description and the 3-d one had the data for the replicated tables.

[root@sandbox ~]# hadoop fs -ls /user/oracle/ggflume
Found 12 items
-rw-r--r--   1 flume oracle       1833 2016-03-10 11:17 /user/oracle/ggflume/FlumeData.1457626634620
-rw-r--r--   1 flume oracle       1762 2016-03-10 11:17 /user/oracle/ggflume/FlumeData.1457626634621
-rw-r--r--   1 flume oracle       1106 2016-03-10 11:17 /user/oracle/ggflume/FlumeData.1457626634622
[root@sandbox ~]# hadoop fs -tail /user/oracle/ggflume/FlumeData.1457626634620
 {
        "name" : "PK_ID",
        "type" : [ "null", "string" ],
        "default" : null
      }, {
        "name" : "PK_ID_isMissing",
        "type" : "boolean"
      }, {
        "name" : "RND_STR",
        "type" : [ "null", "string" ],
        "default" : null
      }, {
        "name" : "RND_STR_isMissing",
        "type" : "boolean"
     ..................

[root@sandbox ~]# hadoop fs -tail /user/oracle/ggflume/FlumeData.1457626634621
 "string"
  }, {
    "name" : "primary_keys",
    "type" : {
      "type" : "array",
      "items" : "string"
    }
  }, {
    "name" : "tokens",
    "type" : {
      "type" : "map",
      "values" : "string"
    },
...........................
[root@sandbox ~]# hadoop fs -tail /user/oracle/ggflume/FlumeData.1457626634622
:?v??8?????	SaQm?"BDTEST.TEST_TAB_1Ñ??
                                          ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-03-10T11:17:14.448000(00000000-10000002012
PK_ID1371O62FX&2014-01-24:19:09:20RJ68QYM5&2014-01-22:12:14:30"BDTEST.TEST_TAB_1Ñ??
                                                                                   ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-03-10T11:17:14.459000(00000000-10000002155
PK_ID2371O62FX&2014-01-24:19:09:20HW82LI73&2014-05-11:05:23:23"BDTEST.TEST_TAB_1Ñ??
                                                                                   ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-03-10T11:17:14.459001(00000000-10000002298
PK_ID3RXZT5VUN&2013-09-04:23:32:56RJ68QYM5&2014-01-22:12:14:30"BDTEST.TEST_TAB_1Ñ??
                                                                                   ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-03-10T11:17:14.460000(00000000-10000002441
PK_ID4RXZT5VUN&2013-09-04:23:32:56HW82LI73&2014-05-11:05:23:23"BDTEST.TEST_TAB_2?????"BDTEST.TEST_TAB_2I42016-02-16 19:17:40.76289942016-03-10T11:17:14.466000(00000000-10000002926
PK_IDRND_STR_1ACC_DATE7IJWQRO7T&2013-07-07:08:13:52

The initial load has succeeded, and now we can create and start the proper ongoing replication to HDFS through Flume.
Let’s prepare a new parameter file for our permanent Flume replicat and starting it up.

GGSCI (sandbox.localdomain) 2> edit param rflume
REPLICAT rflume
-- Trail file for this example is located in "dirdat/" directory
-- Command to add REPLICAT
-- add replicat rflume, exttrail dirdat/or
TARGETDB LIBFILE libggjava.so SET property=dirprm/flume.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
HANDLECOLLISIONS
MAP ggtest.*, TARGET bdtest.*;


GGSCI (sandbox.localdomain) 1> add replicat rflume, exttrail dirdat/or, begin now
REPLICAT added.


GGSCI (sandbox.localdomain) 2> start replicat rflume

Sending START request to MANAGER ...
REPLICAT RFLUME starting

Let’s insert a row and see what we get on the target system.

orclbd> insert into ggtest.test_tab_1
  2  values (7,dbms_random.string('x', 8), sysdate-(7+dbms_random.value(0,1000)),
  3  dbms_random.string('x', 8), sysdate-(6+dbms_random.value(0,1000))) ;

1 row inserted.

orclbd> commit;

Commit complete.

orclbd>

As soon as commit had been executed we received a couple of new files on HDFS where the first had the schema for the changed table, and the second had the data for the transaction or  “payload”.

[root@sandbox ~]# hadoop fs -ls /user/oracle/ggflume
.................
-rw-r--r--   1 flume oracle       1833 2016-03-10 11:17 /user/oracle/ggflume/FlumeData.1457626634620
-rw-r--r--   1 flume oracle       1762 2016-03-10 11:17 /user/oracle/ggflume/FlumeData.1457626634621
-rw-r--r--   1 flume oracle       1106 2016-03-10 11:17 /user/oracle/ggflume/FlumeData.1457626634622
-rw-r--r--   1 flume oracle       1833 2016-03-10 12:43 /user/oracle/ggflume/FlumeData.1457631817021
-rw-r--r--   1 flume oracle        605 2016-03-10 12:43 /user/oracle/ggflume/FlumeData.1457631817022
[root@sandbox ~]#

[root@sandbox ~]# hadoop fs -cat /user/oracle/ggflume/FlumeData.1457631817021
SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable???:]B?9?k??	]kTSa?m??{
  "type" : "record",
  "name" : "TEST_TAB_1",
  "namespace" : "BDTEST",
  "fields" : [ {
    "name" : "table",
    "type" : "string"
  }, {
    "name" : "op_type",
    "type" : "string"
  }, {
..............................
[root@sandbox ~]# hadoop fs -cat /user/oracle/ggflume/FlumeData.1457631817022
{EQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable~,?`?aHTZRB?Sa?ny
  "type" : "record",
  "name" : "generic_wrapper",
  "namespace" : "oracle.goldengate",
  "fields" : [ {
    "name" : "table_name",
    "type" : "string"
  }, {
    "name" : "schema_hash",
    "type" : "int"
  }, {
    "name" : "payload",
    "type" : "bytes"
  } ]
}Sa?nz?"BDTEST.TEST_TAB_1Ñ??
                            ?"BDTEST.TEST_TAB_1I42016-03-10 17:43:31.00169042016-03-10T12:43:33.464000(00000000080001408270
PK_ID7XYJN3Z31&2014-04-21:09:01:21FL6Z8RPN&2013-08-06:21:40:02

I prepared and executed a small regression testing of inserts and updates to the table using jmeter, and started to push inserts and updates with a rate about 29 transaction per second.Even with one flume channel and my small Hadoop environment, it had a pretty good response time without trashing any errors. Flume put about 900 transactions per a HDFS file.

-rw-r--r--   1 flume oracle     123919 2016-03-10 14:52 /user/oracle/ggflume/FlumeData.1457639485465
-rw-r--r--   1 flume oracle      35068 2016-03-10 14:52 /user/oracle/ggflume/FlumeData.1457639485466
-rw-r--r--   1 flume oracle     145639 2016-03-10 14:52 /user/oracle/ggflume/FlumeData.1457639485467
-rw-r--r--   1 flume oracle     178943 2016-03-10 14:52 /user/oracle/ggflume/FlumeData.1457639485468
-rw-r--r--   1 flume oracle     103285 2016-03-10 14:52 /user/oracle/ggflume/FlumeData.1457639485469
[oracle@sandbox Downloads]$ hadoop fs -cat /user/oracle/ggflume/FlumeData.1457639485467 | wc -l
804
[oracle@sandbox Downloads]$ hadoop fs -cat /user/oracle/ggflume/FlumeData.1457639485468 | wc -l
988
[oracle@sandbox Downloads]$ hadoop fs -cat /user/oracle/ggflume/FlumeData.1457639485469 | wc -l
570
[oracle@sandbox Downloads]$

I’ve also tried the “thrift” datasource for Flume and it worked well too. To switch from “avro” to “thrift” I changed the value in the parameter agent.sources.ogg1.type in the flume.conf and restarted the flume agent. You also have to change client.type from default to thrift in your custom-flume-rpc.properties file. It worked fine, and I was able to get the information from the trail and write to the hdfs.

[oracle@sandbox oggbd]$ ./replicat paramfile dirprm/irflume.prm reportfile dirrpt/irflume.rpt
[oracle@sandbox oggbd]$ hadoop fs -ls /user/oracle/ggflume
Found 3 items
-rw-r--r--   1 flume oracle       1833 2016-02-25 16:05 /user/oracle/ggflume/FlumeData.1456434311892
-rw-r--r--   1 flume oracle       1762 2016-02-25 16:05 /user/oracle/ggflume/FlumeData.1456434311893
-rw-r--r--   1 flume oracle       1106 2016-02-25 16:05 /user/oracle/ggflume/FlumeData.1456434311894
[oracle@sandbox oggbd]$
[oracle@sandbox oggbd]$ hadoop fs -cat  /user/oracle/ggflume/FlumeData.1456434311892
SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritableOG?????$?{qS@]?{
  "type" : "record",
  "name" : "TEST_TAB_1",
  "namespace" : "BDTEST",
  "fields" : [ {
    "name" : "table",
    "type" : "string"
  }, {
.....


[oracle@sandbox oggbd]$ hadoop fs -cat  /user/oracle/ggflume/FlumeData.1456434311894
SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable?????t?E?9j??S@??"BDTEST.TEST_TAB_1Ñ??
                                                                                                              ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-02-25T16:05:11.480000(00000000-10000002012
PK_ID1371O62FX&amp;2014-01-24:19:09:20RJ68QYM5&amp;2014-01-22:12:14:30"BDTEST.TEST_TAB_1Ñ??
                                                                                   ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-02-25T16:05:11.498000(00000000-10000002155
PK_ID2371O62FX&amp;2014-01-24:19:09:20HW82LI73&amp;2014-05-11:05:23:23"BDTEST.TEST_TAB_1Ñ??
                                                                                   ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-02-25T16:05:11.498001(00000000-10000002298
PK_ID3RXZT5VUN&amp;2013-09-04:23:32:56RJ68QYM5&amp;2014-01-22:12:14:30"BDTEST.TEST_TAB_1Ñ??
                                                                                   ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-02-25T16:05:11.499000(00000000-10000002441
PK_ID4RXZT5VUN&amp;2013-09-04:23:32:56HW82LI73&amp;2014-05-11:05:23:23"BDTEST.TEST_TAB_2?????"BDTEST.TEST_TAB_2I42016-02-16 19:17:40.76289942016-02-25T16:05:11.505000(00000000-10000002926
PK_IDRND_STR_1ACC_DATE7IJWQRO7T&amp;2013-07-07:08:13:52[oracle@sandbox oggbd]$

You can see from the output that in the FlumeData.1456434311894 file we are getting the schema description and in the FlumeData.1456434311894 we have the data from the tables TEST_TAB_1 and TEST_TAB_2.

Let’s try some simple DDL commands.
If we truncate a table:

orclbd> truncate table ggtest.test_tab_1;

Table GGTEST.TEST_TAB_1 truncated.

orclbd>

It is not going to be replicated. If we are altering the table, we are not seeing it as a separate command, but it is going to be reflected in the new schema definition for any new transaction replicated to HDFS. You will get a file with new schema definition and the transaction itself in a next file.

orclbd> alter table ggtest.test_tab_1 add (new1 varchar2(10));

Table GGTEST.TEST_TAB_1 altered.

orcl> insert into ggtest.test_tab_1
  2  values (7,dbms_random.string('x', 8), sysdate-(7+dbms_random.value(0,1000)),
  3  dbms_random.string('x', 8), sysdate-(6+dbms_random.value(0,1000)),'new_col' );

1 row created.

orcl> commit;

Commit complete.

orcl>
[oracle@sandbox oggbd]$ hadoop fs -cat /user/oracle/ggflume/FlumeData.1457117136700
SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable????)0???4(SB?Yc{
  "type" : "record",
  "name" : "TEST_TAB_1",
  "namespace" : "BDTEST",
  "fields" : [ {
    "name" : "table",
    "type" : "string"
........
........
        "name" : "NEW1",
        "type" : [ "null", "string" ],
        "default" : null
      }, {
        "name" : "NEW1_isMissing",
        "type" : "boolean"
........
[oracle@sandbox oggbd]$ hadoop fs -cat /user/oracle/ggflume/FlumeData.1457117136701
........
}SB???"BDTEST.TEST_TAB_1?????"BDTEST.TEST_TAB_1I42016-03-04 18:45:30.00131442016-03-04T13:45:34.156000(00000000000000014363
PK_ID7U09D0CTU&2013-08-15:12:53:50W0BSUWLL&2013-08-16:09:28:12new_col

As I’ve mentioned in my previous post, the deeper investigation of supported DDL changes is going to be the subject of a dedicated blog post. Here we can conclude that the adapter worked as expected, and supported the flow of transactions from our Oracle database down to Flume using Avro and Thrift sources. Of course it is not production implementation, and serves only as a basic functional and elementary regression testing. For a serious production workflow we need to develop appropriate architecture.

In my next few posts I plan to check Kafka and HBASE adapters and see how they work. Stay tuned!

Categories: DBA Blogs

Connection timeout parameters in MySQL

Pythian Group - Thu, 2016-03-17 08:09
Introduction
  • wait_timeout
  • interactive_timeout
  • net_read_timeout
  • net_write_timeout

What do these timeouts do in MySQL? If you search the web for one or more of these, you may find complaints that no comprehensive explanation exists for all of these timeouts in one place (besides the obvious documentation of dynamic server system variables in MySQL). This blog post seeks to provide a central documentation source for timeouts and provide some practical explanation.

Knowing what timeouts do helps in a troubleshooting effort. It’s good to understand when an issue is timeout related and when it’s not, and to know the right reasons for changing timeout variables, or the right time to ask the developer or ad-hoc user to please tune the variables in the session, instead of asking the DBA to change the global variables.

Before diving into the meat of this topic, here are a few introductory concepts that I will touch on.

I mentioned session variables. The four timeout variables we are discussing: interactive_timeout, wait_timeout, net_read_timeout, net_write_timeout can all be set within the context of the session using the SET command. e.g. “set session wait_timeout=3600;“.  A general rule of thumb that I always try to follow: Once I have what I think are correct timeouts to protect my mission critical application, I do not want to change them unless there is a really good reason. If I’m encountering a client disconnection issue and it’s not coming from my main application, I need to see if the problem can be alleviated by adjusting timeouts on the session level using SET commands.

Second, it’s important to note that timeouts are there to protect your server and critical application. You don’t want too many clients connected to your server doing nothing (they can crowd out connections that need to do something). You don’t want clients that are in a failed state due to a network connection or other interruptive problem to continue to consume resources such as locks on your database. Timeouts should be set very thoughtfully, based on the unique environment. If you’re unsure, leave at defaults and adjust when needed.

Finally, I created a perl script to illustrate wait_timeout situations. I made a great effort to also incorporate net_read_timeout and net_write_timeout into the script and you’ll see empty functions there as place-holders. There are many layers to the “MySQL client/server onion” involving buffers on the system, client, and server layers. In lieu of providing a direct scripting examples of net_read_timeout and net_write_timeout, I hope you’ll find the other methods of explanation included below, useful.

interactive_timeout

According to the MySQL manual, Interactive_timeout is only used for connections from interactive clients like the command line MySQL clients.

The technical explanation is: “clients utilizing the “CLIENT_INTERACTIVE option to mysql_real_connect().” Check the specific documentation to your favorite client to see whether it fits this description.

If your client fits this behavior, wait_timeout is set to the value in interactive_timeout. The only benefit to this parameter is flexibility. If your command line or other interactive clients have a vastly different requirement than your application, feel free to set this differently than wait_timeout.

wait_timeout

Wait timeout is simply there to protect you in the common case of clients sitting there doing nothing but absorbing a connection. You want to set this to the lowest acceptable number of seconds in order to protect your server against an application malfunction or some other event that may cause too many connections to the database to be opened, crowding out other clients. If a client is doing nothing for wait_timeout seconds, the MySQL server will terminate the connection. The proper setting for this variable depends on the particular environment.

An environment I worked in for 3+ years set this to 120 seconds, because jboss connection pooling (and associated parameters) took care of making sure the mission critical application in the environment had available connections.

A common misconception and common misuse is to try to adjust wait_timeout for a situation in which a query has been interrupted. Wait timeout only applies to idle sessions. If the connection is in any other state besides idle, wait_timeout does not apply. This is why utilities like pt-kill exist. It does not help that an idle connection being interrupted by KILL produces the same error message.

Example using my perl script and the MySQL processlist:


jscott@js-trusty1:~$ ./test_timeouts.pl -K
Doing Killable Wait Timeout Test
ATTN: You have 20 seconds to kill the MySQL thread!!

in another shell

jscott@js-trusty1:~$ mysql -e "show processlist"
+----+------+-------------------+--------------+---------+-------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-------------------+--------------+---------+-------+-------+------------------+
| 36 | js | 172.19.73.2:58892 | NULL | Sleep | 28514 | | NULL |
| 79 | root | localhost:43350 | timeout_test | Sleep | 4 | | NULL |
| 80 | root | localhost | NULL | Query | 0 | NULL | show processlist |
+----+------+-------------------+--------------+---------+-------+-------+------------------+
jscott@js-trusty1:~$ mysql -e "kill 79";

back to the perl script output.

DBD::mysql::st execute failed: MySQL server has gone away at ./test_timeouts.pl line 134.
-------------
Err: 2006
ErrStr: MySQL server has gone away
State: HY000

 

Note that we got MySQL error code 2006 “MySQL server has gone away”.

Now we’ll allow wait_timeout to trigger instead of killing a process.


jscott@js-trusty1:~$ ./test_timeouts.pl -W
Doing Wait Timeout Test
DBD::mysql::st execute failed: MySQL server has gone away at ./test_timeouts.pl line 151.
-------------
Err: 2006
ErrStr: MySQL server has gone away
State: HY000

The fact that a killed idle process and a process receiving wait_timeout get the same error message and code (2006 “MySQL server has gone away”) causes much confusion. Rest assured, wait_timeout only kills idle connections, even if the error messages are the same.  If there is a running query, you will typically get error code 2013: “Lost connection to MySQL server during query”.

The most common reason a query is killed, in my experience, is a backup. Many backup utilities for MySQL have the default behavior of killing long-running queries.

net_read_timeout

net_read_timeout is the number of seconds mysql will wait between bytes as you are sending it data. As mentioned above, I tried very hard to synthetically cause a net_read_timeout with a perl script and was unsuccessful.

The best way to explain net_read_timeout is to use the example of an ETL (Extract, Transform, Load) job. At the beginning of an ETL job, you select data from one data source, then transform and load it into another data source, such as a MySQL data warehouse. Tools like “Pentaho Data Integration” (cited below) provide a GUI view of ETL.

Pentaho ETL Process

Pentaho ETL Process

If your ETL job is complex, and you use something similar to Pentaho Kettle, you have very little visibility into what is happening behind the scenes.

If the steps in between the initial query in an ETL job and an output step consume more than net_read_timeout seconds, then an error may occur due to incomplete statements being sent to the “output” steps.  In mid-transaction, if the MySQL server fails to receive data within the timeout interval, the connection will be terminated.

Another way net_read_timeout can occur: Normally all database connections used in a Pentaho ETL transformation are opened at the beginning of the transformation. If you have “use result streaming cursor” turned off on the database connection used for “table input”, the result set may take longer than “net_read_timeout” seconds to receive, causing an error on the database connection being used for the output steps.

Have you ever executed a simple query (with a large result set) on the MySQL command line and wondered why there is a pause of several seconds or even minutes prior to beginning to receive the results? Your client actually has begun receiving the results, but they simply have not been presented to you. The reason for this is, it’s generally better for a MySQL server to be “finished” with a query as soon as possible; therefore, the default behavior of the MySQL client libraries on a read-only query is to receive all the results first, before passing the results onto the program. The default behavior is called “mysql_store_result“.  The MySQL reference manual does a good job of explaining this in detail. As the manual states, you should not change the default behavior unless you are doing a minimal amount of processing on each result row.

Homemade ETL programs in other languages can have the same issue. Take care to use auto_reconnect features and/or streaming cursors / “mysql_use_result”, when appropriate.

ETLs and scripts are good examples of clients which can be modified with SET commands.  “SET SESSION net_read_timeout=<longer value>;” should be considered for jobs outside of your main application.

Lastly, net_read_timeout is also often caused by bad network connections. This is the default explanation offered in many blog posts on this topic.

The Error returned to the client will be (2013) “Lost connection to MySQL server during query”.

net_write_timeout

net_write_timeout is the number of seconds MySQL will wait for the client to receive additional data before terminating a connection. Using the example of the ETL job above, if steps in the middle of the transformation cause the “input” steps to pause for longer than net_write_timeout seconds, it’s possible to receive an error. I’ve never seen net_write_timeout hit because of an ETL job. The MySQL client has default behavior of buffering result sets.  Also, I try to use the “use result streaming cursor” in Pentaho or the “mysql_use_result” option in the Perl DBI, sparingly.

The most typical case of receiving net_write_timeout is a backup such as a mysqldump. During a long data transfer such as a mysqldump, a myriad of factors could come into play, not the least of which is network connectivity. Remember, if you run into problems with net_write_timeout on a mysqldump backup, mysqldump has its own section of the my.cnf. You can set an option file on a client or a server to specifically set this and other variables for a mysqldump session.

Categories: DBA Blogs

Links for 2016-03-16 [del.icio.us]

Categories: DBA Blogs

Restricting database access to IP addeess using sqlnet.ora

Learn DB Concepts with me... - Wed, 2016-03-16 16:15
**************************************************************
                      FROM MY CLIENT MACHINE:
**************************************************************

C:\Windows\System32>tnsping pdborcl1

TNS Ping Utility for 32-bit Windows: Version 11.2.0.1.0 - Production on 07-MAR-2016 15:32:20

Copyright (c) 1997, 2010, Oracle.  All rights reserved.

Used parameter files:
C:\app\oracle\product\11.2.0\client_1\network\admin\sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.215.XX4.XX)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = pdbo
cl1)))
OK (0 msec)

**************************************************************
NOW I want my DB to be only be accessible from this machine and restrict all other. For this I will need to make changed to my DB server sqlnet.ora :
**************************************************************

ON DB SERVER:

oracle@Linux03 admin]$ cat sqlnet.ora
# sqlnet.ora Network Configuration File: /u01/app/oracle/product/12.1.0.2/db_1/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.

NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)

tcp.validnode_checking = yes  ## I have just added this line to test if it works

[oracle@Linux03 admin]$ vi sqlnet.ora
[oracle@Linux03 admin]$ lsnrctl reload

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 07-MAR-2016 15:20:03

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=Linux03)(PORT=1521)))
TNS-00584: Valid node checking configuration error
 TNS-12560: TNS:protocol adapter error

As expected it doesn't work as listener needs to know where to accept or deny connections from.
Now I tweak the sqlnet.ora file to below

[oracle@Linux03 admin]$ cat sqlnet.ora
# sqlnet.ora Network Configuration File: /u01/app/oracle/product/12.1.0.2/db_1/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.

NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)


tcp.validnode_checking = yes
tcp.invited_nodes = (Linux03)  ## I only want to accept connections from my localhost i.e Linux03

RESTART THE LISTENER NOW.

[oracle@Linux03 admin]$ lsnrctl reload

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 07-MAR-2016 15:21:06

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=Linux03)(PORT=1521)))
The command completed successfully

 (OR)

[oracle@Linux03 admin]$ lsnrctl stop

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 07-MAR-2016 15:28:36

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=Linux03)(PORT=1521)))
The command completed successfully
[oracle@Linux03 admin]$ lsnrctl start

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 07-MAR-2016 15:28:41

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Starting /u01/app/oracle/product/12.1.0.2/db_1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 12.1.0.2.0 - Production
System parameter file is /u01/app/oracle/product/12.1.0.2/db_1/network/admin/listener.ora
Log messages written to /u01/app/oracle/diag/tnslsnr/Linux03/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=Linux03)(PORT=1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=Linux03)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                07-MAR-2016 15:28:41
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/12.1.0.2/db_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/Linux03/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=Linux03)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
The listener supports no services
The command completed successfully

****************************************************
NOW I TRY TO CONNECT FROM CLIENT AGAIN :
****************************************************

C:\Windows\System32>tnsping pdborcl1

TNS Ping Utility for 32-bit Windows: Version 11.2.0.1.0 - Production on 07-MAR-2016 15:32:57

Copyright (c) 1997, 2010, Oracle.  All rights reserved.

Used parameter files:
C:\app\oracle\product\11.2.0\client_1\network\admin\sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.215.XX4.XX)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = pdbo
cl1)))
TNS-12537: TNS:connection closed

C:\Windows\System32>


Nope it doesn't allow me to connect.
*************************************************************
Lets try to remove the tcp.validnode_checking & tcp.invited_nodes and see
**************************************************************

[oracle@Linux03 admin]$ cat sqlnet.ora
# sqlnet.ora Network Configuration File: /u01/app/oracle/product/12.1.0.2/db_1/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.

NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)


[oracle@Linux03 admin]$ lsnrctl reload

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 07-MAR-2016 15:31:58

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=Linux03)(PORT=1521)))
The command completed successfully
[oracle@Linux03 admin]$ vi sqlnet.ora
[oracle@Linux03 admin]$ lsnrctl reload

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 07-MAR-2016 15:32:16

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=Linux03)(PORT=1521)))
The command completed successfully

C:\Windows\System32>tnsping pdborcl1

TNS Ping Utility for 32-bit Windows: Version 11.2.0.1.0 - Production on 07-MAR-2016 15:32:36

Copyright (c) 1997, 2010, Oracle.  All rights reserved.

Used parameter files:
C:\app\oracle\product\11.2.0\client_1\network\admin\sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.215.XX4.XX)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = pdbo
cl1)))
OK (0 msec)

**************************************************************************
SUCCESS YOU HAVE SUCCESFULLY IMPLEMENTED ANOTHER LAYER OF SECURITY ON DB SERVER
*************************************************************************


tcp.validnode_checking = yes
tcp.invited_nodes = (Linux03,HOSTNAME1,HOSTNAME2 ...)

YOU CAN ALSO EXPLICITLY EXCLUDE NODE LIKE THIS :

tcp.excluded_nodes = (192.168.100.101,HOSTNAME2,)

But : If you only specify invited nodes with tcp.invited_nodes, all other nodes will be excluded, so there is really no reason to do both. The same is true for excluded nodes. If you put tcp.excluded_nodes = (192.168.100.101,192.168.100.160) then IP containing 192.168.100.101 and 192.168.100.160 will be excluded/denied to connect to database as a database user while allowing others to connect.





Categories: DBA Blogs

Find missing or stale statistics

DBA Scripts and Articles - Wed, 2016-03-16 15:20

Purpose Statistics are primordial for the optimizer to choose the best execution plan possible. Sometimes you encounter a sub-optimal plan and need to find out if the tables involved in the statement are up to date. This query will help you find out which tables have been modified significantly since the last statistics gathering. For … Continue reading Find missing or stale statistics

The post Find missing or stale statistics appeared first on Oracle DBA Scripts and Articles (Montreal).

Categories: DBA Blogs

Find missing or stale statistics

DBA Scripts and Articles - Wed, 2016-03-16 15:20

Purpose Statistics are primordial for the optimizer to choose the best execution plan possible. Sometimes you encounter a sub-optimal plan and need to find out if the tables involved in the statement are up to date. This query will help you find out which tables have been modified significantly since the last statistics gathering. For … Continue reading Find missing or stale statistics

The post Find missing or stale statistics appeared first on Oracle DBA Scripts and Articles (Montreal).

Categories: DBA Blogs

Log Buffer #465: A Carnival of the Vanities for DBAs

Pythian Group - Wed, 2016-03-16 12:17

This Log Buffer Edition covers some tricks, tips, workarounds, and tech-dives covered in various blog posts from Oracle, SQL Server and MySQL.

Oracle:

SQLcl can run Ruby script!!.. JRuby to be precise.

Financial Information Discovery Integration with Oracle Assets

Data Vault Modeling and Snowflake Elastic Data Warehouse

Couchbase Bucket Index Status in NetBeans IDE 8.1

Make SQL Developer Run Faster

SQL Server:

Find a String in a String

Microsoft cloud platform allows you to provide resiliency for your on-premises SQL Server deployments. Marcin Policht describes this functionality, available as part of the Azure Site Recovery solution.

Trust, or Verify – How FKs Can (or Cannot) Help Performance

10 New Features Worth Exploring in SQL Server 2016

Validating a Windows Cluster Prior to Installing SQL Server 2014

MySQL:

When you think of Cloud services for database administration, Database as a Service (DBaaS) is what tends to comes to mind, but one of the ways that the Cloud has had an enormous impact on our work productivity is in the area of collaboration.

Introduction to MySQL Troubleshooting Webinar: Q & A for How to Create a Test Setup

MariaDB is a community-developed fork of MySQL and aims to be an enhanced, drop-in replacement for it.

The MySQL binary package have been growing a lot the last GA versions.

Indexing JSON documents via Virtual Columns

Categories: DBA Blogs

Breaking Down Barriers with DevOps

Pythian Group - Wed, 2016-03-16 09:53

In a recently recorded webinar,  How To Deliver Faster, Better Software Releases with DevOps, Pythian’s Chief Digital Officer, Aaron Lee and guest speaker Amy DeMartine, Senior Research Analyst at Forrester Research, discussed DevOps and the important role it plays in the race to deliver the best customer experience with faster and better software releases.

One of the webinar participants asked about the critical success factors when implementing a rapid release software model. 

The case for adopting agile operations is very compelling. Rapid release cycles can reduce costs, provide better and more stable solutions, increase customer and employee satisfaction, and most importantly, outpace competition and drive revenue.

Driving both efficiency and effectiveness of operations is easier said than done. DeMartine highlighted 7 habits of effective DevOps, but stressed that culture is the number one enemy.

She said, “this fear of change is a huge cultural gap. DevOps is at the point of cultural change, and this is one of the biggest ones that IT has to break”.

Historically, IT teams have held large and lengthy change boards, requiring a strong reliance on rigid approval processes. Combine that history with the complexity and degree of unknowns that operations faces, and it isn’t surprising that some teams are risk averse and consensus driven.

While new technologies, processes, and automation are critical to implementing rapid release models and enhancing DevOps, both DeMartine and Lee agreed that it boils down to the people, and the way Dev and Ops interact with each other.

“It’s clear what the responsibilities of a product manager are. It’s clear where the developers responsibilities start and stop,” said Lee. Often this leads to operations “picking up everything else” and facing an increased amount of unknowns.

Part of the solution is understanding how to embrace and maximize the opportunity for change, as opposed to minimizing the occurrence of change.

Most importantly, Dev and Ops need to know and understand each other. By setting and balancing clear boundaries and expectations, each team can understand the role the other is playing, and begin to see the environment under a common goal.

For more on this topic, download the on-demand webinar below:

devopswebinar_CTA (1)

Categories: DBA Blogs

Converting Hortonworks Sandbox to run on Hyper-V

Pythian Group - Tue, 2016-03-15 10:58

It looks like Hortonworks recently decided to stop hosting a version of their Sandbox VM for Windows Hyper-V. I only see VirtualBox and VMware versions listed.

What if, like me, your primary learning lab machine runs Hyper-V?

Well, you can convert it fairly easily. My method is to use VirtualBox to do this.

I run VirtualBox on my Mac because it’s free, it has free conversion tools and I usually only run 1-2 VMs on it, but my Mac isn’t my learning lab. This tip WILL work on a Windows machine that has VirtualBox installed.

Note that VirtualBox and Hyper-V may not get along well if installed on the same device, hence my using two machines to do this.

In order to convert it, here’s what you need to do.

  • Download the VirtualBox Sandbox VM here.
  • Follow Hortonworks’ instructions to import the appliance into VirtualBox.
  • Find the disk that it created by looking at the properties of the VM you just created.
  • Open a terminal and navigate to that directory.
  • From that directory, run this:

VBoxManage clonehd Hortonworks_sanbox_with_hdp_2_4_virtualbox-disk1.vmdk HDP2.4.vhd --format vhd

This process runs for a bit and creates a copy in VHD format, which you can copy onto, and run from; any Hyper-V machine.

Simply create a new Hyper-V machine, as you normally would, but instead of creating a new disk, choose this one and fire it up.

On the subject of VM Config, you should give it access to your internal network so that you can access it via browser, a couple processors and on memory, a word of caution: when I did this with dynamic memory enabled, the VM took all of my available system memory, so you may want to limit consumption to a number that reserves some computing power for the host and any other VMs you may want to run in parallel.

After mounting and starting my new Hyper-V, VM I found that I hadn’t allocated enough RAM or processor and it was “dying” on boot so I upped the RAM to 6 GB and 4 processors from 2 GB and 1 respectively.

Next up, eth0 wasn’t found on boot so I checked what Google had to say and found this article.

I edited the first file, and upon checking the second (/etc/sysconfig/network-scripts/ifcfg-eth0) I found that the MAC address was not recorded so didn’t have anything to do.

I saved, rebooted, and watched and eth0 was found at this time – of course the VirtualBox add-in failed at boot, but that isn’t a big deal.

When the VM came up, it instructed me to connect to 127.0.0.1:8888 which didn’t work. I looked up the IP assigned by my router put that IP (without a port) into my browser and was able to connect without issue.

Happy learning!

Categories: DBA Blogs

There’s more to IT than just coding

Pythian Group - Tue, 2016-03-15 10:00

 

March 2, 2016 was officially the midpoint of the Technovation Challenge in Ottawa. The 2016 competition started on Sunday January 17, at Carleton University, where Anar Simpson, Global Ambassador for Technovation, kicked off the program.

Technovation is a global technology entrepreneurship competition for young women that sets out to prove that there’s more than just coding in the technology sector. The program is designed to inspire and educate young women to pursue a career in technology by showing them all aspects of starting a technology business.

Regional Technovation Chapters contact local high schools to introduce the program and recruit teams of high school girls. Thanks to the efforts of Jennifer Francis, chair of Women Powering Technology, an Ottawa Chapter of Technovation that started up in January 2015. The pilot was such a success, participation in 2016 has doubled with over 100 high school and middle school girls participating and 30 female mentors from the Ottawa tech sector.

In addition to IBM, Shopify, and L-Spark, Pythian is a proud sponsor of the 2016 competition. Having just announced the Pythia program , it was a natural fit for Pythian to sponsor Technovation. The Pythia program focuses on increasing the percentage of talented women who work at Pythian, especially in tech roles. It also encourages and supports the participation of girls and women in STEM fields, which is exactly what Technovation is all about.

The support of the sponsors allows the teams to meet weekly at the  sponsor’s facilities. Here the teams, along with their mentors, guest speakers and instructor from Carleton University’s Technology Innovation Management (TIM) program, focus on identifying a problem facing their community, creating an App to solve the problem, code the App, build a company, and pitch their business plan to experts in the field! It’s pretty impressive considering the high school girls squeeze this competition in on top of their day-to-day school classes and after-school activities. They are all committed and dedicated – a great sign of future leaders!

“My views of working in the technology sector have changed, since it feels like something anyone can be a part of, whereas it was a distant idea before,” said 17-year-old Doris Feng, a student at Merivale High School and member of the team Women With Ambition. “I came in with the notion that we would be coding during the first week, but it turns out much of the development takes place off screen, with many hours dedicated to brainstorming, surveying users, drawing a paper prototype, and mulling over the ideas with team members.”

I couldn’t have said it better Doris! This is exactly what happens in the real world.

Technovation is a program designed to inspire women to pursue the entrepreneurial spirit in all of us. For more information on Technovation and starting your own local chapter, visit Technovation online. Globally, Technovation is sponsored by Adobe Foundation, Google, Verizon, CA Technologies, Intel and Oracle, in partnership with UN Women, UNESCO and MIT Media Lab.

Categories: DBA Blogs

How to run OpenTSDB with Google Bigtable

Pythian Group - Mon, 2016-03-14 12:49

In a previous post (OpenTSDB and Google Cloud Bigtable) we discussed OpenTSDB, an open source distributed database specifically designed for storing timeseries data. We also explained how OpenTSDB relies on Apache HBase for a reliable and scalable data backend. However, deployment and administration of an HBase cluster is not a trivial task, as it requires a full Hadoop setup. This means that it takes a big data engineer (or better a team of them) to plan for the cluster sizing, provision the machines and setup the Hadoop nodes, configure all services and tune them for optimal performance. If this is not enough, Operations teams have to constantly monitor the cluster, deal with hardware and service failures, perform upgrades, backup regularly, and a ton of other tasks that make maintenance of a Hadoop cluster and OpenTSDB a challenge for most organizations.

With the release of Google Bigtable as a cloud service and its support for the HBase API, it was obvious that if we managed to integrate OpenTSDB with Google Bigtable, we would enable more teams to have access to the powerful functionality of OpenTSDB by removing the burden from maintaining an HBase cluster.

Nevertheless, integration of OpenTSDB with Bigtable was not as seamless as dropping a few jars in its release directory. This happened because the OpenTSDB developers went over and above the standard HBase libraries, by implementing their very own asynchbase library. Asynchbase is a fully asynchronous, non-blocking, thread-safe, high-performance HBase API. And no one can put it better than the asynchbase developers themselves who claim that ‘This HBase client differs significantly from HBase’s client. Switching to it is not easy as it requires one to rewrite all the code that was interacting with any HBase API.’

This meant that integration with Google Bigtable required OpenTSDB to switch back to the standard HBase API. We saw the value of such an effort here at Pythian and set about developing this solution.

The asyncbigtable library

Today, we are very happy to announce the release of the asyncbigtable library. The asyncbigtable library is a 100% compatible implementation of the great asynchbase library that can be used as a drop in replacement and enable OpenTSDB to use Google Bigtable as a storage backend.

Thanks to support from the OpenTSDB team, the asyncbigtable code is hosted in the OpenTSDB GitHub repository.

Challenges

To create asyncbigtable we had to overcome two great challenges. The first one was that OpenTSDB assumes that the underlying library (until now asynchbase) performs asynchronous and non-blocking operations. On the other hand, the standard HBase API only supports synchronous and blocking calls. As a workaround for this, we used the BufferedMutator  implementation that collects all Mutation operations in a buffer and performs them in batches, allowing for mutations with an extremely low latency.

The second challenge stemmed from the fact that the OpenTSDB project has a very limited set of jar dependencies, that are explicitly defined in Makefiles. Contrary to this spartan approach, HBase and Bigtable client libraries have a significant number of transitive dependencies. Since, adding those dependencies one-by-one in the OpenTSDB build process would complicate its dependency management, we decided to  package all asyncbigtable dependencies in an uber-jar using the Maven assembly plugin. Therefore, building OpenTSDB with asyncbigtable support is now as simple as downloading a single beefy jar.

Build stepsBefore you start

Before you build OpenTSDB with Google Bigtable support, you must complete the following required steps:

  1. Create a Google Bigtable cluster (https://cloud.google.com/bigtable/docs/creating-cluster)
  1. Install HBase shell with access to the Google Bigtable cluster (https://cloud.google.com/bigtable/docs/installing-hbase-shell)
  1. Download and install the required tools for compiling OpenTSDB from source (http://opentsdb.net/docs/build/html/installation.html#compiling-from-source)
Build and run OpenTSDB
  1. Clone and build the modified source code from the Pythian github repository:

git clone -b bigtable git@github.com:pythian/opentsdb.git
cd opentsdb
sh build-bigtable.sh

  1. Create OpenTSDB tables

OpenTSDB provides a script that uses HBase shell to create its tables.  To create the tables run the following command:
env COMPRESSION=NONE HBASE_HOME=/path/to/hbase-1.1.2 \
./src/create_table.sh

  1. Run OpenTSDB

export HBASE_CONF=/path/to/hbase-1.1.2/conf
mkdir -p <tmp_dir>
./build/tsdb tsd --port=4242 --staticroot=build/staticroot \
--cachedir=<tmp_dir>

Future work

By all means our work on asyncbigtable does not stop here. We are putting great effort towards improving the library to achieve the high quality standards of the rest of OpenTSDB code. Our first priority is to test the library against most real world scenarios and achieve the highest quality. In the future, we plan to benchmark the performance of OpenTSDB with Bigtable and compare how it competes against HBase.

We are also working on building a true asynchronous implementation of the asyncbigtable library by integrating deeper with the Google Bigtable API.

Acknowledgements

We would like to thank the OpenTSDB developers (Benoît Sigoure and Chris Larsen) for their brilliant work in building such great software and for embracing the asyncbigtable library. Their insights and code contributions helped us deal with some serious issues. Also, we would like to thank the Google Cloud Bigtable team because they expressed genuine interest in this project and they were very generous in providing us with cloud infrastructure and excellent support.

Categories: DBA Blogs

#EMd360 … OEM health checks made easy

DBASolved - Mon, 2016-03-14 12:01

Oracle Enterprise Manager 12c is a great tool! Now that 13c is out, it is getting even better. This post however it not really about Oracle Enterprise Manager, rather than a quick and simple health check tool that I’ve put together. With the help of of some really cool co-workers (Carlos Sierra and Mauro Pagano), I’ve put together a small diagnostic tool call EMd360.

EMd360 stands for Enterprise Manager d360. The concept behind this tool is just like other tools that have been released with the 360 concept (edb360 and sqld360); to provide a quick and easy approach to checking an environment. As with edb360 and sqld360, EMd360 is a completely free tool for anyone to use.

So, why is there a need for EMd360? It is quite simple, there are so many things that go into OEM and you get so much out of OEM it is overwhelming. As a consultant, I’ve been asked to review a lot of OEM architectures and the associated performance. A lot of this information is in the OMR and often time I’m using other tools like REPVFY and OMSVFY, plus a handful of scripts. I’ve decided to make my life (and hopefully yours) a bit easier by building EMd360.

The first (base) release of EMd360 is now live on GitHub (https://github.com/dbasolved/EMd360.git). Go and get it! Test it out!

Download

If you are interested in trying out EMd360, you can download it from GitHub.

Instructions

Download EMd360 from GitHub as a zip file
Unzip EMd360-master.zip on the OMR server and navigate to the directory where you unzipped it
Connect to the OMR using SQL*Plus and execute @emd360.sql

Options

The @emd360.sql script take two variables. You will be prompted for them if not passed on the sql command line.

Variable 1 – Server name of the Oracle Management Service (without domain names)
Variable 2 – Oracle Management Repository name (database SID)

Example:

$ sqlplus / as sysdba
SQL> @emd360 pebble oemrep

Let me know your thoughts and if there is something you would like to see in it. Every environment is different and there maybe something you are looking for that is not provided. Let me know via email or blog comment and I’ll try to get it added in the next release.

Enjoy!!!

about.me: http://about.me/dbasolved


Filed under: OEM
Categories: DBA Blogs

#EMd360 … OEM health checks made easy

DBASolved - Mon, 2016-03-14 12:01

Oracle Enterprise Manager 12c is a great tool! Now that 13c is out, it is getting even better. This post however it not really about Oracle Enterprise Manager, rather than a quick and simple health check tool that I’ve put together. With the help of of some really cool co-workers (Carlos Sierra and Mauro Pagano), I’ve put together a small diagnostic tool call EMd360.

EMd360 stands for Enterprise Manager d360. The concept behind this tool is just like other tools that have been released with the 360 concept (edb360 and sqld360); to provide a quick and easy approach to checking an environment. As with edb360 and sqld360, EMd360 is a completely free tool for anyone to use.

So, why is there a need for EMd360? It is quite simple, there are so many things that go into OEM and you get so much out of OEM it is overwhelming. As a consultant, I’ve been asked to review a lot of OEM architectures and the associated performance. A lot of this information is in the OMR and often time I’m using other tools like REPVFY and OMSVFY, plus a handful of scripts. I’ve decided to make my life (and hopefully yours) a bit easier by building EMd360.

The first (base) release of EMd360 is now live on GitHub (https://github.com/dbasolved/EMd360.git). Go and get it! Test it out!

Download

If you are interested in trying out EMd360, you can download it from GitHub.

Instructions

Download EMd360 from GitHub as a zip file
Unzip EMd360-master.zip on the OMR server and navigate to the directory where you unzipped it
Connect to the OMR using SQL*Plus and execute @emd360.sql

Options

The @emd360.sql script take two variables. You will be prompted for them if not passed on the sql command line.

Variable 1 – Server name of the Oracle Management Service (without domain names)
Variable 2 – Oracle Management Repository name (database SID)

Example:

$ sqlplus / as sysdbaSQL> @emd360 pebble oemrep

Let me know your thoughts and if there is something you would like to see in it. Every environment is different and there maybe something you are looking for that is not provided. Let me know via email or blog comment and I’ll try to get it added in the next release.

Enjoy!!!

about.me: http://about.me/dbasolved


Filed under: OEM
Categories: DBA Blogs

ORACLE GENERATED ALWAYS AS IDENTITY

Learn oracle 12c database management - Mon, 2016-03-14 10:52
Lets create a new table with Oracle 12c New feature Identity column.

CREATE TABLE TEST_ID
(
  ID NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL
, FNAME VARCHAR2(20)
, LNAME VARCHAR2(20)
, CONSTRAINT TEST_ID_PK PRIMARY KEY
  (
    ID
  )
  ENABLE
);

Table TEST_ID created.


INSERT INTO "PDBORCL1_USR2"."TEST_ID" (FNAME, LNAME) VALUES ('test', 'me');
INSERT INTO "PDBORCL1_USR2"."TEST_ID" (FNAME, LNAME) VALUES ('and', 'me');

Commit Successful;

Oracle generate the PK values for us. We no longer have to worry about auto increment.Leaving everything in oracle hands.

select id,FNAME,LNAME from TEST_ID;

        ID FNAME                LNAME            
---------- -------------------- --------------------
         1 test                 me                
         2 and                  me  


In the Background Oracle creates the a sequence & uses it to generate the id for this column and auto populates these id's.

select sequence_name from user_sequences;

SEQUENCE_NAME                                                                
--------------------------------------------------------------------------------
ISEQ$$_91816  
Categories: DBA Blogs

Pages

Subscribe to Oracle FAQ aggregator - DBA Blogs