Pat Shuff

Subscribe to Pat Shuff feed
Oracle Blogs
Updated: 48 min 50 sec ago

Corente on VirtualBox revisited

Wed, 2016-09-28 16:20
Last week we started talking about setting up Corente and came to the conclusion that you can not run Corente Gateway in a VirtualBox. It turns out that not only was I wrong but I got a ton of mail from people like product managers, people who got it working, and people who generally did not agree with my conclusion. Ok, I will admit that I read the manuals, played with the suggested configurations, and tried deploying it on my own. It appears that I did a few things backwards and cornered myself into an area that caused things not to work. Today we are going to walk through the steps needed to get Corente up and running in your data center using VirtualBox as a sandbox.

The first thing that you absolutely need is a Corente admin account. Without this you will not be able to create a configuration to download and everything will fail. You should have received an account email from "no-reply-cloud@oracle.com" with the title "A VPN account was created for you". If you have multiple accounts you should have received multiple emails. This is a good thing if you got multiples. It is a bad thing if you did not get any. I received mine back on August 11th of this year. I received similar emails back on April 27th for some paid accounts that I have had for a while. The email reads

The VPN account information included in this email enables you to sign in to App Net Manager Service Portal when setting up Corente Services Gateway (cloud gateway) on Oracle Cloud, which is Step 2 of the setup process.
Account Details
	
Username:  a59878_admin
Password: --not shown--
Corente Domain:  a59878
Click here for additional details about how to access your account. The link takes you to the documentation on how to setup a service gateway. The document was last updated in August and goes through the workflow on how to setup a connection.

Step 1: Obtain a trial or paid subscription to Oracle Compute Cloud Service. After you subscribe to Oracle Compute Cloud Service, you will get your Corente credentials through email after you receive the Oracle Compute Cloud Service welcome email.

Step 2: Set up a Corente Services Gateway (on-premises gateway) in your data center. This is where everything went off the rails the first time. This actually is not step 2. Step 2 is to visit the App Net Manager and register your gateway using the credentials that you received in the email. I went down the foolish path of spinning up a Linux 6 instance and running the verification to make sure that the virtualization gets passed to the guest operating system. According to the documentation, this is step 2. VirtualBox fails all of the tests suggested. I then looked for a second way of running in VirtualBox and the old way of doing this is being dropped from support. According to the product manager, support is being dropped because it does work in VirtualBox and if you follow the cookbooks that are available internal to Oracle you can make it work properly. I found two cookbooks and both are too large to publish in this blog. I will try to summarize the key steps. Ask your local sales consultant to look for "Oracle Corente Cloud Services Cook Book" or "Oracle Cloud Platform - Corente VPN for PaaS and IaaS". Both walk you through installation with screen shots and recommended configurations.

Step 2a: Go to www.corente.com/web and execute the Java code that launches the App Net Manager. When I first did this it failed. I had to download a newer version of Java to get the javaws image to install. If you are on a Linux desktop you can do this with a w get http://javadl.oracle.com/webapps/download/AutoDL?BundleId=211989 or go to the web page https://java.com/en/download/linux_manual.jsp and download the Linux64 bundle. This allows you to uncompress and install the javaws binary and associate it with the jsp file provided on the Corente site. If you are on Windows or MacOS, go to https://java.com/en/download/ and it will figure out what your desktop is and ask you to download and install the latest version of Java. What you are looking for is a version with a JDK containing the javaws binary. This binary is called from the web browser and executes the downloadable scripts from the Corente site.

Step 2b: When you go to the www.corente.com/web site it will download java code and launch the App Manager. It should look like

The first time there will be no locations listed. We will need to add a location. It is important to note that the physical address that you use for the location has no relevance to the actual address of your server, gateway, or cloud hosting service. I have been cycling through major league baseball park addresses as my location. My gateway is currently located at Minute Maid Park in Houston and my desktop is at the Texas Rangers Ballpark in Arlington with my server at Wrigley Field in Chicago.

Step 2c: Launch the New Location Wizard. The information that will be needed is Name, address, maintenance window (date and reboot option), inline configuration, dhcp, dhcp client name is optional, and lan interface. Note that it is important to know ahead of time what your lan interface is going to be. Once you get your gateway configured and connected the only way to get back into this console is to do it from this network. When I first did this I did not write down the ip address and basically locked my account. I had to go to another account domain and retry the configuration. For the trial that I did I used 192.168.200.1 as the lan address and had it use 255.255.255.0 as the netmask. This will become your gateway for all subnets in your data center. By default there is a dhcp server in my house that assigns IP addresses to the 192.168.1.X network. You need to pick something different than this subnet because you can't have a broadband router acting as a gateway to the internet and a VPN router acting as a gateway router on the same subnet. The implication to this is that you will need to create a new network interface on your Oracle Compute Cloud instances that have a network connection that talk on the 192.168.200.X network. This is easy to do but selection of this network is important and writing it down is even more important. The wizard will continue and ask about adding the subnet to the Default User Group. Click Yes and add the 192.168.200.X subnet to this group.

Step 2d: At this point we are ready to install a Linux 6 or Linux 7 guest OS in VirtualBox and download the Corente Services Gateway software from http://www.oracle.com/technetwork/topics/cloud/downloads/network-cloud-service-2952583.html. From here you agree to the legal stuff and download the Corente Gateway Image. This is a bootable image that works with VirtualBox.

Step 2e: We need to configure the instance with 2G of RAM, at least 44G of disk, and two network interfaces. The first interface needs to be configured as active using the Bridged Adapter. The second interface needs to be configured as active using the Internal Network. The bridged adapter is going to get assigned to the 192.168.1.X network by our home broadband DHCP server. The second network is going to be statically mapped to 192.168.200.1 by the configuration that you download from the App Manager. You also need to mount the iso image that was downloaded for the Corente Gateway Image. When the server boots it will load the operating system into the virtual disk and ask to reboot once the OS is loaded.

Step 3: Rather than rebooting the instance we should stop the reboot after shutdown happens and remove the iso as the default boot device. If we don't, we will go through the OS install again and it will keep looping until we do. Once we boot the OS it will ask us to download the configuration file from the App Manager. We do this by setting the download site to www.corente.com, selecting dhcp as the network configuration and entering our login information for the App Manager in the next screen.

Step 4: At this point we have a gateway configured in our data center (or home in my case) and need to setup a desktop server to connect through the VPN and access the App Manager. Up to this point we have connected to the app manager via our desktop to setup the initial configuration. From this point forward we will need to do so from an ip address in the 192.168.200.x network. If you try to connect to the app manager from your desktop you will get an error message and nothing can be done. To install a guest system we boot Linux 6 or Linux 7 into VirtualBox and connect to https://66.77.134.249. To do this we need to setup the network interfaces on our guest operating system. The network needs to be the internal network. For my example I used 192.168.200.100 as the guest OS ip address and the default router is 192.168.200.1 which is our gateway server. This machine is configured with a static IP address because by default the 192.168.1.X server will answer the DHCP address and assign you to the wrong subnet. To get the App Manager to work I had to download the javaws again for Linux and associate the jsp file from the www.corente.com/web site to launch using javaws. Once this was done I was able to add the guest OS as a new location.

At this point we have a gateway server configured and running and a computer inside our private subnet that can access the App Manager. This is the foundation to getting everything to work. From here you can then provision a gateway instance in the cloud service and connect your guest OS to computers in the cloud as if they were in the same data center. More on that later.

In summary, this was more difficult to do than I was hoping for. I made a few key mistakes when configuring the service. The first was not recording the IP address when I setup everything the first time. The second was using the default network behind my broadband router and not a different network address. The third was assuming that the steps presented in the documentation were the steps that I had to follow. The fourth was not knowing that I had to setup a guest OS to access the App Manager once I had the gateway configured. Each of these mistakes took hours to overcome. Each configuration and failure required starting over again from scratch and once I got to a point in the install I could not go back to scratch but had to start over with another account to get back to scratch. I am still trying to figure out how to reset the configuration for my initial account. Hopefully my slings and arrows will help you avoid the pitfalls of outrageous installations.

Making Hadoop easier

Mon, 2016-09-26 02:07
Last week we looked at provisioning a Hadoop server and realized that the setup was a little complex and somewhat difficult. This is what people typically do the first time when they want to provision a service. They download the binaries (or source if you are really crazy) and install everything from scratch. Our recommendation is to do everything this way the first time. It does help you get a better understanding of how the setup works and dependencies. For example, Hadoop 2.7.3 required Java 1.8 or greater. If we go with Hadoop 2.7.2 we can get by with Java 1.7.

Rather than going through all of the relationships, requirements, and libraries needed to get something working we are going to do what we would typically do to spin up a server if we suddenly need one up and running. We go to a service that provides pre-compiled and pre-configured public domain code sandboxes and get everything running that way. The service of choice for the Oracle Compute Cloud is Bitnami We can search for a Hadoop configuration and provision it into our IaaS foundation. Note that we could do the same using the Amazon EMR and get the same results. The key difference between the two are configurations, number of servers, and cost. We are going to go through the Bitnami deployment on the Oracle Cloud in this blog.

Step 1 Search for Hadoop on http://oracle.bitnami.com and launch the instance into your region of choice.

Step 2 Configure and launch the instance. We give the instance a name, we increase the default disk size from 10 GB to 60 GB to have room for data, we go with the hadoop 2.7.2-1 version, select Oracle Linux 6.7 as the OS (Ubuntu is an alternative), and go with a small OC3 footprint for the compute size. Don't change the security rules. A new one will be generated for you as well as the ssh keys when you provision through this service.

Step 3 Log into your instance. To do this you will need ssh and use the keys that bitnami generates for you. The instance creation takes 10-15 minutes and should show you a screen with the ip address and have links for you to download the keys.

Step 4 Once you have access to the master system you can execute the commands that we did last week. The only key difference with this implementation is that you will need to install java-1.8 with a yum install because by default the development kit is not installed and we need the jar functionality as part of configuration. The steps needed to repeat our tests from the previous blog entry. --- setup hdfs file system hdfs namenode -format hdfs getconf -namenodes hdfs dfs -mkdir input cp /opt/bitnami/hadoop/etc/hadoop/*.xml input hdfs dfs -put input/*.xml input --- setup simple test with wordcount hdfs dfs -mkdir wordcount hdfs dfs -mkdir wordcount/input mkdir ~/wordcount mkdir ~/wordcount/input vi file01 mv file01 ~/wordcount/input vi ~/wordcount/input/file02 hdfs dfs -put ~/wordcount/input/* wordcount/input vi WordCount.java --- install java-1.8 to get all of the libraries sudo yum install java-1.8\* --- create ec.jar file export HADOOP_CLASSPATH=/opt/bitnami/java/lib/tools.jar hadoop com.sun.tools.javac.Main WordCount.java jar cf wc.jar WordCount*.class hadoop jar wc.jar WordCount wordcount/input wordcount/output hadoop fs -cat wordcount/output/part-r-00000 --- download data and test pig mkdir data cd data w get http://stat-computing.org/dataexpo/2009/1987.csv.bz2 w get http://stat-computing.org/dataexpo/2009/1988.csv.bz2 bzip2 -d 1987.csv.bz2 bzip2 -d 1988.csv.bz2 hdfs dfs -mkdir airline hdfs dfs -copyFromLocal 19*.csv airline vi totalmiles.pig pig totalmiles.pig hdfs dfs -cat data/totalmiles/part-r-00000

Note that we can do the exact same thing using Amazon AWS. They have a MapReduce product called EMR. If you go to the main console, click on EMR at the bottom of the screen, you can create a Hadoop cluster. Once you get everything created and can ssh into the master you can repeat the steps above.

I had a little trouble with the WordCount.java program in that the library version was a little different. The JVM_1.7 libraries had a problem linking and adding the JVM_1.8 binaries did not properly work with the Hadoop binaries. You also need to change the HADOOP_CLASSPATH to point to the proper tools.jar file since it is in a different location from the Bitnami install. I think with a little tweaking it would all work. The pig sample code works with no problem so we were able to test that without changing anything.

In summary, provisioning a Hadoop server or cluster in the cloud is very easy if someone else has done the heavy lifting and pre-configured a server or group of servers for you. I was able to provision two clusters before lunch, run through the exercises, and still have time to go through it again to verify. Using a service like private Marketplaces, Bitnami, or the AWS Marketplace makes it much simpler to deploy sandbox images.

Making Hadoop easier

Mon, 2016-09-26 02:07
Last week we looked at provisioning a Hadoop server and realized that the setup was a little complex and somewhat difficult. This is what people typically do the first time when they want to provision a service. They download the binaries (or source if you are really crazy) and install everything from scratch. Our recommendation is to do everything this way the first time. It does help you get a better understanding of how the setup works and dependencies. For example, Hadoop 2.7.3 required Java 1.8 or greater. If we go with Hadoop 2.7.2 we can get by with Java 1.7.

Rather than going through all of the relationships, requirements, and libraries needed to get something working we are going to do what we would typically do to spin up a server if we suddenly need one up and running. We go to a service that provides pre-compiled and pre-configured public domain code sandboxes and get everything running that way. The service of choice for the Oracle Compute Cloud is Bitnami We can search for a Hadoop configuration and provision it into our IaaS foundation. Note that we could do the same using the Amazon EMR and get the same results. The key difference between the two are configurations, number of servers, and cost. We are going to go through the Bitnami deployment on the Oracle Cloud in this blog.

Step 1 Search for Hadoop on http://oracle.bitnami.com and launch the instance into your region of choice.

Step 2 Configure and launch the instance. We give the instance a name, we increase the default disk size from 10 GB to 60 GB to have room for data, we go with the hadoop 2.7.2-1 version, select Oracle Linux 6.7 as the OS (Ubuntu is an alternative), and go with a small OC3 footprint for the compute size. Don't change the security rules. A new one will be generated for you as well as the ssh keys when you provision through this service.

Step 3 Log into your instance. To do this you will need ssh and use the keys that bitnami generates for you. The instance creation takes 10-15 minutes and should show you a screen with the ip address and have links for you to download the keys.

Step 4 Once you have access to the master system you can execute the commands that we did last week. The only key difference with this implementation is that you will need to install java-1.8 with a yum install because by default the development kit is not installed and we need the jar functionality as part of configuration. The steps needed to repeat our tests from the previous blog entry.

 --- setup hdfs file system 
   hdfs namenode -format
   hdfs getconf -namenodes
   hdfs dfs -mkdir input
   cp /opt/bitnami/hadoop/etc/hadoop/*.xml input
   hdfs dfs -put input/*.xml input
 --- setup simple test with wordcount
   hdfs dfs -mkdir wordcount
   hdfs dfs -mkdir wordcount/input
   mkdir ~/wordcount
   mkdir ~/wordcount/input
   vi file01
   mv file01 ~/wordcount/input
   vi ~/wordcount/input/file02
   hdfs dfs -put ~/wordcount/input/* wordcount/input
   vi WordCount.java
 --- install java-1.8 to get all of the libraries
   sudo yum install java-1.8\*
 --- create ec.jar file
   export HADOOP_CLASSPATH=/opt/bitnami/java/lib/tools.jar
   hadoop com.sun.tools.javac.Main WordCount.java
   jar cf wc.jar WordCount*.class
   hadoop jar wc.jar WordCount wordcount/input wordcount/output
   hadoop fs -cat wordcount/output/part-r-00000
 --- download data and test pig
   mkdir data
   cd data
   w get http://stat-computing.org/dataexpo/2009/1987.csv.bz2
   w get http://stat-computing.org/dataexpo/2009/1988.csv.bz2
   bzip2 -d 1987.csv.bz2
   bzip2 -d 1988.csv.bz2
   hdfs dfs -mkdir airline
   hdfs dfs -copyFromLocal 19*.csv airline
   vi totalmiles.pig
   pig totalmiles.pig
   hdfs dfs -cat data/totalmiles/part-r-00000

Note that we can do the exact same thing using Amazon AWS. They have a MapReduce product called EMR. If you go to the main console, click on EMR at the bottom of the screen, you can create a Hadoop cluster. Once you get everything created and can ssh into the master you can repeat the steps above.

I had a little trouble with the WordCount.java program in that the library version was a little different. The JVM_1.7 libraries had a problem linking and adding the JVM_1.8 binaries did not properly work with the Hadoop binaries. You also need to change the HADOOP_CLASSPATH to point to the proper tools.jar file since it is in a different location from the Bitnami install. I think with a little tweaking it would all work. The pig sample code works with no problem so we were able to test that without changing anything.

In summary, provisioning a Hadoop server or cluster in the cloud is very easy if someone else has done the heavy lifting and pre-configured a server or group of servers for you. I was able to provision two clusters before lunch, run through the exercises, and still have time to go through it again to verify. Using a service like private Marketplaces, Bitnami, or the AWS Marketplace makes it much simpler to deploy sandbox images.

Hadoop on IaaS - part 2

Fri, 2016-09-23 02:07
Today we are going to get our hands dirty and install a single instance standalong Hadoop Cluster on the Oracle Compute Cloud. This is a continuing series of installing public domain software on Oracle Cloud IaaS. We are going to base our installation on three componentsWe are using Oracle Linux 6.7 because it is the easiest to install on Oracle Compute Cloud Services. We could have done Ubuntu or SUSE or Fedora and followed some of the tutorials from HortonWorks or Cloudera or Apache Single Node Cluster. Instead we are going old school and installing from the Hadoop home page by downloading a tar ball and configuring the operating system to run a single node cluster.

Step 1:

Install Oracle Linux 6.7 on an Oracle Compute Cloud instance. Note that you can do the same thing by installing on your favorite virtualization engine like VirtualBox, VMWare, HyperV, or any other cloud vendor. The only true dependency is the operating system beyond this point. If you are installing on the Oracle Cloud, go with the OL_67_3GB..... option, go with the smallest instance, delete the boot disk, replace it with a 60 GB disk, rename it and launch. The key reason that we need to delete the boot disk is that by default the 3 GB disk will not take the Hadoop binary. We need to grow it to at least 40 GB. We pad a little bit with a 60 GB disk. If you check the new disk as a boot disk it replaces the default Root disk and allows you to create an instance with a 60 GB disk.

Step 2:

Run yum to update the os, install w get, and java version 1.8. You need to login as opc to the instance so that you can run as root.

Note that we are going to diverge from the Hadoop for Dummies that we referenced yesterday. They suggest attaching to a yum repository and doing an install from the repository for the bigtop package. We don't have that option for Oracle Linux and need to do the install from the binaries by downloading a tar or src image. The bigtop package basically takes the Apache Hadoop bundle and translates them to rpm files for an operating system. Oracle does not provide this as part of the yum repository and Apache does not create one for Oracle Linux or RedHat. We are going to download the tar file from the links provided at Apache Hadoop homepage we are following install instructions for a single node cluster.

Step 3:

Get the tar.gz file by pulling it from http://apache.osuosl.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

Step 4:We unpack the tar.gz file with the tar xvzf hadoop-2.7.2.tar.gz command

Step 5:

Next we add the following to the .bashrc file in the home directory to setup some environment variables. The java code is done in the same location by the yum command. The location of the hadoop code is based on downloading into the opc home directory.

export JAVA_HOME=/usrexport HADOOP_HOME=/home/opc/hadoop-2.7.3export HADOOP_CONFIG_DIR=/home/opc/hadoop-2.7.3/etc/hadoopexport HADOOP_MAPRED_HOME=/home/opc/hadoop-2.7.3export HADOOP_COMMON_HOME=/home/opc/hadoop-2.7.3export HADOOP_HDFS_HOME=/home/opc/hadoop-2.7.3export YARN_HOME=/home/opc/hadoop-2.7.3export PATH=$PATH:$HADOOP_HOME/bin

Step 6

Source the .bashrc to pull in these environment variables

Step 7Edit the /etc/hosts file to add namenode to the file.

Step 8

Setup ssh so that we can loop back to localhost and launch an agent. I had to edit the authorized_keys to add a return before the new entry. If you don't the ssh won't work. ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsacat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keysvi ~/.ssh/authorized_keysssh localhostexit

Step 9Test the configuration then configure the hadoop file system for single node.cd $HADOOP_HOMEmkdir inputcp etc/hadoop/*.xml input./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'vi etc/hadoop/core-site.xml

When we ran this and there were a couple of warnings which we can ignore. The test should finish without error and generate a long output list. We then edit to core-site.xml file by changing the following lines at the end. (omit the spaces, the blog software masked them and the only way to show the full file was to add spaces)< configuration >< property >< name >fs.defaultFS< /name >< value >hdfs://namenode:8020< /value >< /property >< /configuration >

Step 10

Create the hadoop file system with the command hdfs namenode -format

Step 11

Verify the configuration with the command hdfs getconf -namenodes

Step 12

Start the hadoop file system with the command sbin/start-dfs.sh

At this point we have the hadoop filesystem up and running. We now need to configure MapReduce and test functionality.Step 13

Make the HDFS directories required to execute MapReduce jobs with the commands hdfs dfs -mkdir /user hdfs dfs -mkdir /user/opc hdfs dfs -mkdir input hdfs dfs -put etc/hadoop/*.xml input

Step 14Run a MapReduce example and look at the output hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+' hdfs dfs -get output output cat output/* output/output/*

Step 15

Create a test program to do a wordcount of two files. This example comes from an Apache MapReduce Tutorial

hdfs dfs -mkdir wordcounthdfs dfs -mkdir wordcount/inputmkdir ~/wordcountmkdir ~/wordcount/inputvi ~/wordcount/input/file01 - add Hello World Bye Worldvi ~/wordcount/input/file02- addHello Hadoop Goodbye Hadoophdfs dfs -put ~/wordcount/input/* wordcount/inputvi ~/wordcount/WordCount.java

Create WordCount.java with the following code

import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount { public static class TokenizerMapper extends Mapper{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount { public static class TokenizerMapper extends Mapper{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}

Step 16

Compile and run the WordCount.java code

cd ~/wordcountexport JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64export HADOOP_CLASSPATH=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/lib/tools.jarhadoop com.sun.tools.javac.Main WordCount.javajar cf wc.jar WordCount*.classhadoop jar wc.jar WordCount wordcount/input wordcount/outputhadoop fs -cat wordcount/output/part-r-00000

At this point we have a working system and can run more MapReduce jobs, look at results, and play around with Big Data foundations.

In summary, this is a relatively complex example. We have moved beyond a simple install of an Apache web server or Tomcat server and editing some files to get results. We have the foundations for a Big Data analytics solution running on the Oracle Compute Cloud Service. The steps to install are very similar to the other installation tutorials that we referenced earlier on Amazon and Virtual Machines. Oracle Compute is a good foundation for public domain code. Per core the processes are cheaper than other cloud vendors. Networking is non-blocking and higher performance. Storage throughput is faster and optimized for compute high I/O and tied to the compute engine. Hopefully this tutorial has given you the foundation to start playing with Hadoop on Oracle IaaS.

Hadoop on IaaS - part 2

Fri, 2016-09-23 02:07
Today we are going to get our hands dirty and install a single instance standalong Hadoop Cluster on the Oracle Compute Cloud. This is a continuing series of installing public domain software on Oracle Cloud IaaS. We are going to base our installation on three components We are using Oracle Linux 6.7 because it is the easiest to install on Oracle Compute Cloud Services. We could have done Ubuntu or SUSE or Fedora and followed some of the tutorials from HortonWorks or Cloudera or Apache Single Node Cluster. Instead we are going old school and installing from the Hadoop home page by downloading a tar ball and configuring the operating system to run a single node cluster.

Step 1:

Install Oracle Linux 6.7 on an Oracle Compute Cloud instance. Note that you can do the same thing by installing on your favorite virtualization engine like VirtualBox, VMWare, HyperV, or any other cloud vendor. The only true dependency is the operating system beyond this point. If you are installing on the Oracle Cloud, go with the OL_67_3GB..... option, go with the smallest instance, delete the boot disk, replace it with a 60 GB disk, rename it and launch. The key reason that we need to delete the boot disk is that by default the 3 GB disk will not take the Hadoop binary. We need to grow it to at least 40 GB. We pad a little bit with a 60 GB disk. If you check the new disk as a boot disk it replaces the default Root disk and allows you to create an instance with a 60 GB disk.

Step 2:

Run yum to update the os, install w get, and java version 1.8. You need to login as opc to the instance so that you can run as root.

Note that we are going to diverge from the Hadoop for Dummies that we referenced yesterday. They suggest attaching to a yum repository and doing an install from the repository for the bigtop package. We don't have that option for Oracle Linux and need to do the install from the binaries by downloading a tar or src image. The bigtop package basically takes the Apache Hadoop bundle and translates them to rpm files for an operating system. Oracle does not provide this as part of the yum repository and Apache does not create one for Oracle Linux or RedHat. We are going to download the tar file from the links provided at Apache Hadoop homepage we are following install instructions for a single node cluster.

Step 3:

Get the tar.gz file by pulling it from http://apache.osuosl.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

Step 4: We unpack the tar.gz file with the tar xvzf hadoop-2.7.2.tar.gz command

Step 5:

Next we add the following to the .bashrc file in the home directory to setup some environment variables. The java code is done in the same location by the yum command. The location of the hadoop code is based on downloading into the opc home directory.

export JAVA_HOME=/usr
export HADOOP_HOME=/home/opc/hadoop-2.7.3
export HADOOP_CONFIG_DIR=/home/opc/hadoop-2.7.3/etc/hadoop
export HADOOP_MAPRED_HOME=/home/opc/hadoop-2.7.3
export HADOOP_COMMON_HOME=/home/opc/hadoop-2.7.3
export HADOOP_HDFS_HOME=/home/opc/hadoop-2.7.3
export YARN_HOME=/home/opc/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin

Step 6

Source the .bashrc to pull in these environment variables

Step 7 Edit the /etc/hosts file to add namenode to the file.

Step 8

Setup ssh so that we can loop back to localhost and launch an agent. I had to edit the authorized_keys to add a return before the new entry. If you don't the ssh won't work.

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
vi ~/.ssh/authorized_keys
ssh localhost
exit

Step 9 Test the configuration then configure the hadoop file system for single node.

cd $HADOOP_HOME
mkdir input
cp etc/hadoop/*.xml input
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
vi etc/hadoop/core-site.xml

When we ran this and there were a couple of warnings which we can ignore. The test should finish without error and generate a long output list. We then edit to core-site.xml file by changing the following lines at the end. (omit the spaces, the blog software masked them and the only way to show the full file was to add spaces)

< configuration >
 < property >
  < name >fs.defaultFS< /name >
  < value >hdfs://namenode:8020< /value >
 < /property >
< /configuration >

Step 10

Create the hadoop file system with the command hdfs namenode -format

Step 11

Verify the configuration with the command hdfs getconf -namenodes

Step 12

Start the hadoop file system with the command sbin/start-dfs.sh

At this point we have the hadoop filesystem up and running. We now need to configure MapReduce and test functionality. Step 13

Make the HDFS directories required to execute MapReduce jobs with the commands

  hdfs dfs -mkdir /user
  hdfs dfs -mkdir /user/opc
  hdfs dfs -mkdir input
  hdfs dfs -put etc/hadoop/*.xml input

Step 14 Run a MapReduce example and look at the output

  hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep 
    input output 'dfs[a-z.]+'
  hdfs dfs -get output output
  cat output/* output/output/*

Step 15

Create a test program to do a wordcount of two files. This example comes from an Apache MapReduce Tutorial

hdfs dfs -mkdir wordcount
hdfs dfs -mkdir wordcount/input
mkdir ~/wordcount
mkdir ~/wordcount/input
vi ~/wordcount/input/file01
 - add 
Hello World Bye World
vi ~/wordcount/input/file02
- add
Hello Hadoop Goodbye Hadoop
hdfs dfs -put ~/wordcount/input/* wordcount/input
vi ~/wordcount/WordCount.java

Create WordCount.java with the following code

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Step 16

Compile and run the WordCount.java code

cd ~/wordcount
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64
export HADOOP_CLASSPATH=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/lib/tools.jar
hadoop com.sun.tools.javac.Main WordCount.java
jar cf wc.jar WordCount*.class
hadoop jar wc.jar WordCount wordcount/input wordcount/output
hadoop fs -cat wordcount/output/part-r-00000

At this point we have a working system and can run more MapReduce jobs, look at results, and play around with Big Data foundations.

In summary, this is a relatively complex example. We have moved beyond a simple install of an Apache web server or Tomcat server and editing some files to get results. We have the foundations for a Big Data analytics solution running on the Oracle Compute Cloud Service. The steps to install are very similar to the other installation tutorials that we referenced earlier on Amazon and Virtual Machines. Oracle Compute is a good foundation for public domain code. Per core the processes are cheaper than other cloud vendors. Networking is non-blocking and higher performance. Storage throughput is faster and optimized for compute high I/O and tied to the compute engine. Hopefully this tutorial has given you the foundation to start playing with Hadoop on Oracle IaaS.

Hadoop on IaaS

Thu, 2016-09-22 02:07
We are going to try a weekly series of posts that talks about public domain code running on the Oracle Cloud IaaS platform. A good topic seems to be Big Data and Hadoop. Earlier we talked about running Tomcat on IaaS as well as WordPress on IaaS using bitnami.com. To start this process we are going to review what is Big Data and what is Hadoop. We are going to start with the first place that most people start with and that is looking at what books are available on the subject and walking through one or two of them. Today we are going to start with Hadoop for Dummies by Dirk deRoos. This is not the definitive source on Hadoop but a good place to have terms and concepts defined for us.

Years ago one of the big business trends was to create a data warehouse. The idea was to take all of the coporate operational data and put it into one database and grind on it to generate reports. History has shown that aggregation of the data was a difficult task as well as the processing power required to grind through reports. The task took significant resources to architect the data, to host the data, and to write select statements to generate reports for users. As retail got more and more ingrained on the web, sources outside the company became highly relevant and influential on products and services. Big Data and Hadoop have come with tools to pull from non-structured data like Twitter, Yelp, and other public web services and correlate comments and reviews to products and services.

The three characterizations of Big Data according to Hadoop for Dummies are

  • Volume - high volumes of data ranging from dozens fo terabytes to petabytes.
  • Variety - data that is organized in multiple structures, ranging from raw text to log files.
  • Velocity - data that enters an organization has some kind of value for a limited amount of time. The higher the volume of data entering an organization per second, the bigger the velocity of change.

Hadoop is architected to view high volumes of data and data with a variety of structures but it is not necessarily suited to analyze data in motion as it enters the organization but once it is stored and at rest.

Since we touched on the subject, let's define different data structures. Structured data is characterized by a high degree of organization and is typically stored in a database or spreadsheet. There is a relational mapping to the data and programs can be written to analize and process the relationships. Semi-structured data is a bit more difficult to understand than structured data. It is typically stored in the form of text data or log files. The data is typically somewhat structured and is either comma, tab, or character delimited. Unfortunately multiple log files have different formats so the stream of formatting is different for each file and parsing and analysis is a little more challenging. Unstructured data has none of the advantages of the other two data types. Structure might be in the form of directory structure, server location, or file type. The actual architecture of the data might or might not be predictable and needs a special translator to parse the data. Analyzing this type of data typically requires a data architect or data scientist to look at the data and reformat it to make it usable.

From Dummies Guide again, Hadoop is a framework for storing data on large clusters of commodity hardware. This lends itself well to running on a cloud infrastructure that is predictable and scalable. Level 3 networking is the foundation for the cluster. An application that is running on Hadoop gets its work divided among the nodes in the cluster. Some nodes aggregate data through MapReduce or YARN and the data is stored and managed by other nodes using a distributed file system know as the Hadoop distributed file system (HDFS). Hadoop started back in 2002 with the Apache Nutch project. The purpose of this project was to create the foundation for an open source search engine. The project needed to be able to scale to billions of web pages and in 2004 Google published a paper that introduced MapReduce as a way of parsing these web pages.

MapReduce performs a sequence of operations on distributed data sets. The data consists of key-value pairs and has two phases, mapping and data reduction. During the map phase, input data is split into a large number of fragments which is assigned to a map task. Map tasks process the key-value pair that it assigned to look for and proces a set of intermediate key-value pairs. This data is sorted by key and stored into a number of fragments that matches the number of reduce tasks. If for example, we are trying to parse data for the National Football League in the US we would want to spawn 32 task nodes to that we could parse data for each team in the league. Fewer nodes would cause one node to do double duty and more than 32 nodes would cause a duplication of effort. During the reduction phase each task processes the data fragment that it was assigned to it and produces an output key-value pair. For example, if we were looking for passing yardage by team we would spawn 32 task nodes. Each node would look for yardage data for each team and categorize it as either passing or rushing yardage. We might have two quarterbacks pay for a team or have a wide receiver throw a pass. The key for this team would be the passer and the value would be the yards gained. These reduce tasks are distributed across the cluster and the results of their output is stored on the HDFS when finished. We should end up with 32 data files from 32 different task nodes updating passing yardage by team.

Hadoop is more than just distributed storage and MapReduce. It also contains components to help administer and coordinate servers (HUE, Ambari, and Zookeeper), data movement management (flume and sqoop), resource management (YARN), processing framework (MapReduce, Tez, Hoya), Workflow engines (Oozie), Data Serialization (Avro), Data Collection (MapReduce, Pig, Hive, and HBase), and Data Analysis (Mahout). We will look into these system individually later.

There are commercial and public domain offerings for Hadoop.

A good project to start a small Hadoop project is log analysis. If you have a web server, it generates logs every time that a web page is requested. When a change is made to the web site, logs are generated when people log into manage the pages or change the page content. If you web page is a transactional system, orders are being placed for goods and services as well as credit card transaction processing. All of these generate log files. If we wanted to look at a product catalog and correlate what people look at in relationship to what is ordered, we could do what Amazon has done for years. We could come up with recommendations on what other people are looking at as well as what other people ordered along with this item. If, for example, we are buying a pair of athletic shoes. A common purchase with a pair of shoes is also socks. We could give a recommendation on socks that could go with the shoes or a shoe deoderant product that yields a higher profit margin. These items could be displayed with the product in the catalog or shopping cart to facilitate more goods sold on the web. We can also look at the products that no one is looking at and reduce our inventories since they are not even getting looked at casually.

We can also use Hadoop as a fraud detection or risk modeling engine. Both provide significant value to companies and allow executives to look at revenue losses as well as potential transactions that could cause a loss. For example, we might want to look at the packing material that we use for a fragile item that we sell. If we have a high rate of return on a specific item we might want to change the packing, change the shipper, or stop shipping to a part of the country that tends to have a high return rate. Any and all of these solutions can be implemented but a typical data warehouse will not be able to coordinate the data and answer these questions. Some of the data might be stored in plain text files or log files on our return web site. Parsing and processing this data is a good job for Hadoop.

In the upcoming weeks we will dive into installation of a Hadoop framework on the Oracle Cloud. We will look at resources required, pick a project, and deploy sample code into a IaaS solution. We will also look at other books and resources to help us understand and deploy sandboxes to build a prototype that might help us solve a business problem.

Hadoop on IaaS

Thu, 2016-09-22 02:07
We are going to try a weekly series of posts that talks about public domain code running on the Oracle Cloud IaaS platform. A good topic seems to be Big Data and Hadoop. Earlier we talked about running Tomcat on IaaS as well as WordPress on IaaS using bitnami.com. To start this process we are going to review what is Big Data and what is Hadoop. We are going to start with the first place that most people start with and that is looking at what books are available on the subject and walking through one or two of them. Today we are going to start with Hadoop for Dummies by Dirk deRoos. This is not the definitive source on Hadoop but a good place to have terms and concepts defined for us.

Years ago one of the big business trends was to create a data warehouse. The idea was to take all of the coporate operational data and put it into one database and grind on it to generate reports. History has shown that aggregation of the data was a difficult task as well as the processing power required to grind through reports. The task took significant resources to architect the data, to host the data, and to write select statements to generate reports for users. As retail got more and more ingrained on the web, sources outside the company became highly relevant and influential on products and services. Big Data and Hadoop have come with tools to pull from non-structured data like Twitter, Yelp, and other public web services and correlate comments and reviews to products and services.

The three characterizations of Big Data according to Hadoop for Dummies are

  • Volume - high volumes of data ranging from dozens fo terabytes to petabytes.
  • Variety - data that is organized in multiple structures, ranging from raw text to log files.
  • Velocity - data that enters an organization has some kind of value for a limited amount of time. The higher the volume of data entering an organization per second, the bigger the velocity of change.

Hadoop is architected to view high volumes of data and data with a variety of structures but it is not necessarily suited to analyze data in motion as it enters the organization but once it is stored and at rest.

Since we touched on the subject, let's define different data structures. Structured data is characterized by a high degree of organization and is typically stored in a database or spreadsheet. There is a relational mapping to the data and programs can be written to analize and process the relationships. Semi-structured data is a bit more difficult to understand than structured data. It is typically stored in the form of text data or log files. The data is typically somewhat structured and is either comma, tab, or character delimited. Unfortunately multiple log files have different formats so the stream of formatting is different for each file and parsing and analysis is a little more challenging. Unstructured data has none of the advantages of the other two data types. Structure might be in the form of directory structure, server location, or file type. The actual architecture of the data might or might not be predictable and needs a special translator to parse the data. Analyzing this type of data typically requires a data architect or data scientist to look at the data and reformat it to make it usable.

From Dummies Guide again, Hadoop is a framework for storing data on large clusters of commodity hardware. This lends itself well to running on a cloud infrastructure that is predictable and scalable. Level 3 networking is the foundation for the cluster. An application that is running on Hadoop gets its work divided among the nodes in the cluster. Some nodes aggregate data through MapReduce or YARN and the data is stored and managed by other nodes using a distributed file system know as the Hadoop distributed file system (HDFS). Hadoop started back in 2002 with the Apache Nutch project. The purpose of this project was to create the foundation for an open source search engine. The project needed to be able to scale to billions of web pages and in 2004 Google published a paper that introduced MapReduce as a way of parsing these web pages.

MapReduce performs a sequence of operations on distributed data sets. The data consists of key-value pairs and has two phases, mapping and data reduction. During the map phase, input data is split into a large number of fragments which is assigned to a map task. Map tasks process the key-value pair that it assigned to look for and proces a set of intermediate key-value pairs. This data is sorted by key and stored into a number of fragments that matches the number of reduce tasks. If for example, we are trying to parse data for the National Football League in the US we would want to spawn 32 task nodes to that we could parse data for each team in the league. Fewer nodes would cause one node to do double duty and more than 32 nodes would cause a duplication of effort. During the reduction phase each task processes the data fragment that it was assigned to it and produces an output key-value pair. For example, if we were looking for passing yardage by team we would spawn 32 task nodes. Each node would look for yardage data for each team and categorize it as either passing or rushing yardage. We might have two quarterbacks pay for a team or have a wide receiver throw a pass. The key for this team would be the passer and the value would be the yards gained. These reduce tasks are distributed across the cluster and the results of their output is stored on the HDFS when finished. We should end up with 32 data files from 32 different task nodes updating passing yardage by team.

Hadoop is more than just distributed storage and MapReduce. It also contains components to help administer and coordinate servers (HUE, Ambari, and Zookeeper), data movement management (flume and sqoop), resource management (YARN), processing framework (MapReduce, Tez, Hoya), Workflow engines (Oozie), Data Serialization (Avro), Data Collection (MapReduce, Pig, Hive, and HBase), and Data Analysis (Mahout). We will look into these system individually later.

There are commercial and public domain offerings for Hadoop.

A good project to start a small Hadoop project is log analysis. If you have a web server, it generates logs every time that a web page is requested. When a change is made to the web site, logs are generated when people log into manage the pages or change the page content. If you web page is a transactional system, orders are being placed for goods and services as well as credit card transaction processing. All of these generate log files. If we wanted to look at a product catalog and correlate what people look at in relationship to what is ordered, we could do what Amazon has done for years. We could come up with recommendations on what other people are looking at as well as what other people ordered along with this item. If, for example, we are buying a pair of athletic shoes. A common purchase with a pair of shoes is also socks. We could give a recommendation on socks that could go with the shoes or a shoe deoderant product that yields a higher profit margin. These items could be displayed with the product in the catalog or shopping cart to facilitate more goods sold on the web. We can also look at the products that no one is looking at and reduce our inventories since they are not even getting looked at casually.

We can also use Hadoop as a fraud detection or risk modeling engine. Both provide significant value to companies and allow executives to look at revenue losses as well as potential transactions that could cause a loss. For example, we might want to look at the packing material that we use for a fragile item that we sell. If we have a high rate of return on a specific item we might want to change the packing, change the shipper, or stop shipping to a part of the country that tends to have a high return rate. Any and all of these solutions can be implemented but a typical data warehouse will not be able to coordinate the data and answer these questions. Some of the data might be stored in plain text files or log files on our return web site. Parsing and processing this data is a good job for Hadoop.

In the upcoming weeks we will dive into installation of a Hadoop framework on the Oracle Cloud. We will look at resources required, pick a project, and deploy sample code into a IaaS solution. We will also look at other books and resources to help us understand and deploy sandboxes to build a prototype that might help us solve a business problem.

Corente DataCenter Setup

Tue, 2016-09-20 10:55
Yesterday we went through the theory of setting up a VPN to connect a subnet in our data center to a subnet in the Oracle Cloud. Today we are going to go through the setup of the Corente Gateway in your data center. We will be following the Corente Service Gateway Setup. Important, this lab has problems. Corente does not work with VirtualBox.

The first step that we need to do is ensure that we have a Linux server that we can install the services on in our data center. We will be installing these services on an Oracle Linux 6.7 release running in VirtualBox. To get started we install a new version from an iso image. We could just as easily have cloned an existing instance. For the installation we select the software development desktop and add some adminstration tools to help look at stuff later down the road.

According to the instructions we need to make sure that our user has sudo rights and can reconfigure network settings as well as access the internet to download code. This is done by editing the /etc/sudoers file and adding our oracle user to the access rights. We then runmodprobe -v kvm-intelegrep '^flags.*(vmx|svm)' /proc/cpuinfoto verify that we have the right type of virtualization needed to run the VPN software. It turns out that VirtualBox does not support nested virtualization which is needed by the Corente software. We are not able to run the Corente Gateway from a VirtualBox instance.

We need to follow a different set of instructions and download the binaries for the Corente Gateway Services - Virtual Environment. Unfortunately, this version was depreciated in version 9.4. We are at a roadblock and need to look at alternatives for connecting Corente Gateway Services from out sandbox to the Oracle Cloud.

I debated continuing on or showing different failed paths in this post. I decided that showing a failed attempt had as much value as showing a successful attempt. Our first attempt was to install the gateway software on a virtual instance using VirtualBox since it is a free product. Unfortunately, we can't do this since it does not support passing the virtual interfaces from the Intel Xeon chip into the guest operating system. The second attempt was to go with a binary specifically designed to work with VirtualBox and load it. It turns out that this version was decommitted and there really is not solution that works with VirtualBox. Tomorrow we will look for alternatives of running the gateway on a native Windows host and a MacOS host since I use both to write this blog. Installing a gateway on a physical host is not optimum because we might need to reconfigure ethernet connections. My preference is to stay in a sandbox but setting up an OracleVM server, VMWare server, or HyperV server would all be difficult at best. An alternative that we might look at is setting up our gateway server in another cloud instance and connecting one cloud vendor to another cloud vendor. It all depends on who exposes the hardware virtualization to their guest instances. More on that tomorrow.

Corente DataCenter Setup

Tue, 2016-09-20 10:55
Yesterday we went through the theory of setting up a VPN to connect a subnet in our data center to a subnet in the Oracle Cloud. Today we are going to go through the setup of the Corente Gateway in your data center. We will be following the Corente Service Gateway Setup. Important, this lab has problems. Corente does not work with VirtualBox.

The first step that we need to do is ensure that we have a Linux server that we can install the services on in our data center. We will be installing these services on an Oracle Linux 6.7 release running in VirtualBox. To get started we install a new version from an iso image. We could just as easily have cloned an existing instance. For the installation we select the software development desktop and add some adminstration tools to help look at stuff later down the road.

According to the instructions we need to make sure that our user has sudo rights and can reconfigure network settings as well as access the internet to download code. This is done by editing the /etc/sudoers file and adding our oracle user to the access rights. We then run

modprobe -v kvm-intel
egrep '^flags.*(vmx|svm)' /proc/cpuinfo
to verify that we have the right type of virtualization needed to run the VPN software. It turns out that VirtualBox does not support nested virtualization which is needed by the Corente software. We are not able to run the Corente Gateway from a VirtualBox instance.

We need to follow a different set of instructions and download the binaries for the Corente Gateway Services - Virtual Environment. Unfortunately, this version was depreciated in version 9.4. We are at a roadblock and need to look at alternatives for connecting Corente Gateway Services from out sandbox to the Oracle Cloud.

I debated continuing on or showing different failed paths in this post. I decided that showing a failed attempt had as much value as showing a successful attempt. Our first attempt was to install the gateway software on a virtual instance using VirtualBox since it is a free product. Unfortunately, we can't do this since it does not support passing the virtual interfaces from the Intel Xeon chip into the guest operating system. The second attempt was to go with a binary specifically designed to work with VirtualBox and load it. It turns out that this version was decommitted and there really is not solution that works with VirtualBox. Tomorrow we will look for alternatives of running the gateway on a native Windows host and a MacOS host since I use both to write this blog. Installing a gateway on a physical host is not optimum because we might need to reconfigure ethernet connections. My preference is to stay in a sandbox but setting up an OracleVM server, VMWare server, or HyperV server would all be difficult at best. An alternative that we might look at is setting up our gateway server in another cloud instance and connecting one cloud vendor to another cloud vendor. It all depends on who exposes the hardware virtualization to their guest instances. More on that tomorrow.

connecting subnets

Mon, 2016-09-19 02:07
This week we are going to focus on connecting computers. It seems like we have been doing this for a while. We have looked at connecting our desktop to a cloud server (Linux and Windows). We have also looked at hiding a server in the cloud so that you can only get to it from a proxy host and not from our desktop or anywhere on the cloud. In this blog we are going to start talking about changing the configuration so that we create a new network interface on our desktop and use this new network interface to connect through a secure tunnel to a second network interface on a compute cloud. A good diagram of what we are trying to accomplish can be seen below.

Note that there are three components to this. The first is Corente Gateway running in your data center. The second is Corente Gateway running in the Oracle Cloud. The third is the Corente App Net Manager Service Portal. The basic idea behind this is that we are going to create a second network as we did with hiding a server in the cloud. We initially setup one instance so that we could access it through a public IP address 140.86.14.242. It also has a private IP address for inside the Oracle Cloud of 10.196.135.XXX. I didn't record the internal IP address because I rarely use it for anything. The key here is that by default a subnet range of 10.196.135/24 was configured. We were able to connect to our second server at 10.196.135.74 because we allowed ssh access on the private subnet but not on the public network. When this blog was initially written Oracle did not support defining your own subnet and assigned you to a private network based on the rack that you got provisioned into inside the Cloud data center. As of OpenWorld, subnet support was announced so that you could define your own network range. One of the key feedbacks that Oracle got was that customers did not like creating a new subnet in their data center to match the Oracle subnet. They would rather define their own subnet in the Oracle Cloud to match the subnets that they have in their own data center.

Let's take each of these components one at a time. First the Corente Gateway running in your data center. This is a virtual image that you download or software components that you install on a Linux system and run in your data center. The concept here is that you have a (virtual) computer that runs in your data center. The system has two network interfaces attached. The first can connect to the public internet through network address translation or is directly connected to the internet or through a router. The second network interface connects to your private subnet. This IP address is typically not routable like a 10. or 192.168. network. There is not mistake that your are hoping to get to a machine on the internet because these networks are non-routable. The key is that the Corente Gateway actually has a listener that looks for communications intended for this non-routable network and replicates the packets through a secure tunnel to the Corente Gateway running in the Oracle Cloud. All of the traffic passes from your local network which is non-routable to another network hundreds or thousands of miles away and gives you the ability to talk to more computers on that network. This effectively gives you a private virtual network from your data center to a cloud data center.

Rather than using a software virtual gateway you can use a hardware router to establish this same connection. We are not going to talk about this as we go through our setup exercises but realize that it can be done. This is typically what a corporation does to extend resources to another data center, another office, or a cloud vendor for seasonal peak periods or cheaper resources. The benefit to this configuration is that it can be done by corporate IT and not by an individual.

The key things that get setup during this virtual private network connection are name parsing (DNS), ip routing (gateways and routers), and broadcast/multicast of messages. Most VPN configurations support layer 3 and above. If you do a arp request the arp is not passed through the VPN and never reaches the other data center. With Corente it uses the GRE tunneling protocol which is a layer 2 option. Supporting layer 2 allows you to route ping requests, multicast requests, and additional tunnel requests at a much lower and faster level. As we discussed in an earlier blog, Microsoft does not allow layer 2 to go into or out of their Azure cloud. Amazon allows layer 2 inside their cloud but not into and out of their cloud. This is a key differentiator between the AWS, Azure, and Oracle clouds.

The second component of the virtual private network is the Oracle Cloud Corente Gateway. This is the target side where the gateway in your data center is the initiator. Both gateways allow traffic to go between the two networks on the designated subnet. Both gateways allow for communication between servers in the data center and servers in the Oracle cloud. When you combine the VPN gateways with the Security Lists and Security Rules you get a secure network that allows you to share a rack of server and not worry about someone else who is using the Oracle Cloud from accessing your data center even if they are assigned an IP address on the same subnet. When you define a Security List or Security Rule, these exceptions and holes allow for traffic from computers in your account to access the VPN. No one else in the same rack or same cloud data center can access your VPN or your data center.

The third component is the app net management service portal. This portal establishes connections and rules for the other two components. When you install each of the components it communicates with the admin portal to get configuration information. If you need to change a configuration or keys or some aspect of the communication protocol this is done in the admin portal and it communicates to the other two components to update the configuration. This service also allows you to monitor traffic and record traffic between the Oracle Cloud and your data center.

The network resources for your data center installed service will look like

with br0 being your public facing connection and br1 connecting to your subnet in your data center. A similar configuration is done in the Oracle Cloud but this is pre-configured and can be provisioned as a public image. The only thing that you need to configure is the subnet address range and relationship with the app net management service portal.

Today was a lot of theory and high level discussions. Tomorrow we will dive into configuration of the gateway in your data center. The day after that we will look at provisioning the gateway in the Oracle Cloud and connecting the two. Just a quick reminder, we talked about how to establish a connection between your desktop and a cloud server. By going the a VPN configuration we will get around having to hide a server in the cloud. We can setup all of our servers to have private network links and only open up web servers or secure web servers to talk to the public internet. We can use ssh and rdp from our desktops at home or in our offices to communicate to the cloud servers. Setting up the VPN is typically a corporate responsibility and giving you access to the resources. What you need to know are what cloud resources you have access to and how much money you have in your budget to solve your business problem.

connecting subnets

Mon, 2016-09-19 02:07
This week we are going to focus on connecting computers. It seems like we have been doing this for a while. We have looked at connecting our desktop to a cloud server (Linux and Windows). We have also looked at hiding a server in the cloud so that you can only get to it from a proxy host and not from our desktop or anywhere on the cloud. In this blog we are going to start talking about changing the configuration so that we create a new network interface on our desktop and use this new network interface to connect through a secure tunnel to a second network interface on a compute cloud. A good diagram of what we are trying to accomplish can be seen below.

Note that there are three components to this. The first is Corente Gateway running in your data center. The second is Corente Gateway running in the Oracle Cloud. The third is the Corente App Net Manager Service Portal. The basic idea behind this is that we are going to create a second network as we did with hiding a server in the cloud. We initially setup one instance so that we could access it through a public IP address 140.86.14.242. It also has a private IP address for inside the Oracle Cloud of 10.196.135.XXX. I didn't record the internal IP address because I rarely use it for anything. The key here is that by default a subnet range of 10.196.135/24 was configured. We were able to connect to our second server at 10.196.135.74 because we allowed ssh access on the private subnet but not on the public network. When this blog was initially written Oracle did not support defining your own subnet and assigned you to a private network based on the rack that you got provisioned into inside the Cloud data center. As of OpenWorld, subnet support was announced so that you could define your own network range. One of the key feedbacks that Oracle got was that customers did not like creating a new subnet in their data center to match the Oracle subnet. They would rather define their own subnet in the Oracle Cloud to match the subnets that they have in their own data center.

Let's take each of these components one at a time. First the Corente Gateway running in your data center. This is a virtual image that you download or software components that you install on a Linux system and run in your data center. The concept here is that you have a (virtual) computer that runs in your data center. The system has two network interfaces attached. The first can connect to the public internet through network address translation or is directly connected to the internet or through a router. The second network interface connects to your private subnet. This IP address is typically not routable like a 10. or 192.168. network. There is not mistake that your are hoping to get to a machine on the internet because these networks are non-routable. The key is that the Corente Gateway actually has a listener that looks for communications intended for this non-routable network and replicates the packets through a secure tunnel to the Corente Gateway running in the Oracle Cloud. All of the traffic passes from your local network which is non-routable to another network hundreds or thousands of miles away and gives you the ability to talk to more computers on that network. This effectively gives you a private virtual network from your data center to a cloud data center.

Rather than using a software virtual gateway you can use a hardware router to establish this same connection. We are not going to talk about this as we go through our setup exercises but realize that it can be done. This is typically what a corporation does to extend resources to another data center, another office, or a cloud vendor for seasonal peak periods or cheaper resources. The benefit to this configuration is that it can be done by corporate IT and not by an individual.

The key things that get setup during this virtual private network connection are name parsing (DNS), ip routing (gateways and routers), and broadcast/multicast of messages. Most VPN configurations support layer 3 and above. If you do a arp request the arp is not passed through the VPN and never reaches the other data center. With Corente it uses the GRE tunneling protocol which is a layer 2 option. Supporting layer 2 allows you to route ping requests, multicast requests, and additional tunnel requests at a much lower and faster level. As we discussed in an earlier blog, Microsoft does not allow layer 2 to go into or out of their Azure cloud. Amazon allows layer 2 inside their cloud but not into and out of their cloud. This is a key differentiator between the AWS, Azure, and Oracle clouds.

The second component of the virtual private network is the Oracle Cloud Corente Gateway. This is the target side where the gateway in your data center is the initiator. Both gateways allow traffic to go between the two networks on the designated subnet. Both gateways allow for communication between servers in the data center and servers in the Oracle cloud. When you combine the VPN gateways with the Security Lists and Security Rules you get a secure network that allows you to share a rack of server and not worry about someone else who is using the Oracle Cloud from accessing your data center even if they are assigned an IP address on the same subnet. When you define a Security List or Security Rule, these exceptions and holes allow for traffic from computers in your account to access the VPN. No one else in the same rack or same cloud data center can access your VPN or your data center.

The third component is the app net management service portal. This portal establishes connections and rules for the other two components. When you install each of the components it communicates with the admin portal to get configuration information. If you need to change a configuration or keys or some aspect of the communication protocol this is done in the admin portal and it communicates to the other two components to update the configuration. This service also allows you to monitor traffic and record traffic between the Oracle Cloud and your data center.

The network resources for your data center installed service will look like

with br0 being your public facing connection and br1 connecting to your subnet in your data center. A similar configuration is done in the Oracle Cloud but this is pre-configured and can be provisioned as a public image. The only thing that you need to configure is the subnet address range and relationship with the app net management service portal.

Today was a lot of theory and high level discussions. Tomorrow we will dive into configuration of the gateway in your data center. The day after that we will look at provisioning the gateway in the Oracle Cloud and connecting the two. Just a quick reminder, we talked about how to establish a connection between your desktop and a cloud server. By going the a VPN configuration we will get around having to hide a server in the cloud. We can setup all of our servers to have private network links and only open up web servers or secure web servers to talk to the public internet. We can use ssh and rdp from our desktops at home or in our offices to communicate to the cloud servers. Setting up the VPN is typically a corporate responsibility and giving you access to the resources. What you need to know are what cloud resources you have access to and how much money you have in your budget to solve your business problem.

subnets

Tue, 2016-08-30 09:24
Let's take a step back and look at networking from a different perspective. A good reference book to start with from a corporate IT perspective is CCENT Cisco Certified Entry Networking Technician ICND1 Study Guide (Exam 100-101) with Boson NetSim Limited Edition, 2nd Edition. You don't need to read this book to get Cisco Certified but it does define terms and concepts well.

At the lowest level you start with a LAN or local area network. From the study guide, "Usually, LANs are confined to a single room, floor, or building, although they can cover as much as an entire campus. LANs are generally created to fulfill basic networking needs, such as file and printer sharing, file transfers, e-mail, gaming, and connectivity to the Internet or outside world." This typically is connected with a single network hub or a series of hubs and a router or gateway connects us to a larger network or the internet. The key services that you need on a LAN are a naming service and gateway service. The naming service allows you to find services by name rather than ip address. The gateway service allows you to connect to this service that you want to connect to that are not on your local network. It is basically as simple as that. A gateway typically also acts as a firewall and or network address translation device (NAT). The firewall either allows or blocks connections to a specific port on a specific ip address. It might have a rule that says drop all traffic or allow traffic from anywhere, from a network range, or from a specific network address. Network address translation allows you to communicate to the outside world from your desktop on a private nonroutable ip address and have the service that you are connecting to know how to get back to you. For example. my home network has an internet router that connects to AT&T. When the router connects to AT&T, it gets a public ip address from the internet provider. This address is typically something like 90.122.5.12. This is a routable address that can be reached anywhere from the internet. The router assigns an ip address to my desktop and uses the address range 192.168.1.0 to 192.168.1.100 to assign the addresses. This implies that I can have 101 devices in my house. When I connect to gmail.com to read my email I do a name search for gmail.com and get back the ip address. My desktop, assigned to 192.168.1.100 does an http get from gmail.com on port 80. This http request is funneled through my internet router which changes the ip header assigning the transmitter ip address to 90.122.5.12. It keeps track of the assignment so that a response coming back from gmail.com gets routed back to my desktop rather than my kids desktop on the same network. To gmail.com it thinks that you are connecting from AT&T and not your desktop.

It is important to take our discussion back to layer 2 and layer 3 when talking about routing. If we are operating on a LAN, we can use layer 2 multicast to broadcast packets to all computers on our local network. Most broadband routers support all of layer 3 and part of layer 2. You can't really take a video camera in your home and multicast it to your neighbors so that they can see your video feed but you can do this in your home. You can ping their broadband router if you know the ip address. Typically the ip address of a router is not mapped to a domain name so you can't really ask for the ip address of the router two houses down. If you know their ip address you can setup links between the two houses and through tcp/ip or udp/ip share video between the houses.

If we want to limit the number of computers that we can put on our home or office network we use subnet netmasks to limit the ip address range and program the router to look for all ip addresses in the netmask range. The study guide book does a good job of describing subnetting. The diagram below shows how to use a netmask to define a network that can host just over a hundred computers.

Note that we have defined a network with a network id of 192.168.1.64 by using netmask 255.255.255.192 which limits the number of computers to 127 computers. If we put a computer with ip address of 192.168.1.200 on this network we won't be able to connect to the internet and we won't be able to use layer 2 protocols to communicate to all of the computers on this network. With this configuration we have effectively created a subnet inside our network. If we combine this with the broadcast address that is used when we create our network connection we can divide our network into ranges. The study guide book goes through an exercise of setting up a nework for different floors in an office and limiting each floor to a fixed number of computers and devices.

One of the design challenges faced by people who write applications is where do you layer security and layer connectivity. Do you configure an operating system firewall to restrict address ranges that it will accept requests from? Do you push this out to the network and assume that the router will limit traffic on the network? Do you push this out to the corporate or network firewall and assume that everything is stopped at the castle wall. The real answer is yes. You should setup security at all of these layers. When you make an assumption things fall apart when someone opens an email and lets the trojan horse through the castle gates.

If you look at the three major cloud vendors they all take the same basic approach. Microsoft and Oracle don't let you configure the subnet that you are assigned to. You get assigned to a subnet and have little choice on the ip address range for the computers that you are placed upon in the cloud solution. Amazon allows you to define a subnet and ip address range. This is good and bad. It makes routing a little more difficult in the cloud and address translation needs to be programmed for the subnet that you pick. Going with vendors that assign an ip address range have hardwired routing for that network. This optimizes routing and simplifies the routing tables. Amazon faces problems with EC2 and S3 connectivity and ends up charging for data transmitted from S3 to EC2. Bandwidth is limited with these connections partly due to routing configuration limitations. Oracle and Microsoft have simpler routing maps and can put switched networks between compute and storage which provides a faster and higher throughput storage network connection.

The fun part comes when we want to connect our network which is on a non-routable network to our neighbors. We might want to share our camera systems and record them into a central video archive. Corporations face this when they want to create a cloud presence yet keep servers in their data center. Last week we talked about hiding a server in the cloud and putting our database where you can't access it form the public internet. This is great for security but what happens when we need to connect with sql developer to the database to upload a new stored procedure? We need to be able to connect to this private subnet and map it to our corporate network. We would like to be able to get to 10.10.1.122 from our network which is mapped to 192.168.1.0. How do we do this? There are two approaches. First, we can define a secondary network in our data center to match the 10.10.1.0 network and create a secure tunnel between the two network. The second is to remap the cloud network to the 192.168.1.0 subnet and create a secure tunnel between the two networks. Do you see a common theme here? You need a secure tunnel with both solutions and you need to change the subnet either at the cloud host or in your data center. Some shops have the flexibility to change subnets in their corporate network or data center to match the cloud subnet (as is required with Oracle and Microsoft) while others require the cloud vendor to change the subnet configuration to match their corporate policy (Amazon provides this).

Today we are not doing to dive deep into virtual private networks, IPSec, or secure tunnels. We are going to touch on the subjects and discuss them in depth later. The basic concept is a database developer working on their desktop needs to connect to a database server in the cloud. A Java developer working on their desktop needs to connect to a Java server in the cloud. We also need to hide the database server so that no one from the public internet can connect to the database server. We want to limit the connection to the Java server to be port 443 for secure https to public ip addresses and allow ssh login on port 22 from our corporate network. If we set a subnet mask, define a virtual private secure network between our corporate network and cloud network, and allow local desktops to join this secure network we can solve the problem. Defining the private subnet in the cloud and connecting it to our corporate network is not enough. This is going back to the castle wall analogy. We want to define firewall rules at the OS layer. We want to define routing protocols between the two networks and allow or block communication at different layers and ports. We want to create a secure connection from our sql developer, java developer, or eclipse development tools to our production servers. We also want to facilitate tools like Enterprise Manager to measure and control configurations as well as notify us of overload or failure conditions.

In summary, there are a variety of decisions that need to be made when deploying a cloud solution. Letting the application developer deploy the configuration is typically a bad idea because they don't think of all of the corporate requirements. Letting the IT Security specialist deploy the configuration is also a bad idea. The solution will be so limiting that it makes the cloud services unusable. The architecture needs to be a mix of accessibility, security, as well as usability. Network configurations are not always the easiest discussion to have but critical to have early in the conversation. This blog is not trying to say that one cloud vendor is better than the other but trying to simply point out the differences so that you as a consumer can decide what works best for your problem.

subnets

Tue, 2016-08-30 09:24
Let's take a step back and look at networking from a different perspective. A good reference book to start with from a corporate IT perspective is CCENT Cisco Certified Entry Networking Technician ICND1 Study Guide (Exam 100-101) with Boson NetSim Limited Edition, 2nd Edition. You don't need to read this book to get Cisco Certified but it does define terms and concepts well.

At the lowest level you start with a LAN or local area network. From the study guide, "Usually, LANs are confined to a single room, floor, or building, although they can cover as much as an entire campus. LANs are generally created to fulfill basic networking needs, such as file and printer sharing, file transfers, e-mail, gaming, and connectivity to the Internet or outside world." This typically is connected with a single network hub or a series of hubs and a router or gateway connects us to a larger network or the internet. The key services that you need on a LAN are a naming service and gateway service. The naming service allows you to find services by name rather than ip address. The gateway service allows you to connect to this service that you want to connect to that are not on your local network. It is basically as simple as that. A gateway typically also acts as a firewall and or network address translation device (NAT). The firewall either allows or blocks connections to a specific port on a specific ip address. It might have a rule that says drop all traffic or allow traffic from anywhere, from a network range, or from a specific network address. Network address translation allows you to communicate to the outside world from your desktop on a private nonroutable ip address and have the service that you are connecting to know how to get back to you. For example. my home network has an internet router that connects to AT&T. When the router connects to AT&T, it gets a public ip address from the internet provider. This address is typically something like 90.122.5.12. This is a routable address that can be reached anywhere from the internet. The router assigns an ip address to my desktop and uses the address range 192.168.1.0 to 192.168.1.100 to assign the addresses. This implies that I can have 101 devices in my house. When I connect to gmail.com to read my email I do a name search for gmail.com and get back the ip address. My desktop, assigned to 192.168.1.100 does an http get from gmail.com on port 80. This http request is funneled through my internet router which changes the ip header assigning the transmitter ip address to 90.122.5.12. It keeps track of the assignment so that a response coming back from gmail.com gets routed back to my desktop rather than my kids desktop on the same network. To gmail.com it thinks that you are connecting from AT&T and not your desktop.

It is important to take our discussion back to layer 2 and layer 3 when talking about routing. If we are operating on a LAN, we can use layer 2 multicast to broadcast packets to all computers on our local network. Most broadband routers support all of layer 3 and part of layer 2. You can't really take a video camera in your home and multicast it to your neighbors so that they can see your video feed but you can do this in your home. You can ping their broadband router if you know the ip address. Typically the ip address of a router is not mapped to a domain name so you can't really ask for the ip address of the router two houses down. If you know their ip address you can setup links between the two houses and through tcp/ip or udp/ip share video between the houses.

If we want to limit the number of computers that we can put on our home or office network we use subnet netmasks to limit the ip address range and program the router to look for all ip addresses in the netmask range. The study guide book does a good job of describing subnetting. The diagram below shows how to use a netmask to define a network that can host just over a hundred computers.

Note that we have defined a network with a network id of 192.168.1.64 by using netmask 255.255.255.192 which limits the number of computers to 127 computers. If we put a computer with ip address of 192.168.1.200 on this network we won't be able to connect to the internet and we won't be able to use layer 2 protocols to communicate to all of the computers on this network. With this configuration we have effectively created a subnet inside our network. If we combine this with the broadcast address that is used when we create our network connection we can divide our network into ranges. The study guide book goes through an exercise of setting up a nework for different floors in an office and limiting each floor to a fixed number of computers and devices.

One of the design challenges faced by people who write applications is where do you layer security and layer connectivity. Do you configure an operating system firewall to restrict address ranges that it will accept requests from? Do you push this out to the network and assume that the router will limit traffic on the network? Do you push this out to the corporate or network firewall and assume that everything is stopped at the castle wall. The real answer is yes. You should setup security at all of these layers. When you make an assumption things fall apart when someone opens an email and lets the trojan horse through the castle gates.

If you look at the three major cloud vendors they all take the same basic approach. Microsoft and Oracle don't let you configure the subnet that you are assigned to. You get assigned to a subnet and have little choice on the ip address range for the computers that you are placed upon in the cloud solution. Amazon allows you to define a subnet and ip address range. This is good and bad. It makes routing a little more difficult in the cloud and address translation needs to be programmed for the subnet that you pick. Going with vendors that assign an ip address range have hardwired routing for that network. This optimizes routing and simplifies the routing tables. Amazon faces problems with EC2 and S3 connectivity and ends up charging for data transmitted from S3 to EC2. Bandwidth is limited with these connections partly due to routing configuration limitations. Oracle and Microsoft have simpler routing maps and can put switched networks between compute and storage which provides a faster and higher throughput storage network connection.

The fun part comes when we want to connect our network which is on a non-routable network to our neighbors. We might want to share our camera systems and record them into a central video archive. Corporations face this when they want to create a cloud presence yet keep servers in their data center. Last week we talked about hiding a server in the cloud and putting our database where you can't access it form the public internet. This is great for security but what happens when we need to connect with sql developer to the database to upload a new stored procedure? We need to be able to connect to this private subnet and map it to our corporate network. We would like to be able to get to 10.10.1.122 from our network which is mapped to 192.168.1.0. How do we do this? There are two approaches. First, we can define a secondary network in our data center to match the 10.10.1.0 network and create a secure tunnel between the two network. The second is to remap the cloud network to the 192.168.1.0 subnet and create a secure tunnel between the two networks. Do you see a common theme here? You need a secure tunnel with both solutions and you need to change the subnet either at the cloud host or in your data center. Some shops have the flexibility to change subnets in their corporate network or data center to match the cloud subnet (as is required with Oracle and Microsoft) while others require the cloud vendor to change the subnet configuration to match their corporate policy (Amazon provides this).

Today we are not doing to dive deep into virtual private networks, IPSec, or secure tunnels. We are going to touch on the subjects and discuss them in depth later. The basic concept is a database developer working on their desktop needs to connect to a database server in the cloud. A Java developer working on their desktop needs to connect to a Java server in the cloud. We also need to hide the database server so that no one from the public internet can connect to the database server. We want to limit the connection to the Java server to be port 443 for secure https to public ip addresses and allow ssh login on port 22 from our corporate network. If we set a subnet mask, define a virtual private secure network between our corporate network and cloud network, and allow local desktops to join this secure network we can solve the problem. Defining the private subnet in the cloud and connecting it to our corporate network is not enough. This is going back to the castle wall analogy. We want to define firewall rules at the OS layer. We want to define routing protocols between the two networks and allow or block communication at different layers and ports. We want to create a secure connection from our sql developer, java developer, or eclipse development tools to our production servers. We also want to facilitate tools like Enterprise Manager to measure and control configurations as well as notify us of overload or failure conditions.

In summary, there are a variety of decisions that need to be made when deploying a cloud solution. Letting the application developer deploy the configuration is typically a bad idea because they don't think of all of the corporate requirements. Letting the IT Security specialist deploy the configuration is also a bad idea. The solution will be so limiting that it makes the cloud services unusable. The architecture needs to be a mix of accessibility, security, as well as usability. Network configurations are not always the easiest discussion to have but critical to have early in the conversation. This blog is not trying to say that one cloud vendor is better than the other but trying to simply point out the differences so that you as a consumer can decide what works best for your problem.

networking differences between cloud providers

Mon, 2016-08-29 09:00
In this blog entry we are going to perform a simple task of enabling an Apache Web Server on a Linux server and look at how to do this on the Oracle Cloud, Amazon AWS, and Microsoft Azure. Last week we did this for the Oracle Cloud but we will quickly review this again. As we go down this path we will look at the different options presented to you as you create a new instance and see how the three cloud vendors diverge in their approach to services. Which version of Linux we select is not critical. We are looking at the cloud tooling and what is required to deploy and secure an instance. Our goals are
  • Deploy a Linux instance into a cloud service
  • Enable port 22 to allow us to communicate from our desktop into the Linux instance
  • Enable port 80 to allow us to communicate from the public internet into the Linux instance
  • Disable all other services coming into this instance.
  • We will use DHCP initially to get an ip address assigned to us but look at static ip addresses in the end

Step 1:Deploy a Linux instance into a small compute service. Go with the smallest compute shape to save money, go with the smallest memory allocation because we don't need much for a test web server, go with the default network interfaces and have an ip address assigned, go with the smallest disk you can to speed up the process.

Step 1a - Oracle Public Cloud

We go to the Compute Console and click on Create Instance. This takes us through screens that allow us to select an operating system, core count and memory size. When we get to the instance config we have the option of defining network security rules with a Security List. We can either create a new security list or select an existing security list. We will in the end select the default that allows us to connect to port 22 and modify the security list at a later point. We could have selected the WebServer entry from the Security List because we have done this before. For this exercise we will select the default and come back later and add another access point. Once we get to the review screen we can create the instance. The only networking questions that we were asked was what Security List definition do we want.

Step 1b - Amazon AWS

We go to the EC2 Console and click on EC2 followed by Launch Instance. From the launch screen we select a Linux operating system and start the configuration. Note that the network and subnet menus allow you to deploy your instance into an ip address range. This is different than the Oracle Cloud where you are assigned into a non-routable ip address range based on the server that you are dropped into. Since these are private ip addresses for a single server this is really not a significant issue. We are going to accept the defaults her and configure the ports in a couple of screens. We are going to go with a dhcp public ip address to be able to attach to our web server.

We accept the default storage and configure the ports that we want to open for our instance. We can define a new security group or accept an existing security group. For this example we are going to add http port 80 since it is a simple add at this point and move forward with this configuration. We could go with a predefined configuration that allows port 80 and 22 but for this example we will create a new one. We then review and launch the instance.

Step 1c - Microsoft Azure

We go to the Azure Portal and click on Virtual Machine -> Add which takes us to the Marketplace. From here we type in Linux and pick a random Linux operating system to boot from. We are assigned a subnet just like we were with the Oracle Cloud and have the ability to add a firewall rule to allow port 80 and 22 through from the public internet. Once we have this defined we can review and launch our instance.

Step 2: Log into your instance and add the apache web server. This can easily be done with a yum install apache2 command. We then edit the /var/www/index.html file so that we can see an answer from the web server.

Step 3: Verify the network security configuration of the instance to make sure that ports 80 and 22 are open.

Step 3a: Oracle Cloud

When we created the instance we went with the default network configuration which only has port 22 open. We now need to add port 80 as an open inbound port for the public internet. This is done by going to the Compute Instance console and viewing our web server instance. By looking at the instance we can see that we have the default Security List associated with our instance. If we have a rule defined for port 80 we can just click on Add Security List and add the value. We are going to assume that we have not defined a rule and need to do so. We create a new rule which allows us to allow http traffic from the public internet to our security list WebServer. We than need to go back and add a new Security List to our instance and select WebServer which allows port 80 and 22.

Step 3b and 3c: AWS and Azure

We really don't need to do anything here because both AWS and Azure gave us the ability to add a port definition in the menu creation system. Had we selected a predefine security list there would be no step 3 for any of the services.

Surprisingly, we are done. Simple network configuration is simple for all three vendors. The key differences that we see are that Amazon and Microsoft give you the ability to define individual port definitions as you create your instance. Oracle wants you to define this with Security Rules and Security Lists rather than one at a time for each instance. All three platforms allow you to configure firewall rules ahead of time and add those as configurations. In this example we were assuming a first time experience which is not the normal way of doing things. The one differential that did stand out is that Amazon allows you to pick and choose your subnet assignment. Oracle and Microsoft really don't give you choices and assign you an ip range. All three give you the option of static of dynamic public ip addresses. For our experiment there really isn't much difference in how any of the cloud vendors provision and administer firewall configurations.

networking differences between cloud providers

Mon, 2016-08-29 09:00
In this blog entry we are going to perform a simple task of enabling an Apache Web Server on a Linux server and look at how to do this on the Oracle Cloud, Amazon AWS, and Microsoft Azure. Last week we did this for the Oracle Cloud but we will quickly review this again. As we go down this path we will look at the different options presented to you as you create a new instance and see how the three cloud vendors diverge in their approach to services. Which version of Linux we select is not critical. We are looking at the cloud tooling and what is required to deploy and secure an instance. Our goals are
  • Deploy a Linux instance into a cloud service
  • Enable port 22 to allow us to communicate from our desktop into the Linux instance
  • Enable port 80 to allow us to communicate from the public internet into the Linux instance
  • Disable all other services coming into this instance.
  • We will use DHCP initially to get an ip address assigned to us but look at static ip addresses in the end

Step 1:Deploy a Linux instance into a small compute service. Go with the smallest compute shape to save money, go with the smallest memory allocation because we don't need much for a test web server, go with the default network interfaces and have an ip address assigned, go with the smallest disk you can to speed up the process.

Step 1a - Oracle Public Cloud

We go to the Compute Console and click on Create Instance. This takes us through screens that allow us to select an operating system, core count and memory size. When we get to the instance config we have the option of defining network security rules with a Security List. We can either create a new security list or select an existing security list. We will in the end select the default that allows us to connect to port 22 and modify the security list at a later point. We could have selected the WebServer entry from the Security List because we have done this before. For this exercise we will select the default and come back later and add another access point. Once we get to the review screen we can create the instance. The only networking questions that we were asked was what Security List definition do we want.

Step 1b - Amazon AWS

We go to the EC2 Console and click on EC2 followed by Launch Instance. From the launch screen we select a Linux operating system and start the configuration. Note that the network and subnet menus allow you to deploy your instance into an ip address range. This is different than the Oracle Cloud where you are assigned into a non-routable ip address range based on the server that you are dropped into. Since these are private ip addresses for a single server this is really not a significant issue. We are going to accept the defaults her and configure the ports in a couple of screens. We are going to go with a dhcp public ip address to be able to attach to our web server.

We accept the default storage and configure the ports that we want to open for our instance. We can define a new security group or accept an existing security group. For this example we are going to add http port 80 since it is a simple add at this point and move forward with this configuration. We could go with a predefined configuration that allows port 80 and 22 but for this example we will create a new one. We then review and launch the instance.

Step 1c - Microsoft Azure

We go to the Azure Portal and click on Virtual Machine -> Add which takes us to the Marketplace. From here we type in Linux and pick a random Linux operating system to boot from. We are assigned a subnet just like we were with the Oracle Cloud and have the ability to add a firewall rule to allow port 80 and 22 through from the public internet. Once we have this defined we can review and launch our instance.

Step 2: Log into your instance and add the apache web server. This can easily be done with a yum install apache2 command. We then edit the /var/www/index.html file so that we can see an answer from the web server.

Step 3: Verify the network security configuration of the instance to make sure that ports 80 and 22 are open.

Step 3a: Oracle Cloud

When we created the instance we went with the default network configuration which only has port 22 open. We now need to add port 80 as an open inbound port for the public internet. This is done by going to the Compute Instance console and viewing our web server instance. By looking at the instance we can see that we have the default Security List associated with our instance. If we have a rule defined for port 80 we can just click on Add Security List and add the value. We are going to assume that we have not defined a rule and need to do so. We create a new rule which allows us to allow http traffic from the public internet to our security list WebServer. We than need to go back and add a new Security List to our instance and select WebServer which allows port 80 and 22.

Step 3b and 3c: AWS and Azure

We really don't need to do anything here because both AWS and Azure gave us the ability to add a port definition in the menu creation system. Had we selected a predefine security list there would be no step 3 for any of the services.

Surprisingly, we are done. Simple network configuration is simple for all three vendors. The key differences that we see are that Amazon and Microsoft give you the ability to define individual port definitions as you create your instance. Oracle wants you to define this with Security Rules and Security Lists rather than one at a time for each instance. All three platforms allow you to configure firewall rules ahead of time and add those as configurations. In this example we were assuming a first time experience which is not the normal way of doing things. The one differential that did stand out is that Amazon allows you to pick and choose your subnet assignment. Oracle and Microsoft really don't give you choices and assign you an ip range. All three give you the option of static of dynamic public ip addresses. For our experiment there really isn't much difference in how any of the cloud vendors provision and administer firewall configurations.

Instance and storage snapshot

Fri, 2016-08-26 07:51
Yesterday we went through and created an E-Business Suite 12.5.5 instance on three servers in the Oracle Public Cloud. On previous days we talked about how to protect these instances by hiding the database and removing access from the public internet and only allowing the application server to connect to the database instance. Today we are going to assume that our work as an architect is done and we need to backup our work. We could go through the standard ufsdump and backup our system to network storage. This only solves half the problem in the cloud. We can restore our data but things are a little different in the cloud world. We need to backup our Security Lists, Security Rules, and instance configurations. We might want to replicate this environment for a secondary dev/test or QA environment so creating a golden master would be a nice thing to do.

With the Oracle Cloud we have the option of doing an instance snapshot as well as storage snapshot. This is equivalent to cloning our existing instance and having it ready to provision when we want. This is different from a backup. A backup traditionally assumes a fixed computer architecture and we can restore our operating system bits and application code onto a disk. If we suddenly change and add a virtual private network for communications with our on premise data center the backup might or might not have that configuration as part of the bits on the network disk. Many customers found that this was the case with VMWare. When you can redefine the network through software defined networks, create virtual disks and virtual network interfaces, these additions are not part of a ufsdump or OS level backup. You really need to clone the virtual disk as well as the configurations.

Oracle released snapshots of storage as well as snapshots of instances in the May/June update of cloud services. There really are no restrictions on the storage snapshots but there are a few on the instance snapshots. For the instance snapshot you need to make sure that the boot disk is non-persistent. This means that you don't pre-create the disk, attach it to the instance and boot from it. The disk needs to have the characteristic of delete upon termination. This sounds very dangerous up front. If you create customizations like adding user accounts to /etc and init files to the /etc/init directory these get deleted on termination. The key is that you create an instance, customize it, and create a snapshot of it. You then boot from a clone of this snapshot rather than a vanilla image of the operating system.

First, let's look at storage snapshots. We can find more information in the online documentation for the console or the online documentation for the REST API and command line interface. There are some features in the REST API that are worth diving a little deeper into. According to the REST API documentation you can create a snapshot in the same server to allow for faster restores by specifying /oracle/private/storage/snapshot/collocated as a property when you create the snapshot. From this you can create a storage volume from a snapshot. We can do most of these functions through the compute console. We select the storage volume and select the Create Snapshot menu item.

We can now restore this snapshot as a bootable disk and can create a new instance based on this volume. We restore by going to the storage snapshot tab, selecting the snapshot, and selecting Restore Volume from the menu. We can see the restored volume in the storage list.

We can create an instance snapshot as well. The key limitation to creating a snapshot from an instance is that the disk needs to be non-persistent. This means that we have a disk that is deleted on termination rather than created and mounted as part of the instance. This is a little confusing at first. If you follow the default instance creation it creates a storage volume for you. You need to delete this storage volume and have it replaced by a ROOT disk that is deleted upon termination. If we walk through an instance creation we have to change our behavior when we get to the storage creation. The default creates a storage instance. We want to remove it and it will be automatically replaced by a nonpersistent volume.

Once we have this hurdle removed, we can create an instance snapshot. We select the instance and click on the Create Snapshot from the menu item. If the menu item is greyed out we have a persistent storage volume as our boot image.

We can create a bootable image from this snapshot by clicking on the menu for the snapshot and Associate Image with this snapshot. This allows us to create an instance from our image.

The key to using instance snapshots is we create a bootable instance, configure it the way that we want and then create a snapshot of this instance. This gives us a golden master of not only the boot disk but of the network and customizations that we have done to the instance. You have to think a little differently when it comes to instance snapshots. It is a little strange not having a persistent root disk. It is a little strange knowing that any customizations will be lost on reboot. It is a little strange knowing that default log files will be wiped out on reboot. You need to plan a little differently and potentially reconfigure your logs, configurations, and customizations to go to another disk rather than a default root disk. If you think about it, this is not a bad thing. The root disk should be protected and not customized. Once you have the customized it should be frozen in time. One key advantage of this methodology is that you can't really insert a root kit into the kernel. These types of intrusions typically need to reboot to load the malware. Rebooting reverts you back to a safe and secure kernel and default libraries. This does mean that any packages or customizations will require a new snapshot for this customization to be persistent.

In summary, snapshots are a good way of freezing storage and an instance in time. This is good for development and test allowing you to create a golden master that you can easily clone. It also adds a new level of security by freezing your boot disk with packages that you want and locks out malware that requires reboot. It does add a new layer of thought that is needed in that any package or root file customization requires a new golden image with a new snapshot. Hopefully this helps you think of how to use snapshots and create a best practice methodology for using snapshots.

Instance and storage snapshot

Fri, 2016-08-26 07:51
Yesterday we went through and created an E-Business Suite 12.5.5 instance on three servers in the Oracle Public Cloud. On previous days we talked about how to protect these instances by hiding the database and removing access from the public internet and only allowing the application server to connect to the database instance. Today we are going to assume that our work as an architect is done and we need to backup our work. We could go through the standard ufsdump and backup our system to network storage. This only solves half the problem in the cloud. We can restore our data but things are a little different in the cloud world. We need to backup our Security Lists, Security Rules, and instance configurations. We might want to replicate this environment for a secondary dev/test or QA environment so creating a golden master would be a nice thing to do.

With the Oracle Cloud we have the option of doing an instance snapshot as well as storage snapshot. This is equivalent to cloning our existing instance and having it ready to provision when we want. This is different from a backup. A backup traditionally assumes a fixed computer architecture and we can restore our operating system bits and application code onto a disk. If we suddenly change and add a virtual private network for communications with our on premise data center the backup might or might not have that configuration as part of the bits on the network disk. Many customers found that this was the case with VMWare. When you can redefine the network through software defined networks, create virtual disks and virtual network interfaces, these additions are not part of a ufsdump or OS level backup. You really need to clone the virtual disk as well as the configurations.

Oracle released snapshots of storage as well as snapshots of instances in the May/June update of cloud services. There really are no restrictions on the storage snapshots but there are a few on the instance snapshots. For the instance snapshot you need to make sure that the boot disk is non-persistent. This means that you don't pre-create the disk, attach it to the instance and boot from it. The disk needs to have the characteristic of delete upon termination. This sounds very dangerous up front. If you create customizations like adding user accounts to /etc and init files to the /etc/init directory these get deleted on termination. The key is that you create an instance, customize it, and create a snapshot of it. You then boot from a clone of this snapshot rather than a vanilla image of the operating system.

First, let's look at storage snapshots. We can find more information in the online documentation for the console or the online documentation for the REST API and command line interface. There are some features in the REST API that are worth diving a little deeper into. According to the REST API documentation you can create a snapshot in the same server to allow for faster restores by specifying /oracle/private/storage/snapshot/collocated as a property when you create the snapshot. From this you can create a storage volume from a snapshot. We can do most of these functions through the compute console. We select the storage volume and select the Create Snapshot menu item.

We can now restore this snapshot as a bootable disk and can create a new instance based on this volume. We restore by going to the storage snapshot tab, selecting the snapshot, and selecting Restore Volume from the menu. We can see the restored volume in the storage list.

We can create an instance snapshot as well. The key limitation to creating a snapshot from an instance is that the disk needs to be non-persistent. This means that we have a disk that is deleted on termination rather than created and mounted as part of the instance. This is a little confusing at first. If you follow the default instance creation it creates a storage volume for you. You need to delete this storage volume and have it replaced by a ROOT disk that is deleted upon termination. If we walk through an instance creation we have to change our behavior when we get to the storage creation. The default creates a storage instance. We want to remove it and it will be automatically replaced by a nonpersistent volume.

Once we have this hurdle removed, we can create an instance snapshot. We select the instance and click on the Create Snapshot from the menu item. If the menu item is greyed out we have a persistent storage volume as our boot image.

We can create a bootable image from this snapshot by clicking on the menu for the snapshot and Associate Image with this snapshot. This allows us to create an instance from our image.

The key to using instance snapshots is we create a bootable instance, configure it the way that we want and then create a snapshot of this instance. This gives us a golden master of not only the boot disk but of the network and customizations that we have done to the instance. You have to think a little differently when it comes to instance snapshots. It is a little strange not having a persistent root disk. It is a little strange knowing that any customizations will be lost on reboot. It is a little strange knowing that default log files will be wiped out on reboot. You need to plan a little differently and potentially reconfigure your logs, configurations, and customizations to go to another disk rather than a default root disk. If you think about it, this is not a bad thing. The root disk should be protected and not customized. Once you have the customized it should be frozen in time. One key advantage of this methodology is that you can't really insert a root kit into the kernel. These types of intrusions typically need to reboot to load the malware. Rebooting reverts you back to a safe and secure kernel and default libraries. This does mean that any packages or customizations will require a new snapshot for this customization to be persistent.

In summary, snapshots are a good way of freezing storage and an instance in time. This is good for development and test allowing you to create a golden master that you can easily clone. It also adds a new level of security by freezing your boot disk with packages that you want and locks out malware that requires reboot. It does add a new layer of thought that is needed in that any package or root file customization requires a new golden image with a new snapshot. Hopefully this helps you think of how to use snapshots and create a best practice methodology for using snapshots.

E-Business Suite in the Oracle Cloud

Thu, 2016-08-25 09:00
For the last three days we talked about deploying multiple servers and securing the different layers by hiding services in the cloud. Today we are going to go through the installation processes and procedure to install E-Business Suite 12.2.5 (EBS) into a multiple compute cloud instance. We could use this instance for development and test, quality assurance, or even production work if we wanted to. Note that there are many ways to install EBS. We could install everything into Oracle Compute Cloud Services (IaaS) and install WebLogic and the Oracle Database onto Linux servers. We could install the database component into Oracle Database Cloud Services (DBaaS) and the rest on IaaS. We could also install the database into DBaaS, the application into Oracle Java Cloud Services (JaaS), and the rest in IaaS. The current recommended deployment scenario is to deploy everything into IaaS and bring your own licenses for EBS, Database, WebLogic, and Identity Servers.

We are going to go through the tutorial for installing EBS on IaaS for this blog. We are going to go down the multi-node install which first requires installing the provisioning tools to boot all of the other images into standalone instances. We will need at least four compute instances with 500 GB of disk storage to deploy our test. The individual requirements are shown in the diagram below.

Before we can start deploying we must first go to the Oracle Cloud Marketplace and download five EBS bootable images. We start by going to the marketplace and searching for "e-business" images. A list of the images that we need are shown in the diagram below.

Step 1:Download EBS 12.2.5 Fresh Install DB Tier Image. This is done by selecting the image that is returned from the search. When we get to the image page we click on "Get App". This brings up a usage terms screen that we need to click on and click OK. Once we have accepted the terms we are presented with a list of cloud instances that we can deploy into. If you don't see a list of servers you need to go into your preferences for your instance and click the checkbox that allows you to provision the marketplace apps into your instance. You will also need Compute_Admin roles to provision these boot images. You don't need to go to the compute instance after you download the image. You are mainly trying to copy the DB Tier Image into your private images.

Step 2:Download EBS 12.2.5 Demo DB Tier Image. Unfortunately there is no go back feature so you need to go to the marketplace page, search again for e-business, and select the Demo DB Tier Image.

Step 3:Download EBS 12.2.5 Application Tier Image.

Step 4:Download EBS OS-Only Image

Step 5:Download EBS Provisioning Tools Image

Step 6:Verify that all of the images are ready. You should get an email confirmation that the image is ready. You should also be able to create a new instance and see the images in the private images area. You should have five images available and we could create a bootable instance for all of them.

Step 7:Create a compute instance using the Provisioning Tool image. We are going to go with an OC3 instance and accept the default. We will create a new security list and rule that allows http access. We do have to select the boot image from the private image list.

You get to review this before it is provisioned.

This will create an Orchestration that will create the bootable disk and boot the instance. It will take a few minutes to do this and once it is done we should have all of the provisioning tools ready to execute and deploy our multi-node EBS instance.

Step 8:Connect to the server via ssh using opc. Get the ip address from the previous screen. When I first tried to connect I had to add default to the Security List otherwise the connection timed out. Once I added the ssh rule, everything worked as expected.

Step 9:change user to oracle and execute

knife oc image list

You will need the compute endpoint of the compute service because you will be prompted for it. To find this you need to go to the Compute Dashboard and look at the Compute Detail. The RESTapi Endpoint is shown but for our instance we need to change it a little bit. We have two zones associated with this domain. We want to connect to the z16 instead of the z17 zone. Once we enter the endpoint, identity domain, account id, and account password, we get a list of images that we can boot from. At the bottom of the list we see the EBS images and should be good to go. It is important to repeat that using the z17 zone will not show the images so we had to change over to the z16 zone. This is due to a Marketplace configuration that always deploys images into the lowest numbered zone for your instance.

Step 10:Edit /u01/install/APPS/apps-unlimited-ebs/ProvisionEBS.xml and replace the id-domain and user name with the output of the knife command. It is important to note that your substitute command will be a little different from the screen shot below. I also had to change the OS-Image to include the date otherwise the perl script that we are about to execute will fail as well. The file name should be /Compute-obpm44952/pat.shuff@oracle.com/Oracle-E-Business-Suite-OS-Image-12032015 but your instance and user will be different.

Step 11:Run perl /u01/install/APPS/apps-unlimited-ebs/ProvisionEBS.pl to start the install. This will ingest the xml file from the previous section and present you a menu system to install the other instances. The system will again ask for the restAPI Endpoint for the compute server, your restAPI Endpoint for storage (go to Dashboard and click on Storage to get this), your identity domain, account, and password again. For our test installation we selected option 3 for a multi-node single application server installation. The perl script then installs chef, pulls cookbooks, and installs the database, app server, and forms server instances into compute instances. This step will take a while. I recommend playing around with all of the options and configurations until you get comfortable with what you are installing. We were going for the demo installation rather than a dev/test installation. We went for a single app node and a single database node. We could have gone for multiple app nodes and gone with demo or dev deployments. Some of the screen shots from this process are below. We called our installation prsEBS so if you see references to this it relates to our installation. The process deploys orchestrations to the cloud services then starts these services in the Oracle Cloud.

We can confirm that this is doing what is expected by looking at the Orchestration page under the compute console.

When it is complete we will see four instances are running in compute.

In summary, we are able to provision multiple instances that comprise a single application, E-Business Suite. This process is well documented and well scripted. Hopefully these screen shots and steps help you follow the online tutorial mentioned earlier. What is needed next is to apply the security principles that we talked about in the past few days to secure the database and hide it from the public.

E-Business Suite in the Oracle Cloud

Thu, 2016-08-25 09:00
For the last three days we talked about deploying multiple servers and securing the different layers by hiding services in the cloud. Today we are going to go through the installation processes and procedure to install E-Business Suite 12.2.5 (EBS) into a multiple compute cloud instance. We could use this instance for development and test, quality assurance, or even production work if we wanted to. Note that there are many ways to install EBS. We could install everything into Oracle Compute Cloud Services (IaaS) and install WebLogic and the Oracle Database onto Linux servers. We could install the database component into Oracle Database Cloud Services (DBaaS) and the rest on IaaS. We could also install the database into DBaaS, the application into Oracle Java Cloud Services (JaaS), and the rest in IaaS. The current recommended deployment scenario is to deploy everything into IaaS and bring your own licenses for EBS, Database, WebLogic, and Identity Servers.

We are going to go through the tutorial for installing EBS on IaaS for this blog. We are going to go down the multi-node install which first requires installing the provisioning tools to boot all of the other images into standalone instances. We will need at least four compute instances with 500 GB of disk storage to deploy our test. The individual requirements are shown in the diagram below.

Before we can start deploying we must first go to the Oracle Cloud Marketplace and download five EBS bootable images. We start by going to the marketplace and searching for "e-business" images. A list of the images that we need are shown in the diagram below.

Step 1:Download EBS 12.2.5 Fresh Install DB Tier Image. This is done by selecting the image that is returned from the search. When we get to the image page we click on "Get App". This brings up a usage terms screen that we need to click on and click OK. Once we have accepted the terms we are presented with a list of cloud instances that we can deploy into. If you don't see a list of servers you need to go into your preferences for your instance and click the checkbox that allows you to provision the marketplace apps into your instance. You will also need Compute_Admin roles to provision these boot images. You don't need to go to the compute instance after you download the image. You are mainly trying to copy the DB Tier Image into your private images.

Step 2:Download EBS 12.2.5 Demo DB Tier Image. Unfortunately there is no go back feature so you need to go to the marketplace page, search again for e-business, and select the Demo DB Tier Image.

Step 3:Download EBS 12.2.5 Application Tier Image.

Step 4:Download EBS OS-Only Image

Step 5:Download EBS Provisioning Tools Image

Step 6:Verify that all of the images are ready. You should get an email confirmation that the image is ready. You should also be able to create a new instance and see the images in the private images area. You should have five images available and we could create a bootable instance for all of them.

Step 7:Create a compute instance using the Provisioning Tool image. We are going to go with an OC3 instance and accept the default. We will create a new security list and rule that allows http access. We do have to select the boot image from the private image list.

You get to review this before it is provisioned.

This will create an Orchestration that will create the bootable disk and boot the instance. It will take a few minutes to do this and once it is done we should have all of the provisioning tools ready to execute and deploy our multi-node EBS instance.

Step 8:Connect to the server via ssh using opc. Get the ip address from the previous screen. When I first tried to connect I had to add default to the Security List otherwise the connection timed out. Once I added the ssh rule, everything worked as expected.

Step 9:change user to oracle and execute

knife oc image list

You will need the compute endpoint of the compute service because you will be prompted for it. To find this you need to go to the Compute Dashboard and look at the Compute Detail. The RESTapi Endpoint is shown but for our instance we need to change it a little bit. We have two zones associated with this domain. We want to connect to the z16 instead of the z17 zone. Once we enter the endpoint, identity domain, account id, and account password, we get a list of images that we can boot from. At the bottom of the list we see the EBS images and should be good to go. It is important to repeat that using the z17 zone will not show the images so we had to change over to the z16 zone. This is due to a Marketplace configuration that always deploys images into the lowest numbered zone for your instance.

Step 10:Edit /u01/install/APPS/apps-unlimited-ebs/ProvisionEBS.xml and replace the id-domain and user name with the output of the knife command. It is important to note that your substitute command will be a little different from the screen shot below. I also had to change the OS-Image to include the date otherwise the perl script that we are about to execute will fail as well. The file name should be /Compute-obpm44952/pat.shuff@oracle.com/Oracle-E-Business-Suite-OS-Image-12032015 but your instance and user will be different.

Step 11:Run perl /u01/install/APPS/apps-unlimited-ebs/ProvisionEBS.pl to start the install. This will ingest the xml file from the previous section and present you a menu system to install the other instances. The system will again ask for the restAPI Endpoint for the compute server, your restAPI Endpoint for storage (go to Dashboard and click on Storage to get this), your identity domain, account, and password again. For our test installation we selected option 3 for a multi-node single application server installation. The perl script then installs chef, pulls cookbooks, and installs the database, app server, and forms server instances into compute instances. This step will take a while. I recommend playing around with all of the options and configurations until you get comfortable with what you are installing. We were going for the demo installation rather than a dev/test installation. We went for a single app node and a single database node. We could have gone for multiple app nodes and gone with demo or dev deployments. Some of the screen shots from this process are below. We called our installation prsEBS so if you see references to this it relates to our installation. The process deploys orchestrations to the cloud services then starts these services in the Oracle Cloud.

We can confirm that this is doing what is expected by looking at the Orchestration page under the compute console.

When it is complete we will see four instances are running in compute.

In summary, we are able to provision multiple instances that comprise a single application, E-Business Suite. This process is well documented and well scripted. Hopefully these screen shots and steps help you follow the online tutorial mentioned earlier. What is needed next is to apply the security principles that we talked about in the past few days to secure the database and hide it from the public.

hiding a server in the cloud

Wed, 2016-08-24 09:00
There was a question the other day about securing a server and not letting anyone see it from the public internet. Yesterday we talked about enabling a web server to talk on the private network and not be visible from the public internet. The crux of the question was can we hide the console and shell access and only access the system from another machine in the Oracle Cloud.

To review, we can configure ports into and out of a server by defining a Security Rule and Security List. The options that we have are to allow ports to communicate between the public-internet, sites, or instances. You can find out more about Security Lists from the online documentation. You must have the Compute_Operations role to be able to define a new Security List. With a Security List you can drop inbound packets without acknowledge or reject packets with acknowledgement. The recommended configuration is to Drop with no reply. The outbound policy allows you to permit, drop without acknowledgement or reject the packet with acknowledgement. The outbound policy allows you to have your program communicate with the outside world or lock down the instance. By default everything is configured to allow outbound and deny inbound.

Once you have a Security List defined, you create exceptions to the list through Security Rules. You can find out more about Security Rules from the online documentation. You must have the Compute_Operations role to manage Security Rules. With rules you create a name for the rule and either enable or disable communications on a specific port. For example the defaultPublicSSHAccess is setup to allow communications on port 22 with traffic from the public-internet to your instance. This is mapped to the default Security List which allows console and command line login to Linux instances. For our discussion today we are going to create a new Security List that allows local instances to communicate via ssh and disable public access. We will create a Security Rule that creates the routing locally on port 22. We define a port by selecting a Security Application. In this example we want to allow ssh which corresponds to port 22. We additionally need to define the source and destination. We have the choice of connecting to a Security List or to Security IP List. The Security IP List is either to or from an instance, the pubilc internet, or a site. We can add other options using the Security IP List tab on the left side of the screen. If we look at the default definitions we see that instance is mapped to the instances that have been allocated into this administrative domain (AD). In our example this maps to 10.196.96.0/19, 10.2.0.0/26, 10.196.128.0/19 because these three private ip address ranges can be provisioned into our AD. The public internet is mapped to 0.0.0.0/0. The site is mapped to 10.110.239.128/26, 10.110.239.0/26, 10.110.239.192/26. Note that the netmask is the key difference between the site and instance definitions.

Our exercise for today is to take our WebServer1 (or Instance 1 in the diagram below) and disable ssh access from the public internet. We also want to enable ssh from WebServer2 (or Instance 2) so that we can access the console and shell on this computer. We effectively want to hide WebServer1 from all public internet access and only allow proxy login to this server from WebServer2. The network topology will look like

Step 1:Go through the configuration steps (all 9 of them) from two days ago and configure one compute instance with an httpd server, ports open, and firewall disabled. We will call this instance WebServer1 and go with the default Security List that allows ssh from the public internet.

Step 2:Repeat step 1 and call this instance WebServer2 and go with the default Security List that allows ssh from the public internet.

Step 3:The first thing that we need to do is define a new Security List. For this security list we want to allow ssh on the private network and not on the public network. We will call this list privateSSH.

Step 4:Now we need to define a Security Rule for port 22 and allow communication from the instance to our privateSSH Security List that we just created. We are allowing ssh on port 22 on the 10.x.x.x network but not the public network.

Step 5:We now need to update the instance network options for WebServer1 and add the privateSSH Security List item and remove the default Security List. Before we make this change we have to setup a couple of things. We first copy the ssh keys from our desktop to the ~opc/.ssh directory to allow WebServer2 to ssh into WebServer1. We then test the ssh by logging into WebServer2 then ssh from WebServer2 to WebServer1. We currently can ssh into WebServer1 from our desktop. We can do this as opc just to test connectivity.

Step 6:We add the privateSSH, remove default, and verify the Security List is configured properly for WebServer1.

Step 7:Verify that we can still ssh from WebServer2 to WebServer1 but can not access WebServer1 from our desktop across the public internet. In this example we connect to WebServer1 as opc from our desktop. We then execute step 6 and try to connect again. We expect the second connection to fail.

In summary, we have taken two web servers and hidden one from the public internet. We can log into a shell from the other web server but not from the public internet. We used web servers for this example because they are easy to test and play with. We could do something more complex like deploy PeopleSoft, JDE, E-Business Suite, or Primavera. Removing ssh access is the same and we can open up more ports for database or identity communication between the hidden and exposed services.

The basic answer to the question of "can we hide a server from public internet access" the answer is yes. We can easily hide a server with Security Lists, Security Rules, Security IP Lists, and Security Applications. We can script these in Orchestrations or CLI scripts if we wanted to. In this blog we went through how to do this from the compute console and provided links to additional documentation to learn more about using the compute console to customize this for different applications.

Pages