Pat Shuff

Subscribe to Pat Shuff feed
Oracle Blogs
Updated: 2 hours 57 min ago

Database as a Service

Wed, 2016-05-04 16:38
Today we are going to dive into Database as a Service offered from Oracle. This product is the same product offered by Oracle as a perpetual processor license or perpetual named user license for running database software in your data center. The key different is that the database is provisioned onto a Linux server in the cloud and rather than paying $47,500 for a processor license and 22% annually after that, you pay for the database services on an hourly or monthly basis. If you have a problem that needs only a few weeks, you pay for the service for a few weeks. If you have a problem that takes a very large number of processors but for a very short period of time, you can effectively lease the large number of processors in the cloud and purchase a much smaller number of processors in your data center. Think of a student registration system. If you have 20K-30K students that need to log into a class registration system, you need to size this server for the peak number of students going through the system. In our example, we might need an 8 core system to handle the load during class registration. Outside the two or three weeks for registration, this system sits idle at less than 10% utilization because it is used to record and report grades during the semester. Rather than paying $47.5K times 8 cores times 0.5 for an x86 or Sparc server ($190K), we only have to pay $47.5K times 2 cores times 0.5 for x86 or Sparc cores ($47.5K) and lease the additional processors in the cloud for a month at $3K/core/month ($24K). We effectively reduced the cost from $190K to $71.5K by using the cloud for the peak period. Even if we do this three times during the year the price is $119.5K which is a cost savings of $70.5K. The second year we would be required to pay $41.8K in support cost for the larger server. By using the smaller server we drop the support cost to $10.5K. This effectively pays for leasing a third of the cloud resources by using a smaller server and bursting to the cloud for high peak utilization.

Now that we have looked at one of our use cases and the cost savings associated with using the cloud for peak utilization and reducing the cost of on servers and software in our data center, let's dive into the pricing and configuration of Database as a Service (DBaaS) offered by Oracle in the public cloud services. If we click on the Platform -> Database menu we see the following page.

If we scroll down to the bottom we see that there are effectively three services that we can use in the public cloud. The first is Database Schema as a Service. This allows you to access a database through a web interface and write programs to read and present data to the users. This is the traditional Application Express interface or APEX interface that was introduced in Oracle 9. This is a shared service where you are given a database instance that is shared with other users. The second service is Database as a Service. This is the 11g or 12c database installed on a Linux installation in the cloud. This is a full installation of the database with ssh access to the operating system and sqlplus access to the database from a client system. The third service is Exadata as a Service. This is the Oracle database on dedicated hardware that is optimized to run the Oracle database.

The Schema as a Service is also known as Application Express. If you have never played with apex.oracle.com, click on the link and register for a free account. You can create an instance, a database schema, and store upto 10 MB or 25 MB of data for free. If you want to purchase a larger storage amount it is sold in 5 GB, 20 GB, or 50 GB increments.

The 10 or 25 MB instance is free. The 5 GB instance is $175/month. The 20 GB is $900/month, and the 50 GB is $2,000/month.

Tomorrow we will dive a little deeper into Schema as a Service. In summary, this is a database instance that can contain multiple tables and has an application development/application web front end allowing you to access the database. You can not attach with sqlplus. You can not attach with port 1521. You can not put a Java or PHP front end in front of your database and use it as a back end repository. You can expose database data through applications and REST api interfaces. This instance is shared on a single computer with other instances. You can have multiple instances on the same computer and the login give you access to your applications and your data in your instance.

The Database as a Service (DBaaS) is slightly different. With this you are getting a Linux instance that has been provisioned with a database. It is a fully deployed, fully provisioned database based on your selection criteria. There are many options when you provision DBaaS. Some of the options are virtual vs full instance, 11g vs 12c, standard edition vs enterprise edition vs enterprise edition high performance vs enterprise edition extreme performance. You need to provide an expected data size and if you plan on backing up the data and a cloud object repository if you do. You need to provide ssh keys to login as oracle or opc/root to manage the database and operating system. You also need to pick a password for the sys/system user inside the database. Finally, you need to pick the processor and memory shape that will run the database. All of these options have a pricing impact. All of these options effect functionality. It is important to know what each of these options means.

Let's dive into some of these options. First, virtual vs full instance. If you pick a full instance you will get an Oracle Enterprise Linux installation that has the version of the database that you requested fully installed and operational. For standard installations the file system is the logical volume manager and the file system is provisioned across four file systems. The /u01 file system is the ORACLE_HOME. This is where the database binary is installed. The /u02 file system is the +DATA area. This is where table extents and table data is located. The /u03 file system is the +FRA area. This is where backups are dropped using the RMAN command which should run automatically every night for incremental backups and 2am on Sunday morning for a full backup. You can change the times and backup configurations with command line options. The /u04 area is teh +RECO area. This is where change logs and other log files are dropped. If you are using Data Guard to replicate data to another database or from another database, this is where the change logs are found.

If you pick a virtual instance you basically get a root file system running Oracle Enterprise Linux with a tar ball that contains the oracle database. You can mount file systems as desired and install the database as you have it installed in your data center. This configuration is intended to mirror what you have on-premise to test patches and new features. If you put everything into /u01 then install everything that way. If you put everything in the root file system, you have the freedom to do so even though this is not the recommended best practice.

The question that you are not asked when you try to create a DBaaS is if this service is metered or non-metered. This question is asked when you create your identity domain. If you request a metered service, you have the flexibility to select the shapes that you want and if you are billed hourly or monthly. The rates are determined by the processor shape, amount of memory, and what database option you select (standard, enterprise, high performance, or extreme performance). More on that later. With the metered option you are free to stop the database (but not delete it) and retain your data. You suspend the consumption of the database license but not the compute and storage. This is a good way of saving a configuration for later testing and not getting charged for using it. Think of it as having an Uber driver sit outside the store but not charge you to sit there. When you get back in the car the charge starts. A better analogy would be the Cars2Go. You can reserve a car for a few hours and drive it from Houston to Austin. You park the car in the Cars2Go parking slot next to the convention center and don't pay for parking. You come out at the end of your conference, swipe your credit card and drive the car back to Houston. You only get charged for the car when it is between parking lots. You don't get charged for it while it is parked in the reserved slot. You pay a monthly charge for the service (think of compute and storage) at a much lower rate. If you think of a non-metered service as renting a car from a car rental place, you pay for the car that they give you and it is your until you return it to the car rental place. You can't not pay for the car while you are in your convention as with Card2Go. You have to pay for parking at the hotel or convention center. You can't decide half way into your trip that you really need a truck instead of a car or a mini-van to hold more people and change out cars. The rental company will end your current agreement and start a new one with the new vehicle. Non-metered services are similar. If you select an OC3M shape then you can't upgrade it to an OC5 to get more cores. You can't decide that you need to use the diagnostics and tuning and upgrade from enterprise edition to enterprise edition high performance. You get what you started with and have 12 months to consume the services reserved for you.

The choice of 11g or 12c is a relatively simple one. You get 11.2.0.4 running on Oracle Enterprise Linux 6.6 or you get 12.1.0.2 running on Oracle Enterprise Linux 6.6. This is one of those binary questions. You get 11g or 12c. It really does not effect any other question. It does effect features because 12c has more features available to it but this choice is simple. Unfortunately, you can't select 11.2.0.3 or 10.whatever or 9.whatever. You get the latest running version of the database and have an option to upgrade to the next release when it is available or not upgrade. Upgrades and patches are applied after you approve them.

The next choice is the type of database. We will dive into this deeper in a couple of days. The basic is that you pick Standard Edition or Enterprise Edition. You have the option of picking just the base Enterprise Edition with encryption only, with most of the options in the High Performance Option, or all of the options with Extreme Performance Option. The difference between High Performance and Exterme Performance is the Extreme included Active DataGuard, In-Memory options, and Real Application Clustering options. Again, we will dive into this deeper in a later blog entry.

The final option is the configuration of the database. I wanted to include a screen shot here but the main options that we look at are the CPU and memory shape which dictates the database consumption cost as well as the amount of storage for table space (/u02) and backup space (/u03 and /u04). There are additional charges above 128 GB for table storage and for backups. We will not go into the other options on this screen in this blog entry.

In summary, DBaaS is charged on a metered or un-metered basis. The un-metered is a lower cost option but less flexible. If you know exactly what you need and the time that it is needed, this is a better option. Costs are fixed. Expenses are predictable. If you don't know what you need, metered service might be better. It gives you the option of starting and stopping different processor counts, shutting off the database to save money, and select different options to test out different features. Look at the cost option and a blog that we will do in a few days analyzing the details on cost. Basically, the database can be mentally budgeted as $3K/OCPU/month for Enterprise Edition, $4K/OCPU/month for High Performance, and $5K/OCPU/month for Extreme Performance. Metered options typically cross over at 21 days. If you use metered service for more than 21 days your charges will exceed this amount. If you use it for less, it will cost less.

The Exadata as a Service is a special use case of Database as a Service. In this service you are getting a quarter, half, or full rack of hardware that is running the database. You get dedicated hardware that is tuned and optimized to run the Oracle database. Storage is dedicated to your compute nodes and not one else can use these components. You get 16, 56, or 112 processors dedicated to your database. You can add additional processors to get more database power. This service is available in a metered or non-metered option. All of the database options are available with this product. All of the processors are clustered into one database and you can run one or many instances of a database in this hardware. With the 12c option you get multi-tenant features so that you can run multiple instances and manage them with the same management tools but give users full access to their instance but not other instances running on the same database.

Exadata cost for metered services

Exadata cost for non-metered services

In summary, there are two options for database as a service. You can get a web based front end to a database and access all of your data through http and https calls. You can get a full database running on a Linux server or Linux cluster that is dedicated to you. You can consume these services on a an hourly, monthly, or yearly basis. You can decide on less expensive or more expensive options as well as how much processor, memory, and storage that you want to allocate to these services. Tomorrow, we will dive a little deeper into APEX or Schema as a Service and look at how it compares to services offered by Amazon and Azure.

Intro to PaaS

Tue, 2016-05-03 16:44
Today we are going to move up the stack. We will first focus on the Oracle solutions talking about the different platform as a service offerings. It is important to spend a little time reviewing this layer because what one company calls PaaS, another calls SaaS. The best way to get started is to go to cloud.oracle.com and look at the pull downs at the top of the screen. We see Infrastructure, Platform, and Applications.

When we pull down the Platform menu we see that there are different areas that we can dive into.

Data management is the first area that we will review. This is basically a way to aggregate and look at data. We can store data in a database, store on-premise databases into the cloud, store data in NoSQL repositories, and do analytics on a variety of data with Big Data Preparation and Big Data services. All of these involve pulling data into a repository of some type and performing queries against the repository. The key difference is the way that the data is stored, how we can ask questions, and the results that we get back. At this point we will not dive into any of these deeply but at a later point dive deep into the database and database backup.

The Application Development is moving farther away from the technology of storing data and moving closer to how we present data to users. The Java platform, for example, allows us to do things like create a shopping cart or hosting more complex applications in a Java repository or container. The Mobile Cloud Service allows us to dive into existing applications and present a user interface to iPhones, Android Phones, and tablets. The idea is to customize existing web and fat clients into a mobile format that can be consumed on mobile devices. The Messaging Cloud Service is a messaging protocol that allows for transactions in the cloud. If you are looking at connecting different cloud services together it allows you to serialize the communication between vendors for a true transactional experience. The Application Container Cloud is a lightweight Java container allowing you to upload and run java applications but without access to the operating system. This is a shared multi-tenant version of a WebLogic server. The Developer Cloud Service is a DevOps integration for the Java and Database services. This service is an aggregation of public domain components used to develop microservices at the database or java layer. The Application Builder Cloud Service is a cloud based REST api development interface allowing you to integrate with Application software in the Oracle Cloud as well as other Clouds. The API Catalog is a way of publishing the REST apis that you have and expose them to your customers.

The Content and Process Cloud Services are an aggregation of services that address group communications as well as business process flow. The Documents Cloud Service is a way of file sharing on the web. The Process Cloud Service is an extension that allows you to launch business processes (think Business Process Manager or BPM) in the cloud. The Sites Cloud Service is a web portal interface that takes documents and processes and aggregates them into a single cloud site allowing you to take a wiki like presentation but put business processes into the presentation. The Social Network Cloud Service allows you to integrate social network services like Facebook and Twitter into your web presence. It allows you to integrate these services as well as search these repositories for information relating to your company.

The Business Analytics part of Platform services provides data visualization and analytic tools as well as data aggregation utilities. The Business Intelligence component is the traditional BI package that allows users to create custom queries into your database. The Big Data Preparation allows you to aggregate data from a variety of sources into a Big Data repository. The Big Data Discovery allows you to look at your data in a variety of ways and generate reports based on your data and views of data. The Data Visualization Cloud Service allows you to view and analyze your data from different perspectives. This is similar to the BI and Big Data but looks at data slightly differently. The Internet of Things Cloud Service allows you to aggregate monitoring and measuring devices into a repository.

The Cloud Integration part of Platform services is the traditional data aggregation tools from other repositories. The Integration Cloud Service allows you to aggregate traditional SaaS vendors to unify fields like how a customer is defined or what data elements are incorporated into a purchase order. The SOA Cloud Service is implementation of the Oracle SOA Suite in the cloud. The GoldenGate Cloud Service is an implementation of the Oracle Golden Gate software that allows you to take data from different databases and synchronize the different repositories independent of the database vendor. The Internet of Things Cloud Service is the same listed in the Business Analytics section mentioned before.

The Cloud Management part of Platform services allows you to take the log files that you have inside your data center and analyze them for a variety of things. You can aggregate your log files into the Log Analytics Cloud Services to look for patterns, intrusion attempts, and problems or issues with services. The IT Analytics Cloud Service looks at log files and looks for trends like disks filling up, processors being used or not used appropriately. The Application Performance Cloud Service looks at log files to look at how systems and applications are operating rather than how systems are working rather than how components are working.

In Summary, we looked at an overview of the Platform as a Services offered by Oracle. Unfortunately, the variety of topics are too great for one blog. We did a high level overview of these services. In upcoming blogs we will dive deeper into each of these services and look at not only what they are but how they work and how to provision these services. We will also compare and contrast how these services compare to services offered by Amazon and Azure as we dive into each service.

storage cloud appliance in the cloud

Mon, 2016-05-02 10:46
Last week we focused on getting infrastructure as a service up and running. I wanted to move up the stack and talk about platform as a service but unfortunately, I got distracted with yet another infrastructure problem. We were able to install the storage cloud appliance software in a virtual machine but how do you install this in a compute cloud instance? This brings up two issues. First, how do you run a Linux 7 - 3.10 kernel in the Oracle Compute Cloud Service. Second, how do you connect and manage this service both from an admin perspective and client from another compute engine in the cloud service.

Let's tackle the first problem. How do you spin up a Linux 7 - 3.10 kernel in the Oracle Compute Cloud Service? If we look at the compute instance creation we can see what images that we can boot from.

There is not Linux 7 - 3.10 kernel so we need to download and import and image that we can boot from. Fortunately, Oracle has gone through a good importing a bootable image tutorial. If we follow these steps, we need to first download a CentOS 7 bootable image from cloud.centos.org. The cloud instance that we use is the CentOS-7-x86_64-OracleCloud.raw.tar.gz. We first download this to a local directory then upload it to the compute cloud image area. This is done by going to the compute console and clicking on the "Images" tab at the top of the screen.

We then upload the tar.gz file that is a bootable image. This allows us to create a new storage instance that we can boot from. The upload takes a few minutes and once it is complete we need to associate it with a bootable instance. This is done by clicking on the "Associate Image" button where we basically enter a name to use for the operating system as well as description.

Note that the OS size is 9 GB which is really small. We don't have a compute instance at this point. We either need to create a bootable storage element or compute instance based on this image. We will go through the storage create first since this is the easiest way of getting started. We first have to change from the Image tab to the Storage tab. We click on the Create Storage Volume and go through selection of the image, storage name, and size. We went with the storage size rather than resizing the storage we are creating.

At this point we should be able to create a compute instance based on this boot disk. We can clone the disk, boot from it, or mount it on another instance. We will go through and boot from this instance once it is created. We do this by going to the Instance tab and clicking on Create Instance. It does take 5-10 minutes to create the storage instance and need to wait till it is completed before creating a compute instance. An example of a creation looks like

We select the default network, the CentOS7 storage that we previously created, the 2016 ssh keys that we uploaded, and review and launch the instance.

After about 15 minutes, we have a compute instance based on our CentOS 7 image. Up to this point, all we have done is create a bootable Linux 7 - 3.10 kernel. Once we have the kernel available we can focus on connecting and installing the cloud storage appliance software. This follows the making backup better blog post. There are a couple of things that are different. First, we connect as the user centos rather than oracle or opc. This is a function of the image that we downloaded and not a function of the compute cloud. Second, we need to create a second user that allows us to login. When we use the centos user and install the oscsa_install.sh script, we can't login with our ssh keys for some reason. If we create a new user then whatever stops us from logging in as the centos user does not stop us from logging in as oracle, for example. The third thing that we need to focus on is creating a tunnel from our local desktop to the cloud instance. This is done with ssh or putty. What we are looking for is routing the management port for the storage appliance. It is easier to create a tunnel rather than change the management port and opening up the port through the cloud firewall.

From this we execute the commands we described in the maker backup better blog. We won't go through the screen shots on this since we have done this already. One thing is missing from the screenshot, you need to disable selinux vy editing /etc/sysconfig/selinux. You need to disable SELINUX by editing the file and rebooting. Make sure that you add a second user before rebooting otherwise you will get locked out and the ssh keys won't work once this change is made.

The additional steps that we need to do are create a user, copy the authorized_keys from an existing user into the .ssh directory, change the ownership, and assign a password to the new user, and add the user to /etc/sudoers.

useradd oracle
mkdir ~oracle/.ssh
cp ~centos/.ssh/authorized_keys ~oracle/.ssh
chown -R oracle ~oracle
passwd oracle
vi /etc/sudoers
The second major step is to create an ssh tunnel to allow you to connect in from your localhost into the cloud compute service. When you create the oscsa instance it starts up a management console using port 32769. To tunnel this port we use putty to connect.

At this point we should be able to spin up other compute instances and mount this file system internally using the command

 mount -t nfs -o vers=4,port=32770 e53479.compute-metcsgse00028.oraclecloud.internal:/ /local_mount_point
We might want to use the internal ip address rather than the external dns name. In our example this would be the Private IP address of 10.196.89.62. We should be able to mount this file system and clone other instances to leverage the object storage in the cloud.

In summary, we did two things in this blog. First, we uploaded a new operating system that was not part of the list of operating systems presented by default. We selected a CentOS instance that conforms to the requirements of the cloud storage appliance. Second, we configured the cloud storage appliance software on a newly created Linux 7 - 3.10 kernel and created a putty tunnel so that we can manage the directories that we create to share. This gives us the ability to share the object storage as an nfs mount internal to all of our compute servers. It allows for things like spinning up web servers or other static servers all sharing the same home directory or static pages. We can use these same processes and procedures to pull data from the Marketplace and configure more complex installations like JD Edwards, PeopleSoft, or E-Business Suite. We can import a pre-defined image, spin up a compute instance based on that image, and provision higher level functionality onto infrastructure as a service. Up next, platform as a service explained.

private cloud vs public cloud

Fri, 2016-04-29 02:19
Today is our last day to talk about infrastructure as a service. We are moving up the stack into platform as a service after this. The higher up the stack we get, the more value it has to the business and end users. It is interesting to talk about storage and compute in the cloud but does this help us treat patients in a medical practice any better, find oil and gas faster, deliver our manufactured product any cheaper? Not really but not having them will negatively effect all of them. We need to make sure that these services are there so that we can perform higher functions without having to worry about triple redundant mirroring of a disk or load balancing compute servers to handle higher loads.

One of the biggest complaints with cloud services is that there are perception problems of security, latency, and governance. Why would I put my data on a computer that I don't control. There is a noisy neighbor issue where I am renting part of a computer, part of a disk, part of a network. If someone wants to play heavy metal at the highest volume (remember our apartment example) while I am trying to file a monthly report or do some analytics, my resources will suffer and it will take me longer to finish my job due to something out of my control.

Many people have decided to go with private hosted cloud solutions like VCE, VBlock, Cisco UCS clusters, and other products that provide raw compute, raw storage, and "hyper-converged" infrastructure to solve the cloud problem. I can create a golden master on my VMWare server and provision a database to my configuration as many times as I want. Yes, I run into a licensing issue. Yes, it is a really big licensing issue running Oracle on VMWare but Microsoft is not far behind with SQL Server licensing on a per core basis either. Let's look at the economics of putting together one of these private cloud servers.

It is important to dive into the licensing issue. The Oracle Database and WebLogic servers are licensed either on a processor or named user basis. Database licensing is detailed in a pdf on the Oracle site. The net of the license says that the database is licensed based on the core count of the processor running on the server. There is a multiplication factor (0.5 for X86) based on the chip type that factors into the license cost. A few years ago it was easy to do this calculation. If I have a dual core, dual socket system, this is a four core computer. The license price of the computer would be 4 cores x 0.5 (Intel x86 chip) x $47,500. The total price would be $95K. Suddenly the core count of computers went to 8, 16, or 32 cores per chip. A single system could easily have 64 cores on a single board. If you aggregate multiple boards as is done in a Cisco UCS system you can have 8 board or 256 cores that you can use. There are very few applications that can take advantage of 256 cores so a virtualization engine was placed on top of these cores so that you could sub-divide the system into smaller chunks. If you have a 4 core database problem, you can allocate 4 cores to it. If you need 8 cores, allocate 8 cores. Products like VMWare and HyperV took advantage of this and grew rapidly. These virtualization packages added features like live migration, dynamic sizing, and bursting utilization. If you allocate 4 cores and the processor goes to 90%, two more cores will be made available for a short burst. The big question comes up as to how you now license on a per core basis. If you can flex to more processors without rebooting or live migrate from a 2 core to a 24 core system, which do you license for?

Unfortunately, Oracle took a different position from the rest of the industry. None of the Oracle products contain a license key. None of the products require that you go to a web site and get a token to allow you to run the software. The code is wide open and freely available to load and run on any system that you want. Unfortunately, companies don't do a good job of tracking utilization. If someone from the sales or purchasing department rolls out a golden master onto a new virtual machine, no one is really tracking that. People outside of IT can easily spin up more licenses. They can provision a database in a cloud service and assume that the company has enough licenses to cover their project. After a while, licensing gets out of control and a license review is done to see what is actually being used and how it is being used. Named user licenses are great but you have to have a ratio of users to cores to meet minimums. You can't for example, buy a 5 user license and deploy it on a 64 core system. You have to maintain a typical ratio of 25 users to a core of 40 users to a core based on the product that you are using. You also need to make sure that you understand soft partitioning vs hard partitioning. Soft partitioning is the ability to flex or change the core count without having to reconfigure or reboot the system. A hard partition puts hard limits on the core count and does not allow you exceed it. Products like OracleVM, Solaris, and AIX contain hard partition virtualization. Products like HyperV and VMWare contain soft partitions. With soft partitions, you need to pay for all of the cores in the cluster since in theory you can consume all of the cores. To be honest, most people don't understand this and get in trouble with license reviews.

When we talk about cloud services, licensing is also important to understand. Oracle published cloud license rules to detail limits and restrictions. The database is still licensed on a per core basis. The Linux operating system is licensed on per server instance and is limited to 8 virtual cores. If you deploy the Oracle database or WebLogic server in AWS or Azure or any other cloud vendor, you have to own a perpetual license for the database using the formulas above. The license must correlate to the high water mark for the core count that you provision. If you provision a 4 core system, you need a 2 processor license. If you run the database for six months and shut it off, you still need to own the perpetual license. The only way to work around this is to purchase database as a service in the Oracle cloud. You can pay for the database license on an hourly or monthly basis with metered services or on an annual basis with non-metered services. This provides a great cost savings because if we only need a database for 6 months we only need to pay for 6 months x the number of cores x the database edition type. If, for example, we want just the Database Enterprise Edition, it is $3K/core/month. If we want 4 cores that is $12K per month. If we want 6 months then we get it for $72K. We can walk away from the license and not have to pay the 22% annual maintenance on the $95K. We save $23K the first year and $20K annually by only using the database in the cloud for six months. If we wanted to use the database for 9 months, it is cheaper to own the license and lease processor and storage. If we go to the next higher level of database, Database High Performance Edition at $4K/core/month, it becomes cheaper to use the cloud service because it contains so many options that cost $15K/processor. Features like partitioning, compression, diagnostics, tuning, real application testing, and encryption are part of this cloud service. Suddenly the economics is in favor of cloud hosting a database rather than hosting in a data center.

Let's go back to the Cisco UCS and network attached storage discussion. If we purchase a UCS 8 blade server with 32 cores per blade we are looking at $150K or higher for the server. We then want to attach a 100 TB disk array to it at about $300K (remember the $3K/TB). We will then have to pay $300K for VMWare. If we add 10% for hardware and 20% for software we are at just over $1M for a base solution. With a three year amortization we are looking at about $330K per year just to have a compute and storage infrastructure. We have to have a VMWare admin who doles out processors and storage, loads operating systems, creates golden masters, and acts as a traffic cop to manage and allocate resources. We still need to pay for the Oracle database license which is licensed on a per core basis. Unfortunately, with VMWare we must license all of the cores in the cluster so we either have to sub-divide the cluster into one blade and license all 32 cores or end up paying for all 256 cores. At roughly $25K/core that gets expensive quickly. Yes, you can run OracleVM or Solaris on one of the blades and subdivide the database into two cores and only pay for two cores since they both support hard partitioning but you would be amazed at how many people fight this solution. You now have two virtualization engines that you need to support with two different file formats. No one in mass wants two solutions just to solve a licensing issue.

Oracle has taken a radically different approach to this. Rather than purchasing hardware, storage, and a virtualization platform, run everything in the cloud and pay for it on a monthly basis. The biggest objection is that the cloud is in another city and security, latency, ... you get the picture. The new solution is to run this hardware in your data center with the Oracle Public Cloud Machine. The cost of this solution is roughly $260K/year with a three year commit. You get 200 plus cores and 100 ish TB of storage to use as you want. You don't manage it with VSphere but manage it with the same web page that you manage the public cloud services. If you want to script everything then you can manage it with REST apis or perl/java/insert your lanaguage scripts. The key benefit to this is that you no longer need to worry about what virtualization engine you are using. You manage the higher level and lease the database or weblogic or SOA license on an hourly or monthly basis.

Next week we will move up the stack and look at database hosting. Today we talked about infrastructure choices and how it impacts database license cost. Going with AWS or Azure still requires that you purchase the database license. Going with the Oracle public cloud or public cloud machine allows you to not own the database license but effectively lease it on an hourly or monthly basis. It might not be the right solution for 7x24x365 operation but it might be. It really is the right solution for bursty needs list holiday peak periods, student registration systems, development and testing where you only need a large footprint for a few weeks and don't need to buy for your highwater mark and run at 20% the rest of the year.

Archive Storage Services vs Archive Cloud Services

Thu, 2016-04-28 02:07
Yesterday we started talking about the cost comparison for storage in the cloud. We briefly touched on the cost of long term archive in the cloud. How much does it cost to backup data for long term archive and what is the best way to do this? Years ago the default way of doing this was to copy your data on disk to a tape unit and put the tape in a box. The box was then put in an environmentally controlled room to extend the lifetime of tape and a person was put on staff to pull the data off the shelf when the data was needed. The data might be a backup of data on disk or a secondary copy just in case the disk failed. Tape was typically used to provide separation of duties required by Sarbanes-Oxly to keep people who report on financial data separate from the financial data. It also allowed companies to take large volumes of data, like seismic data, and not keep it on spinning disks. The traces were reloaded when geophysicists wanted to look at the data.

The first innovation in this technology was to provide a robot to load and unload tapes as a tape unit gets full or needs to be reloaded. Magazines were created that could hold eight tapes and the robots had bar code readers so that they could seek to the right tape in the magazine, pull it out of the series of tapes and inserted into the tape unit for reading or writing. Management software got more advanced and understood the bar code values and could sequence the whopping 800 GB of data that could be written to an LT04 tape. Again, technology gets updated and the industry moved to LT05 and LT06 tapes with significantly higher densities. A single LT06 could hold 2.5 TB per tape unit. Technology marches on and compression allows us to store 6 TB on these disks. If we go back to our 120 TB case that we talked about yesterday this means that we will need 20 tapes (at $30-$45 for each tape) and $25K for a single tape drive unit. Most tape drive systems support 8 tapes per magazine so we are talking about something that will support three magazines. To support three magazines, we need a second shelf in our tape storage so the price goes up by about $20K. We are sitting at about $55K to backup our 120 TB and $5.5K in support annually for the hardware. We also need about $1K in tape for the number of full and incremental backups that we want which would be $20K for four months of retention before we recycle the tapes. These tapes are good for a dozen re-writes so every three years we will need to repurchase tapes. If we spread the cost of the tape unit, tape drives, and tapes across three years we are looking at $2K/month to backup our 120 TB. We also need to factor in $60/week for tape pickup and storage fees at a service like Iron Mountain and a couple of $250 charges to retrieve tapes in the event of a catastrophic failure to drive tapes back to our data center from cold storage. This bumps the cost to $2.2K/month which is significantly cheaper than the $10K/month for network storage in our data center or $3.6K/month for cloud storage services. Unfortunately, a tape unit requires someone to care and feed it and you will pay that person more than $600/month but not $7.8K/month which you would with the cloud or disk solutions.

If you had a ton of data to archive you could purchase a tape silo that supported hundreds or thousands of magazines. Unfortunately, this expandability cones at a cost. The tape backup unit grew from an eighth of a rack to twenty full racks. There isn't much in between. You can get an eighth of a rack solution, a full rack solution, or a twenty full rack solution. The larger solution comes in at hundreds of thousands of dollars rather than tens of thousands.

Enter cloud solutions. Amazon and Oracle offer tape solutions in the cloud. Both companies offer the twenty full rack solution but only charge a per tape charge to consumers. Amazon Glacier charges $7/TB/month to store data. Oracle charges $1/TB/month for the same service. Both companies charge for data restoration and outbound transfer of data. The Amazon Glacier cost of writing 120 TB and reading back 10% of it comes in at $2218/month. This is the same cost as having the tape unit on site. The key difference is that we can recover the data by requesting it from Amazon and get it back in less than four hours. There is no emergency recovery charges. There is not the weekly pickup charges. We can expand the amount that we backup and the bulk of this cost is reading back the data ($1300). Storage is relatively cheap for our backups, we just need to plan on the cost of recovery and try to limit this since it is the bulk of the cost.

We can drop this cost even more using the Oracle Archive Cloud Services. The price from Oracle is $1/TB/month but the recovery and transmission charges are about the same. The same archive service with Oracle is $1560/month with roughly $1300 being the charges for restoring and outbound transfer of the data. Unfortunately, Oracle does not offer an un-metered archive service so we have to guestimate how much we are going to restore on a monthly basis.

Both services use REST apis to write, restore, and read data. When a container (Oracle Archive) or bucket (Amazon Glacier) is created, a PUT call is done to the endpoint of the service. The first step required by both are authentication to provide credentials into the service. Below we show the Oracle authentication and creation process through the REST api.

The important part of this is the archive header extension. This differentiates if the container is spinning disk or if it is tape in the cloud.

Amazon recommends using a windows based tool like s3browser, CloudBerry, or using a language like Java, .NET, or Ruby and their published SDKs. CloudBerry works for the Oracle Archive as well. When you create a container you have the option of pulling down storage or archive as the container type.

Both services allow you to encrypt and compress the data as it is written with HTML Headers changing the characteristics and parameters of the container. Both services require you to issue a PUT request to write the data to tape. Below we show the Oracle REST api.

For CloudBerry and the other gui based tools, uploading is just a drag and drop from your local file system to the tape storage in the cloud.

Amazon details the readback procedure and job system that shows the status of the restore request. Oracle has a similarly defined retrieval policy as well as an archive tutorial. Both services offer a 4 hour window to allow for restoration. Below is an example of a restore request and checking on the job status of the job spawned to load the tape and transfer the data for reading. The file is ready to read when the completedPercentage is 100.

We can do the same thing with the S3 browser and Amazon Glacier. We need to request the restore, check the job status, then download the restored files. The files change color when they are ready to read.

In summary, we have looked at how to reduce cost of archives and backups. We looked at using a secondary disk at our data center or another data center. We looked at using on site tape units. We looked at disk in the cloud. Today we looked at tape in the cloud. It is important to remember that not one of these solutions is the answer. A combination of any or all of them are needed. Daily and weekly backups should happen to a secondary disk locally. This data is most likely to be restored on a regular basis. Once you get a full backup or two under your belt, move the data to another site. It might be spinning disk, it might be tape but something needs to be offsite in the event of a true catastrophic failure like a communication link going out (think Dell PowerVault and a thunderstorm) and you loose your primary lun and secondary lun that contains your backups. The whole idea of offsite backups are not for restore but primary for insurance and regulation compliance. If someone needs to see the old data, it is there. You are betting that you won't need to read it back and the cloud vendors are counting on that. If you do read it back on a regular basis you might want to significantly increase your budget, pass the charges onto the people who want to read data back, or look for another solution. Tape storage in the cloud is a great way of archiving data for a long time at a low cost.

metered vs un-metered vs dedicated services

Wed, 2016-04-27 02:07
One of the newest concepts that has been introduced for cloud services is the concept of un-metered or dedicated services. Before we dive into this subject, let's review what a cloud service really is. When you boil it all down, you are basically leasing computer resources on a computer that you don't own. You are taking a slice of a compute engine, slice of a disk drive, part of a network connection. You are renting space. Think of it as living in an apartment. Yes, this is a silly analogy but if you think about it, it makes sense. You can rent an efficiency, one, two, or three bedroom apartment. You can get parking with or without a roof over your car. You can get a storage closet or a garage to store stuff that you are not using but want to keep around. There are benefits to apartments. You don't have to cut the grass. You typically have access to a pool but don't need to maintain it. If the toilet backs up or the gas stops working you call the super and they come fix it. You still have to replace your own light bulbs that burn out. You still need to clean your own bathroom and kitchen and take out your trash on a regular basis. On the grander scale, you don't need to drop 10% down and get a mortgage to live there. Monthly rents are typically cheaper than paying down a mortgage. Your taxes are bundled into your rent cost. You basically show up, use the apartment and go on with your life. On the flip side, there are drawbacks to apartments. If your upstairs neighbor likes to play heavy metal at 2am or throw wild parties on the weekends it does make it hard to sleep. Someone might park in your parking spot so you need to park farther away from your front door. You can't pull into your garage to unload your groceries and have to potentially carry them in the rain across the parking lot and up the stairs to your third floor apartment. The super might decide that Tuesday they are going to repaint all bathrooms another color and you need to be out of the way for a day and put up with the smell even though you planned a dinner party the next night. It is difficult to grill on your balcony and you can't really sit out without sharing the space with all of your neighbors. The true downfall is that twenty years from now, you will still be renting your apartment (and the rent probably went up every other year) while your college buddies are celebrating a mortgage burning party and the only thing that they owe on a monthly basis is the taxes that the government takes annually.

Yes, our analogy is silly. Yes, our analogy is relevant. It is easy to decide that you want another job in another city so you hire a mover, pack up all your stuff, and move to another apartment. This is where our analogy breaks down. Cloud vendors charge you for every piece of furniture that you take out of the building. They charge you to use the stairs or elevator. They charge you every time a moving van exits the building full of furniture and boxes of clothes. It is free to bring stuff in because it locks you into the apartment. Just don't try to take anything out. Remember that storage closet or garage that you got with your apartment, you can open the door and put stuff in for free but if you carry anything out (even if you just relocate it to your apartment) you get charged per item that you carry across the threshold.

If you look at storage from any cloud vendor they offer a metered storage service. The same is true for compute services. You can lease a virtual processor and memory and grind on data all that you want. The catch is when you want to transfer your files or report results of you analysis to your desktop computer, you get charged on a per gigabyte transferred across the internet. Cost calculators help you calculate these costs but they are a little hard to estimate and use to calculate outbound data charges. Amazon, for example, has a calculators that you can use. The AWS pricing calculator allows you to look at the cost of all cloud services.

Let's walk through the cost of Amazon Glacier. The price list says that you should pay $0.007/GB/month or $7/TB/month to keep things in cold storage. We will use 120 TB as our basis for analysis. We put this as the amount to store and see the cost of storing the data is $860/month.

If we plan on reading back 10% of this data during the month the price goes up to $2217. The bulk of these charges are the outbound charges. The cost goes to $921 if we read the data to an EC2 instance and not all the way back to our data center or desktop computer. To use our apartment analogy, you are paying $860 to get a storage garage. You pay $61 every time you take something out and move it to your apartment. You can put all you want into the storage area (as long as your don't exceed the space of the storage unit) but taking something out will cost you. If you put your recently retrieved item in your car or a truck and drive it out the gate you get a surcharge of $1300. It is important to remember that pulling more stuff out of your storage will cost you more. Putting this in terms of computer archive, you can store all of your emails, contracts, customer transactions, patient records in long term storage. If your on-site storage fails for some reason or if you get a legal request to review five years of data, you can pull the data back from cold storage. It will cost you to pull the data back but it is still cheaper than keeping the seven years of longer data on spinning disk in your data center (estimate $3K/TB plus 10% per year for spinning disk in your data center).

We can do the same calculation for cloud storage using S3. We can store 120 TB for roughly $3950/month. If we want to read back 10%, or 12 TB, of that data, it will cost us $5150 or $1200 additional.

We can reduce the cost by using lower speed storage in the cloud. We put the S3 data into the infrequent category to save money. This drops the cost to just over $3K which does save us about $2100/month. We agree to pay a lower cost to get higher latency and longer retrieval times. It is better than using tape in the cloud and we can save some money with this option.

We can opt for reduced redundancy storage (aka non-mirrored and non-replicated data) but we risk data loss since we will only have a single copy in the cloud. This drops the cost to $4300 with the data retrieval but we have to weigh the cost vs data loss risks.

Let's not pick on Amazon. How does this compare to Azure? Unfortunately, we can't start with Microsoft tape in the cloud, they don't offer the service. We must start with block storage in the Azure cloud. Microsoft has an Azure pricing calculator that you can use to perform the same calculations. The calculator and pricing is a little difficult to use when you first get into it. You basically need to put together the calculation a piece at a time. You need to factor in the cost of the storage and the cost of transferring the data from the cloud to your data center. This is done in two different pieces. An example of what we are looking for can be seen below.

We need to piece together the calculator. First we add the storage component then the bandwidth component. There is a transaction component but this amount is trivial and we are going to ignore it for simplicity.

If we look at the options for Azure storage, we can basically select blob storage in different zones. In the grand scheme of things, the cost is not siginificantly higher one way or another. The basic cost is about the same.

The third class of storage that are going to look at is the Oracle Storage Cloud Service. We can look at Oracle Storage Cloud Service as well as Oracle Archive Cloud Service. The Archive service compares directly to the Amazon Glacier service except that it is $1/TB/month and suffers the same transfer charges for outbound data. The Oracle Storage Cloud Service is similar to the Amazon S3 and Azure Blob Storage Service but it is offered either as a metered service (as is S3 and Blobs) and un-metered services. Unfortunately, Oracle does not provide a cost calculator for general use. The Value Added Distributors are given a copy of the calculator but it is not generally available. The key difference with the Oracle storage services is that there are two significant flavor differences; metered and non-metered. The metered services are charged just like the Microsoft and Amazon services. You pay for what you use on a per GB basis and pay for outbound data transfer. An example of the pricing calculator is shown below. Note that we do need to have a good guestimate on how much data we will transfer outbound across the internet. These charges are not consumed if you are reading the data to a compute engine in the cloud unlike S3 which still consumes cost just for reading the data off the disk.

The most significant differential in storage offerings is the non-metered storage. Oracle offers storage in blocks that you reserve and allocate for 12 months. This is different from the metered storage in that metered can start with 10 TB and grow to 120 TB over the year. With the non-metered storage, you start with 120 TB and end with 120 TB. You can extend your contract and grow storage but you basically extend a new contract for more storage. You can not shrink your storage and pay for less. The benefit of this is that you don't have to pay for outbound data transfer. You can read and write as much as you want and not get charged for transferring the data across the internet. A pricing calculator for this is simple. How much do you need and how long do you need it?

If we piece all of this together and look a price comparison between the three service providers, the answer of which is cheapest comes down to it depends. Oracle non-metered storage has a significant advantage if you are planning on reading back your data at high or unpredictable rates. Amazon S3 infrequent is the cheapest if you don't plan on reading back your data and want it as an insurance policy only. I honestly would go with Glacier or Oracle Archive if this is the case since it is an order of magnitude cheaper. The chart below compares 120 TB of storage and the variable charge for reading back this data on a monthly basis. If you have 120 TB of storage and plan on reading back 120 TB on a monthly basis, the Oracle non-metered storage is significantly cheaper. If you are only planning on reading back 12 to 24 TB per month the cost is about the same for all of the services.

In summary, one option is not clearly better than the other (except for high read rates) and this blog is intended to help you decide on what fits your needs best. Pricing calculators can help with the cost based on transfer rates. It is important to remember that storage transfer is a significant part of the calculation. It is also important to look at your usage model. We assumed that you started with 120 TB and ended with 120 TB for our analysis. If you start with 12 TB and grow to 120 TB, the pricing calculation will be a little different. Neither the Amazon nor Azure calculators will help you run this simulation and you will have to calculate everything on a month by month basis. It is also interesting to take 120 TB of on-premise storage and assume that each TB can be purchased at $3K/TB. If we assume 10% annual hardware maintenance and a three year amortization, the charge for on-premise storage is $1030/month which might be more or less than cloud based storage. Your results might vary.

making backup better

Tue, 2016-04-26 02:07
Yesterday we looked at backing up our production databases to cloud storage. One of the main motivations behind doing this was cost. We were able to reduce the cost of storage from $3K/TB capex plus $300/TB/year opex to $400/TB/year opex. This is a great solution but some customers complain that it is not generic enough and latency to cloud storage is not that great. Today we are going to address both of these issues with the cloud storage appliance. First, let's address both of the typical customer complaints.

The database backup cloud service is just that. It backs up a database. It does it really well and it does it efficiently. You replace one of the backup library modules that translates writes of backup data to the cloud REST api rather than a tape driver. The software works well with commercial products like Symantec or Legato and integrates well into that solution. Unfortunately, the critics are right. The database backup cloud service does that and only that. It backs up Oracle databases. It does not backup MySQL, SQL Server, DB2, or other databases. It is a single use tool. A very useful single use tool but a single use tool. We need to figure out how to make it more generic and backup more than just databases. It would be nice if we could have it backup home directories, email servers, virtual machines, and other stuff that is used in the data center.

The second complaint is latency. If we are writing to an SSD or spinning disk attached to a server via high speed SCSI, iSCSI, or SAS, we should expect 10ms access time or less. If we are writing to a server half way across the country we might experience 80ms latency. This means that a typical read or write takes eight times longer when we read and write cloud storage. For some applications this is not an issue. For others this latency makes the system unusable. We need to figure out how to read adn write at 10ms latency but leverage the expandability of the cloud storage and lower cost.

Enter stage left the Oracle Cloud Storage Appliance. The appliance is a software component that listens on the data center internet using the NFS protocol and talks to the cloud services using the storage REST api. Local disks are used as a cache front end to store data that is written to and read from the network shares exposed by the appliance. These directories map directly to containers in the Oracle Storage Cloud Service and can be clear text or encrypted when stored. Data written from network servers is accepted and released quickly as it is written to local disk and slow tricked to the cloud storage. As the cache fills up, data is aged and migrated from the cache storage into cloud storage. The metadata representing the directory structure and storage location is updated to show that the data is no longer stored locally but stored in the cloud. If a read occurs from the file system, the meta data helps the appliance locate where the data is stored and it is presented to the network client from the cache or pulled from the cloud storage and temporarily stored in the local cache as long as there is space. A block diagram of this architecture is shown below

The concept of how to use this device is simple. We create a container in our cloud storage and we attach to it with the cloud storage appliance. This attachment is exposed via an nfs mount to clients on our corporate network and anyone on the client can read or write files in the cloud storage. Operations happen at local disk speed using the network security of the local network and group/owner rights in the directory structure. It looks, smells, and feels just like nfs storage that we would spend thousands of dollars per TB to own and operate.

For the rest of this blog we are going to go through the installation steps on how to configure the appliance. The minimum requirements for the appliance are

  • Linux 7 (3.10 kernel or later)
  • Docker 1.6.1 or later
  • two dual core x86 CPUs
  • 4 GB of RAM
We will be installing our configuration on a Windows desktop running VirtualBox. We will not go through the installation of Oracle Enterprise Linux 7 because we covered this a long time ago. We do need to configure the OS to have 4 GB of RAM and at least 2 virtual cores as shown in the images below.

We also need to configure a network. We configure two networks. One is for the local desktop console and the other is for the public internet. We could configure a third interface to represent our storage network but for simplicity we only configure two.

We can boot our Linux 7 system and will need to select the 3.10 kernel. By default it will want to boot to the 3.8 kernel which will cause problems in how the appliance works.

What we would like to do is remove the 3.8 kernel from our installation. This is done by removing the packages with the rpm -e command. We then update the grub.cfg file to list only the 3.10 kernels.

Once we have removed the kernels, we update the grub loader and enable additional options for the yum update.

The next step that we need to take is to install docker. This is done with the yum install command.

Once we have the docker package installed, we need to make sure that we have the nfs-client and nfs-server installed and started.

It is important to note that the tar bundle is not generally available. It does require product manager approval to get a copy of the software for installation. The file that I got was labeled oscsa-1.0.5.tar.gz. I had to unzup and untar this file after loading it on my Linux VirtualBox instance. I did not do a screen capture of the download but did go through the installation process.

We start the service with the oscsa command. When we start it it brings up a management web page so that we can make the connection to the cloud storage service. To see this page we need to start firefox and connect to the page.

One of the things that we need to know is the end point of our storage. We can find this by looking at the management console for our cloud services. If we click on the storage cloud service details link we can find it.

Once we have the end point we will need to enter this into the management console of the appliance as well as the cloud credentials.

We can add encryption and a container name for our network share and start reading and writing.

We can verify that everything is working from our desktop by mounting the nfs share or by using cloudberry to examine our cloud storage containers. In this example we use cloudberry just like we did when we looked at the generic Oracle Storage Cloud Services.

We can examine the properties of the container and network share from the management console. We can look at activity and resources available for the caching.

In summary, we looked at a solution to two problems offered by our database backup solution. The first was single purpose and the second was latency. By providing a network share to the data center we can not only backup or Oracle database but all of the databases by having the backup software write to the network share. We can backup other files like virtual machines, email boxes, and home directories. Disk latency operates at the speed of the local disk rather than the speed of the cloud storage. This software does not cost anything additional and can be installed on any virtual platform that supports Linux 7 with kernel 3.10 or greater. When we compare this to the Amazon Storage Gateway which requires 2x the processing power and $125/month to operate it looks significantly better. We did not compare it to the Azure solution because it is an iSCSI hardware solution and not easy to get a copy of for testing.

backing up a database to the cloud

Mon, 2016-04-25 02:07
Up to this point we have talked about the building blocks of the cloud. Today we are going to look into the real economics of using some of the cloud services that we have been examining. We have looked at moving compute and storage to the cloud. Let's look at some of the reasons why someone would look at storage in the cloud.

Storage is one of those funny things that everyone asks for. Think of uses for storage. You save emails that come in every day. If you host your email system in your corporation, you have to consider how many emails someone can keep. You have to consider how long you keep files associated with email. At Oracle we have just over 100,000 employees and limit everyone to 2GB for email. This means that we need 200 TB to store email. If we increase this to 20 GB this grows to 2 PB. At $3K/TB we are looking at $600K capex to handle email messages. If we grow this to 2 PB we are looking at $6M for storage. This is getting into real money. Associated with this storage is a 10% support cost ($60K opex annually) as well as part of a full time employee to replace defective disks, tune and feed the storage system, allocate disks and partitions not only to our storage but other projects at a cost of $80K payroll annually. If we use a 4 year depreciation, our email boxes will cost us ($150K capex + $60K opex + $80K opex) $290K per year or $29/user just to store the email. If we expand the email limits to 20 GB we grow almost everything by a factor of 10 as well so the email boxes cost us $220/user annually (we don't need 10x the storage admins). Pile on top of this home directories that people want to save attachments into and this number explodes. We typically do want to give everyone 20 GB for a home directory since this stores documents associated with operation of the company. We typically want people storing these documents on a network share and not on a disk on their laptop. Storing data on their laptop opens up security and data protection discussions as well as access to data if the laptop fails. Putting it on a shared home directory allows the company to backup the files as well as define protection mechanisms for sensitive data. We have basically justified $250/user for email and home directories by allocating 22 GB to each employee.

The biggest problem is not user data, it is corporate data. Databases typically consume upwards of 400 GB to 40 TB. There is a database for human resources, payroll, customer service, purchase orders, general ledger, inventory, transportation management, manufacturing.... the list goes on. Backing up this data as it changes becomes an issue. Fortunately, programs like E-Business Suite, PeopleSoft, and JD Edwards aggregate all of these business functions into a small number of database instances so we don't have tens or hundreds of databases to backup. Some companies do roll out multiple databases for each project to collect and store data but these are typically done with low cost, low function databases like MySQL, Postgress, MongoDB, and SQL Server. Corporate data that large numbers of people use are typically an Oracle database, DB2, or SQL Server. Backing up this data is critical to a corporation. Database backups are typically done nightly to make sure that you can recover disk or server failures. Fortunately, you don't need to backup all 400 GB every night but can do incremental backups and copy only the data blocks that have changed from the previous night. Companies typically reserve late at night on the weekends to do a full backup because users are typically not working and few if any people are hitting the database at 2am on Sunday morning. The database can be taken offline for a couple of hours to backup 400 GB or a live backup can be taken with little risk since few if anyone is on the system at this time. If you have a typical computer with SCSI or SAS disks you can reasonably get 2G/second throughput so reading 400 GB will take 200 seconds. Unfortunately, writing is typically about half that speed so the backup should reasonably take 400 seconds which is 7 minutes. If your database is 4 TB then you increase this by a factor of 10 so it takes just over an hour to backup everything. Typically you want to copy this data to another data center or to tape and your write speeds get doubled again. The 7 minutes becomes 15 minutes. The hour becomes two hours.

When we talk about backing up database data, there are two schools of thought. The database data is contained in a table extent file. You can backup your data by replicating your file system or using database tools to backup your data. Years ago files were kept on raw disk partitions. Few people do this anymore and table extent files are kept on file systems. Replicating data from a raw partition is difficult so most people used tools like RMAN to backup database files on raw partitions. Database vendors have figured out how to optimize reads and writes to disk despite the file system structures that operating system vendors created. File system vendors have figured out how to optimize backup and recovery to avoid disk failures. Terms like mirroring, triple mirroring, RAID, and logical volume management come up when you talk about protecting data in a file system. Other terms like snap mirror and off-site cloning sneak into the conversation as well. Earlier when we talked about $3K/TB we are really talking about $1K/TB but we triple mirror the disks thus triple the cost of usable storage. This makes sense when we go down to Best Buy or Fry's and look at a 1 TB USB disk for $100. We could purchase this but the 2G/second transfer rate suddenly drops to 200K/second. We need to pay more for a higher speed communication bridge to the disk drive. We could drop the cost of storage to $100/TB but the 7 minute backup and recovery time suddenly grows to 70 minutes. This becomes a cost vs recovery time discussion which is important to have. At home, recovering your family photos from a dead desktop computer can take hours. For a medical practice, waiting hours to recover patient records impacts how the doctors engage with patients. Waiting hours on a ticket sales or stock trading web site becomes millions of dollars lost as people go to your competitors to transact business.

Vendors like EMC and NetApp talk about cloning or snap mirrors of disks to another data center. This technology works for things like email and home directories but does not work well for databases. Database files are written to multiple files at times. If you partition your data, the database might be moving data from one file to another as data ages. We might have high speed SSD disks for current data and low speed, low latency disks for data greater than 30 days old. If we start a clone of our SSD disks during a data move, the recent data will get copied to our mirror at another site. The database might finish re-partitioning the data and the disk management software starts backing up the lower speed disks. We suddenly get into a data consistency problem. The disk management software and the database software don't talk to each other and tell each other that they are moving data between file systems. Data that was in the high speed SSD disks is now out of sequence with the low speed disks at our backup site. If we have a disk failure on our primary site, restoring data from or secondary site will cause database failure. The only way to solve this problem is to schedule disk clones while the database is shut down. Unfortunately, many IT departments select a disk cloning solution since it is the best solution for mirroring home directories, email servers, and virtualization servers. Database servers have a slightly different backup requirement and require a different way of doing things.

The recommended way of backing up a database is to use archive tools like RMAN or commercially available products like ComVault or Legato. The commercial products provide a common backup process that knows how virtualization servers and databases like to be backed up. It allows you to backup SQL Server and an Oracle database with the same user interface and process. Behind the scenes these tools talk to RMAN and the SQL Server backup utilities but presents a uniform user interface to schedule and manage backups and restores.

Enough rambling about disks and backups. Let's start talking about how to use cloud storage for our disk replication. Today we are going to talk about database backup to the cloud. The idea behind our use of the cloud is pure economics. We would like to reduce our cost of storage from $3K/TB to $400/TB/year and get rid of the capex cost. The true problem with purchasing storage for our data center is that we don't want to purchase 10 TB a month because that is what we are consuming for backups. What we are forced to do is look 36 months ahead and purchase 400 TB of disk to handle the monthly data consumption or start deleting data after a period. For things like Census data and medical records the retention period is decades and not months. For some applications, we can delete data after 12 months. If we are copying incremental database backups, we can delete the incrementals once we do a full backup. In our 10 TB a month example we will have to purchase $1.2M in storage today knowing that we will only consume 10 TB this month and 10 TB next month. Using the cloud storage we can pay $300 this month, $600 next month, and grow this amount at $300/month until we get to the 400 TB that we will consume in 36 months. If we guestimated low we will need to purchase more storage again in two years. If we guestimated high we will have overpurchased storage and spend more than $3K/TB for what we are using. Using cloud storage allows us to consume storage at $400/TB/year. If we guess wrong and have metered storage there is no penalty. If we are using non-metered storage, we might purchase a little too much but only have to look forward 12 months rather than 36. It typically is easier to guess a year ahead rather than three years.

Just to clarify, we are not talking about moving all of our backup data to the cloud all at once. What we are talking about is doing daily incremental backups to your high speed disk attached to your database. After a few days we do a full backup to lower cost network storage. After a few weeks we copy these backups to the cloud. In the diagram below we do high speed backups for five days, backup to low speed disks for 21 days, backup to the cloud for periods beyond that. We show moving data to tape in the cloud beyond 180 days. The cost benefit is to take data that we probably won't read to a lower cost storage. Using $400/TB/year gives us a $2600/TB cost savings in capex. Using $12/TB/year tape gives us an additional $388/TB cost savings in opex.

The way that we get this storage tiering is to modify the RMAN backup libraries and move the tape interface from an on-site tape unit to disk or tape in the cloud. The library module can be download from the oracle tech network. More information on this service can be found at Oracle documentation or Backup whitepaper. You can also watch videos that describe this service

The economics behind this can be seen in a TCO analysis that we did. In this example we look at moving 30 TB of backup from on-premise disk to cloud backup. The resulting 4 year savings is $120K. This does not take into account tangential savings but only looks at physical cost savings of not purchasing 30 TB of disk.

Let's walk through what is needed to make this work. First we have to download the library module that takes RMAN read and write commands and translates them into REST api commands. This library exists for Oracle Storage Cloud Services as well as Amazon S3. The key benefits to the Oracle Storage Cloud is that you get encryption and parallelism for free as part of the service. With Amazon S3 you need to pay for additional parallel channels at $1500/channel as well as encryption in the database at $10K/processor license. The Oracle Storage Cloud provides this as part of the $33/TB/month database backup bundle.

Once we download the module, we need to install it with a java command. Note that this is where we tie the oracle home and SID to the cloud credentials. The data is stored in a database wallet as well as the encryption keys used to encrypt the backups.

Now that we have replaced the tape interface with cloud storage, we need to define a tape interface for RMAN and link the library into the process. When we read and write to tape we are actually reading and writing to cloud storage.

Once we have everything configured, we use RMAN or ComVault or Legato as we have for years. Accessing the tape unit is really accessing the cloud storage.

In summary, this is our first use case of the cloud. We are offsetting the cost of on-premise storage and reduce the cost of our database backups. A good rule of thumb is that we can drop the cost of backups from $3K/TB plus $300/TB/year to $400/TB/year. Once we have everything downloaded an installed, nothing looks or feels different from what we have been doing for years. When you start looking at purchasing more disks because you are running out of space, look at moving your backups from local disk to the cloud.

So how about other cloud providers

Fri, 2016-04-22 02:07
If you are looking for a cloud hosting provider, the number one question that comes up is which one to use. There are a ton of cloud providers. How do you decide which one is best for you? To be honest, the answer is it depends. It depends on what your problem is and what problem you are trying to solve. Are you trying to solve how you communicate with customers? If so do you purchase something like SalesForce or Oracle Sales Cloud, you get a cloud based sales automation tool. Doing a search on the web yields a ton of references. Unfortunately, you need to know what you are searching for. Are you trying to automate your project management (Oracle Primavera or Microsoft Project)? Every PC magazine and trade publication have opinions on this. Companies like Gartner and Forrester write reviews. Oracle typically does not rate well with any of these vendors for a variety of reasons.

My recommendation is to look at the problem that you are trying to solve. Are you trying to lower your cost of on-site storage? Look at generic cloud storage. Are you trying to reduce your data center costs and go with a disaster recovery site in the cloud? Look at infrastructure in the cloud and compute in the cloud. I had a chance to play with VMWare VCloud this week and it has interesting features. Unfortunately, it is a really bad generic cloud storage company. You can't allocate 100 TB of storage and access it remotely without going through a compute engine and paying for a processor, operating system, and OS administrator. It is really good if I have VMWare and want to replicate the instances into the cloud or use VMotion to move things to the cloud. Unfortunately, this solution does not work well if I have a Solaris of AIX server running in my data center and want to replicate into the cloud.

The discussion on replication opens a bigger can of worms. How do you do replication? Do you take database and java files and snap mirror them to the cloud or replicate them as is done inside a data center today? Do you DataGuard the database to a cloud provider and pay on a monthly basis for the database license rather than owning the database? Do you setup a listener to switch between your on-site database and cloud database as a high availability failover? Do you setup a load balancer in front of a web server or Java app server to do the same thing? Do you replicate the visualization files from your VMWare/HyperV/OracleVM/Zen engine to a cloud provider that supports that format? Do you use a GoldenGate or SOA server to physically replicate objects between your on-site and cloud implementation? Do you use something like the Oracle Integration server to synchronize data between cloud providers and your on-premise ERP system?

Once you decide on what level to do replication/fail over/high availability you need to begin the evaluation of which cloud provider is best for you. Does your cloud provider have a wide range of services that fits the majority of your needs or do you need to get some solutions from one vendor and some from another. Are you ok standardizing on a foundation of a virtualization engine and letting everyone pick and choose their operating system and application of choice? Do you want to standardize at the operating system layer and not care about the way things are virtualized? When you purchase something like SalesForce CRM, do you even know what database or operating system they use or what virtualization engine supports it? Do or should you care? Where do you create your standards and what is most important to you? If you are a health care provider do you really care what operating system that your medical records systems uses or are you more interested in how to import/export ultrasound images into your patients records. Does it really matter which VM or OS is used?

The final test that you should look at is options. Does your cloud vendor have ways of easily getting data to them and easily getting data out. Both Oracle and Amazon offer tape storage services. Both offer disks that you can ship from your data center to their cloud data centers to load data. Which one offers to ship tapes to you when you want to get them back? Can you only backup from a database in the cloud to storage in the cloud? What does it cost to get your data back once you give it to a cloud provider? What is the outbound charge rate and did you budget enough to even terminate the service without walking away from your data? Do they provide an un-limited read and write service so that you don't get charged for outbound data transfer.

Picking a choosing a cloud vendor is not easy. It is almost as difficult as buying a house, a car, or a phone carrier. You will never know if you made the right choice until you get penalized for making the wrong choice. Tread carefully and ask the right questions as you start your research.

Storage on Azure

Thu, 2016-04-21 02:07
Yesterday we were out on a limb. Today we are going to be skating on thin ice. Not only do I know less about Azure than AWS but Microsoft has significantly different thoughts and solutions on storage than the other two cloud vendors. First, let's look at the available literature on Azure storage

There are four types of storage available with Azure storage services; blob storage, table storage, queue storage, and file storage. Blob storage is similar to the Oracle Block Storage or Amazon S3 storage. It provides blocks of pages that can be used for documents, large log files, backups, databases, videos, and so on. Blobs are objects placed inside of containers that have characteristics and access controls. Table storage offers the ability to store key/attribute entries in a semi-structured dataset similar to a NoSQL database. The queue storage provides a messaging system so that you can buffer and sequence events between applications. The third and final is file based storage similar to dropbox or google docs. You can read and write files and file shares and access them through SMB file mounts on Windows systems.

Azure storage does give you the option of deciding upon your reliability model by selecting the replication model. The options are locally triple redundant storage, replication between two data centers, replication between different geographical locations, or read access geo-redundant storage.

Since blob storage is probably more relevant for what we are looking for, let's dive a little deeper into this type of storage. Blobs can be allocated either as block blobs or page blobs. Block blobs are aggregation of blocks that can be allocated in different sizes. Page blobs are of smaller fixed size chunks of 512 bytes for each page blob. Page blogs are the foundation of virtual machines and are used by default to support operating systems running in a virtual machine. Blobs are allocated into containers and inherit the characteristics of the container. Blobs are accessed via REST apis. The address of a blob is formatted as http://(account-name).blob.core.windows.net/(container-name)/(blob-name). Note that the account name is defined by the user. It is important to note that the account-name is not unique to your account. This is something that you create and Microsoft adds it to their DNS so that your ip address on the internet can be found. You can't choose simple names like test, testing, my, or other common terms because they have been allocated by someone else.

To begin the process we need to log into the Azure portal and browser for the Storage create options.

Once we find the storage management page we have to click the plus button to add a new storage resource.

It is important to create a unique name. This name will be used as an extension of the REST api and goes in front of the server address. This name must be unique so picking something like the word "test" will fail since someone else has already selected it.

In our example, we select wwpf which is an abbreviation for a non-profit that I work with, who we play for. We next need to select the replication policy to make sure that the data is highly available.

Once we are happy with the name, replication policy, resource group, and payment method, we can click Create. It takes a while so we see a deploying message at the top of the screen.

When we are finished we should see a list of storage containers that we have created. We can dive into the containers and see what services each contains.

Note that we have the option of blob, table, queue, and files at this point. We will dive into the blob part of this to create raw blocks that can be used for backups, holding images, and generic file storage. Clicking on the blob services allows us to create a blob container.

Note that the format of the container name is critical. You can't use special characters or capital letters. Make sure that you follow the naming convention for container names.

We are going to select a blob type container so that we have access to raw blocks.

When the container is created we can see the REST api point for the newly created storage.

We can examine the container properties by clicking on the properties button and looking at when it was created, lease information, file count, and other things related to container access rights.

The easiest way to access this newly created storage is to do the same thing that we did with Oracle Storage. We are going to use the CloudBerry Explorer. In this gui tool we will need to create an attachment to the account. Note that the tool used for Azure is different from the Oracle and Amazon tools. Each cost a little money and they are not the same tool unfortunately. They also only work on a Windows desktop which is challenging if you use a Mac of Linux desktop.

To figure out your access rights, go to the storage management interface and click on the key at the top right. This should open up a properties screen showing you the account and shared access key.

From here we can access the Azure blob storage and drag and drop files. We first add the account information then navigate to the blob container and can read and write objects.

In this example, we are looking at virtual images located on our desktop "E:\" drive and can drag and drop them into a blob container for use by an Azure compute engine.

In summary, Azure storage is very similar to Amazon S3 and Oracle Storage Cloud Services. The cost is similar. The way we access it is similar. The way we protect and restrict access to it is similar. We can address it through a REST api (which we did not detail) and can access it from our desktop or compute server running in Azure. Overall, storage in the cloud is storage in the cloud. You need to examine your use cases and see which storage type works best for you. Microsoft does have an on-premise gateway product called Azure SimpleStor which is similar to the Amazon Storage Gateway or the Oracle Cloud Storage Appliance. It is more of a hardware solution that attaches via iSCSI to existing servers.

Storage in Amazon S3

Wed, 2016-04-20 02:07
To be honest, I am going out on a limb here. I know just enough about Amazon S3 to be dangerous. Most of the reference material that I used was from the amazon web site or Safari Books. The books that I relied upon the most are The key use cases according to the S3 Essentials book are file hosting, storing data on mobile based applications, static web hosting, video hosting, and data backup. We will look at a couple of these configurations and how to deploy and use them.

With Oracle Storage Cloud Services we had the concept of a container. This container had characteristics like spinning disk, archive, ownership, and other features related to ownership and security. Amazon S3 has a similar concept but they call this container a bucket. A bucket can contain nested folders and has properties associated with it. If we look at the AWS Console, we see six types of storage and content delivery

  • S3 - block storage in the cloud
  • Cloud Front - content delivery network
  • Elastic File System - fully managed file system for EC2
  • Glacier - tape storage in the cloud
  • Snowball - large scale data transport to get data into and out of S3
  • Storage Gateway - an appliance to reduce latency for S3 storage
We will be focusing on S3 and Glacier. Snowball is a mechanism to transport large amounts of data to and from the Amazon data center. The Storage Gateway is an appliance in a data center to reduce latency and provide a quicker access to data stored in S3. We will need to dive a little deeper into S3 and the Storage Gateway but not the Cloud Front, and the Elastic File System in this blog.

We first start with the AWS console and click on the S3 console. We can create a new bucket by clicking on Create Bucket.

When we create a new S3 bucket, we can name it and define which data center we want the storage to be allocated into. We have to be careful when we create a bucket name. The namespace is shared with all users. If we want to create a common name, it will probably be used by someone else and we will see an error in creating the name.

If we look at the properties associated with this storage we can see that we have a variety of options to configure.

  • Permissions
  • Static Web Hosting
  • Logging
  • Events
  • Versioning
  • Lifecycle
  • Cross-Region Replication
  • Tags
  • Requester Pays

Let's go through each of these individually. With Permissions, you have the ability to control who can see, modify, delete, and download the contents of the bucket. Bucket policies can get relatively complex and have a variety of conditions and restrictions applied to it. You can find out more at Detailing Advanced Policies. This feature allows you to restrict who can read content by ip address, access keys, or usernames.

Static web hosting allows you to create a web site based on the files in a container. If you have an index.html, it becomes to the basis for accessing all of the other files in this directory. This is both good and bad because you get the basic functionality of a web server but you don't get the configuration and access logs. It has some uses but is limited in how it works. It does make static web page presentation easy because you no longer need an EC2 instance, operating system, or application to host the web site.

Logging allows you to view how, who, and from where files were accessed. You can generate logs to look at access patterns and access locations.

Versioning allows you to keep past copies of files. If a file is edited and changed, previous versions and deltas are tracked. This is a good feature but does cause storage consumption to grow because you never delete older versions of a file but keep the deltas for a fixed amount of history.

Lifecycle allows you to automatically archive files from spinning disk to tape after a fixed amount of time and history of access. If no one has accessed a file in months, it can be written to Glacier for long term lower cost archive.

Cross-Region Replication allows you to replicate blocks between data centers automatically. This allows for high availability in the event that one data center fails or storage at one location is having significant problems.

Tags and Request Payer allows for charge-back features to allow people who consume resources to pay for the download and storage. The person creating the bucket is not charged for usage but has the mechanism to transfer the charges to the person reading the data.

Reading and writing to our newly created bucket requires a user interface or usage of the Amazon Rest api to transfer files. Amazon does provide a user interface to upload and edit the properties of the files and directories. We recommend using another interface like CloudBerry or other graphical tool or the command line utilities because this interface is a bit limiting.

This blog entry is significantly different from the one yesterday. Yesterday we started with pricing then got technical. Today we dove straight into the technical and ignored pricing. Let's dive into pricing. The cost of S3 storage is $30/TB/month plus outbound charges. I suggest using the S3 price list and the S3 price calculator to figure pricing. Attached are screen shots of pricing for 120 TB of storage using the calculator and screen shots of the price list.

One thing that we talked about with the Oracle Storage Cloud and have not talked about here is an on premise virtual machine to reduce latency. Amazon offers this with the AWS Storage Gateway. The key differences between the two products are 1) AWS Gateway uses iSCSI on the client side to provide storage access to the data center and 2) it cost $125/month/gateway. It solves the same latency problem but does it slightly differently. Unfortunately, we are not going to install and configure this virtual instance and play with it because it requires 8 virtual CPUs which is greater than my laptop will offer.

In summary, this is an initial review of S3 storage with Amazon AWS. We did not dive deep into Glacier or the Storage Gateway. We did not review elastic block services (EBS) because these are typically attached to EC2 instances. It is important to note that the focus of S3 is different than Oracle Storage Cloud Services but very similar. Files and directories can be stored in containers and access can be controlled. S3 extends services to provide things like video streaming, static web site hosting, and migrating data to and from tape in the cloud. You can use S3 for backup archives and generic block storage and access it via REST api or AWS api calls. Products like CloudBerry Explorer and S3 Explorer exist to help translate the human interface to S3 storage calls. The cost for S3 is roughly $30/TB/month with additional charges for outbound data on a per GB basis. Archive storage is roughly $7/TB/month with additional charges for data retrieval and outbound data on a per GB basis. The intent of this blog is not to say that one service is better than the other but provide resources to help you make your own decisions and decide what works best for your situation and corporation.

Storage in the Oracle Cloud

Tue, 2016-04-19 02:07
This week we are going to focus on storage. Storage is a slippery slope and difficult conversation to have. Are we talking about a file synchronization like dropbox.com, google.com/docs, or box.com? Are we talking about raw block storage or long term archive storage? There are many services available from many vendors. We are going to focus on block storage in the cloud that can be used for files if desired or for backups of databases and virtual machines. Some of the cloud vendors have specific focused storage like Azure tables that offer a noSQL type storage or Amazon S3 allowing you to run a website without a web server. Today we will look at the Oracle IaaS Storage set of products. This is different than the Oracle PaaS Documents option which is more of a Google Docs like solution. The IaaS Storage is a block of storage that you pay for either on a metered usage or non-metered usage basis.

Notice from the cloud.oracle.com web page, we click on Infrastructure and follow the Storage route. We see that we get the raw block storage or the archive storage as options. We also have the option of an on-site cache front end that reduces latency and offers an NFS front end to the users providing more of a document management strategy rather than a raw block option.

Before we dive a little deeper into the options and differences between the storage appliance, spinning disk, and spinning tape in the cloud, we need to have a discussion about pricing and usage models. If you click on the Pricing tab at the top of the screen you see the screens below.

Metered pricing consists of three parts. 1) how much storage are you going to start with, 2) how much storage are you going to grow to, and 3) how much are you going to read back? Metering is difficult to guestimate and unfortunately it has a significant cost associated with being wrong. Many long term customers of AWS S3 understand this and have gotten sticker shock when the first bill comes in. The basic cost for outbound transfer is measured on a per GB basis. The more that you read across the internet, the more you pay. You can circumvent this by reading into a compute server in the Oracle cloud and not have to pay the outbound transfer. If, for example, you are backing up video surveillance data and uploading 24 hours of video at the end of they day, you can read the 24 hour bundle into a compute server and extract the 10-15 minutes that you are interested in and pay for the outbound charges on compute for the smaller video file.

Non-Metered pricing consists of one part. How much storage are you going to use over the year. Oracle does not charge for the amount of data transferred in-bound or out-bound with this storage. You can read and write as much as you want and there is no charge for data transfer across the internet. In the previous example you could read the 24 hours of video from the cloud storage, throw away 90% of it from a server in your data center, and not incur any charges for the volume of transfer.

Given that pricing is difficult to calculate, we created our own spreadsheet to estimate pricing as well as part numbers that should be ordered when consuming Oracle cloud resources. The images below show the cost of 120 TB of archive storage, metered block storage, and non-metered block storage.

Note that the data transfer price is non-trivial. Reading the data back from the cloud can get significantly more expensive than the cost of the storage itself. A good rule of thumb is the cost of spinning disk in the cloud should not exceed $30/TB/month or $400/TB/year. If you look at the cost of a NetApp or EMC storage system, you are looking at $3K-$4K/TB purchase price with 10% annual maintenance per year ($300-$400). If you are currently running out of storage and your NFS filer is filling up, you can purchase cloud resources for a few months and see if it works. It won't cost you anything more than paying support and you can grow your cloud storage as needed rather than buying 3 years ahead as you would with a filer in your data center. The key issue with cloud storage is latency and access times. Access to a filer in your data center is typically 10ms where access time to cloud storage is typically 80+ms. All cloud storage vendors have on site appliance solutions that act as cache front ends to address this latency problem. Oracle has one that talks NFS. Amazon has one that talks iSCSI. Microsoft has one that talk SMB. There truly is no single vendor with a generic solution that addresses all problems.

Enough with the business side of storage. Unfortunately, storage is a commodity so the key conversation is economics, reliability, and security. We have already addressed economics. When it comes to reliability the three cloud vendors address data replication and availability in different ways. Oracle triple mirrors the data and provides public-private key encryption of all data uploaded to the cloud. Data can be mirrored to another data center in the same geography but can not be mirrored across an ocean. This selection is done post configuration and is tied to your account as a storage configuration.

Now to the ugly part of block storage. Traditionally, block storage has been addressed through an operating system as a logical unit or aggregation of blocks on a disk drive. Terms like tracks and sectors bleed into the conversation. With cloud storage, it is not part of the discussion. Storage in the cloud is storage. It is accessed through an interface called a REST api. The data can be created, read, updated, and deleted using html calls. All of this is documented in the Oracle Documents - Cloud Storage web site.

The first step is to authenticate to the cloud site with an instance name, username, and password. What is passed back is an authentication token. Fortunately, there are a ton of tools to help read and write HTML code and are specifically tuned to help create headers and JSON structured data packets for the REST api interfaces. The screen below shows the Postman interface available through Chrome. A similar one exists for Firefox called RESTClient API. Unfortunately, there is no extension for Internet Explorer.

The first step is to get an auth header by typing in the username and password into the Basic Authentication screen.

Once we are authorized, we connect to the service by going to https://storage.us2.oraclecloud.com/v1/Storage-(identity domain) where identity domain is the cloud provider account that we have been assigned. In our example we are connecting to metcsgse00029 as our identity domain and logging in as the user cloud.admin. We can see what "containers" are available by sending a GET call or create a new container by sending a PUT call with the new container name at the end of our html string. I use the word container because the top level of storage consists of different areas. These areas are not directories. They are not file systems. The are containers that hold special properties. We can create a container that is standard storage which represents spinning disk in the cloud or we can create a container that is archive storage which represents a tape unit in the cloud. This is done by sending the X-Storage-Class header. If there is no header, the default is block storage and spinning disk. If the X-Storage-Class is assigned to Archive it is tape in the cloud. Some examples of creating a container are shown below. We can do this via Postman inside Chrome or a command line

From the command line this would look like

export OUID=cloud.admin
export OPASS=mypassword
export ODOMAIN=metcsgse00029
c url -is -X GET -H "X-Storage-User:Storage-$ODOMAIN:$OUID" 
                 -H "X-Storage-Pass:$OPASS" 
                 https://$ODOMAIN.storage.oraclecloud.com/auth/v1.0

This should return an html header with HTTP 200 OK and an embedded header of X-Auth-Token: AUTH_tk578061b9ae7f864ae9cde3cfdd75d706. Note that the value after the X-Auth-Token is what we will use to pass into all other requests. This token will change with each request and is good for 30 minutes from first execution. Once we have the authentication finished we either change the request type from a GET to a PUT and append the container name to the end. The screen above shows how to do this with Postman. The results should look like the screen below. We can do this from the command line as show below as well.

c url -is -X PUT -H "X-Auth-Token: AUTH_tk578061b9ae7f864ae9cde3cfdd75d706" 
                 https://storage.us2.oraclecloud.com/v1/Storage-$ODOMAIN/new_area
In this example we create a new container from the command line called new_area. We can verify this by reviewing the cloud storage by changing the PUT to a GET.

c url -is -X GET -H "X-Auth-Token: AUTH_tk578061b9ae7f864ae9cde3cfdd75d706" 
                 https://storage.us2.oraclecloud.com/v1/Storage-$ODOMAIN
Both of these methods allow us to see the storage that we created. I personally do not like this interface. It is not intended to be human consumable. Uploading and downloading a file is difficult at best. A user interface that makes dragging and dropping files is desirable. This is where dropbox and google docs shine. They allow you to drag and drop as well as synchronize directories to cloud storage. The Oracle Storage Cloud is not intended to be this solution. It is designed so that you can drop a new library into your rman backup and backup straight from your database to the cloud. You can point your ComVault or Legato backup software to a cloud instance and replicate your data to the cloud. If you want a human readable interface you need to purchase something like the Cloudberry Explorer from Cloudberry. This give you a Windows Explorer like interface and allows your to drag and drop files, create containers and directories, and schedule archives or backups as desired.

Note that the way that you create a block storage container vs an archive container is a simple menu selection. Retrieving the archive storage is a little more complex because the tape unit must stage the file from the tape to disk and notify you that the restoration has been completed. This is a little more complex and we will defer this discussion to a later blog.

Copying files is little more than dragging and dropping a file between sections of a window in Cloudberry.

For completeness, I have included the command line screen shots so that you can see the request/response of a command line interaction.

It is important to remember our objective. We can use the cloud block storage as a repository for things like database and a holding point for our backups. When we configure a database in the cloud, we backup and restore from this storage. This is configured in the database provisioning screen. The Storage-metcsgse00029/backup is the location of RMAN backup and restores. The backup container is created through the REST api or Cloudberry interface. We can also attach to the cloud storage through the cloud storage appliance software which runs inside a virtual machine and listens for NFS requests and translates them into REST api calls. A small disk is attached to the virtual machine and it acts as a cache front end to the cloud storage. As files are written via NFS they are copied to the cloud storage. As the cache fills up, files contents are dropped from local storage and the metadata pointing to where the files are located are updated relocating the storage to the cloud rather than the cache disk. If a file is retrieved via NFS, the file is read from cache or retrieved from the cloud and inserted into the cache as it is written to the client that requested it.

In summary, we covered the economics behind why you would select cloud storage over on site storage. We talked about how to access the storage from a browser based interface, web based interface, or command line. We talked about improving latency and security. Overall, cloud based storage is something that everyone is familiar with. Products like Facebook, Picaso, or Instagram do nothing more than store photos in cloud storage for you to retrieve when you want. You pay for these services by advertisements injected into the web page. Corporations are turning more and more towards cloud storage as a cheaper way to consume long term storage at a much lower price. The Oracle Storage Cloud service is first of three that we will evaluate this week.

installing Tomcat on Docker

Mon, 2016-04-18 02:07
A different way of looking at running Tomcat is to ignore the cloud platform and install and configure everything inside a docker instance. Rather than picking a cloud instance we are going to run this inside VirtualBox and assume that all of the cloud vendors will allow you to run Docker or configure a Docker instance on a random operating system. What we did was to initially install and configure Oracle Enterprise Linux 7.0 from a iso into VirtualBox. We then installed Docker with the command and start the service
sudo yum install docker
sudo systemctl start docker

We can search for a Tomcat installation and pull it down to run. We find a Tomcat 7.0 version from the search and pull down the configuration

docker search tomcat
docker pull consol/tomcat-7.0

We can run the new image that we pulled down with the commands

docker run consol/tomcat-7.0
docker ps
The docker ps command allows us to look at the container id that is needed to find the ip address of the instance that is running in docker. In our example we see the container id is 1e381042bdd2. To pull the ip address we execute
docker inspect -f format='{{.NetworkSettings.IPAddress}}' 1e381042bdd2
This returns the ip address of 172.17.0.2 so we can open this ip address and port 8080 to see the Tomcat installation.

In summary, this was not much different than going through Bitnami. If you have access to docker containers in a cloud service then this might be an alternative. All three vendors not only support docker instances but all three have announced or have docker services available through IaaS. Time wise it did take a little longer because we had to download an operating system as well as Java and Tomcat. The key benefit is that we can create a master instance and create our own docker image to launch. We can script docker to restart if things fail and do more advanced options if we run out of resources. Overall, this might be worth researching as an alternative to provisioning and running services.

technical diversion - DBaaS Rest APIs

Fri, 2016-04-15 02:07
We are going to take a side trip today. I was at Collaborate 2016 and one of the questions that came up was how do you provision 40 database instances for a lab. I really did not want to sit and click through 40 screens and log into 40 accounts so I decided to do a little research. It turns out that there is a relatively robust REST api that allows you to create, read, update, and delete database instances. The DBaaS Rest Api Documentation is a good place to start to figure out how this works.

To list instances that are running in the database service use the following command, note that "c url" should be shortened to remove the space. Dang blogging software! Note to make things easier and to allow us to script creation we define three variables on the command line. I did most of this testing on a Mac so it should translate to Linux and Cygwin. The three variables that we need to create are

  • ODOMAIN - instance domain that we are using in the Oracle cloud
  • OUID - username that we log in as
  • OPASS - password for this instance domain/username
export ODOMAIN=mydomain
export OUID=cloud.admin
export OPASS=mypassword
c url -i -X GET -u $OUID:$OPASS -H "X-ID-TENANT-NAME: $ODOMAIN" -H "Content-Type:application/json" https://dbaas.oraclecloud.com/jaas/db/api/v1.1/instances/$ODOMAIN
What should return is
HTTP/1.1 200 OK
Date: Sun, 10 Apr 2016 18:42:42 GMT
Server: Oracle-Application-Server-11g
Content-Length: 1023
X-ORACLE-DMS-ECID: 005C2NB3ot26uHFpR05Eid0005mk0001dW
X-ORACLE-DMS-ECID: 005C2NB3ot26uHFpR05Eid0005mk0001dW
X-Frame-Options: DENY
X-Frame-Options: DENY
Vary: Accept-Encoding,User-Agent
Content-Language: en
Content-Type: application/json

{"uri":"https:\/\/dbaas.oraclecloud.com:443\/paas\/service\/dbcs\/api\/v1.1\/instances\/metcsgse00027","service_type":"dbaas","implementation_version":"1.0","services":[{"service_name":"test-hp","version":"12.1.0.2","status":"Running","description":"Example service instance","identity_domain":"metcsgse00027","creation_time":"Sun Apr 10 18:5:26 UTC 2016","last_modified_time":"Sun Apr 10 18:5:26 UTC 2016","created_by":"cloud.admin","sm_plugin_version":"16.2.1.1","service_uri":"https:\/\/dbaas.oraclecloud.com:443\/paas\/service\/dbcs\/api\/v1.1\/instances\/metcsgse00027\/test-hp"},{"service_name":"db12c-hp","version":"12.1.0.2","status":"Running","description":"Example service instance","identity_domain":"metcsgse00027","creation_time":"Sun Apr 10 18:1:21 UTC 2016","last_modified_time":"Sun Apr 10 18:1:21 UTC 2016","created_by":"cloud.admin","sm_plugin_version":"16.2.1.1","service_uri":"https:\/\/dbaas.oraclecloud.com:443\/paas\/service\/dbcs\/api\/v1.1\/instances\/metcsgse00027\/db12c-hp"}],"subscriptions":[]}

If you get back anything other than a 200 it means that you have the identity domain, username, or password incorrect. Note that we get back a json structure that contains two database instances that were previously created, test-hp and db12c-hp. Both are up and running. Both are 12.1.0.2 instances. We don't know much more than these but can dive a little deeper by requesting more information by included the service name as part of the request. A screen shot of the deeper detail is shown below.

A list of the most common commands are shown in the screen shot below

The key options to remember are:

  • list: -X GET
  • stop: -X POST --data '{ "lifecycleState" : "Stop" }'
  • restart: -X POST --data '{ "lifecycleState" : "Restart" }'
  • delete: -X DELETE **need to add the instance name at the end, for example db12c-hp in request above
  • create: -X POST --data @createDB.json
In the create option we include a json file that defines everything for the database instance.
{
  "serviceName": "test-hp",
  "version": "12.1.0.2",
  "level": "PAAS",
  "edition": "EE_HP",
  "subscriptionType": "HOURLY",
  "description": "Example service instance",
  "shape": "oc3",
  "vmPublicKeyText": "ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAnrfxP1Tn50Rvuy3zgsdZ3ghooCclOiEoAyIl81Da0gzd9ozVgFn5uuSM77AhCPaoDUnWTnMS2vQ4JRDIdW52DckayHfo4q5Z4N9dhyf9n66xWZM6qyqlzRKMLB0oYaF7MQQ6QaGB89055q23Vp+Pk5Eo+XPUxnfDR6frOYZYnpONyZ5+Qv6pmYKyxAyH+eObZkxFMAVx67VSPzStimNjnjiLrWxluh4g3XiZ1KEhmTQEFaLKlH2qdxKaSmhVg7EA88n9tQDWDwonw49VXUn/TaDgVBG7vsWzGWkRkyEN57AhUhRazs0tEPuGI2jXY3V8Q00w3wW38S/dgDcPFdQF0Q== rsa-key-20160107",
  "parameters": [
    {
      "type": "db",
      "usableStorage": "20",
      "adminPassword": "Test123_",
      "sid": "ORCL",
      "pdb": "PDB1",
      "failoverDatabase": "no",
      "backupDestination": "none"
    }
  ]
}
 
The vmPublicKeyText is our id_rsa.pub file that we use to connect to the service. I did not include a backup space but could have. I did not in this example because we have to embed a password in this space and I did not want to show a service with username and password.

Overall, I prefer scripting everything and running this from a command line. Call me old school but sitting for hours and clicking through screens when I can script it and get a notification when it is done appeals to me.

Tomcat on Azure

Thu, 2016-04-14 02:07
Today we are going to install Tomcat on Microsoft Azure. In the past three days we have installed Tomcat on Oracle Linux using Bitnami and onto a raw virtual image as well as on Amazon AWS using a raw virtual image. Microsoft does not really have a notion of a MarketPlace like the AWS Commercial or Public Domain AMI Markets. It does have Bitnami and we could go through the installation on Azure just like we did the Oracle Compute Cloud. Rather than repeating on yet another platform, let's do something different and look at how we would install Tomcat on Windows on Azure. The Linux installation would be no different than the Oracle Linux raw virtual machine install so let's do something different. You can find Tomcat on Linux Instructions or Tomcat on Windows Instructions. To be honest we won't deviate much from the second one so follow this or follow the instructions from Microsoft, they are basically the same.

The steps that we need to follow are

  • Create a virtual machine with Windows and Java enabled
  • Download and install Tomcat
  • open the ports on the Azure portal
  • open the ports on Windows
We start by loading a virtual machine in the Azure portal. Doing a search for Tomcat returns the Bitnami image as well as a Locker Tomcat Container. This might work but it does not achieve our desire for this exercise. We might want to look at a Container but for our future needs we need to be able to connect to a database and upload jar and war files. I am not sure that a Container will do this.

We search for a JDK and find three different versions. We select the JDK 7 and click Create.

In creating the virtual machine, we define a name for our system, a default login, a password (I prefer a confirmation on the password rather than just entering it once), our default way of paying, and where to place it is storage and which data center based on the storage we select. We go with the default East configuration and click OK.

Since we are cheap and this is only for demo purposes, we will select A0 Standard. The recommended is A1 Standard but it is $50 more per month and again this is only for demo purposes. After having played with the A0 Standard, we might be better off going with the A1 Standard. Yes, it is more expensive. The speed of the A0 shape is so painful that it is almost unusable.

We will want to open up ports 80, 8080, and 443. These will all be used for Tomcat. This can be done by creating an new security rule and adding port exceptions when we create the virtual machine. We can see this in the installation menu.

We add these ports and can click Create to provision the virtual machine

One of the things that I don't like about this configuration is that we have three additional ports that we want to add. When we add them we don't see the last two rules. It would be nice if we could see all of the ports that we define. We also need to make sure that we have a different priority for the port definition. The installation will fail if we assign priority 1000 to all of the ports.

Connection to the virtual machine is done through remote desktop. If you go to the portal and click on the virtual machine you will be able to connect to the console. I personally don't like connecting to a gui interface but prefer a command line interface. You must connect with a username and password rather than a digital certificate.

The first thing that comes up with Windows 2012 server is the server management screen. You can use this to configure the compute firewall and allow ports 80, 8080, and 443 to go to the internet. This also requires going to the portal to enable these ports as network rules. You have two configurations that you need to make to enable port 8080 to go from your desktop, through the internet, get routed to your virtual machine, then into your tomcat application.

For those of you that are Linux and Mac snobs, getting Windows to work in Azure was a little challenging. Simple things like opening a browser became a little challenging. This is more a lack of Windows understanding. To get Internet Explorer to come up you first have to move your mouse into the far right of the screen.

At first it did not work for me because the Windows screen was a little larger than my desktop and I had to scroll all the way to the bottom and all the way to the right before the pop up navigation window comes up. When the window does come up you see three icons. The bottom icon is the configuration that allows you to get the the Control Panel to configure the firewall. The icon above it is the Microsoft Windows icon which gives you an option to launch IE. Yes, I use Windows on one of my desktops. Yes, I do have an engineering degree. No, I don't get this user interface. Hovering over an empty spot on the screen (which is behind a scroll bar) makes no sense to me.

From this point forward I was able to easily follow the Microsoft Tomcat installation instructions. If you don't select the JDK 7 Virtual Machine you can download it from java.com download. You then download the Tomcat app server. We selected Tomcat 7 for the download and followed the defaults. We do need to configure the firewall on the Windows server to enable ports 80, 8080, and 443 to see everything from our desktop browser. We can first verify that Tomcat is properly installed by going to http://localhost:8080 from Internet Explorer in the virtual image. We can then get the ip address of our virtual machine and test the network connections from our desktop by replacing localhost with the ip address. Below are the screen shots from the install. I am not going to go through the instructions on installing Tomcat because it is relatively simple with few options but included the screen shots for completeness.

In Summary, we could have followed the instructions from Microsoft to configure Tomcat. We could pre-configure the ports as suggested in this blog. We could pre-load the JDK with a virtual machine rather than manually downloading it. It took about 10-15 minutes to provision the virtual machine. It then took 5-10 minutes to download the JDK and Tomcat components. It took 5-10 minutes to configure the firewall on Windows and the port access through the Azure portal. My suggestion is to use a service like Bitnami to get a preconfigured system because it takes about half the time and enables all of the ports and services automatically.

installing Tomcat on AWS

Wed, 2016-04-13 02:07
In our last two entries we installed Tomcat on the Oracle Compute Cloud. We first installed the application using oracle.bitnami.com and the second we installed Linux using Oracle Compute Cloud then downloading tomcat.apache.org and configuring the network and startup scripts. In this blog we will do the same thing for Amazon AWS. Note that there are a few blogs that do the same thing. We are going to cheat a little bit with AWS. Rather than configuring Linux, downloading Java and downloading Tomcat, we are going to go to the Amazon Marketplace and download an image that is already configured. This is similar to going through Bitnami but I thought it would be interesting to look at a different pre-configured instance and see how it differs from Bitnami. When we go to the marketplace we get the option of a community ami pool or a commercial ami pool. The selection is very diverse. I could not find anyone who pre-configured Tomcat on Oracle Enterprise Linux but did find Red Hat and Amazon Linux which are from the same codebase.

It is important to note that the commercial version does come with supplemental pricing on an hourly basis. This typically prices AWS as an option out of the running when compared to other cloud services.

We select an instance (the smallest since this is a demo of functionality) and go through the launch screens.

By default, the network configuration only opens up ssh and potentially port 80. We will need to add ports 8080 and 443. In hindsight we really don't need to add port 8080 because the commercial version remaps the catalina configuration file to port 80 but we did anyway for completeness.

Adding the new ports looks like

Note that this is different from the Oracle Compute network setup. Amazon sets this up during the instance configuration while Oracle allows you to add this after the instance is created. Neither are good or bad, just different. You do need to scroll to the far right to see the Security Group definition and follow the links to modify the rules to allow another port. My first assumption was to go to the instance configuration menu at the top but all the network options were greyed out. You need to scroll to the far right to change the ports using the security group link. I initially did not see this because my fonts were too large and I did not realize that I had to scroll to the far right to see this.

Once we have the network configured, we can review and launch the instance. Note that we can use our own ssh keys to attach to the instance.

When we finish and confirm everything we should get an initialization screen. If the startup takes too long we will get a waiting screen. Once the instance is created we should see that it is running in the EC2 console.

Once the instance is started we can connect to it. We do this by looking up the ip address and connecting with ssh.

It is important to note that Tomcat is installed in /opt/tomcat7 and the startup scripts in /etc/rc3.d/S80tomcat7_1 are already setup.

We restart the service just to test the startup script and test the instance locally by getting the html from the command line and confirm that everything works from our desktop browser.

In summary, we were able to install and configure everything using the marketplace in less than 15 minutes. The configuration was similar to the Bitnami instance but it is important to note that there is an extra cost associated with this instance on an hourly basis. The Bitnami economics are done on a per instance charge. I, for example, pay $30/month to allow me to deploy three instances across multiple cloud vendors. Note the model is on a per instance and not a per hour basis. We could have gone through the exact same configuration that we did with the Oracle Compute Cloud instance by installing Linux then using the Tomcat website to download the binaries and install. The same websites, same tar files, and same configurations work since both are Linux based installs.

installing Tomcat on Oracle

Tue, 2016-04-12 02:07
In our last entry we installed Tomcat onto the Oracle Compute Cloud using Bitnami. Just as a reminder, it was easy, it was simple, and it took 15 minutes. In this entry we are going to go through the manual process and show what has to happen on the server side and what has to happen on the cloud side. The steps that we will follow are
  • Install Oracle Enterprise Linux on the Oracle Compute Cloud Service
  • ssh into the box and install/update
    • Java 7
    • Tomcat
    • iptables to open port 80 and 443
  • Start the service and verify localhost:8080 works
  • Configure the ports on the cloud side so that 80 and 443 pass through

Installing Oracle Enterprise Linux has been done in a previous blog. We won't go through the screen shots for this other than to say that we called the box prsTomcat (as we did in the previous example) and requested OEL 6.6 with a 60 GB hard drive because this was the default installation and configuration. We selected the 60 GB hard drive because we had one preconfigured and it would reduce the creation time by not having to create and populate a new hard drive.

To ssh into the instance we need to go to the compute page and find the ip address. We need to login as opc so that we can execute sudo and install packages.

Once we have logged in we first need to verify that java is installed and configured. We do this with

java -version

This command verified that Java is properly installed. The next step is to download tomcat. To get the correct version and location to download it from we must go to tomcat.apache.org and figure out what version to install. This is a little confusing because there are numerous versions and numerous dot releases. We are looking for Tomcat 7 so we scroll down and download it from tar.gz binary distribution.

We look for Tomcat 7 and follow the download link.

Once we had downloaded the binary bundle we need to unzip this into a location that we want to run it from. In this example we are going to install it in /usr/local/apache-tomcat-version_number. This is done with

"w g e t " http://www-us.apache.org/dist/tomcat/tomcat-7/v7.0.68/bin/apache-tomcat-7.0.68.tar.gz
cd /usr/local
sudo tar xvzf /home/opc/apache-tomcat-7.0.68.tar.gz

Now that we have the binary downloaded, we have to start the server. This is done with the bin/startup.sh command. It is important to note that we will still need to install and configure an init script in /etc/rc3.d/S99tomcat to start and stop the service. This requires hand editing of the file to run the startup.sh script. Once we have Tomcat installed and running we can use "w g e t" to verify that the server is running

sudo /usr/local/apache-tomcat-7.0.68
"w g e t" http://localhost:8080
This should return the html page served by the Tomcat server. If it does not, we have an issue with the server starting or running.

Now that we have the server up and running, we need to update the iptables to add ports 80, 8080, and 443 as pass through ports. This is done by

sudo iptables -A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
sudo iptables -A INPUT -p tcp -m tcp --dport 443 -j ACCEPT
sudo iptables -A INPUT -p tcp -m tcp --dport 8080 -j ACCEPT

sudo service iptables restart

Once we have the ports properly running at the operating system layer we need to go back to our compute cloud console and create the security rules for these ports. This is done by going into the instance and clicking on the network tab at the top of the screen.

We open up port 80 first from the public internet to the instance.

We then open up port 443 similarly

The final configuration should look like

We also need to add port 8080 which is by default not installed and configured. We do this by defining a new Security Application from the Networking tab. We add port 8080 then we have to add it as a security rule.

From this point we should be able to look from our desktop to the cloud instance and see port 80, 8080, and 443. We can test port 443 by logging into the management console as we did with the bitnami configuration. We can test port 8080 by going to the default link for our server ip address.

In summary, it took 5-6 minutes to get Linux installed. It took 5-10 minutes to do the y um install based on how many packages were out of sync and needed updating. It took 4-5 minutes to open up the ports and reconfigure the network access. To get the same configuration we would have to edit the catalina.conf file and redirect the browser from port 8080 to port 80 as well as create a startup script to initialize the server at boot time. Overall this method took us about 50% longer to install and configured the exact same thing as we did with Bitnami. The benefits to doing the configuration ourselves is that we could script it with tools like Puppet and Chef. We could automate this easily and make sure it is done the same way every time. Doing it by hand and creating the instance, logging in, and using a graphic interface to configure everything leads to error and divergence as time goes on.

Note that the "w g e t" should be one word but again, our blogging software does not allow me to use that word in the blog. Grrr!

installing Tomcat through bitnami

Mon, 2016-04-11 02:07
This week we are going to focus on installing Tomcat on to cloud servers. Today we are going to take the easy route. We are going to use bitnami and look at timings of how long everything takes as well as the automatic configurations that it sets up. In previous lesions we talked about linking your cloud accounts to bitnami and will not repeat that instructions. For those that are new to public domain software, Tomcat is a public domain software package that allows you to host java applications similar to WebLogic. I won't get in to a debate over which is better because we will be covering how to install WebLogic in a later blog. I will give you all the information that you need to make that decision of which is best for our company and implementation.

We login to our oracle.bitnami.com web site and verify our account credentials.

We want to launch a Tomcat server so we search for Tomcat and hover over the icon. When we hover over the icon the word Launch appears and we click on this button.

Once we click Launch we get the virtual machine configuration screen.

Things to note on this screen are

  • the name is what you want to add to identify the virtual macine
  • the cloud account identifies which data center, if it is metered or un-metered, and what shapes will be available to this virtual machine
  • the network is automatically configured for ports 80 and 443 and enabled not only in the cloud network security configuration but in the operating system as well
  • the operating system gives you the option but we default to OEL 6.7
  • we could increase the disk size and select the memory/cpu option but it does not show us the cost because bitnami does not know if your account is metered or un-metered which have different costing models.

After we click the create button we get an update that shows us how the installation is progressing. The installation took just under 15 minutes to finish everything, launch the instance, and show us the configuration.

Once everything finishes we get the ip address, the passwords, and the ssh keys that were used to create this virtual machine.

We are able to open the link to the Tomcat server by clicking on the Go To The Application on the top right of the screen. This allows us to see the splash screen as well as access the management console.

When you click on the Access my Application you get the detailed information about the Tomcat server. We can go to the management console and look at the configuration as well as bring the server up and down.

At this point we have a valid configuration that we can see across the internet. The whole process took 15 minutes and did not require any editing or configurations other than selecting the configuration and giving the virtual machine a name.

security diversion before going up the stack

Fri, 2016-04-08 02:07
This entry is going to be more of a Linux tutorial than a cloud discussion but it is relevant. One of the questions and issues that admins are faced is creation and deletion of accounts. With cloud access being something relatively new the last thing that you want is to generate a password with telnet access to a server in the cloud. Telnet is inherently insecure and any script kiddy with a desire to break into accounts can run ettercat and look for clear text passwords flying across an open wifi or wired internet connection. What you really want to do is login via secure ssh or secure putty is you are on windows. This is done with a public/private key exchange.

There are many good explanations of ssh key exchange, generating ssh keys, and using ssh keys. My favorite is a digitalocean.com writeup. The net, net of the writeup is that you generate a public and private key using ssh-keygen or putty-gen and upload the public file to the ~user/.ssh/authorized_keys location for that user. The following scripts should work on an Azure, Amazon, and Oracle Linux instance created in the compute shapes. The idea is that we initially created a virtual machine with the cloud vendor and the account that we created with the VM is not our end user but our cloud administrator. The next level of security is to create a new user and give them permissions to execute what they want to execute on this machine. For example, in the Oracle Database as a Service images there are two users created by default; oracle and opc. The oracle user has the rights to execute everything related to sqlplus, access the file systems where the database and backups are located, and everything related to the ora user. The opc user has sudo rights so that they can execute root scripts, add software packages, apply patches, and other things. The two users have different access rights and administration privileges. In this blog we are going to look at creating a third user so that we can have someone like a backup administrator login and copy backups to tape or a disk at another data center. To do this you need to execute the following instructions.

sudo useradd backupadmin -g dba
sudo mkdir ~backupadmin/.ssh
sudo cp ~oracle/.ssh/authorized_keys ~backupadmin/.ssh
sudo chown -R backupadmin:dba ~backupadmin
sudo chmod 700 ~backupadmin/.ssh

Let's walk through what we did. First we create a new user called backupadmin. We add this user to the dba group so that they can perform dba functions that are given to the dba group. If the oracle user is part of a different group then they need to be added to that group and not the dba group. Next we create a hidden directory in the backupadmin directory called .ssh. The dot in front of the file denotes that we don't want this listed with the typical ls command. The sshd program will by default look in this directory for authorized keys and known hosts. Next we copy a known authorized_keys file into the new backupadmin .ssh directory so that we can present a private key to the operating system as the backupadmin to login. The last two commands are setting the ownership and permissions on the new .ssh directory and all files under it so that backupadmin can read and write this directory and no one else can. The chown sets ownership to backupadmin and the -R says do everything from that directory down to the same ownership. While we are doing this we also set the group permissions on all files to the group dba. The final command sets permissions on the .ssh directory to read, write, and execute for the owner of the directory only. The zeros remove permissions for the group and world.

In our example we are going to show how to access a Linux server from Azure and modify the permissions. First we go to the portal.azure.com site and login. We then look at the virtual machines that we have created and access the Linux VM that we want to change permissions for. When we created the initial virtual machine we selected ssh access and uploaded a public key. In this example we created the account pshuff as the initial login. This account is created automatically for us and is given sudo rights. This would be our cloud admin account. We present the same ssh keys for all virtual machines that we create and can copy these keys or upload other keys for other users. Best practice would be to upload new keys and not replicate the cloud admin keys to new users as we showed above.

From the portal we get the ip address of the Linux server. In this example it is 13.92.235.160. We open up putty from Windows, load the 2016.ppk key that corresponds to the 2016.pub key that we initialized the pshuff account with. When asked for a user to authenticate with we login as pshuff. If this were an Oracle Compute Service instance we would login as opc since this is the default account created and we want sudo access. To login as backupadmin we open putty and load the ppk associated with this account.

When asked for what account to login as we type in backupadmin and can connect to the Linux system using the public/private key that we initialized.

If we examine the public key it is a series of randomly generated text values. To revoke the users access to the system we change the authorized_keys file to a different key. The pub file looks like

if we open it in wordpad on Windows. This is the file that we uploaded when we created the virtual machine.

To deny access to backupadmin (in the case of someone leaving the organization or moving to another group) all we have to do is edit the authorized_keys file as root and delete this public key. We can insert a different key with a copy and paste operation allowing us to rotate keys. Commercial software like key vaults and key management systems allow you to do this from a central control point and update/rotate keys on a regular basis.

In summary, best practices are to upload a key per user and rotate them on a regular basis. Accounts should be created with ssh keys and not password access. Rather than copying the keys from an existing account it would be an upload and an edit. Access can be revoked by the root user by removing the keys or from an automated key management system.

next generation of compute services

Thu, 2016-04-07 02:07
Years ago I was a systems administrator at a couple of universities and struggled making sure that systems were operational and supportable. The one thing that frustrated me more than anything else was how long it took to figure out how something was configured. We had over 100 servers in the data center and on each of these server we had departmental web servers, mail servers, and various other servers to serve the student and faculty users. We standardized on an Apache web server but there were different versions, different configurations, and different additions to each one. This was before virtualization and golden masters became a trendy topic and things were built from scratch. We would put together Linux server with Apache web servers, PHP servers, and MySQL. These later became called LAMP servers. Again, one frustration was the differences between the different versions, how they were compiled, and how they were customized to handle a department. It was bad enough that we had different Linux versions but we had different versions of every other software combination. Debugging became a huge issue because you first had to figure out how things were configure then you had to figure out where the logs were stored and then could start looking at what the issue was.

We have been talking about cloud compute services. In the past blogs we have talked about how to deploying an Oracle Linux 6.4 server onto compute clouds in Amazon, Azure, and Oracle. All three look relatively simple. All three are relatively robust. All three have advantages and disadvantages. In this blog we are going to look at using public domain pre-compiled bundles to deploy our LAMP server. Note that we could download all of these modules into out Linux compute services using a yum install command. We could figure out how to do this or look at web sites like digitalocean.com that go through tutorials on how to do this. It is interesting buy I have to ask why. It took about 15 minutes to provision our Linux server. Doing a yum update takes anywhere from 2-20 minutes based on how old you installation is and how many patches have been released. We then take an additional 10-20 minutes to download all of the other modules, edit the configuration files, open up the security ports, and get everything started. We are 60 minutes into something that should take 10-15 minutes.

Enter stage left, bitnami.com. This company does exactly what we are talking about. They take public domain code and common configurations that go a step beyond your basic compute server and provision these configurations into cloud accounts. In this blog we will look at provisioning a LAMP server. We could have just as easily have configured a wiki server, tomcat server, distance education moodle server, or any other of 100+ public domain configurations that bitmai supports.

The first complexity is linking your cloud accounts into the bitnami service. Unfortunately, the accounts are split into three different accounts; oracle.bitnami.com, aws.bitnami.com, and azure.bitnami.com. The Oracle and Azure account linkages are simple. For Oracle you need to look up the rest endpoint for the cloud service. First, you go to the top right, click the drop down to do account management.

From this you need to look up the rest endpoint from the Oracle Cloud Console by clicking on the Details link from the main cloud portal.

Finally, you enter the identity domain, username, password, and endpoint. With this you have linked the Oracle Compute Cloud Services to Bitnami.

Adding the Azure account is a little simpler. You go to the Account - Subscriptions pull down and add account.

To add the account you download a certificate from the Azure portal as described on the bitnami.com site and import it into the azure.bitnami.com site.

The Amazon linkage is a little more difficult. To start with you have to change your Amazon account according to Bitnami Instructions. You need to add a custom policy that allows bitnami to create new EC2 instances. This is a little difficult to initially understand but once you create the custom policy it becomes easy.

Again, you click on the Account - Cloud Accounts to create a new AWS linkage.

When you click on the create new account you get an option to enter the account name, shared key, and secret key to your AWS account.

I personally am a little uncomfortable providing my secret key to a third party because it opens up access to my data. I understand the need to do this but I prefer using a public/private ssh key to access services and data rather than a vendor provided key and giving that to a third party seems even stranger.

We are going to use AWS as the example for provisioning our LAMP server. To start this we go to http://aws.bitnami.com and click on the Library link at the top right. We could just as easily have selected azure.bitnami.com or oracle.bitnami.com and followed this exact same path. The library list is the same and our search for a LAMP server returns the same image.

Note that we can select the processor core count, disk size, and data center that we will provision into. We don't get much else to choose from but it does the configuration for us and provisions the service in 10-15 minutes. When you click the create key you get an updated screen that shows progress on what is being done to create the VM.

When the creation is complete you get a list of status as well as password access to the application if there were a web interface to the application (in this case apache/php) and an ssh key for authentication as the bitnami user.

If you click on the ppk link at the bottom right you will download the private ssh key that bitnami generates for you. Unfortunately, there is not a way of uploading your own keys but you can change that after the fact for the users that you will log in as.

Once you have the private key, you get the ip address of the service and enter it into putty for Windows and ssh for Linux/Mac. We will be logging in as the user bitnami. We load the ssh key into the SSH - Auth option in the bottom right of the menu system.

When we connect we will initially get a warning but can connect and execute common commands like uname and df to see how the system is configured.

The only differences between the three interfaces is the shapes that you can choose from. The Azure interface looks similar. Azure has fewer options for processor configuration so it is shown as a list rather than a sliding scale that changes the processor options and price.

The oracle.bitnami.com create virtual machine interface does not look much different. The server selection is a set of checkboxes rather than a radio checkbox or a sliding bar. You don't get to check which data center that you get deployed into because this is tied to your account. You can select a different identity domain which will list a different data center but you don't get a choice of data centers as you do with the other services. You are also not shown how much the service will cost through Oracle. The account might be tied to an un-metered service which comes in at $75/OCPU/month or might be tied to a metered service which comes in at $0.10/OCPU/hour. It is difficult to show this from the bitnami provisioning interface so I think that they decided to not show the cost as they do with the other services.

In summary, using a service like bitnami for pre-configured and pre-compiled software packages is the future because it has time and cost advantages. All three cloud providers have marketplace vendors that allow you to purchase commercial packages or deploy commercial configurations where you bring your own license for the software. More on that later. Up next, we will move up the stack and look at what it takes to deploy a the Oracle database on all three of these cloud services.

Pages