Pat Shuff

Subscribe to Pat Shuff feed
Oracle Blogs
Updated: 15 hours 57 min ago

DBaaS in Oracle Public Cloud

Wed, 2016-05-11 02:07
Before we dive deep into database as a service with Oracle we need to define some terms. We have thrown around concepts like Standard Edition, Enterprise Edition, High Performance Edition, and Extreme Performance Edition. We have talked about concepts like DataGuard, Real Application Clustering, Partitioning, and Compression. Today we will dive a little deeper into this so that we can focus on comparing them running in the Oracle Public Cloud as well as other cloud providers.

First, let's tackle Standard Edition (SE) vs Enterprise Edition (EE). Not only is there a SE, there is a SE One and SE2. SE2 is new with the 12c release of the database and the same as SE and SE1 but with different processor and socket restrictions. The Oracle 12c documentation details the differences between the different versions. We will highlight the differences here. Note that you can still store data. The data types do not change between the versions of the database. A select statement that works in SE will work in SE2 and will work in EE.

The first big difference between SE and EE is that SE is licensed on a per socket basis and EE is licensed on a per core basis. The base cost of a SE system is $600 per month per processor in the Oracle Public Cloud. The Standard Edition is limited to 8 cores in the cloud. If you are purchasing a perpetual license the cost is $17,500 and can run across two sockets or single sockets on two systems. The SE2 comes with a Real Application Cluster (RAC) license so that you can have a single instance running on two computers. The SE2 instance will also limit the database to run in 16 threads so running in more cores will have no advantage. To learn more about the differences and limitations, I recommend reading Mike Dietrich's Blog on SE2.

The second big difference is that many of the optional features are not available with SE. For example, you can't use diagnostics and tuning to figure out if your sql command is running at top efficiency. You can't use multi-tenant but you can provision a single pluggable database. This means that you can unplug and move the database to another database (and even another version like EE). The multi-tenant option allows you to have multiple pluggable databases and control them with a master SGA. This allows admins to backup and patch a group of databases all at once rather than having to patch each one individually. You can separate security and have different logins to the different databases but use a global system or sys account to manage and control all of the databases. Storage optimization features like compression and partitioning are not available in SE either. Data recovery features like DataGuard and FlashBack are not supported in SE. DataGuard is a feature that copies changes from one system through the change logs and apply them to the second system. FlashBack does something similar and allows you to query a database at a previous time and return the state of the database at that time. It uses the change log to reconstruct the database as it was at the time requested. Tools like RMAN backup and streams don't work in SE. Taking a copy of a database and copying it to another system is not allowed. The single exception to this is RMAN works in the cloud instance but not in the perpetual on-premise version. Security like Transparent Data Encryption, Label Security, Data Vault, and Audit Vault are not supported in SE. The single exception is transparent data encryption to allow for encryption in the public cloud is supported for SE. All of these features are described here.

When we get Enterprise Edition in the Oracle Public Cloud at $3K/OCPU/month or $5.04/OCPU/hour and the only option that we get is transportation data encryption (TDE) bundled with the database. This allows us to encrypt all or part of a table. TDE encrypts data on the disk when it is written with a SQL insert or update command. Keys are used to encrypt this data and can only be read by presenting the keys using the Oracle Wallet interface. More information on TDE can be found here. The Security Inside Out blog is also a good place to look for updates and references relating to TDE. This version of the database allows us to scale upto 16 processors and 4.6 TB of storage. If we are looking to backup this database, the largest size that we can have for storage is 2.3 TB. If our table requirements are greater than 2.3 TB or 4.6 TB you need to go to Exadata as a Service or purchase a perpetual license and run it on-premise. If we are looking to run this database in our data center we will need to purchase a perpetual license for $47.5K per processor license. If you are running on an IBM Power Server you need to license each processor per core. If you are running on x86 or Sparc servers you multiply the number of cores by 0.5 and can run two cores per processor license. TDE is part of the Advanced Security Option which lists for $15K per processor license. When calculating to see if it is cheaper to run on-premise vs the public cloud you need to factor in both license requirements. The same is true if you decide to run EE in AWS EC2 or Azure Compute. Make sure to read Cloud Licensing Requirements to understand the limits of the cost of running on EC2 or Azure Compute. Since all cloud providers use x86 processors the multiplication factor is 0.5 times the number of cores on the service.

The High Performance Edition contains the EE features, TDE, as well as multi-tenant, partitioning, advanced compression, advanced security, real application testing, olap, DataGuard, and all of the database management packs. This is basically everything with the exception of Real Application Clusters (RAC), Active DataGuard, and In-Memory options. High Performance comes in at $4K/processor/month or $6.72/OCPU/hour. If we wanted to bundle all of this together and run it in our data center we need to compare the database at $47.5K/processor license plus roughly $15K/processor/option (there are 12 of them). We can then calculate which is cheaper based on our accounting rules and amortization schedule. The key differential is that I can use this version on an hourly or monthly basis for less than a full year. For example, if we do patch testing once a quarter and allocate three weeks a quarter to test if the patch is good or bad, we only need 12 weeks a year to run the database. This basically costs us $12K/processor/year to test on a single processor and $24K on a dual processor. If we purchased the system it would cost us $47.5K capital expenditure plus 22% annually for support. Paying this amount just to do patch testing does not make sense. With the three year cost of ownership running this on premise will cost us $78,850. If we use the metered services in the public cloud this will cost us $72K. The $6,850 does not seem like a lot but with the public cloud service we won't need to pay for the hardware, storage, or operating system. We can provision the cloud service in an hour and replicate our on site data to the cloud for the testing. If we did this to a computer or virtual image on site it will take hours/days to provision a new computer, storage, database, and replicate the data.

It is important to note here that you need to be careful with virtualization. You need to use software that allows for hard partitioning. Products like VMWare and HyperV are soft partitioning virtualization software. This means that you can grow the number of processors dynamically and are required to license the Oracle software for the potential high water mark or all of the cores in the cluster. If you are running on something like a Cisco UCS blade server that has a dual socket 16 core processor, you must license all 32 cores to run the database even though you might just create a 2 core virtual instance in this VMWare installation. It gets even worse if you cluster 8 blades into one cluster then you must license all 256 cores. This get a little expensive at $47.5K times 128 processors. Products like OracleVM, Solaris Contailers, and AIX LPARs solve this cost problem with hard partitions.

The third enterprise edition is the Extreme Performance Edition of the database. This feature is $5K/OCPU/month or $8.401/processor/hour. This option comes with RAC, Active DataGuard, and In-Memory. RAC allows you to run across multiple compute instances and restart queries that might fail if one node fails. Active DataGuard allows you to have two databases replicating to each other and for both to be open and active at the same time. Regular or passive DataGuard allows you to replicate the data but not keep the target open and active. In-Memory allows you to store data not only in row format but in column format. When data is entered into the table it is stored on disk in row format. A copy is also placed in memory but stored in column format. This allows you to search faster given that you have already sorted the data in memory and can skip stuff that does not apply to your search. This is typically done with an index but we can't always predict what questions that the users are going to ask and adding too many indexes slows down all operations.

It is important to reiterate that we can take our perpetual license and run it in IaaS or generic compute. We can also effectively lease these licenses on a monthly or hourly rate. If you are running the database, you are consuming licenses. If you stop the database, you stop consuming the database license but continue to consume the storage and processor services. If you terminate the database you stop consuming the database, processor, and storage services because they are all deleted upon termination.

In summary, there are four flavors of DBaaS; Standard Edition, Enterprise Edition, High Performance Edition, and Extreme Performance Edition. Standard Edition and Enterprise Edition are available by other cloud providers but some require perpetual licenses and some do not. If you decide to run this service as PaaS or DBaaS in the Oracle Public Cloud you can pay hourly or monthly and start/stop these services if they are metered to help save money. All of these services come with partial management features offloaded and done by Oracle. Backups, patches, and, restart of services are done automatically for you. This allows you to focus more on how to apply the database service to provide business benefits rather than the feeding and maintenance to keep the database operational.

Up next, we will dive into use cases for database as a service and look at different configurations and pricing models to solve a real business problem.

Exadata as a Service

Tue, 2016-05-10 02:07
For the last four days we have been focusing on Database as a Service in the cloud. We focused on Application Express, or Schema as a Service, in the last three days and looked at pricing and how to get APEX working in the Oracle Public Cloud, Amazon AWS, and Microsoft Azure. With the Oracle Public Cloud we have three options for database in the cloud at the platform as a service layer; Schema as a Service, Database as a Service, and Exadata as a Service. We could run this in compute as a service but have already discussed the benefits of offloading some of the database administration work with platform as a service (backup, patching, restarting services, etc).

The question that we have not adequately addressed is how you choose between the three services offered by Oracle. We touched on one of the key questions, database size, when we talked about Schema as a Service. You can have a free database in the cloud if your database is smaller than 25 MB. It will cost you a little money, $175/month, if you have a database smaller than 5 GB. You can grow this to 50 GB and stay with the Schema as a Service. If your database is larger than 50 GB you need to look at Database as a Service or Exadata as a Service. You also need to look at these alternatives if you are running an application in a Java container and need to attach to the database through the standard port 1521 since Schema as a Service only supports http(s) connection to the database. If you can query the database with a REST api call, Schema as a Service is an option but is not necessarily tuned for performance. Products like WebLogic or Tomcat or other Java containers can buffer select statements in cache and not have to ask the same question over and over again from the database. For example, if we census data and are interested in the number of people who live in Texas, we get back roughly 27 million rows of data from the query. If we want to drill down and look at how many people live in San Antonio, we get back 1.5 million rows. If our Java code were smart enough and our application server had enough buffer space, we would not need to read the 27 million rows back when we want to just look at the 1.5 million rows relating to San Antonio. The database can keep the data in memory as well and does not need to read the data back from disk to make the select statement to find the state or city rows that match the query.

Let's take a step back and talk about how a database works. We create a table and put information in columns like first name, last name, street address, city, state, zip code, email address, and phone number. This allows us to contact each person either through snail mail, email, or phone. If we allocate 32 bytes for each field we have 8 fields and each row takes up 256 bytes to identify each person. If we store data for each person who lives in Texas we consume 27 million rows. Each row takes up 256 bytes. The whole table will fit into 6.9 GB of storage. This data is stored in a table extent or file that we save into the /u02/data directory. If we expand our database to store information about everyone who lives in the United States we need 319 million rows. This will expand our database to 81.7 GB. Note that we have crossed the boundary for Schema as a Service. We can't store this much information in a single table so we have to look at Database as a Service or Exadata as a Service. Yes, we can optimize our database by using less than 32 bytes per column. We can store zip codes in 16 bytes. We can store phone numbers in 16 bytes. We can store state information in two bytes. We can also use compression in the database and not store the characters "San Antonio" in a 32 byte field but store it in an alternate table once and correlate it to the hexadecimal number 9c. We then store 9c into the state field which tells us that the city name is stored in another table. This saves us 1.5 million times 31 bytes (one to store the 9c) or 46 MB of storage. If we can do this for everyone in Texas shrink the storage by 840 MB. This is roughly 13% of what we had allocated for all of the information related to people who live in Texas. If we can do this for the city, state, and zip code fields we can reduce the storage required by 39% or shrink the 81.7 GB to 49.8 GB. This is basically what is done with a technology called Hybrid Columnar Compression (HCC). You create a secondary table that correlates the 9c value to the character string "San Antonio". You only need to store the character string once and the city information shrinks from 32 bytes to 1 byte. When you read back the city name, the database or storage that does the compression returns the string to the application server or application.

When you do a select statement the database looks for the columns that you are asking for in the table that you are doing a select from and returns all of the data that matches the where clause. In our example we might use

select * from census where state = 'Texas';
select * from census where city = 'San Antonio';
We can restrict what we get back by not using the "*" value. We can get just the first_name and last_name and phone number if that is all we are interested in. The select statement for San Antonio will return 1.5 million rows times 8 columns times 32 bytes or 384 MB of data. A good application server will cache this 384 MB of data and if we issue the same select statement again in a few seconds or minutes we do not need to ask the database again. We issue a simple request to the database asking it if anything has changes since the last query. If we are running on a slow internet connection as we find in our homes we are typically running at 3 MB/second download speeds. To transfer all of this data will take us 128 seconds or about two minutes. Not reading the data a second time save us two minutes.

The way that the database finds which 384 MB to return to the application is done similarly. It looks at all of the 81.7 GBs that store the census data and compares the state name to 'Texas' or hex value of corresponding to the state name. If the compare is the same, that row is put into a response buffer and transmitted to the application server. If someone comes back a few seconds later and requests the information correlating to the city name 'San Antonio', the 81.7 GB is read from disk again and and the 384 MB is pulled out to return to the application server. A smart database will cache the Texas data and recognize that San Antonio is a subset of Texas and not read the 81.7 GB a second time but pull the data from memory rather than disk. This can easily be done by partitioning the data in the database and storing the Texas data in one file or disk location and storing the data correlating to California in another file or disk location. Rather than reading back 81.7 GB to find Texas data we only need to read back 6.9 GB since it has been split out in storage. For a typical SCSI disk attached to a computer, we read data back at 2.5 GB/second. To read back all of the US data it takes us 33 seconds. It we read back all of the Texas data it takes us 2.76 seconds. We basically save 30 seconds by partitioning our data. If we read the Texas data first and the San Antonio data second with our select statements, we can cache the 6.9 GB in memory and not have to perform a second read from disk saving us yet another 33 seconds (or 3 seconds with partitioned data). If we know that we will be asking for San Antonio data on a regular basis we setup an index or materialized view in the database so that we don't have to sort through the 6.9 GB of data but access the 384 MB directly but read just the relevant 384 MB of data the first time and reduce our disk access times to 0.15 seconds. It is important to note that we have done two simple things that reduced our access time from 33 seconds to 0.15 seconds. We first partitioned the data and the way that we store it by splitting the data by state in the file system. We second created an index that helped us access the San Antonio data in the file associated with Texas without having to sort through all of the data. We effectively pre-sort the data and provide the database with an index. The cost of this is that every insert command to add a new person to San Antonio requires not only updating the Texas table but updating the index associated with San Antonio as well. When we do an insert of any data we must check to see if the data goes into the Texas table and update the index at the same time whether the information correlates to San Antonio or not because the index might change if data is inserted or updated in the middle of the file associated with the Texas information.

Our original question was how do we choose between Schema as a Service, Database as a Service, and Exadata as a Service. The first metric that we used was table size. If our data is greater than 25 MB, we can't use the free APEX service. If our data is greater than 50 GB, we can't use the paid APEX or Schema as a Service. If we want to use features like compression or partitioning, we can't use the Schema as a Service either unless we have sys access to the database. We can create indexes for our data to speed requests but might or might not be able to setup compression or partitioning since these are typically features associated with the Enterprise Edition of the database. If we look at the storage limitations of the Database as a Service we can currently store 4.8 TB worth of data in the database. If we have more data than that we need to go to Exadata as Service. The Exadata service comes in different flavors as well and allows you to store up to 42 TB with a quarter rack, 84 TB with a half rack, and 168 TB with a full rack. If you have a database larger than 168 TB, there are no solutions in the cloud that can store your data attached to an active database. You can backup your data to cloud storage but you can not have an active database attached to it.

If we look a little deeper into the Exadata there are multiple advantages to going with Exadata as a Service. The first and most obvious is that you are suddenly working on dedicated hardware. In most cloud environments you share processors with other users as well as storage. You do not get a dedicated bandwidth from processor to disk but must time share this with other users. If you provision a 16 core system, it will typically consume half of a 32 core system that has two sockets. This means that you get a full socket but have to share the memory and disk bandwidth with the next person running in the same server. The data read from the disk is cached in the disk controller's cache and your reads are optimized until someone else reads data from the same controller and your cached data gets flushed to make room. Most cloud vendors go with commodity hardware for compute and storage so they are not optimized for database but for general purpose compute. With an Exadata as a Service you get hardware optimized for database and you get all of the processors in the quarter, half, or full rack. There is no competing for memory bandwidth or storage bandwidth. You are electrically isolated from someone in the other quarter or half rack through the Infiniband switch. Your data is isolated on spindles of your own. You get the full 40 GB/second to and from the disk. Reading the 81.7 GB takes 2.05 seconds compared to 32.68 seconds through a standard SCSI disk controller. The data is partitioned and stored automatically so that when we ask for the San Antonio data, we only read back the 384 MB and don't need to read back all of the data or deal with the index update delays when we write the data. The read scans all 81.7 GB and returns the results in 0.01 seconds. We effectively reduce the 33 seconds it took us previously and dropped it to 10 ms.

If you want to learn more about Exadata and how and why it makes queries run faster, I would recommend the following books

or the following youtube video channels or the following web sites

The Exadata as a Service is a unique offering in the cloud. Amazon and Microsoft have nothing that compares to it. Neither company offers dedicated compute that is specifically designed to run a database in the cloud with dedicated disk and dedicated I/O channels. Oracle offers this service to users of the Enterprise Edition of the database that allows them to replicate their on-premise data to the cloud, ingest the data into an Exadata in the cloud, and operate on the data and processes unchanged and unmodified in the cloud. You could take your financial data that runs on a 8 or 16 core system in your data center and replicate it to an Exadata in the cloud. Once you have the data there you can crunch on the data with long running queries that would take hours on your in house system. We worked with a telecommunications company years ago that was using an on-premise transportation management system and generated an inventory load list to put parts on their service trucks, work orders for the maintenance repair staff, and a driving list to route the drivers on the optimum path to cover the largest number of customers in a day. The on-premise system took 15-16 hours to generate all of this workload and was prone to errors and outages requiring the drivers to delay their routes or parts in inventory to be shipped overnight for loading in the morning onto the trucks. Running this load on an Exadata dropped the analytics to less than an hour. This allowed trucks to be rerouted mid-day to higher profit customers to handle high priority outages as well as next day delivery of inventory between warehouses rather than rush orders. Reducing the analytics from 15 hours to less than an hour allowed an expansion of services as well as higher quality of services to their customer base.

Not all companies have daily issues like this and look for higher level processing once a quarter or once or twice a year. Opening new retail outlets, calculating taxes due, or provisioning new services that were purchased as Christmas presents are three examples of predictable, periodic instances where consuming a larger footprint in the cloud rather than investing in resources that sits idle most of the year in your data center. Having the ability to lease these services on an monthly or annual basis allows for better utilization of resources not only in your data center but reduces the overall spend of the IT department and expanding the capabilities of business units to do things that they normally could not afford.

Exadata as a Service is offered in a non-metered configuration at $40K per month for a quarter rack (16 cores and 144 TB of disk), $140K per month for a half rack (56 cores and 288 TB of disk), or $280K per month for a full rack (112 cores and 576 TB of disk). The same service is offered on a metered basis for $80K for a quarter rack, $280K for a half rack, and $560K for a full rack (in the same configuration as the non-metered service). One of the things that we recommend is that you analyze the cost of this service. Is it cheaper to effectively lease a quarter rack at $80K for a month and get the results that you want, effectively lease a quarter rack at $480K for a year, or purchase the hardware, database license, RAC licenses, storage cell licenses, and other optional components to run this in your data center. We will not dive into this analysis because it truly varies based on use cases, value to your company for the use case, and cost of running one of these services in your data center. It is important to do this analysis to figure out which consumption model works for you.

In summary, Exadata as a Service is a unique service that no other cloud vendor offers. Having dedicated hardware to run your database is unique for cloud services. Having hardware that is optimized for long, complex queries is unique as well. Exadata is one of the most popular hardware solutions offered by Oracle and having it available on a monthly or annual basis allows customers to use the services at a much lower cost than purchasing a box or larger box for their data center. Having Oracle manage and run the service frees up your company to focus on the business impact of the hardware and accelerated database rather than spend month to administer the server and database. Tomorrow we will dive into Database as a Service and see how a generic database in the cloud has a variety of use cases and different cost entry points as well as features and functions.

APEX in Azure

Mon, 2016-05-09 02:07
Today we are going to look and see what it takes to get Schema as a Service running in the Microsoft Azure Cloud. Our last two entries looked at Schema as a Service, or Application Express, or APEX, running in the Oracle Public Cloud for free, $175, $900, or $2000/month or running in the Amazon RDS Service for $50 - $2700/month. The idea behind Schema as a Service is that you are leasing a database instance on a compute server in the cloud. You get automated backup, html and htmls interfaces into the database, tools to load and unload data, interfaces to run ad-hoc queries or scripted queries, and REST apis to access our data. The pricing on the Oracle Public Cloud is based on how much storage that you consume for your database. The cost on Amazon RDS is based on the compute power allocated to the database. Storage is charged separately and the more you consume, the more you get charged.

Azure uses the Amazon model for pricing in that it charges on a processor shape. Automated backup and the APEX interface/libraries are not included and requires a separate step to install and configure the software on top of the Oracle database. The default operating installation is Windows Server and you are billed monthly for this shape as well. Azure does not offer backup or any other part of a managed service but only provided a virtual image with the Oracle database pre-installed on a Windows Server in the cloud. The pricing for this software is ShapeCoresRAMDiskStandard EditionEnterprise EditionShape price A0 Basic or Standard0.250.75 GB19 GB$1.11/hr or $826/mo$3.16/hr or $2,351/mo0.02/hr or $15/mo A1 Basic or Standard11.75 GB224 GB$1.11/hr or $826/mo$3.16/hr or $2,351/mo0.09/hr or $67/mo A2 Basic or Standard23.5 GB489 GB$1.11/hr or $826/month$3.16/hr or $2,351/mo0.18/hr or $134/mo A5 Standard214 GB489 GB$1.11/hr or $826/month$3.16/hr or $2,351/mo0.33/hr or $246/mo A3 Basic or Standard47 GB999 GB$1.28/hr or $952/mo$6.32/hr or $4,702/mo0.036/hr or $268/mo A6 Standard4428 GB999 GB$1.28/hr or $952/mo$6.32/hr or $4,702/mo A4 Basic or Standard814 GB2,039 GB$2.55/hr or $1,897/mo$12.63/hr or $9,397/mo0.72/hr or $536/mo A7 Standard856 GB2,039 GB$2.55/hr or $1,897/mo$12.63/hr or $9,397/mo1.32/hr or $982/mo A8 Standard856 GB382 GB$2.55/hr or $1,897/mo$12.63/hr or $9,397/mo A10 Standard856 GB382 GB$2.55/hr or $1,897/mo$12.63/hr or $9,397/mo A9 Standard16112 GB382 GB$5.10/hr or $3794/mo$25.27/hr or $18,801/mo A11 Standard16112 GB382 GB$5.10/hr or $3794/mo$25.27/hr or $18,801/mo D1 Standard13.5 GB50 GB$1.11/hr or $826/mo$3.16/hr or $2,351/mo$0.14/hr or $104/mo D2 Standard27 GB100 GB$1.11/hr or $826/mo$3.16/hr or $2,351/mo$0.28/hr or $208/mo D11 Standard214 GB100 GB$1.11/hr or $826/mo$3.16/hr or $2,351/mo$0.33/hr or $246/mo D3 Standard414 GB200 GB$1.28/hr or $952/mo$6.32/hr or $4,702/mo$0.56/hr or $417/mo D12 Standard428 GB200 GB$1.28/hr or $952/mo$6.32/hr or $4,702/mo$0.652/hr or $485/mo D4 Standard828 GB400 GB$2.55/hr or $1,897/mo$12.63/hr or $9,397/mo$1.12/hr or $833/mo D14 Standard16112 GB800 GB$5.10/hr or $3,794/mo$25.27/hr or $18,801/mo$2.11/hr or $1571/mo

Note that this is a generic database pricing with compute shape pricing. It is not a Schema as a Service pricing. We still need to layer APEX on top of the generic database installation. This provides the cheapest option for deploying Schema as a Service as $841/month or $1.13/hr for Standard Edition or $2,366/month or $3.18/hr for Enterprise Edition. We can basically stop here. Running an Oracle database on a single core with less than 1 GB is unusable. Going to the A1 Basic or Standard shape increases the memory to 1.75 GB which might work for a single schema and the 70 GB of disk will store enough tables for most use cases.

Let's walk through creation of an Oracle database on Azure. First we go to Azure portal and search for the Oracle virtual machine by clicking on New.

We are looking for either the Enterprise Edition or Standard Edition of the database.

We select the Standard Edition and ask that it be provisioned.

Once we select the product, we select the shape that we will run this instance on. The shape recommended is relatively large so we have to look at all shapes and we select the A1 shape primarily for cost reasons. We should look at what we are trying to do and load the right core count and memory footprint.

We select the network and storage models for this instance. We go with the defaults in this example rather than adding another storage instance since we are just looking for a small footprint.

Some things to note here. First, we have the option of a username and password or ssh to connect to the operating system. Second, we are never presented with any information about the operating system. Based on the documentation that we read earlier, I assume that this would be Windows Server. It turns out that is actually Oracle Enterprise Linux 6.7. Third, we are never asked information about the database. We are not asked about the SID, password for sys, or ports to open up or use to connect to the database. It turns out that the database is installed in the /u01 directory but no database is created. You still need to run the dbca to create a database instance and start a listener. There are other OS options available to install the database on. We theoretically could have selected Windows 2012 but these options did not come up with our search.

It took a few minutes to start up the database virtual machine. We can look up the details on the instance to find the ip address to connect to and use putty or ssh to connect to the instance.

When we log in we can look at the installation directory and notice that everything is installed in /u01. When we look at the /etc/oratab we notice that nothing is configured or installed. We will need to run oratab to create a database instance. We will then need to download and install APEX to configure Schema as a Service.

In summary, we can install and run an Oracle database on Azure. The installation is lacking a bit. This is more of an Infrastructure as a Service with a binary pre-installed but not configured. This is not Database as a Service and lacking when we compare it to Schema as a Service. To get Schema as a Service we need to download software, install it, change the network configuration, and update the virtual machine network (we did not show this). Microsoft has a good tutorial on the steps needed to create the database once the virtual machine is installed. You do get root access to the operating system. You do get sys access to the database. You do get file system access through the operating system. The documentation says that the service is on Windows but got provisioned on Linux. We might have done something wrong looking back or the documentation is wrong. The service is priced for generic database starting at $800/month or more based on the shape you select. This installation is not DBaaS or PaaS. Backups are not automatically done for you. Patches are not configure or installed. You basically get an operating system with a binary configured. The database is not configured and ports are not configured to allow you to connect across the internet.

Apex in Amazon AWS

Fri, 2016-05-06 02:07
Today we are going to look at running Schema as a Service using Amazon AWS as your IaaS foundation. Yesterday we looked at Schema as a Service using Oracle DBaaS. To quickly review, you can run a schema in the cloud using apex.oracle.com for tables upto 25 MB for free or cloud.oracle.com for tables of 5 GB, 20GB, or 50 GB for a monthly fee. You do not get a login to the operating system, database, or file system but do everything through an html interface. You can query the database through an application that you write in the APEX interface or through REST api interfaces, both of which are accessible through http or https. Today we are looking at what it takes and how much it cost to do the same thing using Amazon AWS.

Amazon offers a variety of database options as a managed service. This is available through the Amazon RDS Console. This screen looks like

If you go to the RDS Getting Started page you will see that you can provision

  • MySQL
  • Oracle
  • SQL Server
  • Maria DB
  • or an Amazon custom database, Aurora

We won't go into a compare and contrast in this blog entry to compare the different database but go into the Oracle RDS offerings and look at what you get that compares to Schema as a Service offered by Oracle.

The Oracle database offerings that you get from Amazon RDS are

  • Oracle Standard Edition One
  • Oracle Standard Edition Two
  • Oracle Standard Edition
  • Oracle Enterprise Edition
Note that we can launch any version in EC2 but we are trying to look for a platform as a service where the database is pre-configured and some management is done for us like patching, backups, and operating system maintenance, failure detection, and service restarting.

You can launch 11g or 12c version of the Oracle database but it is important to note that through RDS you do not get access to the operating system, file system, or sys/system user. There is an elevated user that lets you perform a limited function list but not all options are available to this elevated user. Some features are also not enabled in the 12c instance

  • In-Memory
  • Spatial
  • multi-tenant or pluggable databases
  • Real Application Clusters (RAC)
  • Data Guard / Active Data Guard
  • Connection to Enterprise Manager
  • Automated Storage Management
  • Database Vault
  • Java libraries
  • Locator services
  • Label Security
In the 11g instance the above list plus the following features are not supported
  • Real Application Testing
  • Streams
  • XML DB
The following roles and privileges are not provided to users in Amazon RDS
  • Alter database
  • Alter system
  • Create any directory
  • Drop any directory
  • Grant any privilege
  • Grant any role

Before we dive into the usage of Amazon RDS, let's talk pricing and licensing. The only option that you have for a license included with RDS is the Standard Edition One license type. To figure out the cost, we must look at the sizes that we can provision as well as the RDS cost calculator. To start this journey, we start at the AWS console, go to the RDS console, and select Oracle SE1 as the instance type.

If we select the license-included License Model we get to look at the shapes that we can deploy as well as the versions of the database.

We can use the cost calculator in conjunction to figure out the monthly cost of deploying this service. For our example we selected 11.2.0.4 v7, db.t2.micro (1 vCPU, 1 GB RAM), and 20 GB of storage. For this shape we find that the monthly cost will be $25.62. We selected the 11.2.0.4 version because this is the only 11g option available to us for the SE1 licensed included selection. We could have selected the 12.1.0.1 as an option. If we select any other version we must bring our own license to run on AWS. It is important to look at the outbound transfer rate because this cost is some times significant. If we put 20 GB outbound traffic the price increases to $26.07 which is not significant. This says that we can backup our entire database once a month offsite and not have to pay a significant to get our database off RDS.

It is important to look at the shape options that we have for the different database versions. We should also look at the cost associated with it. For 11g we have

  • db.t2.micro (1 vCPU, 1 GB) - $25.62/month
  • db.t2.small (1 vCPU, 2 GB) - $51.24/month
  • db.t2.medium (2 vCPU, 4 GB) - $102.48/month
  • db.t2.large (2 vCPU, 8 GB)- $205.70/month
  • db.m4.large (2 vCPU, 8 GB)- $300.86/month
  • db.m4.xlarge (4 vCPU, 16 GB)- $602.44/month
  • db.m4.2xlarge (8 vCPU, 32 GB)- $1324.56/month
  • db.m4.4xlarge (16 vCPU, 64 GB) - $2649.11/month
  • db.m3.medium (1 vCPU, 3.75 GB) - $153.72/month
  • db.m3.large (2 vCPU, 7.5 GB) - $307.44/month
  • db.m3.xlarge (4 vCPU, 15 GB) - $614.88/month
  • db.m3.2xlarge (8 vCPU, 30 GB) - $1352.74/month
  • db.r3.large (2 vCPU, 15 GB) - $333.06/month
  • db.r3.xlarge (4 vCPU, 30 GB) - $666.12/month
  • db.r3.2xlarge (8 vCPU, 61 GB) - $1465.47/month
  • db.r3.4xlarge (16 vCPU, 122 GB) - $2930.93/month
  • db.m2.xlarge (2 vCPU, 17 GB) - $409.92/month
  • db.m2.2xlarge (4 vCPU, 34 GB) - $819.84/month
  • db.m2.4xlarge (8 vCPU, 68 GB) - $1803.65/month
  • db.m1.small (1 vCPU, 3.75 GB) - $84.18/month
  • db.m1.medium (2 vCPU, 7.5 GB) - $168.36/month
  • db.m1.large (4 vCPU, 15 GB) - $336.72/month
  • db.m1.xlarge (8 vCPU, 30 GB) - $673.44/month
For the 12c version we have

  • db.m1.small (1 vCPU, 3.75 GB) - $84.18/month
  • db.m3.medium (1 vCPU, 3.75 GB) - $153.72/month
  • db.m3.large (2 vCPU, 7.5 GB) - $307.44/month
  • db.m3.xlarge (4 vCPU, 15 GB) - $614.88/month
  • db.m3.2xlarge (8 vCPU, 30 GB) - $1352.74/month
  • db.m2.xlarge (2 vCPU, 17 GB) - $409.92/month
  • db.m2.2xlarge (4 vCPU, 34 GB) - $819.84/month
  • db.m2.4xlarge (8 vCPU, 68 GB) - $1803.65/month
  • db.m1.medium (2 vCPU, 7.5 GB) - $168.36/month
  • db.m1.large (4 vCPU, 15 GB) - $336.72/month
  • db.m1.xlarge (8 vCPU, 30 GB) - $673.44/month

    If we want to create the database, we can select the database version (11g), the processor size (smallest just to be cheap for demo purposes), and storage. We define the OID, username and password for the elevated user, and click next

    We then confirm the selections and backup schedule (scroll down to see), and click on Launch.

    When we launch this, the system shows that the inputs were accepted and the database will be created. We can check on the status by going to the RDS console.

    It takes a few minutes to provision the database instance, 15 minutes in our test. When the creation is finished we see available rather than creating for the status.

    Once the instance is created we can connect to the database using the Oracle Connection Instructions and connect using sqlplus installed on a local machine connecting to a remote database (the one we just created), using the aws connection tools to get status (aws rds describe-db-instances --headers), or connecting with sql developer to the ip address, port 1521, and user oracle with the password we specified. We chose to open up port 1521 to the internet during the install which is not necessarily best practices.

    Note that we have fallen short of Schema as a Service. We have database as a service at this point. We will need to layer application express on top of this to get Schema as a Service. We can install APEX 4.1.1 on the 11g instance that we just created by following installation instructions. Note that this is a four step process followed by spinning up and EC2 instance and installing optional software to run a listener because the APEX listener is not supported on the RDS instance. We basically add $15-$20/month to spin up a minimal EC2 instance and install the listener software and follow the nine step installation process to link the listener to the RDS instance.

    The installation and configuration steps are similar for 12c. We can provision a 12c instance of the database in RDS, spin up an EC2 instance for the listener, and configure the EC2 instance to point to the RDS instance. At this point we have the same experience that we have as Schema as a Service with the Oracle DBaaS option.

    In summary, we can provision a database into the Amazon RDS. If we want anything other than Standard Edition One, we need to bring our own license at $47.5K/two cores for Enterprise Edition or $17.5K for Standard Edition and maintain our annual support cost at 22%. If we want to use a license provided by Amazon we can but are limited to the Standard Edition One version and APEX 4.1.1 with 11.2.0.4 or APEX 4.2.6 with 12.1.0.1. Both of these APEX versions are a major version behind the 5.0 version offered by Oracle through DBaaS. On the positive side, we can get a SE One instance with APEX installed for about $50/month for 11g or $100/month for 12c which is slightly cheaper than the $175/month through Oracle. The Oracle product is Enterprise Edition vs Standard Edition One on AWS RDS and the Oracle version does come with APEX 5.0 as well as being configured for you upon provisioning as opposed to having to perform 18 steps and spin up an EC2 instance to act as a listener. It really is difficult to compare the two products as the same product but if you are truly only interested in a simple query engine schema as a service in the cloud, RDS might be an option. If you read the Amazon literature, switching to Aurora is a better option but that is a discussion for another day.

  • Application Express or Schema as a Service

    Thu, 2016-05-05 07:49
    Today we are doing to dive headlong into Schema as a Service. This is an interesting option offered by Oracle. It is a unique service that is either free or can cost as much as $2K/month. Just a quick review, you get the Oracle database from an http interface for:
    • 10 MB storage - free
    • 25 MB storage - free
    • 5 GB - $175/month
    • 20 GB - $900/month
    • 50 GB - $2000/month
    When you consume this service you don't get access to an operating system. You don't get access to a file system. You don't get access to the database from the command line. All access to the database is done through an http or https interface. You can access the management console to load, backup, and query data in the database. You can load, backup, and update applications that talk to the data in the database. You can create a REST api that allows you to read and write data in your database as well as run queries against the database. You get a single processor access with 7.5 GB of RAM running the 12c version of the Oracle database inside a pluggable container and isolated from other users sharing this processor with you. Microsoft offers an Express Edition of SQL Server and the table storage service that allows you to do something similar. Amazon offers a lightweight database that does simple table lookups. The two key differences between these products is that all access to the Oracle Schema as a Service is done through http or https. Applications can be written but run inside the database and not on a separate server as is done with the other similar cloud options. Some say this is an advantage, some say it is a disadvantage.

    We covered this topic from a different angle a month ago in convertign excel to apex and printing from apex. Both of these blog entries talk about how to use Schema as a Service to solve a problem. Some good references on Schema as a Service can be found at

    I typically use safari books subscription to get these books on demand and on my iPad for reading on an airplane.

    When we login to access the database we are asked to present the schema that we created, a username, and a password. Note in this example we are either using the external free service apex.oracle.com or the Oracle corporate service for employees apex.oraclecorp.com. The two services are exactly the same. As an Oracle employee I do not have access to the public free service and am encouraged to use the internal service for employees. The user interface is the same but screen shots will bounce between the two as we document how to do things.

    Once we login we see a main menu system that allows us to manage application, manage tables in the database, to team development, and download and install customer applications.

    The Object Browser allows us to look at the data in the database. The SQL Commands allow us to make queries into the database. The SQL Scripts allows us to load and save sql commands to run against the database. Utilities allows us to load and unload data. The REST ful service allows us to define html interfaces into the database. If we look at the Object Browser, we can look at table definitions and data stored in tables.

    We can use the SQL Commands tab to execute select statements against a table. For example, if we want to look at part number B77077 we can select it from the pricelist by matching the column part_number. We should get back one entry since there is only one part for this part number.

    If we search for part number B77473 we get back multiple entries that are the same part number. This search returns six lines of data with more data in other columns than the previous select statement.

    The SQL Scripts allows you to load scripts to execute from your laptop or desktop. You can take queries that have run against other servers and run them against this server.

    Up to this point we have looked at how to write queries, run queries, and execute queries. We need to look at how to load data so that we have something to query against. This is done in the Utilities section of the Schema as a Service. Typically we start with a data source either as an XML source or an Excel spreadsheet. We will look first at taking an Excel spreadsheet like the one below and importing it into a table.

    Note that the spreadsheet is well defined and headers exist for the data. We do have some comments at the top that we need to delete so that the first row becomes the column names in our new table. Step one is to save the sheet as a comma separated value file. Step two is to edit this file and delete the comment and blank like. Step three is to upload the file into the Schema as a Service web site.

    At this point we have a data source loaded and ready to drop into a table. The user interface allows us to define the column type and tries to figure out if everything is character strings, numbers, or dates. The tools is good at the import but typically fails at character length and throws exceptions on specific rows. If this happens you can either manually enter the data or re-import the data into an existing table. Doing this can potentially cause replication of data so deleting the new table and re-importing into a new table might be a good thing. You have to play at this point to import your data and get the right column definitions to import all of your data.

    Alternatively we can import xml data into an existing table. This is the same process of how we backup our data by exporting it as xml.

    At this point we have loaded data using a spreadsheet or csv file and an xml file. We can query the database by entering sql commands or loading sql scripts. We could load data from a sql script if we wanted but larger amounts of data needs to be imported with a file. Unfortunately, we can not take a table from an on-premise database or an rman backup and restore into this database. We can't unplug a pdb and plug it into this instance. This service does have limitations but for free or less than $200 per month, the service provides a lot of functionality.

    To create an application to read and display our data, we must create an application. To do this we go into the application development interface of Schema as a Service.

    We select the Application Builder tab at the top and click on Create icon. This takes us to a selection of what type of application to build. We are going to build a desktop application since it has the most function options. We could have easily selected the mobile option which formats displays for a smaller screen format.

    We have to select a name for our application. In this example we are calling it Sample_DB since we are just going to query our database and display contents of a table.

    We are going to select one page. Note that we can create multiple pages to display different information or views into a table. In our previous blog entry on translating excel to apex we created a page to display the cost of archive and the cost of storage on different pages. In this example we are going to create one page and one page only.

    If we have shared components from other apps or want to create libraries that can called from other apps, we have the option at this point to define that. We are not going to do this but just create a basic application to query a database.

    We can create a variety of authorization sources to protect our data. In this example we are going to allow anyone to read and write our table. We select no authorization. We could use an external source or look at the user table in the database to authenticate users. For this example, we will leave everything open for our application.

    We get a final confirmation screen (not shown) and create the application. When the create is finished we see the following screen that lets us either run the application or edit it.

    If we click on the Home icon we can edit the page. This screen is a little daunting. There are to many choices and things that you can do. There are different areas, breadcrumbs for page to page navigation, and items that you can drop into a screen. In this example we are going to add a region by hovering the mouse over the Content Body and right clicking the mouse. This allows us to create a new region in the body of our page.

    Note that a new region is created and is highlighted in the editor. We are going to edit the content type. We have a wide variety of options. We could type in static text and this basically becomes a static web page. Note that we could create a graph or chart. We could create a classic report. We could create a form to submit data and query a table. We will use the interactive report because it allows us to enter sql for a query into our table.

    In this example we will enter the select statement in the query box. We could pop this box into another window for full screen editing. For our example we are doing a simple select * into our table with select * from pricelist.

    When we click the run button at the top right we execute this code and display it in a new window. We can sort this data, we can change the query and click Go. This is an interactive from to read data from our table. If we wanted to restrict the user from reading all of the data we would have selected a standard report rather than an interactive report.

    The final part of our tutorial is creation of a REST api for our data. We would like to be able to go to a web page and display the data in the table. For example, if we want to look at the description of part number B77077 it would be nice to do it from a web page or get command at the command line. To do this we go to the SQL Workshop tab and click the RESTful Service icon at the top right.

    Again, this screen is a little daunting. We get a blank screen with a create button. Clicking the create button takes to a screen where we need to enter information that might not be very familiar.

    The screen we see is asking us to enter a name for our service, a template, and a resource handler. Looking at this for the first time, I am clueless as to what this means. Fortunately, there is an example on how to enter this information if you scroll down and click on the Example button.

    If we look at the example we see that the service name is the header that we will hit from our web page.

    In our example we are going to create a cloud RESTapi where we expose the pricelist. In this example we call the service cloud. We call the resource template pricelist and allow the user to pass in a part_number to query. In the resource handler we go a get function that does a select from the table. We could pass in the part number that we want to read but for simplicity we ignore the part number and return all rows in the table. Once we click save, we have exposed our table to a web query with no authentication.

    Once we have created our REST service we can query the database from a we browser using the url of the apex server/pls/apex/(schema name)/pricelist/(part number). In this example we go to apex.oraclecorp.com/pls/apex/parterncloud/pricelist/B77077. It executes the select statement and returns all rows in the table using JSON format.

    In summary, we are able to upload, query, and display datatbase data using http and https protocols. We can upload data in xml or csv format. We can query the database using web based tools or REST interfaces. We can display data either by developing a web based program to display data or pull the data from a REST interface and get the data in JSON format. This service is free if your database size is small. If we have a larger database we can pay for the service as well as host the application to read the data. We have the option to read the data from a REST interface and pull it into an application server at a different location. We did not look at uploading data with a PUT interface through the REST service but we could have done this as well. Up next, how do we implement this same service in AWS or Azure.

    Database as a Service

    Wed, 2016-05-04 16:38
    Today we are going to dive into Database as a Service offered from Oracle. This product is the same product offered by Oracle as a perpetual processor license or perpetual named user license for running database software in your data center. The key different is that the database is provisioned onto a Linux server in the cloud and rather than paying $47,500 for a processor license and 22% annually after that, you pay for the database services on an hourly or monthly basis. If you have a problem that needs only a few weeks, you pay for the service for a few weeks. If you have a problem that takes a very large number of processors but for a very short period of time, you can effectively lease the large number of processors in the cloud and purchase a much smaller number of processors in your data center. Think of a student registration system. If you have 20K-30K students that need to log into a class registration system, you need to size this server for the peak number of students going through the system. In our example, we might need an 8 core system to handle the load during class registration. Outside the two or three weeks for registration, this system sits idle at less than 10% utilization because it is used to record and report grades during the semester. Rather than paying $47.5K times 8 cores times 0.5 for an x86 or Sparc server ($190K), we only have to pay $47.5K times 2 cores times 0.5 for x86 or Sparc cores ($47.5K) and lease the additional processors in the cloud for a month at $3K/core/month ($24K). We effectively reduced the cost from $190K to $71.5K by using the cloud for the peak period. Even if we do this three times during the year the price is $119.5K which is a cost savings of $70.5K. The second year we would be required to pay $41.8K in support cost for the larger server. By using the smaller server we drop the support cost to $10.5K. This effectively pays for leasing a third of the cloud resources by using a smaller server and bursting to the cloud for high peak utilization.

    Now that we have looked at one of our use cases and the cost savings associated with using the cloud for peak utilization and reducing the cost of on servers and software in our data center, let's dive into the pricing and configuration of Database as a Service (DBaaS) offered by Oracle in the public cloud services. If we click on the Platform -> Database menu we see the following page.

    If we scroll down to the bottom we see that there are effectively three services that we can use in the public cloud. The first is Database Schema as a Service. This allows you to access a database through a web interface and write programs to read and present data to the users. This is the traditional Application Express interface or APEX interface that was introduced in Oracle 9. This is a shared service where you are given a database instance that is shared with other users. The second service is Database as a Service. This is the 11g or 12c database installed on a Linux installation in the cloud. This is a full installation of the database with ssh access to the operating system and sqlplus access to the database from a client system. The third service is Exadata as a Service. This is the Oracle database on dedicated hardware that is optimized to run the Oracle database.

    The Schema as a Service is also known as Application Express. If you have never played with apex.oracle.com, click on the link and register for a free account. You can create an instance, a database schema, and store upto 10 MB or 25 MB of data for free. If you want to purchase a larger storage amount it is sold in 5 GB, 20 GB, or 50 GB increments.

    The 10 or 25 MB instance is free. The 5 GB instance is $175/month. The 20 GB is $900/month, and the 50 GB is $2,000/month.

    Tomorrow we will dive a little deeper into Schema as a Service. In summary, this is a database instance that can contain multiple tables and has an application development/application web front end allowing you to access the database. You can not attach with sqlplus. You can not attach with port 1521. You can not put a Java or PHP front end in front of your database and use it as a back end repository. You can expose database data through applications and REST api interfaces. This instance is shared on a single computer with other instances. You can have multiple instances on the same computer and the login give you access to your applications and your data in your instance.

    The Database as a Service (DBaaS) is slightly different. With this you are getting a Linux instance that has been provisioned with a database. It is a fully deployed, fully provisioned database based on your selection criteria. There are many options when you provision DBaaS. Some of the options are virtual vs full instance, 11g vs 12c, standard edition vs enterprise edition vs enterprise edition high performance vs enterprise edition extreme performance. You need to provide an expected data size and if you plan on backing up the data and a cloud object repository if you do. You need to provide ssh keys to login as oracle or opc/root to manage the database and operating system. You also need to pick a password for the sys/system user inside the database. Finally, you need to pick the processor and memory shape that will run the database. All of these options have a pricing impact. All of these options effect functionality. It is important to know what each of these options means.

    Let's dive into some of these options. First, virtual vs full instance. If you pick a full instance you will get an Oracle Enterprise Linux installation that has the version of the database that you requested fully installed and operational. For standard installations the file system is the logical volume manager and the file system is provisioned across four file systems. The /u01 file system is the ORACLE_HOME. This is where the database binary is installed. The /u02 file system is the +DATA area. This is where table extents and table data is located. The /u03 file system is the +FRA area. This is where backups are dropped using the RMAN command which should run automatically every night for incremental backups and 2am on Sunday morning for a full backup. You can change the times and backup configurations with command line options. The /u04 area is teh +RECO area. This is where change logs and other log files are dropped. If you are using Data Guard to replicate data to another database or from another database, this is where the change logs are found.

    If you pick a virtual instance you basically get a root file system running Oracle Enterprise Linux with a tar ball that contains the oracle database. You can mount file systems as desired and install the database as you have it installed in your data center. This configuration is intended to mirror what you have on-premise to test patches and new features. If you put everything into /u01 then install everything that way. If you put everything in the root file system, you have the freedom to do so even though this is not the recommended best practice.

    The question that you are not asked when you try to create a DBaaS is if this service is metered or non-metered. This question is asked when you create your identity domain. If you request a metered service, you have the flexibility to select the shapes that you want and if you are billed hourly or monthly. The rates are determined by the processor shape, amount of memory, and what database option you select (standard, enterprise, high performance, or extreme performance). More on that later. With the metered option you are free to stop the database (but not delete it) and retain your data. You suspend the consumption of the database license but not the compute and storage. This is a good way of saving a configuration for later testing and not getting charged for using it. Think of it as having an Uber driver sit outside the store but not charge you to sit there. When you get back in the car the charge starts. A better analogy would be the Cars2Go. You can reserve a car for a few hours and drive it from Houston to Austin. You park the car in the Cars2Go parking slot next to the convention center and don't pay for parking. You come out at the end of your conference, swipe your credit card and drive the car back to Houston. You only get charged for the car when it is between parking lots. You don't get charged for it while it is parked in the reserved slot. You pay a monthly charge for the service (think of compute and storage) at a much lower rate. If you think of a non-metered service as renting a car from a car rental place, you pay for the car that they give you and it is your until you return it to the car rental place. You can't not pay for the car while you are in your convention as with Card2Go. You have to pay for parking at the hotel or convention center. You can't decide half way into your trip that you really need a truck instead of a car or a mini-van to hold more people and change out cars. The rental company will end your current agreement and start a new one with the new vehicle. Non-metered services are similar. If you select an OC3M shape then you can't upgrade it to an OC5 to get more cores. You can't decide that you need to use the diagnostics and tuning and upgrade from enterprise edition to enterprise edition high performance. You get what you started with and have 12 months to consume the services reserved for you.

    The choice of 11g or 12c is a relatively simple one. You get 11.2.0.4 running on Oracle Enterprise Linux 6.6 or you get 12.1.0.2 running on Oracle Enterprise Linux 6.6. This is one of those binary questions. You get 11g or 12c. It really does not effect any other question. It does effect features because 12c has more features available to it but this choice is simple. Unfortunately, you can't select 11.2.0.3 or 10.whatever or 9.whatever. You get the latest running version of the database and have an option to upgrade to the next release when it is available or not upgrade. Upgrades and patches are applied after you approve them.

    The next choice is the type of database. We will dive into this deeper in a couple of days. The basic is that you pick Standard Edition or Enterprise Edition. You have the option of picking just the base Enterprise Edition with encryption only, with most of the options in the High Performance Option, or all of the options with Extreme Performance Option. The difference between High Performance and Exterme Performance is the Extreme included Active DataGuard, In-Memory options, and Real Application Clustering options. Again, we will dive into this deeper in a later blog entry.

    The final option is the configuration of the database. I wanted to include a screen shot here but the main options that we look at are the CPU and memory shape which dictates the database consumption cost as well as the amount of storage for table space (/u02) and backup space (/u03 and /u04). There are additional charges above 128 GB for table storage and for backups. We will not go into the other options on this screen in this blog entry.

    In summary, DBaaS is charged on a metered or un-metered basis. The un-metered is a lower cost option but less flexible. If you know exactly what you need and the time that it is needed, this is a better option. Costs are fixed. Expenses are predictable. If you don't know what you need, metered service might be better. It gives you the option of starting and stopping different processor counts, shutting off the database to save money, and select different options to test out different features. Look at the cost option and a blog that we will do in a few days analyzing the details on cost. Basically, the database can be mentally budgeted as $3K/OCPU/month for Enterprise Edition, $4K/OCPU/month for High Performance, and $5K/OCPU/month for Extreme Performance. Metered options typically cross over at 21 days. If you use metered service for more than 21 days your charges will exceed this amount. If you use it for less, it will cost less.

    The Exadata as a Service is a special use case of Database as a Service. In this service you are getting a quarter, half, or full rack of hardware that is running the database. You get dedicated hardware that is tuned and optimized to run the Oracle database. Storage is dedicated to your compute nodes and not one else can use these components. You get 16, 56, or 112 processors dedicated to your database. You can add additional processors to get more database power. This service is available in a metered or non-metered option. All of the database options are available with this product. All of the processors are clustered into one database and you can run one or many instances of a database in this hardware. With the 12c option you get multi-tenant features so that you can run multiple instances and manage them with the same management tools but give users full access to their instance but not other instances running on the same database.

    Exadata cost for metered services

    Exadata cost for non-metered services

    In summary, there are two options for database as a service. You can get a web based front end to a database and access all of your data through http and https calls. You can get a full database running on a Linux server or Linux cluster that is dedicated to you. You can consume these services on a an hourly, monthly, or yearly basis. You can decide on less expensive or more expensive options as well as how much processor, memory, and storage that you want to allocate to these services. Tomorrow, we will dive a little deeper into APEX or Schema as a Service and look at how it compares to services offered by Amazon and Azure.

    Intro to PaaS

    Tue, 2016-05-03 16:44
    Today we are going to move up the stack. We will first focus on the Oracle solutions talking about the different platform as a service offerings. It is important to spend a little time reviewing this layer because what one company calls PaaS, another calls SaaS. The best way to get started is to go to cloud.oracle.com and look at the pull downs at the top of the screen. We see Infrastructure, Platform, and Applications.

    When we pull down the Platform menu we see that there are different areas that we can dive into.

    Data management is the first area that we will review. This is basically a way to aggregate and look at data. We can store data in a database, store on-premise databases into the cloud, store data in NoSQL repositories, and do analytics on a variety of data with Big Data Preparation and Big Data services. All of these involve pulling data into a repository of some type and performing queries against the repository. The key difference is the way that the data is stored, how we can ask questions, and the results that we get back. At this point we will not dive into any of these deeply but at a later point dive deep into the database and database backup.

    The Application Development is moving farther away from the technology of storing data and moving closer to how we present data to users. The Java platform, for example, allows us to do things like create a shopping cart or hosting more complex applications in a Java repository or container. The Mobile Cloud Service allows us to dive into existing applications and present a user interface to iPhones, Android Phones, and tablets. The idea is to customize existing web and fat clients into a mobile format that can be consumed on mobile devices. The Messaging Cloud Service is a messaging protocol that allows for transactions in the cloud. If you are looking at connecting different cloud services together it allows you to serialize the communication between vendors for a true transactional experience. The Application Container Cloud is a lightweight Java container allowing you to upload and run java applications but without access to the operating system. This is a shared multi-tenant version of a WebLogic server. The Developer Cloud Service is a DevOps integration for the Java and Database services. This service is an aggregation of public domain components used to develop microservices at the database or java layer. The Application Builder Cloud Service is a cloud based REST api development interface allowing you to integrate with Application software in the Oracle Cloud as well as other Clouds. The API Catalog is a way of publishing the REST apis that you have and expose them to your customers.

    The Content and Process Cloud Services are an aggregation of services that address group communications as well as business process flow. The Documents Cloud Service is a way of file sharing on the web. The Process Cloud Service is an extension that allows you to launch business processes (think Business Process Manager or BPM) in the cloud. The Sites Cloud Service is a web portal interface that takes documents and processes and aggregates them into a single cloud site allowing you to take a wiki like presentation but put business processes into the presentation. The Social Network Cloud Service allows you to integrate social network services like Facebook and Twitter into your web presence. It allows you to integrate these services as well as search these repositories for information relating to your company.

    The Business Analytics part of Platform services provides data visualization and analytic tools as well as data aggregation utilities. The Business Intelligence component is the traditional BI package that allows users to create custom queries into your database. The Big Data Preparation allows you to aggregate data from a variety of sources into a Big Data repository. The Big Data Discovery allows you to look at your data in a variety of ways and generate reports based on your data and views of data. The Data Visualization Cloud Service allows you to view and analyze your data from different perspectives. This is similar to the BI and Big Data but looks at data slightly differently. The Internet of Things Cloud Service allows you to aggregate monitoring and measuring devices into a repository.

    The Cloud Integration part of Platform services is the traditional data aggregation tools from other repositories. The Integration Cloud Service allows you to aggregate traditional SaaS vendors to unify fields like how a customer is defined or what data elements are incorporated into a purchase order. The SOA Cloud Service is implementation of the Oracle SOA Suite in the cloud. The GoldenGate Cloud Service is an implementation of the Oracle Golden Gate software that allows you to take data from different databases and synchronize the different repositories independent of the database vendor. The Internet of Things Cloud Service is the same listed in the Business Analytics section mentioned before.

    The Cloud Management part of Platform services allows you to take the log files that you have inside your data center and analyze them for a variety of things. You can aggregate your log files into the Log Analytics Cloud Services to look for patterns, intrusion attempts, and problems or issues with services. The IT Analytics Cloud Service looks at log files and looks for trends like disks filling up, processors being used or not used appropriately. The Application Performance Cloud Service looks at log files to look at how systems and applications are operating rather than how systems are working rather than how components are working.

    In Summary, we looked at an overview of the Platform as a Services offered by Oracle. Unfortunately, the variety of topics are too great for one blog. We did a high level overview of these services. In upcoming blogs we will dive deeper into each of these services and look at not only what they are but how they work and how to provision these services. We will also compare and contrast how these services compare to services offered by Amazon and Azure as we dive into each service.

    storage cloud appliance in the cloud

    Mon, 2016-05-02 10:46
    Last week we focused on getting infrastructure as a service up and running. I wanted to move up the stack and talk about platform as a service but unfortunately, I got distracted with yet another infrastructure problem. We were able to install the storage cloud appliance software in a virtual machine but how do you install this in a compute cloud instance? This brings up two issues. First, how do you run a Linux 7 - 3.10 kernel in the Oracle Compute Cloud Service. Second, how do you connect and manage this service both from an admin perspective and client from another compute engine in the cloud service.

    Let's tackle the first problem. How do you spin up a Linux 7 - 3.10 kernel in the Oracle Compute Cloud Service? If we look at the compute instance creation we can see what images that we can boot from.

    There is not Linux 7 - 3.10 kernel so we need to download and import and image that we can boot from. Fortunately, Oracle has gone through a good importing a bootable image tutorial. If we follow these steps, we need to first download a CentOS 7 bootable image from cloud.centos.org. The cloud instance that we use is the CentOS-7-x86_64-OracleCloud.raw.tar.gz. We first download this to a local directory then upload it to the compute cloud image area. This is done by going to the compute console and clicking on the "Images" tab at the top of the screen.

    We then upload the tar.gz file that is a bootable image. This allows us to create a new storage instance that we can boot from. The upload takes a few minutes and once it is complete we need to associate it with a bootable instance. This is done by clicking on the "Associate Image" button where we basically enter a name to use for the operating system as well as description.

    Note that the OS size is 9 GB which is really small. We don't have a compute instance at this point. We either need to create a bootable storage element or compute instance based on this image. We will go through the storage create first since this is the easiest way of getting started. We first have to change from the Image tab to the Storage tab. We click on the Create Storage Volume and go through selection of the image, storage name, and size. We went with the storage size rather than resizing the storage we are creating.

    At this point we should be able to create a compute instance based on this boot disk. We can clone the disk, boot from it, or mount it on another instance. We will go through and boot from this instance once it is created. We do this by going to the Instance tab and clicking on Create Instance. It does take 5-10 minutes to create the storage instance and need to wait till it is completed before creating a compute instance. An example of a creation looks like

    We select the default network, the CentOS7 storage that we previously created, the 2016 ssh keys that we uploaded, and review and launch the instance.

    After about 15 minutes, we have a compute instance based on our CentOS 7 image. Up to this point, all we have done is create a bootable Linux 7 - 3.10 kernel. Once we have the kernel available we can focus on connecting and installing the cloud storage appliance software. This follows the making backup better blog post. There are a couple of things that are different. First, we connect as the user centos rather than oracle or opc. This is a function of the image that we downloaded and not a function of the compute cloud. Second, we need to create a second user that allows us to login. When we use the centos user and install the oscsa_install.sh script, we can't login with our ssh keys for some reason. If we create a new user then whatever stops us from logging in as the centos user does not stop us from logging in as oracle, for example. The third thing that we need to focus on is creating a tunnel from our local desktop to the cloud instance. This is done with ssh or putty. What we are looking for is routing the management port for the storage appliance. It is easier to create a tunnel rather than change the management port and opening up the port through the cloud firewall.

    From this we execute the commands we described in the maker backup better blog. We won't go through the screen shots on this since we have done this already. One thing is missing from the screenshot, you need to disable selinux vy editing /etc/sysconfig/selinux. You need to disable SELINUX by editing the file and rebooting. Make sure that you add a second user before rebooting otherwise you will get locked out and the ssh keys won't work once this change is made.

    The additional steps that we need to do are create a user, copy the authorized_keys from an existing user into the .ssh directory, change the ownership, and assign a password to the new user, and add the user to /etc/sudoers.

    useradd oracle
    mkdir ~oracle/.ssh
    cp ~centos/.ssh/authorized_keys ~oracle/.ssh
    chown -R oracle ~oracle
    passwd oracle
    vi /etc/sudoers
    
    The second major step is to create an ssh tunnel to allow you to connect in from your localhost into the cloud compute service. When you create the oscsa instance it starts up a management console using port 32769. To tunnel this port we use putty to connect.

    At this point we should be able to spin up other compute instances and mount this file system internally using the command

     mount -t nfs -o vers=4,port=32770 e53479.compute-metcsgse00028.oraclecloud.internal:/ /local_mount_point
    
    We might want to use the internal ip address rather than the external dns name. In our example this would be the Private IP address of 10.196.89.62. We should be able to mount this file system and clone other instances to leverage the object storage in the cloud.

    In summary, we did two things in this blog. First, we uploaded a new operating system that was not part of the list of operating systems presented by default. We selected a CentOS instance that conforms to the requirements of the cloud storage appliance. Second, we configured the cloud storage appliance software on a newly created Linux 7 - 3.10 kernel and created a putty tunnel so that we can manage the directories that we create to share. This gives us the ability to share the object storage as an nfs mount internal to all of our compute servers. It allows for things like spinning up web servers or other static servers all sharing the same home directory or static pages. We can use these same processes and procedures to pull data from the Marketplace and configure more complex installations like JD Edwards, PeopleSoft, or E-Business Suite. We can import a pre-defined image, spin up a compute instance based on that image, and provision higher level functionality onto infrastructure as a service. Up next, platform as a service explained.

    private cloud vs public cloud

    Fri, 2016-04-29 02:19
    Today is our last day to talk about infrastructure as a service. We are moving up the stack into platform as a service after this. The higher up the stack we get, the more value it has to the business and end users. It is interesting to talk about storage and compute in the cloud but does this help us treat patients in a medical practice any better, find oil and gas faster, deliver our manufactured product any cheaper? Not really but not having them will negatively effect all of them. We need to make sure that these services are there so that we can perform higher functions without having to worry about triple redundant mirroring of a disk or load balancing compute servers to handle higher loads.

    One of the biggest complaints with cloud services is that there are perception problems of security, latency, and governance. Why would I put my data on a computer that I don't control. There is a noisy neighbor issue where I am renting part of a computer, part of a disk, part of a network. If someone wants to play heavy metal at the highest volume (remember our apartment example) while I am trying to file a monthly report or do some analytics, my resources will suffer and it will take me longer to finish my job due to something out of my control.

    Many people have decided to go with private hosted cloud solutions like VCE, VBlock, Cisco UCS clusters, and other products that provide raw compute, raw storage, and "hyper-converged" infrastructure to solve the cloud problem. I can create a golden master on my VMWare server and provision a database to my configuration as many times as I want. Yes, I run into a licensing issue. Yes, it is a really big licensing issue running Oracle on VMWare but Microsoft is not far behind with SQL Server licensing on a per core basis either. Let's look at the economics of putting together one of these private cloud servers.

    It is important to dive into the licensing issue. The Oracle Database and WebLogic servers are licensed either on a processor or named user basis. Database licensing is detailed in a pdf on the Oracle site. The net of the license says that the database is licensed based on the core count of the processor running on the server. There is a multiplication factor (0.5 for X86) based on the chip type that factors into the license cost. A few years ago it was easy to do this calculation. If I have a dual core, dual socket system, this is a four core computer. The license price of the computer would be 4 cores x 0.5 (Intel x86 chip) x $47,500. The total price would be $95K. Suddenly the core count of computers went to 8, 16, or 32 cores per chip. A single system could easily have 64 cores on a single board. If you aggregate multiple boards as is done in a Cisco UCS system you can have 8 board or 256 cores that you can use. There are very few applications that can take advantage of 256 cores so a virtualization engine was placed on top of these cores so that you could sub-divide the system into smaller chunks. If you have a 4 core database problem, you can allocate 4 cores to it. If you need 8 cores, allocate 8 cores. Products like VMWare and HyperV took advantage of this and grew rapidly. These virtualization packages added features like live migration, dynamic sizing, and bursting utilization. If you allocate 4 cores and the processor goes to 90%, two more cores will be made available for a short burst. The big question comes up as to how you now license on a per core basis. If you can flex to more processors without rebooting or live migrate from a 2 core to a 24 core system, which do you license for?

    Unfortunately, Oracle took a different position from the rest of the industry. None of the Oracle products contain a license key. None of the products require that you go to a web site and get a token to allow you to run the software. The code is wide open and freely available to load and run on any system that you want. Unfortunately, companies don't do a good job of tracking utilization. If someone from the sales or purchasing department rolls out a golden master onto a new virtual machine, no one is really tracking that. People outside of IT can easily spin up more licenses. They can provision a database in a cloud service and assume that the company has enough licenses to cover their project. After a while, licensing gets out of control and a license review is done to see what is actually being used and how it is being used. Named user licenses are great but you have to have a ratio of users to cores to meet minimums. You can't for example, buy a 5 user license and deploy it on a 64 core system. You have to maintain a typical ratio of 25 users to a core of 40 users to a core based on the product that you are using. You also need to make sure that you understand soft partitioning vs hard partitioning. Soft partitioning is the ability to flex or change the core count without having to reconfigure or reboot the system. A hard partition puts hard limits on the core count and does not allow you exceed it. Products like OracleVM, Solaris, and AIX contain hard partition virtualization. Products like HyperV and VMWare contain soft partitions. With soft partitions, you need to pay for all of the cores in the cluster since in theory you can consume all of the cores. To be honest, most people don't understand this and get in trouble with license reviews.

    When we talk about cloud services, licensing is also important to understand. Oracle published cloud license rules to detail limits and restrictions. The database is still licensed on a per core basis. The Linux operating system is licensed on per server instance and is limited to 8 virtual cores. If you deploy the Oracle database or WebLogic server in AWS or Azure or any other cloud vendor, you have to own a perpetual license for the database using the formulas above. The license must correlate to the high water mark for the core count that you provision. If you provision a 4 core system, you need a 2 processor license. If you run the database for six months and shut it off, you still need to own the perpetual license. The only way to work around this is to purchase database as a service in the Oracle cloud. You can pay for the database license on an hourly or monthly basis with metered services or on an annual basis with non-metered services. This provides a great cost savings because if we only need a database for 6 months we only need to pay for 6 months x the number of cores x the database edition type. If, for example, we want just the Database Enterprise Edition, it is $3K/core/month. If we want 4 cores that is $12K per month. If we want 6 months then we get it for $72K. We can walk away from the license and not have to pay the 22% annual maintenance on the $95K. We save $23K the first year and $20K annually by only using the database in the cloud for six months. If we wanted to use the database for 9 months, it is cheaper to own the license and lease processor and storage. If we go to the next higher level of database, Database High Performance Edition at $4K/core/month, it becomes cheaper to use the cloud service because it contains so many options that cost $15K/processor. Features like partitioning, compression, diagnostics, tuning, real application testing, and encryption are part of this cloud service. Suddenly the economics is in favor of cloud hosting a database rather than hosting in a data center.

    Let's go back to the Cisco UCS and network attached storage discussion. If we purchase a UCS 8 blade server with 32 cores per blade we are looking at $150K or higher for the server. We then want to attach a 100 TB disk array to it at about $300K (remember the $3K/TB). We will then have to pay $300K for VMWare. If we add 10% for hardware and 20% for software we are at just over $1M for a base solution. With a three year amortization we are looking at about $330K per year just to have a compute and storage infrastructure. We have to have a VMWare admin who doles out processors and storage, loads operating systems, creates golden masters, and acts as a traffic cop to manage and allocate resources. We still need to pay for the Oracle database license which is licensed on a per core basis. Unfortunately, with VMWare we must license all of the cores in the cluster so we either have to sub-divide the cluster into one blade and license all 32 cores or end up paying for all 256 cores. At roughly $25K/core that gets expensive quickly. Yes, you can run OracleVM or Solaris on one of the blades and subdivide the database into two cores and only pay for two cores since they both support hard partitioning but you would be amazed at how many people fight this solution. You now have two virtualization engines that you need to support with two different file formats. No one in mass wants two solutions just to solve a licensing issue.

    Oracle has taken a radically different approach to this. Rather than purchasing hardware, storage, and a virtualization platform, run everything in the cloud and pay for it on a monthly basis. The biggest objection is that the cloud is in another city and security, latency, ... you get the picture. The new solution is to run this hardware in your data center with the Oracle Public Cloud Machine. The cost of this solution is roughly $260K/year with a three year commit. You get 200 plus cores and 100 ish TB of storage to use as you want. You don't manage it with VSphere but manage it with the same web page that you manage the public cloud services. If you want to script everything then you can manage it with REST apis or perl/java/insert your lanaguage scripts. The key benefit to this is that you no longer need to worry about what virtualization engine you are using. You manage the higher level and lease the database or weblogic or SOA license on an hourly or monthly basis.

    Next week we will move up the stack and look at database hosting. Today we talked about infrastructure choices and how it impacts database license cost. Going with AWS or Azure still requires that you purchase the database license. Going with the Oracle public cloud or public cloud machine allows you to not own the database license but effectively lease it on an hourly or monthly basis. It might not be the right solution for 7x24x365 operation but it might be. It really is the right solution for bursty needs list holiday peak periods, student registration systems, development and testing where you only need a large footprint for a few weeks and don't need to buy for your highwater mark and run at 20% the rest of the year.

    Archive Storage Services vs Archive Cloud Services

    Thu, 2016-04-28 02:07
    Yesterday we started talking about the cost comparison for storage in the cloud. We briefly touched on the cost of long term archive in the cloud. How much does it cost to backup data for long term archive and what is the best way to do this? Years ago the default way of doing this was to copy your data on disk to a tape unit and put the tape in a box. The box was then put in an environmentally controlled room to extend the lifetime of tape and a person was put on staff to pull the data off the shelf when the data was needed. The data might be a backup of data on disk or a secondary copy just in case the disk failed. Tape was typically used to provide separation of duties required by Sarbanes-Oxly to keep people who report on financial data separate from the financial data. It also allowed companies to take large volumes of data, like seismic data, and not keep it on spinning disks. The traces were reloaded when geophysicists wanted to look at the data.

    The first innovation in this technology was to provide a robot to load and unload tapes as a tape unit gets full or needs to be reloaded. Magazines were created that could hold eight tapes and the robots had bar code readers so that they could seek to the right tape in the magazine, pull it out of the series of tapes and inserted into the tape unit for reading or writing. Management software got more advanced and understood the bar code values and could sequence the whopping 800 GB of data that could be written to an LT04 tape. Again, technology gets updated and the industry moved to LT05 and LT06 tapes with significantly higher densities. A single LT06 could hold 2.5 TB per tape unit. Technology marches on and compression allows us to store 6 TB on these disks. If we go back to our 120 TB case that we talked about yesterday this means that we will need 20 tapes (at $30-$45 for each tape) and $25K for a single tape drive unit. Most tape drive systems support 8 tapes per magazine so we are talking about something that will support three magazines. To support three magazines, we need a second shelf in our tape storage so the price goes up by about $20K. We are sitting at about $55K to backup our 120 TB and $5.5K in support annually for the hardware. We also need about $1K in tape for the number of full and incremental backups that we want which would be $20K for four months of retention before we recycle the tapes. These tapes are good for a dozen re-writes so every three years we will need to repurchase tapes. If we spread the cost of the tape unit, tape drives, and tapes across three years we are looking at $2K/month to backup our 120 TB. We also need to factor in $60/week for tape pickup and storage fees at a service like Iron Mountain and a couple of $250 charges to retrieve tapes in the event of a catastrophic failure to drive tapes back to our data center from cold storage. This bumps the cost to $2.2K/month which is significantly cheaper than the $10K/month for network storage in our data center or $3.6K/month for cloud storage services. Unfortunately, a tape unit requires someone to care and feed it and you will pay that person more than $600/month but not $7.8K/month which you would with the cloud or disk solutions.

    If you had a ton of data to archive you could purchase a tape silo that supported hundreds or thousands of magazines. Unfortunately, this expandability cones at a cost. The tape backup unit grew from an eighth of a rack to twenty full racks. There isn't much in between. You can get an eighth of a rack solution, a full rack solution, or a twenty full rack solution. The larger solution comes in at hundreds of thousands of dollars rather than tens of thousands.

    Enter cloud solutions. Amazon and Oracle offer tape solutions in the cloud. Both companies offer the twenty full rack solution but only charge a per tape charge to consumers. Amazon Glacier charges $7/TB/month to store data. Oracle charges $1/TB/month for the same service. Both companies charge for data restoration and outbound transfer of data. The Amazon Glacier cost of writing 120 TB and reading back 10% of it comes in at $2218/month. This is the same cost as having the tape unit on site. The key difference is that we can recover the data by requesting it from Amazon and get it back in less than four hours. There is no emergency recovery charges. There is not the weekly pickup charges. We can expand the amount that we backup and the bulk of this cost is reading back the data ($1300). Storage is relatively cheap for our backups, we just need to plan on the cost of recovery and try to limit this since it is the bulk of the cost.

    We can drop this cost even more using the Oracle Archive Cloud Services. The price from Oracle is $1/TB/month but the recovery and transmission charges are about the same. The same archive service with Oracle is $1560/month with roughly $1300 being the charges for restoring and outbound transfer of the data. Unfortunately, Oracle does not offer an un-metered archive service so we have to guestimate how much we are going to restore on a monthly basis.

    Both services use REST apis to write, restore, and read data. When a container (Oracle Archive) or bucket (Amazon Glacier) is created, a PUT call is done to the endpoint of the service. The first step required by both are authentication to provide credentials into the service. Below we show the Oracle authentication and creation process through the REST api.

    The important part of this is the archive header extension. This differentiates if the container is spinning disk or if it is tape in the cloud.

    Amazon recommends using a windows based tool like s3browser, CloudBerry, or using a language like Java, .NET, or Ruby and their published SDKs. CloudBerry works for the Oracle Archive as well. When you create a container you have the option of pulling down storage or archive as the container type.

    Both services allow you to encrypt and compress the data as it is written with HTML Headers changing the characteristics and parameters of the container. Both services require you to issue a PUT request to write the data to tape. Below we show the Oracle REST api.

    For CloudBerry and the other gui based tools, uploading is just a drag and drop from your local file system to the tape storage in the cloud.

    Amazon details the readback procedure and job system that shows the status of the restore request. Oracle has a similarly defined retrieval policy as well as an archive tutorial. Both services offer a 4 hour window to allow for restoration. Below is an example of a restore request and checking on the job status of the job spawned to load the tape and transfer the data for reading. The file is ready to read when the completedPercentage is 100.

    We can do the same thing with the S3 browser and Amazon Glacier. We need to request the restore, check the job status, then download the restored files. The files change color when they are ready to read.

    In summary, we have looked at how to reduce cost of archives and backups. We looked at using a secondary disk at our data center or another data center. We looked at using on site tape units. We looked at disk in the cloud. Today we looked at tape in the cloud. It is important to remember that not one of these solutions is the answer. A combination of any or all of them are needed. Daily and weekly backups should happen to a secondary disk locally. This data is most likely to be restored on a regular basis. Once you get a full backup or two under your belt, move the data to another site. It might be spinning disk, it might be tape but something needs to be offsite in the event of a true catastrophic failure like a communication link going out (think Dell PowerVault and a thunderstorm) and you loose your primary lun and secondary lun that contains your backups. The whole idea of offsite backups are not for restore but primary for insurance and regulation compliance. If someone needs to see the old data, it is there. You are betting that you won't need to read it back and the cloud vendors are counting on that. If you do read it back on a regular basis you might want to significantly increase your budget, pass the charges onto the people who want to read data back, or look for another solution. Tape storage in the cloud is a great way of archiving data for a long time at a low cost.

    metered vs un-metered vs dedicated services

    Wed, 2016-04-27 02:07
    One of the newest concepts that has been introduced for cloud services is the concept of un-metered or dedicated services. Before we dive into this subject, let's review what a cloud service really is. When you boil it all down, you are basically leasing computer resources on a computer that you don't own. You are taking a slice of a compute engine, slice of a disk drive, part of a network connection. You are renting space. Think of it as living in an apartment. Yes, this is a silly analogy but if you think about it, it makes sense. You can rent an efficiency, one, two, or three bedroom apartment. You can get parking with or without a roof over your car. You can get a storage closet or a garage to store stuff that you are not using but want to keep around. There are benefits to apartments. You don't have to cut the grass. You typically have access to a pool but don't need to maintain it. If the toilet backs up or the gas stops working you call the super and they come fix it. You still have to replace your own light bulbs that burn out. You still need to clean your own bathroom and kitchen and take out your trash on a regular basis. On the grander scale, you don't need to drop 10% down and get a mortgage to live there. Monthly rents are typically cheaper than paying down a mortgage. Your taxes are bundled into your rent cost. You basically show up, use the apartment and go on with your life. On the flip side, there are drawbacks to apartments. If your upstairs neighbor likes to play heavy metal at 2am or throw wild parties on the weekends it does make it hard to sleep. Someone might park in your parking spot so you need to park farther away from your front door. You can't pull into your garage to unload your groceries and have to potentially carry them in the rain across the parking lot and up the stairs to your third floor apartment. The super might decide that Tuesday they are going to repaint all bathrooms another color and you need to be out of the way for a day and put up with the smell even though you planned a dinner party the next night. It is difficult to grill on your balcony and you can't really sit out without sharing the space with all of your neighbors. The true downfall is that twenty years from now, you will still be renting your apartment (and the rent probably went up every other year) while your college buddies are celebrating a mortgage burning party and the only thing that they owe on a monthly basis is the taxes that the government takes annually.

    Yes, our analogy is silly. Yes, our analogy is relevant. It is easy to decide that you want another job in another city so you hire a mover, pack up all your stuff, and move to another apartment. This is where our analogy breaks down. Cloud vendors charge you for every piece of furniture that you take out of the building. They charge you to use the stairs or elevator. They charge you every time a moving van exits the building full of furniture and boxes of clothes. It is free to bring stuff in because it locks you into the apartment. Just don't try to take anything out. Remember that storage closet or garage that you got with your apartment, you can open the door and put stuff in for free but if you carry anything out (even if you just relocate it to your apartment) you get charged per item that you carry across the threshold.

    If you look at storage from any cloud vendor they offer a metered storage service. The same is true for compute services. You can lease a virtual processor and memory and grind on data all that you want. The catch is when you want to transfer your files or report results of you analysis to your desktop computer, you get charged on a per gigabyte transferred across the internet. Cost calculators help you calculate these costs but they are a little hard to estimate and use to calculate outbound data charges. Amazon, for example, has a calculators that you can use. The AWS pricing calculator allows you to look at the cost of all cloud services.

    Let's walk through the cost of Amazon Glacier. The price list says that you should pay $0.007/GB/month or $7/TB/month to keep things in cold storage. We will use 120 TB as our basis for analysis. We put this as the amount to store and see the cost of storing the data is $860/month.

    If we plan on reading back 10% of this data during the month the price goes up to $2217. The bulk of these charges are the outbound charges. The cost goes to $921 if we read the data to an EC2 instance and not all the way back to our data center or desktop computer. To use our apartment analogy, you are paying $860 to get a storage garage. You pay $61 every time you take something out and move it to your apartment. You can put all you want into the storage area (as long as your don't exceed the space of the storage unit) but taking something out will cost you. If you put your recently retrieved item in your car or a truck and drive it out the gate you get a surcharge of $1300. It is important to remember that pulling more stuff out of your storage will cost you more. Putting this in terms of computer archive, you can store all of your emails, contracts, customer transactions, patient records in long term storage. If your on-site storage fails for some reason or if you get a legal request to review five years of data, you can pull the data back from cold storage. It will cost you to pull the data back but it is still cheaper than keeping the seven years of longer data on spinning disk in your data center (estimate $3K/TB plus 10% per year for spinning disk in your data center).

    We can do the same calculation for cloud storage using S3. We can store 120 TB for roughly $3950/month. If we want to read back 10%, or 12 TB, of that data, it will cost us $5150 or $1200 additional.

    We can reduce the cost by using lower speed storage in the cloud. We put the S3 data into the infrequent category to save money. This drops the cost to just over $3K which does save us about $2100/month. We agree to pay a lower cost to get higher latency and longer retrieval times. It is better than using tape in the cloud and we can save some money with this option.

    We can opt for reduced redundancy storage (aka non-mirrored and non-replicated data) but we risk data loss since we will only have a single copy in the cloud. This drops the cost to $4300 with the data retrieval but we have to weigh the cost vs data loss risks.

    Let's not pick on Amazon. How does this compare to Azure? Unfortunately, we can't start with Microsoft tape in the cloud, they don't offer the service. We must start with block storage in the Azure cloud. Microsoft has an Azure pricing calculator that you can use to perform the same calculations. The calculator and pricing is a little difficult to use when you first get into it. You basically need to put together the calculation a piece at a time. You need to factor in the cost of the storage and the cost of transferring the data from the cloud to your data center. This is done in two different pieces. An example of what we are looking for can be seen below.

    We need to piece together the calculator. First we add the storage component then the bandwidth component. There is a transaction component but this amount is trivial and we are going to ignore it for simplicity.

    If we look at the options for Azure storage, we can basically select blob storage in different zones. In the grand scheme of things, the cost is not siginificantly higher one way or another. The basic cost is about the same.

    The third class of storage that are going to look at is the Oracle Storage Cloud Service. We can look at Oracle Storage Cloud Service as well as Oracle Archive Cloud Service. The Archive service compares directly to the Amazon Glacier service except that it is $1/TB/month and suffers the same transfer charges for outbound data. The Oracle Storage Cloud Service is similar to the Amazon S3 and Azure Blob Storage Service but it is offered either as a metered service (as is S3 and Blobs) and un-metered services. Unfortunately, Oracle does not provide a cost calculator for general use. The Value Added Distributors are given a copy of the calculator but it is not generally available. The key difference with the Oracle storage services is that there are two significant flavor differences; metered and non-metered. The metered services are charged just like the Microsoft and Amazon services. You pay for what you use on a per GB basis and pay for outbound data transfer. An example of the pricing calculator is shown below. Note that we do need to have a good guestimate on how much data we will transfer outbound across the internet. These charges are not consumed if you are reading the data to a compute engine in the cloud unlike S3 which still consumes cost just for reading the data off the disk.

    The most significant differential in storage offerings is the non-metered storage. Oracle offers storage in blocks that you reserve and allocate for 12 months. This is different from the metered storage in that metered can start with 10 TB and grow to 120 TB over the year. With the non-metered storage, you start with 120 TB and end with 120 TB. You can extend your contract and grow storage but you basically extend a new contract for more storage. You can not shrink your storage and pay for less. The benefit of this is that you don't have to pay for outbound data transfer. You can read and write as much as you want and not get charged for transferring the data across the internet. A pricing calculator for this is simple. How much do you need and how long do you need it?

    If we piece all of this together and look a price comparison between the three service providers, the answer of which is cheapest comes down to it depends. Oracle non-metered storage has a significant advantage if you are planning on reading back your data at high or unpredictable rates. Amazon S3 infrequent is the cheapest if you don't plan on reading back your data and want it as an insurance policy only. I honestly would go with Glacier or Oracle Archive if this is the case since it is an order of magnitude cheaper. The chart below compares 120 TB of storage and the variable charge for reading back this data on a monthly basis. If you have 120 TB of storage and plan on reading back 120 TB on a monthly basis, the Oracle non-metered storage is significantly cheaper. If you are only planning on reading back 12 to 24 TB per month the cost is about the same for all of the services.

    In summary, one option is not clearly better than the other (except for high read rates) and this blog is intended to help you decide on what fits your needs best. Pricing calculators can help with the cost based on transfer rates. It is important to remember that storage transfer is a significant part of the calculation. It is also important to look at your usage model. We assumed that you started with 120 TB and ended with 120 TB for our analysis. If you start with 12 TB and grow to 120 TB, the pricing calculation will be a little different. Neither the Amazon nor Azure calculators will help you run this simulation and you will have to calculate everything on a month by month basis. It is also interesting to take 120 TB of on-premise storage and assume that each TB can be purchased at $3K/TB. If we assume 10% annual hardware maintenance and a three year amortization, the charge for on-premise storage is $1030/month which might be more or less than cloud based storage. Your results might vary.

    making backup better

    Tue, 2016-04-26 02:07
    Yesterday we looked at backing up our production databases to cloud storage. One of the main motivations behind doing this was cost. We were able to reduce the cost of storage from $3K/TB capex plus $300/TB/year opex to $400/TB/year opex. This is a great solution but some customers complain that it is not generic enough and latency to cloud storage is not that great. Today we are going to address both of these issues with the cloud storage appliance. First, let's address both of the typical customer complaints.

    The database backup cloud service is just that. It backs up a database. It does it really well and it does it efficiently. You replace one of the backup library modules that translates writes of backup data to the cloud REST api rather than a tape driver. The software works well with commercial products like Symantec or Legato and integrates well into that solution. Unfortunately, the critics are right. The database backup cloud service does that and only that. It backs up Oracle databases. It does not backup MySQL, SQL Server, DB2, or other databases. It is a single use tool. A very useful single use tool but a single use tool. We need to figure out how to make it more generic and backup more than just databases. It would be nice if we could have it backup home directories, email servers, virtual machines, and other stuff that is used in the data center.

    The second complaint is latency. If we are writing to an SSD or spinning disk attached to a server via high speed SCSI, iSCSI, or SAS, we should expect 10ms access time or less. If we are writing to a server half way across the country we might experience 80ms latency. This means that a typical read or write takes eight times longer when we read and write cloud storage. For some applications this is not an issue. For others this latency makes the system unusable. We need to figure out how to read adn write at 10ms latency but leverage the expandability of the cloud storage and lower cost.

    Enter stage left the Oracle Cloud Storage Appliance. The appliance is a software component that listens on the data center internet using the NFS protocol and talks to the cloud services using the storage REST api. Local disks are used as a cache front end to store data that is written to and read from the network shares exposed by the appliance. These directories map directly to containers in the Oracle Storage Cloud Service and can be clear text or encrypted when stored. Data written from network servers is accepted and released quickly as it is written to local disk and slow tricked to the cloud storage. As the cache fills up, data is aged and migrated from the cache storage into cloud storage. The metadata representing the directory structure and storage location is updated to show that the data is no longer stored locally but stored in the cloud. If a read occurs from the file system, the meta data helps the appliance locate where the data is stored and it is presented to the network client from the cache or pulled from the cloud storage and temporarily stored in the local cache as long as there is space. A block diagram of this architecture is shown below

    The concept of how to use this device is simple. We create a container in our cloud storage and we attach to it with the cloud storage appliance. This attachment is exposed via an nfs mount to clients on our corporate network and anyone on the client can read or write files in the cloud storage. Operations happen at local disk speed using the network security of the local network and group/owner rights in the directory structure. It looks, smells, and feels just like nfs storage that we would spend thousands of dollars per TB to own and operate.

    For the rest of this blog we are going to go through the installation steps on how to configure the appliance. The minimum requirements for the appliance are

    • Linux 7 (3.10 kernel or later)
    • Docker 1.6.1 or later
    • two dual core x86 CPUs
    • 4 GB of RAM
    We will be installing our configuration on a Windows desktop running VirtualBox. We will not go through the installation of Oracle Enterprise Linux 7 because we covered this a long time ago. We do need to configure the OS to have 4 GB of RAM and at least 2 virtual cores as shown in the images below.

    We also need to configure a network. We configure two networks. One is for the local desktop console and the other is for the public internet. We could configure a third interface to represent our storage network but for simplicity we only configure two.

    We can boot our Linux 7 system and will need to select the 3.10 kernel. By default it will want to boot to the 3.8 kernel which will cause problems in how the appliance works.

    What we would like to do is remove the 3.8 kernel from our installation. This is done by removing the packages with the rpm -e command. We then update the grub.cfg file to list only the 3.10 kernels.

    Once we have removed the kernels, we update the grub loader and enable additional options for the yum update.

    The next step that we need to take is to install docker. This is done with the yum install command.

    Once we have the docker package installed, we need to make sure that we have the nfs-client and nfs-server installed and started.

    It is important to note that the tar bundle is not generally available. It does require product manager approval to get a copy of the software for installation. The file that I got was labeled oscsa-1.0.5.tar.gz. I had to unzup and untar this file after loading it on my Linux VirtualBox instance. I did not do a screen capture of the download but did go through the installation process.

    We start the service with the oscsa command. When we start it it brings up a management web page so that we can make the connection to the cloud storage service. To see this page we need to start firefox and connect to the page.

    One of the things that we need to know is the end point of our storage. We can find this by looking at the management console for our cloud services. If we click on the storage cloud service details link we can find it.

    Once we have the end point we will need to enter this into the management console of the appliance as well as the cloud credentials.

    We can add encryption and a container name for our network share and start reading and writing.

    We can verify that everything is working from our desktop by mounting the nfs share or by using cloudberry to examine our cloud storage containers. In this example we use cloudberry just like we did when we looked at the generic Oracle Storage Cloud Services.

    We can examine the properties of the container and network share from the management console. We can look at activity and resources available for the caching.

    In summary, we looked at a solution to two problems offered by our database backup solution. The first was single purpose and the second was latency. By providing a network share to the data center we can not only backup or Oracle database but all of the databases by having the backup software write to the network share. We can backup other files like virtual machines, email boxes, and home directories. Disk latency operates at the speed of the local disk rather than the speed of the cloud storage. This software does not cost anything additional and can be installed on any virtual platform that supports Linux 7 with kernel 3.10 or greater. When we compare this to the Amazon Storage Gateway which requires 2x the processing power and $125/month to operate it looks significantly better. We did not compare it to the Azure solution because it is an iSCSI hardware solution and not easy to get a copy of for testing.

    backing up a database to the cloud

    Mon, 2016-04-25 02:07
    Up to this point we have talked about the building blocks of the cloud. Today we are going to look into the real economics of using some of the cloud services that we have been examining. We have looked at moving compute and storage to the cloud. Let's look at some of the reasons why someone would look at storage in the cloud.

    Storage is one of those funny things that everyone asks for. Think of uses for storage. You save emails that come in every day. If you host your email system in your corporation, you have to consider how many emails someone can keep. You have to consider how long you keep files associated with email. At Oracle we have just over 100,000 employees and limit everyone to 2GB for email. This means that we need 200 TB to store email. If we increase this to 20 GB this grows to 2 PB. At $3K/TB we are looking at $600K capex to handle email messages. If we grow this to 2 PB we are looking at $6M for storage. This is getting into real money. Associated with this storage is a 10% support cost ($60K opex annually) as well as part of a full time employee to replace defective disks, tune and feed the storage system, allocate disks and partitions not only to our storage but other projects at a cost of $80K payroll annually. If we use a 4 year depreciation, our email boxes will cost us ($150K capex + $60K opex + $80K opex) $290K per year or $29/user just to store the email. If we expand the email limits to 20 GB we grow almost everything by a factor of 10 as well so the email boxes cost us $220/user annually (we don't need 10x the storage admins). Pile on top of this home directories that people want to save attachments into and this number explodes. We typically do want to give everyone 20 GB for a home directory since this stores documents associated with operation of the company. We typically want people storing these documents on a network share and not on a disk on their laptop. Storing data on their laptop opens up security and data protection discussions as well as access to data if the laptop fails. Putting it on a shared home directory allows the company to backup the files as well as define protection mechanisms for sensitive data. We have basically justified $250/user for email and home directories by allocating 22 GB to each employee.

    The biggest problem is not user data, it is corporate data. Databases typically consume upwards of 400 GB to 40 TB. There is a database for human resources, payroll, customer service, purchase orders, general ledger, inventory, transportation management, manufacturing.... the list goes on. Backing up this data as it changes becomes an issue. Fortunately, programs like E-Business Suite, PeopleSoft, and JD Edwards aggregate all of these business functions into a small number of database instances so we don't have tens or hundreds of databases to backup. Some companies do roll out multiple databases for each project to collect and store data but these are typically done with low cost, low function databases like MySQL, Postgress, MongoDB, and SQL Server. Corporate data that large numbers of people use are typically an Oracle database, DB2, or SQL Server. Backing up this data is critical to a corporation. Database backups are typically done nightly to make sure that you can recover disk or server failures. Fortunately, you don't need to backup all 400 GB every night but can do incremental backups and copy only the data blocks that have changed from the previous night. Companies typically reserve late at night on the weekends to do a full backup because users are typically not working and few if any people are hitting the database at 2am on Sunday morning. The database can be taken offline for a couple of hours to backup 400 GB or a live backup can be taken with little risk since few if anyone is on the system at this time. If you have a typical computer with SCSI or SAS disks you can reasonably get 2G/second throughput so reading 400 GB will take 200 seconds. Unfortunately, writing is typically about half that speed so the backup should reasonably take 400 seconds which is 7 minutes. If your database is 4 TB then you increase this by a factor of 10 so it takes just over an hour to backup everything. Typically you want to copy this data to another data center or to tape and your write speeds get doubled again. The 7 minutes becomes 15 minutes. The hour becomes two hours.

    When we talk about backing up database data, there are two schools of thought. The database data is contained in a table extent file. You can backup your data by replicating your file system or using database tools to backup your data. Years ago files were kept on raw disk partitions. Few people do this anymore and table extent files are kept on file systems. Replicating data from a raw partition is difficult so most people used tools like RMAN to backup database files on raw partitions. Database vendors have figured out how to optimize reads and writes to disk despite the file system structures that operating system vendors created. File system vendors have figured out how to optimize backup and recovery to avoid disk failures. Terms like mirroring, triple mirroring, RAID, and logical volume management come up when you talk about protecting data in a file system. Other terms like snap mirror and off-site cloning sneak into the conversation as well. Earlier when we talked about $3K/TB we are really talking about $1K/TB but we triple mirror the disks thus triple the cost of usable storage. This makes sense when we go down to Best Buy or Fry's and look at a 1 TB USB disk for $100. We could purchase this but the 2G/second transfer rate suddenly drops to 200K/second. We need to pay more for a higher speed communication bridge to the disk drive. We could drop the cost of storage to $100/TB but the 7 minute backup and recovery time suddenly grows to 70 minutes. This becomes a cost vs recovery time discussion which is important to have. At home, recovering your family photos from a dead desktop computer can take hours. For a medical practice, waiting hours to recover patient records impacts how the doctors engage with patients. Waiting hours on a ticket sales or stock trading web site becomes millions of dollars lost as people go to your competitors to transact business.

    Vendors like EMC and NetApp talk about cloning or snap mirrors of disks to another data center. This technology works for things like email and home directories but does not work well for databases. Database files are written to multiple files at times. If you partition your data, the database might be moving data from one file to another as data ages. We might have high speed SSD disks for current data and low speed, low latency disks for data greater than 30 days old. If we start a clone of our SSD disks during a data move, the recent data will get copied to our mirror at another site. The database might finish re-partitioning the data and the disk management software starts backing up the lower speed disks. We suddenly get into a data consistency problem. The disk management software and the database software don't talk to each other and tell each other that they are moving data between file systems. Data that was in the high speed SSD disks is now out of sequence with the low speed disks at our backup site. If we have a disk failure on our primary site, restoring data from or secondary site will cause database failure. The only way to solve this problem is to schedule disk clones while the database is shut down. Unfortunately, many IT departments select a disk cloning solution since it is the best solution for mirroring home directories, email servers, and virtualization servers. Database servers have a slightly different backup requirement and require a different way of doing things.

    The recommended way of backing up a database is to use archive tools like RMAN or commercially available products like ComVault or Legato. The commercial products provide a common backup process that knows how virtualization servers and databases like to be backed up. It allows you to backup SQL Server and an Oracle database with the same user interface and process. Behind the scenes these tools talk to RMAN and the SQL Server backup utilities but presents a uniform user interface to schedule and manage backups and restores.

    Enough rambling about disks and backups. Let's start talking about how to use cloud storage for our disk replication. Today we are going to talk about database backup to the cloud. The idea behind our use of the cloud is pure economics. We would like to reduce our cost of storage from $3K/TB to $400/TB/year and get rid of the capex cost. The true problem with purchasing storage for our data center is that we don't want to purchase 10 TB a month because that is what we are consuming for backups. What we are forced to do is look 36 months ahead and purchase 400 TB of disk to handle the monthly data consumption or start deleting data after a period. For things like Census data and medical records the retention period is decades and not months. For some applications, we can delete data after 12 months. If we are copying incremental database backups, we can delete the incrementals once we do a full backup. In our 10 TB a month example we will have to purchase $1.2M in storage today knowing that we will only consume 10 TB this month and 10 TB next month. Using the cloud storage we can pay $300 this month, $600 next month, and grow this amount at $300/month until we get to the 400 TB that we will consume in 36 months. If we guestimated low we will need to purchase more storage again in two years. If we guestimated high we will have overpurchased storage and spend more than $3K/TB for what we are using. Using cloud storage allows us to consume storage at $400/TB/year. If we guess wrong and have metered storage there is no penalty. If we are using non-metered storage, we might purchase a little too much but only have to look forward 12 months rather than 36. It typically is easier to guess a year ahead rather than three years.

    Just to clarify, we are not talking about moving all of our backup data to the cloud all at once. What we are talking about is doing daily incremental backups to your high speed disk attached to your database. After a few days we do a full backup to lower cost network storage. After a few weeks we copy these backups to the cloud. In the diagram below we do high speed backups for five days, backup to low speed disks for 21 days, backup to the cloud for periods beyond that. We show moving data to tape in the cloud beyond 180 days. The cost benefit is to take data that we probably won't read to a lower cost storage. Using $400/TB/year gives us a $2600/TB cost savings in capex. Using $12/TB/year tape gives us an additional $388/TB cost savings in opex.

    The way that we get this storage tiering is to modify the RMAN backup libraries and move the tape interface from an on-site tape unit to disk or tape in the cloud. The library module can be download from the oracle tech network. More information on this service can be found at Oracle documentation or Backup whitepaper. You can also watch videos that describe this service

    The economics behind this can be seen in a TCO analysis that we did. In this example we look at moving 30 TB of backup from on-premise disk to cloud backup. The resulting 4 year savings is $120K. This does not take into account tangential savings but only looks at physical cost savings of not purchasing 30 TB of disk.

    Let's walk through what is needed to make this work. First we have to download the library module that takes RMAN read and write commands and translates them into REST api commands. This library exists for Oracle Storage Cloud Services as well as Amazon S3. The key benefits to the Oracle Storage Cloud is that you get encryption and parallelism for free as part of the service. With Amazon S3 you need to pay for additional parallel channels at $1500/channel as well as encryption in the database at $10K/processor license. The Oracle Storage Cloud provides this as part of the $33/TB/month database backup bundle.

    Once we download the module, we need to install it with a java command. Note that this is where we tie the oracle home and SID to the cloud credentials. The data is stored in a database wallet as well as the encryption keys used to encrypt the backups.

    Now that we have replaced the tape interface with cloud storage, we need to define a tape interface for RMAN and link the library into the process. When we read and write to tape we are actually reading and writing to cloud storage.

    Once we have everything configured, we use RMAN or ComVault or Legato as we have for years. Accessing the tape unit is really accessing the cloud storage.

    In summary, this is our first use case of the cloud. We are offsetting the cost of on-premise storage and reduce the cost of our database backups. A good rule of thumb is that we can drop the cost of backups from $3K/TB plus $300/TB/year to $400/TB/year. Once we have everything downloaded an installed, nothing looks or feels different from what we have been doing for years. When you start looking at purchasing more disks because you are running out of space, look at moving your backups from local disk to the cloud.

    So how about other cloud providers

    Fri, 2016-04-22 02:07
    If you are looking for a cloud hosting provider, the number one question that comes up is which one to use. There are a ton of cloud providers. How do you decide which one is best for you? To be honest, the answer is it depends. It depends on what your problem is and what problem you are trying to solve. Are you trying to solve how you communicate with customers? If so do you purchase something like SalesForce or Oracle Sales Cloud, you get a cloud based sales automation tool. Doing a search on the web yields a ton of references. Unfortunately, you need to know what you are searching for. Are you trying to automate your project management (Oracle Primavera or Microsoft Project)? Every PC magazine and trade publication have opinions on this. Companies like Gartner and Forrester write reviews. Oracle typically does not rate well with any of these vendors for a variety of reasons.

    My recommendation is to look at the problem that you are trying to solve. Are you trying to lower your cost of on-site storage? Look at generic cloud storage. Are you trying to reduce your data center costs and go with a disaster recovery site in the cloud? Look at infrastructure in the cloud and compute in the cloud. I had a chance to play with VMWare VCloud this week and it has interesting features. Unfortunately, it is a really bad generic cloud storage company. You can't allocate 100 TB of storage and access it remotely without going through a compute engine and paying for a processor, operating system, and OS administrator. It is really good if I have VMWare and want to replicate the instances into the cloud or use VMotion to move things to the cloud. Unfortunately, this solution does not work well if I have a Solaris of AIX server running in my data center and want to replicate into the cloud.

    The discussion on replication opens a bigger can of worms. How do you do replication? Do you take database and java files and snap mirror them to the cloud or replicate them as is done inside a data center today? Do you DataGuard the database to a cloud provider and pay on a monthly basis for the database license rather than owning the database? Do you setup a listener to switch between your on-site database and cloud database as a high availability failover? Do you setup a load balancer in front of a web server or Java app server to do the same thing? Do you replicate the visualization files from your VMWare/HyperV/OracleVM/Zen engine to a cloud provider that supports that format? Do you use a GoldenGate or SOA server to physically replicate objects between your on-site and cloud implementation? Do you use something like the Oracle Integration server to synchronize data between cloud providers and your on-premise ERP system?

    Once you decide on what level to do replication/fail over/high availability you need to begin the evaluation of which cloud provider is best for you. Does your cloud provider have a wide range of services that fits the majority of your needs or do you need to get some solutions from one vendor and some from another. Are you ok standardizing on a foundation of a virtualization engine and letting everyone pick and choose their operating system and application of choice? Do you want to standardize at the operating system layer and not care about the way things are virtualized? When you purchase something like SalesForce CRM, do you even know what database or operating system they use or what virtualization engine supports it? Do or should you care? Where do you create your standards and what is most important to you? If you are a health care provider do you really care what operating system that your medical records systems uses or are you more interested in how to import/export ultrasound images into your patients records. Does it really matter which VM or OS is used?

    The final test that you should look at is options. Does your cloud vendor have ways of easily getting data to them and easily getting data out. Both Oracle and Amazon offer tape storage services. Both offer disks that you can ship from your data center to their cloud data centers to load data. Which one offers to ship tapes to you when you want to get them back? Can you only backup from a database in the cloud to storage in the cloud? What does it cost to get your data back once you give it to a cloud provider? What is the outbound charge rate and did you budget enough to even terminate the service without walking away from your data? Do they provide an un-limited read and write service so that you don't get charged for outbound data transfer.

    Picking a choosing a cloud vendor is not easy. It is almost as difficult as buying a house, a car, or a phone carrier. You will never know if you made the right choice until you get penalized for making the wrong choice. Tread carefully and ask the right questions as you start your research.

    Storage on Azure

    Thu, 2016-04-21 02:07
    Yesterday we were out on a limb. Today we are going to be skating on thin ice. Not only do I know less about Azure than AWS but Microsoft has significantly different thoughts and solutions on storage than the other two cloud vendors. First, let's look at the available literature on Azure storage

    There are four types of storage available with Azure storage services; blob storage, table storage, queue storage, and file storage. Blob storage is similar to the Oracle Block Storage or Amazon S3 storage. It provides blocks of pages that can be used for documents, large log files, backups, databases, videos, and so on. Blobs are objects placed inside of containers that have characteristics and access controls. Table storage offers the ability to store key/attribute entries in a semi-structured dataset similar to a NoSQL database. The queue storage provides a messaging system so that you can buffer and sequence events between applications. The third and final is file based storage similar to dropbox or google docs. You can read and write files and file shares and access them through SMB file mounts on Windows systems.

    Azure storage does give you the option of deciding upon your reliability model by selecting the replication model. The options are locally triple redundant storage, replication between two data centers, replication between different geographical locations, or read access geo-redundant storage.

    Since blob storage is probably more relevant for what we are looking for, let's dive a little deeper into this type of storage. Blobs can be allocated either as block blobs or page blobs. Block blobs are aggregation of blocks that can be allocated in different sizes. Page blobs are of smaller fixed size chunks of 512 bytes for each page blob. Page blogs are the foundation of virtual machines and are used by default to support operating systems running in a virtual machine. Blobs are allocated into containers and inherit the characteristics of the container. Blobs are accessed via REST apis. The address of a blob is formatted as http://(account-name).blob.core.windows.net/(container-name)/(blob-name). Note that the account name is defined by the user. It is important to note that the account-name is not unique to your account. This is something that you create and Microsoft adds it to their DNS so that your ip address on the internet can be found. You can't choose simple names like test, testing, my, or other common terms because they have been allocated by someone else.

    To begin the process we need to log into the Azure portal and browser for the Storage create options.

    Once we find the storage management page we have to click the plus button to add a new storage resource.

    It is important to create a unique name. This name will be used as an extension of the REST api and goes in front of the server address. This name must be unique so picking something like the word "test" will fail since someone else has already selected it.

    In our example, we select wwpf which is an abbreviation for a non-profit that I work with, who we play for. We next need to select the replication policy to make sure that the data is highly available.

    Once we are happy with the name, replication policy, resource group, and payment method, we can click Create. It takes a while so we see a deploying message at the top of the screen.

    When we are finished we should see a list of storage containers that we have created. We can dive into the containers and see what services each contains.

    Note that we have the option of blob, table, queue, and files at this point. We will dive into the blob part of this to create raw blocks that can be used for backups, holding images, and generic file storage. Clicking on the blob services allows us to create a blob container.

    Note that the format of the container name is critical. You can't use special characters or capital letters. Make sure that you follow the naming convention for container names.

    We are going to select a blob type container so that we have access to raw blocks.

    When the container is created we can see the REST api point for the newly created storage.

    We can examine the container properties by clicking on the properties button and looking at when it was created, lease information, file count, and other things related to container access rights.

    The easiest way to access this newly created storage is to do the same thing that we did with Oracle Storage. We are going to use the CloudBerry Explorer. In this gui tool we will need to create an attachment to the account. Note that the tool used for Azure is different from the Oracle and Amazon tools. Each cost a little money and they are not the same tool unfortunately. They also only work on a Windows desktop which is challenging if you use a Mac of Linux desktop.

    To figure out your access rights, go to the storage management interface and click on the key at the top right. This should open up a properties screen showing you the account and shared access key.

    From here we can access the Azure blob storage and drag and drop files. We first add the account information then navigate to the blob container and can read and write objects.

    In this example, we are looking at virtual images located on our desktop "E:\" drive and can drag and drop them into a blob container for use by an Azure compute engine.

    In summary, Azure storage is very similar to Amazon S3 and Oracle Storage Cloud Services. The cost is similar. The way we access it is similar. The way we protect and restrict access to it is similar. We can address it through a REST api (which we did not detail) and can access it from our desktop or compute server running in Azure. Overall, storage in the cloud is storage in the cloud. You need to examine your use cases and see which storage type works best for you. Microsoft does have an on-premise gateway product called Azure SimpleStor which is similar to the Amazon Storage Gateway or the Oracle Cloud Storage Appliance. It is more of a hardware solution that attaches via iSCSI to existing servers.

    Storage in Amazon S3

    Wed, 2016-04-20 02:07
    To be honest, I am going out on a limb here. I know just enough about Amazon S3 to be dangerous. Most of the reference material that I used was from the amazon web site or Safari Books. The books that I relied upon the most are The key use cases according to the S3 Essentials book are file hosting, storing data on mobile based applications, static web hosting, video hosting, and data backup. We will look at a couple of these configurations and how to deploy and use them.

    With Oracle Storage Cloud Services we had the concept of a container. This container had characteristics like spinning disk, archive, ownership, and other features related to ownership and security. Amazon S3 has a similar concept but they call this container a bucket. A bucket can contain nested folders and has properties associated with it. If we look at the AWS Console, we see six types of storage and content delivery

    • S3 - block storage in the cloud
    • Cloud Front - content delivery network
    • Elastic File System - fully managed file system for EC2
    • Glacier - tape storage in the cloud
    • Snowball - large scale data transport to get data into and out of S3
    • Storage Gateway - an appliance to reduce latency for S3 storage
    We will be focusing on S3 and Glacier. Snowball is a mechanism to transport large amounts of data to and from the Amazon data center. The Storage Gateway is an appliance in a data center to reduce latency and provide a quicker access to data stored in S3. We will need to dive a little deeper into S3 and the Storage Gateway but not the Cloud Front, and the Elastic File System in this blog.

    We first start with the AWS console and click on the S3 console. We can create a new bucket by clicking on Create Bucket.

    When we create a new S3 bucket, we can name it and define which data center we want the storage to be allocated into. We have to be careful when we create a bucket name. The namespace is shared with all users. If we want to create a common name, it will probably be used by someone else and we will see an error in creating the name.

    If we look at the properties associated with this storage we can see that we have a variety of options to configure.

    • Permissions
    • Static Web Hosting
    • Logging
    • Events
    • Versioning
    • Lifecycle
    • Cross-Region Replication
    • Tags
    • Requester Pays

    Let's go through each of these individually. With Permissions, you have the ability to control who can see, modify, delete, and download the contents of the bucket. Bucket policies can get relatively complex and have a variety of conditions and restrictions applied to it. You can find out more at Detailing Advanced Policies. This feature allows you to restrict who can read content by ip address, access keys, or usernames.

    Static web hosting allows you to create a web site based on the files in a container. If you have an index.html, it becomes to the basis for accessing all of the other files in this directory. This is both good and bad because you get the basic functionality of a web server but you don't get the configuration and access logs. It has some uses but is limited in how it works. It does make static web page presentation easy because you no longer need an EC2 instance, operating system, or application to host the web site.

    Logging allows you to view how, who, and from where files were accessed. You can generate logs to look at access patterns and access locations.

    Versioning allows you to keep past copies of files. If a file is edited and changed, previous versions and deltas are tracked. This is a good feature but does cause storage consumption to grow because you never delete older versions of a file but keep the deltas for a fixed amount of history.

    Lifecycle allows you to automatically archive files from spinning disk to tape after a fixed amount of time and history of access. If no one has accessed a file in months, it can be written to Glacier for long term lower cost archive.

    Cross-Region Replication allows you to replicate blocks between data centers automatically. This allows for high availability in the event that one data center fails or storage at one location is having significant problems.

    Tags and Request Payer allows for charge-back features to allow people who consume resources to pay for the download and storage. The person creating the bucket is not charged for usage but has the mechanism to transfer the charges to the person reading the data.

    Reading and writing to our newly created bucket requires a user interface or usage of the Amazon Rest api to transfer files. Amazon does provide a user interface to upload and edit the properties of the files and directories. We recommend using another interface like CloudBerry or other graphical tool or the command line utilities because this interface is a bit limiting.

    This blog entry is significantly different from the one yesterday. Yesterday we started with pricing then got technical. Today we dove straight into the technical and ignored pricing. Let's dive into pricing. The cost of S3 storage is $30/TB/month plus outbound charges. I suggest using the S3 price list and the S3 price calculator to figure pricing. Attached are screen shots of pricing for 120 TB of storage using the calculator and screen shots of the price list.

    One thing that we talked about with the Oracle Storage Cloud and have not talked about here is an on premise virtual machine to reduce latency. Amazon offers this with the AWS Storage Gateway. The key differences between the two products are 1) AWS Gateway uses iSCSI on the client side to provide storage access to the data center and 2) it cost $125/month/gateway. It solves the same latency problem but does it slightly differently. Unfortunately, we are not going to install and configure this virtual instance and play with it because it requires 8 virtual CPUs which is greater than my laptop will offer.

    In summary, this is an initial review of S3 storage with Amazon AWS. We did not dive deep into Glacier or the Storage Gateway. We did not review elastic block services (EBS) because these are typically attached to EC2 instances. It is important to note that the focus of S3 is different than Oracle Storage Cloud Services but very similar. Files and directories can be stored in containers and access can be controlled. S3 extends services to provide things like video streaming, static web site hosting, and migrating data to and from tape in the cloud. You can use S3 for backup archives and generic block storage and access it via REST api or AWS api calls. Products like CloudBerry Explorer and S3 Explorer exist to help translate the human interface to S3 storage calls. The cost for S3 is roughly $30/TB/month with additional charges for outbound data on a per GB basis. Archive storage is roughly $7/TB/month with additional charges for data retrieval and outbound data on a per GB basis. The intent of this blog is not to say that one service is better than the other but provide resources to help you make your own decisions and decide what works best for your situation and corporation.

    Storage in the Oracle Cloud

    Tue, 2016-04-19 02:07
    This week we are going to focus on storage. Storage is a slippery slope and difficult conversation to have. Are we talking about a file synchronization like dropbox.com, google.com/docs, or box.com? Are we talking about raw block storage or long term archive storage? There are many services available from many vendors. We are going to focus on block storage in the cloud that can be used for files if desired or for backups of databases and virtual machines. Some of the cloud vendors have specific focused storage like Azure tables that offer a noSQL type storage or Amazon S3 allowing you to run a website without a web server. Today we will look at the Oracle IaaS Storage set of products. This is different than the Oracle PaaS Documents option which is more of a Google Docs like solution. The IaaS Storage is a block of storage that you pay for either on a metered usage or non-metered usage basis.

    Notice from the cloud.oracle.com web page, we click on Infrastructure and follow the Storage route. We see that we get the raw block storage or the archive storage as options. We also have the option of an on-site cache front end that reduces latency and offers an NFS front end to the users providing more of a document management strategy rather than a raw block option.

    Before we dive a little deeper into the options and differences between the storage appliance, spinning disk, and spinning tape in the cloud, we need to have a discussion about pricing and usage models. If you click on the Pricing tab at the top of the screen you see the screens below.

    Metered pricing consists of three parts. 1) how much storage are you going to start with, 2) how much storage are you going to grow to, and 3) how much are you going to read back? Metering is difficult to guestimate and unfortunately it has a significant cost associated with being wrong. Many long term customers of AWS S3 understand this and have gotten sticker shock when the first bill comes in. The basic cost for outbound transfer is measured on a per GB basis. The more that you read across the internet, the more you pay. You can circumvent this by reading into a compute server in the Oracle cloud and not have to pay the outbound transfer. If, for example, you are backing up video surveillance data and uploading 24 hours of video at the end of they day, you can read the 24 hour bundle into a compute server and extract the 10-15 minutes that you are interested in and pay for the outbound charges on compute for the smaller video file.

    Non-Metered pricing consists of one part. How much storage are you going to use over the year. Oracle does not charge for the amount of data transferred in-bound or out-bound with this storage. You can read and write as much as you want and there is no charge for data transfer across the internet. In the previous example you could read the 24 hours of video from the cloud storage, throw away 90% of it from a server in your data center, and not incur any charges for the volume of transfer.

    Given that pricing is difficult to calculate, we created our own spreadsheet to estimate pricing as well as part numbers that should be ordered when consuming Oracle cloud resources. The images below show the cost of 120 TB of archive storage, metered block storage, and non-metered block storage.

    Note that the data transfer price is non-trivial. Reading the data back from the cloud can get significantly more expensive than the cost of the storage itself. A good rule of thumb is the cost of spinning disk in the cloud should not exceed $30/TB/month or $400/TB/year. If you look at the cost of a NetApp or EMC storage system, you are looking at $3K-$4K/TB purchase price with 10% annual maintenance per year ($300-$400). If you are currently running out of storage and your NFS filer is filling up, you can purchase cloud resources for a few months and see if it works. It won't cost you anything more than paying support and you can grow your cloud storage as needed rather than buying 3 years ahead as you would with a filer in your data center. The key issue with cloud storage is latency and access times. Access to a filer in your data center is typically 10ms where access time to cloud storage is typically 80+ms. All cloud storage vendors have on site appliance solutions that act as cache front ends to address this latency problem. Oracle has one that talks NFS. Amazon has one that talks iSCSI. Microsoft has one that talk SMB. There truly is no single vendor with a generic solution that addresses all problems.

    Enough with the business side of storage. Unfortunately, storage is a commodity so the key conversation is economics, reliability, and security. We have already addressed economics. When it comes to reliability the three cloud vendors address data replication and availability in different ways. Oracle triple mirrors the data and provides public-private key encryption of all data uploaded to the cloud. Data can be mirrored to another data center in the same geography but can not be mirrored across an ocean. This selection is done post configuration and is tied to your account as a storage configuration.

    Now to the ugly part of block storage. Traditionally, block storage has been addressed through an operating system as a logical unit or aggregation of blocks on a disk drive. Terms like tracks and sectors bleed into the conversation. With cloud storage, it is not part of the discussion. Storage in the cloud is storage. It is accessed through an interface called a REST api. The data can be created, read, updated, and deleted using html calls. All of this is documented in the Oracle Documents - Cloud Storage web site.

    The first step is to authenticate to the cloud site with an instance name, username, and password. What is passed back is an authentication token. Fortunately, there are a ton of tools to help read and write HTML code and are specifically tuned to help create headers and JSON structured data packets for the REST api interfaces. The screen below shows the Postman interface available through Chrome. A similar one exists for Firefox called RESTClient API. Unfortunately, there is no extension for Internet Explorer.

    The first step is to get an auth header by typing in the username and password into the Basic Authentication screen.

    Once we are authorized, we connect to the service by going to https://storage.us2.oraclecloud.com/v1/Storage-(identity domain) where identity domain is the cloud provider account that we have been assigned. In our example we are connecting to metcsgse00029 as our identity domain and logging in as the user cloud.admin. We can see what "containers" are available by sending a GET call or create a new container by sending a PUT call with the new container name at the end of our html string. I use the word container because the top level of storage consists of different areas. These areas are not directories. They are not file systems. The are containers that hold special properties. We can create a container that is standard storage which represents spinning disk in the cloud or we can create a container that is archive storage which represents a tape unit in the cloud. This is done by sending the X-Storage-Class header. If there is no header, the default is block storage and spinning disk. If the X-Storage-Class is assigned to Archive it is tape in the cloud. Some examples of creating a container are shown below. We can do this via Postman inside Chrome or a command line

    From the command line this would look like

    export OUID=cloud.admin
    export OPASS=mypassword
    export ODOMAIN=metcsgse00029
    c url -is -X GET -H "X-Storage-User:Storage-$ODOMAIN:$OUID" 
                     -H "X-Storage-Pass:$OPASS" 
                     https://$ODOMAIN.storage.oraclecloud.com/auth/v1.0
    

    This should return an html header with HTTP 200 OK and an embedded header of X-Auth-Token: AUTH_tk578061b9ae7f864ae9cde3cfdd75d706. Note that the value after the X-Auth-Token is what we will use to pass into all other requests. This token will change with each request and is good for 30 minutes from first execution. Once we have the authentication finished we either change the request type from a GET to a PUT and append the container name to the end. The screen above shows how to do this with Postman. The results should look like the screen below. We can do this from the command line as show below as well.

    c url -is -X PUT -H "X-Auth-Token: AUTH_tk578061b9ae7f864ae9cde3cfdd75d706" 
                     https://storage.us2.oraclecloud.com/v1/Storage-$ODOMAIN/new_area
    
    In this example we create a new container from the command line called new_area. We can verify this by reviewing the cloud storage by changing the PUT to a GET.

    c url -is -X GET -H "X-Auth-Token: AUTH_tk578061b9ae7f864ae9cde3cfdd75d706" 
                     https://storage.us2.oraclecloud.com/v1/Storage-$ODOMAIN
    
    Both of these methods allow us to see the storage that we created. I personally do not like this interface. It is not intended to be human consumable. Uploading and downloading a file is difficult at best. A user interface that makes dragging and dropping files is desirable. This is where dropbox and google docs shine. They allow you to drag and drop as well as synchronize directories to cloud storage. The Oracle Storage Cloud is not intended to be this solution. It is designed so that you can drop a new library into your rman backup and backup straight from your database to the cloud. You can point your ComVault or Legato backup software to a cloud instance and replicate your data to the cloud. If you want a human readable interface you need to purchase something like the Cloudberry Explorer from Cloudberry. This give you a Windows Explorer like interface and allows your to drag and drop files, create containers and directories, and schedule archives or backups as desired.

    Note that the way that you create a block storage container vs an archive container is a simple menu selection. Retrieving the archive storage is a little more complex because the tape unit must stage the file from the tape to disk and notify you that the restoration has been completed. This is a little more complex and we will defer this discussion to a later blog.

    Copying files is little more than dragging and dropping a file between sections of a window in Cloudberry.

    For completeness, I have included the command line screen shots so that you can see the request/response of a command line interaction.

    It is important to remember our objective. We can use the cloud block storage as a repository for things like database and a holding point for our backups. When we configure a database in the cloud, we backup and restore from this storage. This is configured in the database provisioning screen. The Storage-metcsgse00029/backup is the location of RMAN backup and restores. The backup container is created through the REST api or Cloudberry interface. We can also attach to the cloud storage through the cloud storage appliance software which runs inside a virtual machine and listens for NFS requests and translates them into REST api calls. A small disk is attached to the virtual machine and it acts as a cache front end to the cloud storage. As files are written via NFS they are copied to the cloud storage. As the cache fills up, files contents are dropped from local storage and the metadata pointing to where the files are located are updated relocating the storage to the cloud rather than the cache disk. If a file is retrieved via NFS, the file is read from cache or retrieved from the cloud and inserted into the cache as it is written to the client that requested it.

    In summary, we covered the economics behind why you would select cloud storage over on site storage. We talked about how to access the storage from a browser based interface, web based interface, or command line. We talked about improving latency and security. Overall, cloud based storage is something that everyone is familiar with. Products like Facebook, Picaso, or Instagram do nothing more than store photos in cloud storage for you to retrieve when you want. You pay for these services by advertisements injected into the web page. Corporations are turning more and more towards cloud storage as a cheaper way to consume long term storage at a much lower price. The Oracle Storage Cloud service is first of three that we will evaluate this week.

    installing Tomcat on Docker

    Mon, 2016-04-18 02:07
    A different way of looking at running Tomcat is to ignore the cloud platform and install and configure everything inside a docker instance. Rather than picking a cloud instance we are going to run this inside VirtualBox and assume that all of the cloud vendors will allow you to run Docker or configure a Docker instance on a random operating system. What we did was to initially install and configure Oracle Enterprise Linux 7.0 from a iso into VirtualBox. We then installed Docker with the command and start the service
    sudo yum install docker
    sudo systemctl start docker
    

    We can search for a Tomcat installation and pull it down to run. We find a Tomcat 7.0 version from the search and pull down the configuration

    docker search tomcat
    docker pull consol/tomcat-7.0
    

    We can run the new image that we pulled down with the commands

    docker run consol/tomcat-7.0
    docker ps
    
    The docker ps command allows us to look at the container id that is needed to find the ip address of the instance that is running in docker. In our example we see the container id is 1e381042bdd2. To pull the ip address we execute
    docker inspect -f format='{{.NetworkSettings.IPAddress}}' 1e381042bdd2
    
    This returns the ip address of 172.17.0.2 so we can open this ip address and port 8080 to see the Tomcat installation.

    In summary, this was not much different than going through Bitnami. If you have access to docker containers in a cloud service then this might be an alternative. All three vendors not only support docker instances but all three have announced or have docker services available through IaaS. Time wise it did take a little longer because we had to download an operating system as well as Java and Tomcat. The key benefit is that we can create a master instance and create our own docker image to launch. We can script docker to restart if things fail and do more advanced options if we run out of resources. Overall, this might be worth researching as an alternative to provisioning and running services.

    technical diversion - DBaaS Rest APIs

    Fri, 2016-04-15 02:07
    We are going to take a side trip today. I was at Collaborate 2016 and one of the questions that came up was how do you provision 40 database instances for a lab. I really did not want to sit and click through 40 screens and log into 40 accounts so I decided to do a little research. It turns out that there is a relatively robust REST api that allows you to create, read, update, and delete database instances. The DBaaS Rest Api Documentation is a good place to start to figure out how this works.

    To list instances that are running in the database service use the following command, note that "c url" should be shortened to remove the space. Dang blogging software! Note to make things easier and to allow us to script creation we define three variables on the command line. I did most of this testing on a Mac so it should translate to Linux and Cygwin. The three variables that we need to create are

    • ODOMAIN - instance domain that we are using in the Oracle cloud
    • OUID - username that we log in as
    • OPASS - password for this instance domain/username
    export ODOMAIN=mydomain
    export OUID=cloud.admin
    export OPASS=mypassword
    c url -i -X GET -u $OUID:$OPASS -H "X-ID-TENANT-NAME: $ODOMAIN" -H "Content-Type:application/json" https://dbaas.oraclecloud.com/jaas/db/api/v1.1/instances/$ODOMAIN
    
    What should return is
    HTTP/1.1 200 OK
    Date: Sun, 10 Apr 2016 18:42:42 GMT
    Server: Oracle-Application-Server-11g
    Content-Length: 1023
    X-ORACLE-DMS-ECID: 005C2NB3ot26uHFpR05Eid0005mk0001dW
    X-ORACLE-DMS-ECID: 005C2NB3ot26uHFpR05Eid0005mk0001dW
    X-Frame-Options: DENY
    X-Frame-Options: DENY
    Vary: Accept-Encoding,User-Agent
    Content-Language: en
    Content-Type: application/json
    
    {"uri":"https:\/\/dbaas.oraclecloud.com:443\/paas\/service\/dbcs\/api\/v1.1\/instances\/metcsgse00027","service_type":"dbaas","implementation_version":"1.0","services":[{"service_name":"test-hp","version":"12.1.0.2","status":"Running","description":"Example service instance","identity_domain":"metcsgse00027","creation_time":"Sun Apr 10 18:5:26 UTC 2016","last_modified_time":"Sun Apr 10 18:5:26 UTC 2016","created_by":"cloud.admin","sm_plugin_version":"16.2.1.1","service_uri":"https:\/\/dbaas.oraclecloud.com:443\/paas\/service\/dbcs\/api\/v1.1\/instances\/metcsgse00027\/test-hp"},{"service_name":"db12c-hp","version":"12.1.0.2","status":"Running","description":"Example service instance","identity_domain":"metcsgse00027","creation_time":"Sun Apr 10 18:1:21 UTC 2016","last_modified_time":"Sun Apr 10 18:1:21 UTC 2016","created_by":"cloud.admin","sm_plugin_version":"16.2.1.1","service_uri":"https:\/\/dbaas.oraclecloud.com:443\/paas\/service\/dbcs\/api\/v1.1\/instances\/metcsgse00027\/db12c-hp"}],"subscriptions":[]}
    

    If you get back anything other than a 200 it means that you have the identity domain, username, or password incorrect. Note that we get back a json structure that contains two database instances that were previously created, test-hp and db12c-hp. Both are up and running. Both are 12.1.0.2 instances. We don't know much more than these but can dive a little deeper by requesting more information by included the service name as part of the request. A screen shot of the deeper detail is shown below.

    A list of the most common commands are shown in the screen shot below

    The key options to remember are:

    • list: -X GET
    • stop: -X POST --data '{ "lifecycleState" : "Stop" }'
    • restart: -X POST --data '{ "lifecycleState" : "Restart" }'
    • delete: -X DELETE **need to add the instance name at the end, for example db12c-hp in request above
    • create: -X POST --data @createDB.json
    In the create option we include a json file that defines everything for the database instance.
    {
      "serviceName": "test-hp",
      "version": "12.1.0.2",
      "level": "PAAS",
      "edition": "EE_HP",
      "subscriptionType": "HOURLY",
      "description": "Example service instance",
      "shape": "oc3",
      "vmPublicKeyText": "ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAnrfxP1Tn50Rvuy3zgsdZ3ghooCclOiEoAyIl81Da0gzd9ozVgFn5uuSM77AhCPaoDUnWTnMS2vQ4JRDIdW52DckayHfo4q5Z4N9dhyf9n66xWZM6qyqlzRKMLB0oYaF7MQQ6QaGB89055q23Vp+Pk5Eo+XPUxnfDR6frOYZYnpONyZ5+Qv6pmYKyxAyH+eObZkxFMAVx67VSPzStimNjnjiLrWxluh4g3XiZ1KEhmTQEFaLKlH2qdxKaSmhVg7EA88n9tQDWDwonw49VXUn/TaDgVBG7vsWzGWkRkyEN57AhUhRazs0tEPuGI2jXY3V8Q00w3wW38S/dgDcPFdQF0Q== rsa-key-20160107",
      "parameters": [
        {
          "type": "db",
          "usableStorage": "20",
          "adminPassword": "Test123_",
          "sid": "ORCL",
          "pdb": "PDB1",
          "failoverDatabase": "no",
          "backupDestination": "none"
        }
      ]
    }
     
    
    The vmPublicKeyText is our id_rsa.pub file that we use to connect to the service. I did not include a backup space but could have. I did not in this example because we have to embed a password in this space and I did not want to show a service with username and password.

    Overall, I prefer scripting everything and running this from a command line. Call me old school but sitting for hours and clicking through screens when I can script it and get a notification when it is done appeals to me.

    Tomcat on Azure

    Thu, 2016-04-14 02:07
    Today we are going to install Tomcat on Microsoft Azure. In the past three days we have installed Tomcat on Oracle Linux using Bitnami and onto a raw virtual image as well as on Amazon AWS using a raw virtual image. Microsoft does not really have a notion of a MarketPlace like the AWS Commercial or Public Domain AMI Markets. It does have Bitnami and we could go through the installation on Azure just like we did the Oracle Compute Cloud. Rather than repeating on yet another platform, let's do something different and look at how we would install Tomcat on Windows on Azure. The Linux installation would be no different than the Oracle Linux raw virtual machine install so let's do something different. You can find Tomcat on Linux Instructions or Tomcat on Windows Instructions. To be honest we won't deviate much from the second one so follow this or follow the instructions from Microsoft, they are basically the same.

    The steps that we need to follow are

    • Create a virtual machine with Windows and Java enabled
    • Download and install Tomcat
    • open the ports on the Azure portal
    • open the ports on Windows
    We start by loading a virtual machine in the Azure portal. Doing a search for Tomcat returns the Bitnami image as well as a Locker Tomcat Container. This might work but it does not achieve our desire for this exercise. We might want to look at a Container but for our future needs we need to be able to connect to a database and upload jar and war files. I am not sure that a Container will do this.

    We search for a JDK and find three different versions. We select the JDK 7 and click Create.

    In creating the virtual machine, we define a name for our system, a default login, a password (I prefer a confirmation on the password rather than just entering it once), our default way of paying, and where to place it is storage and which data center based on the storage we select. We go with the default East configuration and click OK.

    Since we are cheap and this is only for demo purposes, we will select A0 Standard. The recommended is A1 Standard but it is $50 more per month and again this is only for demo purposes. After having played with the A0 Standard, we might be better off going with the A1 Standard. Yes, it is more expensive. The speed of the A0 shape is so painful that it is almost unusable.

    We will want to open up ports 80, 8080, and 443. These will all be used for Tomcat. This can be done by creating an new security rule and adding port exceptions when we create the virtual machine. We can see this in the installation menu.

    We add these ports and can click Create to provision the virtual machine

    One of the things that I don't like about this configuration is that we have three additional ports that we want to add. When we add them we don't see the last two rules. It would be nice if we could see all of the ports that we define. We also need to make sure that we have a different priority for the port definition. The installation will fail if we assign priority 1000 to all of the ports.

    Connection to the virtual machine is done through remote desktop. If you go to the portal and click on the virtual machine you will be able to connect to the console. I personally don't like connecting to a gui interface but prefer a command line interface. You must connect with a username and password rather than a digital certificate.

    The first thing that comes up with Windows 2012 server is the server management screen. You can use this to configure the compute firewall and allow ports 80, 8080, and 443 to go to the internet. This also requires going to the portal to enable these ports as network rules. You have two configurations that you need to make to enable port 8080 to go from your desktop, through the internet, get routed to your virtual machine, then into your tomcat application.

    For those of you that are Linux and Mac snobs, getting Windows to work in Azure was a little challenging. Simple things like opening a browser became a little challenging. This is more a lack of Windows understanding. To get Internet Explorer to come up you first have to move your mouse into the far right of the screen.

    At first it did not work for me because the Windows screen was a little larger than my desktop and I had to scroll all the way to the bottom and all the way to the right before the pop up navigation window comes up. When the window does come up you see three icons. The bottom icon is the configuration that allows you to get the the Control Panel to configure the firewall. The icon above it is the Microsoft Windows icon which gives you an option to launch IE. Yes, I use Windows on one of my desktops. Yes, I do have an engineering degree. No, I don't get this user interface. Hovering over an empty spot on the screen (which is behind a scroll bar) makes no sense to me.

    From this point forward I was able to easily follow the Microsoft Tomcat installation instructions. If you don't select the JDK 7 Virtual Machine you can download it from java.com download. You then download the Tomcat app server. We selected Tomcat 7 for the download and followed the defaults. We do need to configure the firewall on the Windows server to enable ports 80, 8080, and 443 to see everything from our desktop browser. We can first verify that Tomcat is properly installed by going to http://localhost:8080 from Internet Explorer in the virtual image. We can then get the ip address of our virtual machine and test the network connections from our desktop by replacing localhost with the ip address. Below are the screen shots from the install. I am not going to go through the instructions on installing Tomcat because it is relatively simple with few options but included the screen shots for completeness.

    In Summary, we could have followed the instructions from Microsoft to configure Tomcat. We could pre-configure the ports as suggested in this blog. We could pre-load the JDK with a virtual machine rather than manually downloading it. It took about 10-15 minutes to provision the virtual machine. It then took 5-10 minutes to download the JDK and Tomcat components. It took 5-10 minutes to configure the firewall on Windows and the port access through the Azure portal. My suggestion is to use a service like Bitnami to get a preconfigured system because it takes about half the time and enables all of the ports and services automatically.

    Pages