Pat Shuff

Subscribe to Pat Shuff feed
Oracle Blogs
Updated: 8 hours 21 min ago

database options - compression

Wed, 2016-06-08 02:07
A natural follow on to database partitioning is database compression. With partitioning we wanted to split everything into buckets based on how frequently it is used and minimize the more used stuff so that it would fit into memory. The older stuff that we don't access that frequently can be put on slower and lower cost storage. In this blog we are going to look at different techniques to use the cheaper storage event more. Since we don't access this data very frequently and most of the time when we access it we only need to read it and not write to it, we should be able to take advantages of common data and compress the information to consume less storage. If, for example, we have the census data that we are storing and we want to store city and state information we can take advantage of not having Punxsutawney, Pennsylvania stored 5900 times based on the current population. If we stored a copy of this roughly 6000 time it would take up 6000 times 12 bytes for the city and 6000 times 12 bytes for the state. We would also store 15767 as the zip code roughly 6000 times consuming 6000 times 9 bytes. If we could create a secondary table that contains Punxsutawney, Pennsylvania 15767 and correlate it to the hexadecimal value 2e, we could store 2e for the city, state, and zip code thus consuming one byte each rather than 12, 12, and 9 bytes. We effectively save 180,000 bytes by doing a replacement value rather than storing the long strings multiple times. This is effectively the way that hybrid columnar compression works.

Compression can be done at a variety of levels and locations. Disk vendors for years have touted compression in place on storage to consume less space. Compression has been used in a variety of industries. Audio compression, for example, takes recorded audio and under samples the changes in volume and pitch and only records only 8,000 samples per second since the ear can not really hear changes faster than that. These changes are then compressed and stored in an mp3 or avi format. Programs know how to take the mp3 format and rebuild the 8k sample and drive a speaker to estimate the sound that was originally created. Some people can hear the differences and still want to listen to music recorded on reel to reel tape or vinyl because the fidelity is better than CD-ROM or DVD. Videos do the same thing by compressing a large number of bits on a screen and break it into squares on the screen. Only the squares that are changing are transmitted rather than sending all of the data across the whole screen and the blocks that did not change are redisplayed rather than being retransmitted thirty times a second. This allows for video distribution of movies and video recordings across the internet and storage on a DVD rather than recording all of the data all of the time.

Generically compressing data for a database can be complex and if done properly works well. It can also be done very poorly and cause performance problems and issues when reading back the data. Let's take the census data that we talked about earlier. If we store the data as bytes it will consume 198K of space on the disk. If we use the compression ratio that we talked about we will consume roughly 20K of data. This gives us a 10x compression ratio and saves us a significant amount of space on the disk. If the disk sub-system does this compression for us we write 198K of data to the disk, it consumes 20K of storage on the spindles, but when we read it back it has to be rehydrated and we transfer 198K back to the processor and consume 198K of memory to hold the rehydrated data. If the database knew what the compression algorithm and compressed the data initially in memory it would only transmit 20K to the disk, store 20K on the spindles, read 20K back from the disk, and consume 20K of memory to hold the data. This might not seem significant but if we are reading the data across a 2.5 G/second SCSI connection it takes 80ms to read the data rather than 8ms. This 72ms difference can be significant if we have to repeat this a few thousand times. It can also be significant if we have a 1 GigE network connection rather than a direct attached disk. The transfer time jumps to 200ms by moving the data from an attached disk to an nfs or smb mounted disk. We see performance problems like this with database backups to third party storage solutions like Data Domain. If you take a database backup and copy it to a Data Domain solution you get the 10x compression and the backup takes roughly an hour. You have to estimate that it will take seven to eight times the time to rehydrate the data so a restore will take 7-8 hours to recover your database.

The recommended solution is to use compression inside the database rather than third party compression solutions that are designed to compress backups, home directories, and email attachments. Oracle offers Advanced Compressions options for information stored in the database. If you look at the 12c Advanced Compression Data Sheet you will notice that there are a variety options available for compression. You can compress

  • using historic access patterns (heat map and ADO options)
  • using row compression (by analyzing update and insert operations as they occur)
  • file compression (duplicate file links and file compression of LOBS, BLOGS, and CLOBS)
  • backup data compression
  • Data Guard compression of redo logs before transmission
  • index compressions
  • network transmission compression of results to client systems
  • hybrid columnar compression (Exadata and ZFS only)
  • storage snapshot optimization (ZFS only)

Heat Map Compression

At the segment level, Heat Map tracks the timestamps of the most recent modification and query of each table and partition in the database. At the block level, Heat Map tracks the most recent modification timestamp. These timestamps are used by Automatic Data Optimization to define compression and storage policies which will be automatically maintained throughout the lifecycle of the data. Heat Map skips internal operations done for system tasks -- automatically excluding Stats Gathering, DDLs, Table Redefinitions and similar operations. In addition, Heat Map can be disabled at the session level, allowing DBA’s to exclude manual maintenance, avoiding pollution of Heat Map data.

With the data collected by Heat Map, Oracle Database can automatically compress each partition of a table independently based on Heat Map data, implementing compression tiering. This compression tiering can use all forms of Oracle table compression, including: Advanced Row Compression and all levels of Hybrid Columnar Compression (HCC) if the underlying storage supports HCC. Oracle Database can also compress individual database blocks with Advanced Row Compression based on Heat Map data.

Row Compression

a segment-level ADO policy is created to automatically compress the entire table after there have been no modifications for at least 30 days, using Advanced Row Compression:

ALTER TABLE employee ILM ADD POLICY ROW STORE COMPRESS ADVANCED SEGMENT AFTER 30 DAYS OF NO MODIFICATION;
In this next example, a row-level ADO policy is created to automatically compress blocks in the table, after no rows in the block have been modified for at least 3 days, using Advanced Row Compression:
ALTER TABLE employee ILM ADD POLICY ROW STORE COMPRESS ADVANCED ROW AFTER 3 DAYS OF NO MODIFICATION;
In addition to Smart Compression, other ADO policy actions can include data movement to other storage tiers, including lower cost storage tiers or storage tiers with other compression capabilities such as Hybrid Columnar Compression (HCC). HCC requires the use of Oracle Storage – Exadata, Pillar Axiom or Sun ZFS Storage Appliance (ZFSSA).

In this example, a tablespace-level ADO policy automatically moves the table to a different tablespace when the tablespace currently containing the object meets a pre-defined tablespace fullness threshold:

ALTER TABLE employee ILM ADD POLICY tier to ilmtbs;
Another option when moving a segment to another tablespace is to set the target tablespace to READ ONLY after the object is moved. This is useful for historical data during database backups, since subsequent full database backups will skip READ ONLY tablespaces.

Advanced Row Compression uses a unique compression algorithm specifically designed to work with OLTP applications. The algorithm works by eliminating duplicate values within a database block, even across multiple columns. Compressed blocks contain a structure called a symbol table that maintains compression metadata. When a block is compressed, duplicate values are eliminated by first adding a single copy of the duplicate value to the symbol table. Each duplicate value is then replaced by a short reference to the appropriate entry in the symbol table.

File Compression

Consider an email application where 10 users receive an email with the same 1MB attachment. Without Advanced LOB Deduplication, the system would store one copy of the file for each of the 10 users – requiring 10MB of storage. If the email application in our example uses Advanced LOB Deduplication, it will store the 1MB attachment just once. That’s a 90% savings in storage requirements.

In addition to the storage savings, Advanced LOB Deduplication also increases application performance. Specifically, write and copy operations are much more efficient since only references to the SecureFiles data are written. Further, read operations may improve if duplicate SecureFiles data already exists in the buffer cache.

Backup data compression

RMAN makes a block-by-block backup of the database data, also known as a “physical” backup, which can be used to perform database, tablespace or block level recovery. Data Pump is used to perform a “logical” backup by offloading data from one or more tables into a flat file.

Due to RMAN’s tight integration with Oracle Database, backup data is compressed before it is written to disk or tape and doesn’t need to be uncompressed before recovery – providing an enormous reduction in storage costs and a potentially large reduction in backup and restore times. There are three levels of RMAN Compression: LOW, MEDIUM, and HIGH. The amount of storage savings increases from LOW to HIGH, while potentially consuming more CPU resources.

Data Pump compression is an inline operation, so the reduced dump file size means a significant savings in disk space. Unlike operating system or file system compression utilities, Data Pump compression is fully inline on the import side as well, so there is no need to decompress a dump file before importing it. The compressed dump file sets are automatically decompressed during import without any additional steps by the Database Administrator.

Data Guard redo log compression

Data Guard Redo Transport Services are used to transfer this redo data to the standby site(s). With Advanced Compression, redo data may be transmitted in a compressed format to reduce network bandwidth consumption and in some cases reduce transmission time of redo data. Redo data can be transmitted in a compressed format when the Oracle Data Guard configuration uses either synchronous redo transport (SYNC) or asynchronous redo transport (ASYNC).

Index Compression

Advanced Index compression is a new form of index block compression. Creating an index using Advanced Index Compression reduces the size of all supported unique and non-unique indexes -- while still providing efficient access to the indexes. Advanced Index Compression works well on all supported indexes, including those indexes that are not good candidates (indexes with no duplicate values, or few duplicate values, for given number of leading columns of the index) with the existing index Prefix Compression feature.

Network Compression

Advanced Network Compression, also referred to as SQL Network Data Compression, can be used to compress the network data to be transmitted at the sending side and then uncompress it at the receiving side to reduce the network traffic. Advanced Network Compression reduces the size of the session data unit (SDU) transmitted over a data connection. Reducing the size of data reduces the time required to transmit the SDU.

Advanced Network Compression not only makes SQL query responses faster but also saves bandwidth. On narrow bandwidth connections, with faster CPU, it could significantly improve performance. The compression is transparent to client applications.

We won't cover the last two options since they don't apply to database services in the cloud unless you purchase the Exadata as a Service option. There is a Compression Estimation Tool to help you estimate the benefits of compression. A sample of this looking at 100 TB of database data shows a significant cost savings in the millions of dollars.

There is also a Compression Advisor that can be downloaded and installed in your database to look at your tables and estimate how much storage you can save based on your data and your usage patterns. You can watch a Four minute marketing video on the tool and how to use it. I recommend Tyler Mouth's blog entry on customizing the output of the compression advisor to be a little more user friendly. I would also look at Mike Haas's Blog on compression and the DBAORA blog that provides a good overview of 11g compressions. Mike Messin's blog is a good blog on installing and executing the compression advisor.

In summary, compression can be used with a variety of mechanisms based on your usage patterns and objectives. This option is not one size fits all and requires a DBA with knowledge of the usage patterns and familiarity of the data and applications. Letting a non-DBA decide on the compression mechanism can lead to poor performance, missing recovery objective times, increased network throughput, and higher processor utilization than necessary. The Database 12c Compression Documentation details how to create tables that are compressed, how to look and see if tables are compressed, and how to update tables for compression. Compression is a mechanism that can directly reduce your storage costs by consuming significantly less amounts of storage to store the same data. In the cloud this correlates directly to storage cost savings. You get compression as an option for High Performance Edition and Extreme Performance Edition but not the Standard Edition or Enterprise Edition versions of the database.

database option - partitioning part 2

Tue, 2016-06-07 02:07
Yesterday we looked at partitioning. Today we are going to continue this evaluation but actually execute code rather than talk in abstracts. If we want to create a partition, this is easily done by appending partitioning to a table create. It is important to remember that this option cost money when done on-premise and is typically done either to improve performance by having a smaller table to bring into memory or done to split storage so that higher speed disk can be assigned to more relevant data and lower speed and lower cost disk can be assigned to data we typically don't need to read regularly. If we are looking at using partitioning in the cloud, tiering storage is not an option. We get one disk, one type of disk, and can't assign higher speed storage to that disk partition with PaaS or DBaaS. We pay $50/TB/month to attach a disk to a compute engine and that stores our data. The tablespaces are stored in either the USER tablespace or the SYSTEM tablespace based on who creates the tablespace. To quickly review we have tables that contain our data. This data is stored in a tablespace. The tablespace might contain multiple tables or parts of tables if partitioning is used. We can assign tablespaces to different directories and typically do with on-premise systems. This allows us to put data that we need fast access to in flash memory and historic data that we might read once a year in lower cost network storage and not have to backup the historic data on a daily basis. With DBaaS we get a /u02 directory that contains the oradata folder. All tablespaces are created in this area by default. Theoretically we could mount an nfs file share if we ran the storage cloud appliance on a compute instance and pay $30/TB/month for this storage. We would have to install the nfs client on our database instance, install OSCSA on a compute instance and share the nfs directory, create a cloud storage container to hold our historic tablespaces, and point our historic partitions to our nfs mounted directories. We are not going to do this in this blog but it is an interesting thought on how to reduce the cost of storage as well as expand the amount of data that you can support with a DBaaS instance.

Let's create a few tablespaces and a partitioned table to see how it works. Most of these examples are liberally hijacked from other blogs and tutorials on the internet.

We need to note that the DBaaS that we provisioned needs to be High Performance Edition or Extreme Performance Edition. This option does not work with Standard Edition or Enterprise Edition and will fail when you try to create the table. We begin by creating a few tablespaces as well as a partitioned table that stores data into these tablespaces. It is important to note that we can easily do this because consuming storage only happens when we insert data and not create a table. We can play with creation all we want at very little cost. First, let's look at our layout using SQL Developer. If we connect to our database as a sys user we can see that by default we have the following tablespaces defined in our PDB1 pluggable container. The same is true for an 11g instance or container database. We are going to look at pluggable because it is easy to make sure that what we are creating is for this instance and not someone else playing with the system. If we add our database instance to the DBA view in SQL Developer we notice that Tablespaces appears as one of the line entries under our database. We can click on this and look at the tablespaces and files associated with them provisioned in our instance. To see the file allocation and which file system the tablespace is allocated in we need to scroll across the screen to see the information on the right.

We are going to create a few tablespaces then create a table and allocate provisions into these tablespaces. Note that these commands might not work on Amazon RDS because you need to have system level access to the database to create a tablespace and assign the file name. If we let the system use the default oradata area the create works fine. If we want to create the tablespace in /nfs/historic_oradata then the create will fail and is not allowed with RDS. Let's look at a simple example

CREATE TABLESPACE T1;
CREATE TABLESPACE T2;
CREATE TABLESPACE T3;
CREATE TABLESPACE T4;
CREATE TABLE credential_evaluations
( eval_id VARCHAR2(16) primary key
, grad_id VARCHAR2(12)
, grad_date DATE
, degree_granted VARCHAR2(12)
, degree_major VARCHAR2(64)
, school_id VARCHAR2(32)
, final_gpa NUMBER(4,2))
PARTITION BY RANGE (grad_date)
( PARTITION grad_date_70s
VALUES LESS THAN (TO_DATE('01-JAN-1980','DD-MON-YYYY')) TABLESPACE T1
, PARTITION grad_date_80s
VALUES LESS THAN (TO_DATE('01-JAN-1990','DD-MON-YYYY')) TABLESPACE T2
, PARTITION grad_date_90s
VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-YYYY')) TABLESPACE T3
, PARTITION grad_date_00s
VALUES LESS THAN (TO_DATE('01-JAN-2010','DD-MON-YYYY')) TABLESPACE T4 )
ENABLE ROW MOVEMENT;
The create tablespace t1 is needed prior to creating the partition that stores data in the tablespace t1 or the create table command will fail. We have to have the tablespace created before we allocate a partition into it. After we create the tablespace, we can look at the tablespace allocation with SQL Developer by going to the DBA view and looking at PDB1, tablespaces.

Note that the file /u02/app/oracle/oradata/ORCL/339C06AF452F1EB6E0531635C40AD41B/datafile/o1_mf_t1_co5fjnr3_.dbf was created for us. If we change our tablespace create command to

CREATE TABLESPACE T1 datafile '/u02/app/oracle/oradata/ORCL/PDB1/t1.dbf' size 2G;
CREATE TABLESPACE T2 datafile '/u02/app/oracle/oradata/ORCL/PDB1/t2.dbf' size 2G;
CREATE TABLESPACE T3 datafile '/u02/app/oracle/oradata/ORCL/PDB1/t3.dbf' size 2G;
CREATE TABLESPACE T4 datafile '/u02/app/oracle/oradata/ORCL/PDB1/t4.dbf' size 2G;
we drop the files into the directory that we want and have control over the file name and location. It is important to note that this will fail on Amazon RDS because we do not have access to the filesystem and can't specify the filename or location.

When we execute this command it takes significantly longer than our first execution because the system creates a 2 GB file before creating our tablespace and table. We would typically want to add other options like how to grow our partitions, limits on the size, and other dynamic commands. We are primarily concerned with where the file is created and not post maintenance at this point.

We need to make sure that we are running on High Performance Edition or Extreme Performance Edition because Standard Edition and Enterprise Edition fail during the create table command.

In summary, we looked a little deeper at partitioning by looking at the create tablespace and where it creates the files in the file system. We also looked at how we can control the naming as well as location with the create statement options. We briefly touch on two of the advantages that partitioning brings, speed and cost and talked about how to reduce cost by using an nfs share to store more data than a typical DBaaS provides as well as using $30/TB/month storage rather than $50/TB/month storage in the cloud. Hopefully this code example will allow you to play with partitioning and speed up select statements using the High Performance Edition of DBaaS.

database option - partitioning

Mon, 2016-06-06 02:07
Database partitioning has been around since 8i version of the database over ten years ago. The initial features of partitioning were manual processes that allowed you to split data by range like dates or sequences like zip codes. Tablespaces were able to be split into multiple files and indexes applied to each file. If a select statement were executed with a where clause that met the partition boundary, a full table scan was not necessary. Splitting the data into different tablespaces allows us not only to read only the relevant data into memory but we can split our database into storage tiers. We can keep the most used data in high speed disk and historic data in slower lower cost storage. Not only can we use lower cost storage but we can compress the data that is not changing and take up less space. We keep our frequently used data in a high speed disk (or memory if we are lucky) and our older data in lower cost storage. This is only available with partitioning and the reason why many customer purchase this as an option. The return on the software investment significantly reduces the cost of our database storage. We can use flash drives for our current quarter/month data, 10K rpm drives for last quarter/month data, 7.5K rpm drives the rest of the years data, and nfs mounts for data greater than a year old. The cost savings on storage more than pays for the cost of partitioning. Unfortunately, this does not correlate into cloud services since you really don't get into tiered storage behind a database when you consume DBaaS or PaaS. We need to focus on improving performance by helping subpartitions into the available memory to speed up select statements.

Some places to learn more about partitioning include

Before we go down the rabbit hole and dive deep into partitioning, let's review how a select statement works and how data is stored. Say for example we have a database that contains addresses for customers. The table contains an id number, a first name, last name, address, phone number, city, state, zip code, credit card number, credit card expiration, and email address. We have a second table for our on-line catalog that contains part numbers, a title, a description, and a file link for photos. We have a third table for our orders and it contains a customer id number, a part number, an order quantity, and order date. We would create our tables with the following commands

create table customers (
  customer_id number(8),,
  first_name varchar2(32),
  last_name varchar2(32),
  address varchar2(64),
  phone_number varchar2(10),
  city varchar2(32),
  state varchar2(16),
  zip_code varchar2(16),
  credit_card_number varchar2(16),
  credit_card_expiration varchar2(8)
  email_address varchar2(64)
);
create table catalog (
  part_number number(8),
  title varchar2(32),
  description varchar2(128),
  part_image blob
);
create order_entry(
  order_number number(8),
  customer_id number(8),
  part_number number(8),
  part_quantity number(8),
  order_date date
);

If we have ten million items in our catalog we potentially consume 128 + 32 + 8 + 16 bytes times 10,000,000. This makes our table roughly 2 TB in size. If we have two million orders we have about 0.5 TB for the order_entry table. When we create a database we have the option of defining not only the storage type that we want our table to reside in but we can define how and where to store the data associated with this table. By default all tables that we create as a user are stored in the SYSTEM tablespace. All three of these tables will be stored in the DATA area under the SYSTEM tablespace since we did not specify a storage area or tablespace to hold the tables. For the database that we created in previous blog entries using Oracle DBaaS, these files are stored in /u02. We can dive down the /u02/app/oracle/oradata/ORCL/PDB1 and see that there is a system01.dbf file. This correlates to the SYSTEM tablespace in the PDB1 pluggable database. As tables are added, they are added to the system01.dbf file. If we are in the container database ORCL the files are stored in /u02/app/oracle/oradata/ORCL/system01.dbf file.

To help with database performance, index are created into tables so that a reference to a table knows where in the system01.dbf file the table customers and catalog are located. We can also create an index on the table. This index is also stored in the system01.dbf file so that we can look up common queries as they are executed. For example, if we are looking for all orders that happened in February we can select this data quicker with an index by presorting all of the data related to order_date. The index allows us to directly access the table entries in the system01.dbf table by creating an index link to the entries. This index is also stored in the system01.dbf file and re-created when we enter new data into the order_entry table. Hopefully our indexes are small enough to stay resident in memory and we don't need to go to storage to reload and reindex our indexes. Partitioning helps keep indexes smaller as well and unused indexes can be aged out to disk to free up memory. If we never look at data that is two years old, we don't need to keep an index on our two year old data in memory but pull it in from disk when needed.

To reduce the access time and select time we can pre-sort the data in a different way. We can partition the data and store the table information in different files. Rather than storing everything in system01.dbf, we can store February order data in february.dbf. When an update to a table is done the insert is done into the system01.dbf file or the january.dbf, february.dbf, or march.dbf file. When we transition into April an april.dbf file is created and the january.dbf data is moved into q1_2016.dbf file. The key advantage to this is when we perform a select statement and look for data in March and April, we only look in the march.dbf and april.dbf files. The rest of the data is not loaded because we know that the data is not in the other table extents. This reduces the amount of data that is loaded into memory and reduces the amount of disk operations that are performed for every select statement. If everything was stored in the system01.dbf file, we would need to load all two million orders just to find the one or two hundred that happened in April. We basically read then throw away 97% of the data read because it does not match our request. True, the index would help but this requires multiple writes to the disk when an insert happens. With partitioning enabled for the order_date column, all order entries are stored pre-sorted by date in the different table extents. Since 11g interval partitioning automatically creates new partition tables. As we cross from February to March, the march.dbf is created and all writes corresponding to march orders are written to the new partition.

There are a variety of partition types that you can use to divide data

  • Range partitioning - typically based on date columns, months, quarters, a range of numberic or character values. You can specify a value less than or value greater than when defining the partition. The value that you compare can be relative or specific to a current date or number.
  • List partitioning - this describes a discrete value and assigns them to their own tablespace. We might split our catalog into plumbing products, lawn and garden products, or appliances. This helps searches into our catalog for a specific item. Note when you do a search at HomeDepot or Amazon you have the option of search in department. They are using list partitions on the back end.
  • Hash partitioning - this is good if you don't have a specific range that will split the data. If you want to sort by address for example, it is difficult to list all of the addresses or sort them into a range. The hash partition allows you to split your data into 16 different partitions and the database will split the data with a best effort to spread all of the data between the number of partitions you define.
  • Composite partitioning - this is a combination of two of the types described above. Composite partitioning is accomplished with the subpartition command where we first sort by one method then sub-sort by another. We could use a list-list or a list-range. We can use two of any of the above to help manage a large tablespace into smaller chunks.
  • Reference partitioning - this allows you to partition data based on referential constraints. If, for example, you want to create a constraint in a table creation and sort on that constraint, you can do this with partition by reference. If we create a table and add a constraint that an order_id must be tied to a customer_id in our customers table we can partition by this constraint which effectively splits the orders table into orders by customer_id which is not defined in this table.
  • Virtual column-based partitioning - virtual column partitioning allows us to split a table based on part of a column value. If, for example, we index our parts in our catalog by sequence numbers with 1-3 representing plumbing and 4-6 representing appliances, we can partition based on the first number in our part_id and effectively split the catalog based on departments without having to define the department as a column in our table. We just need to make sure that all part numbers that are inserted into our catalog follow our numbering convention and not put a gas range into the catalog staring with a 2 as the part number.

If we change the customers table described and append a partition by range statement with the command

create table customers ( .... ) partition by range (state);
we divide the table into potentially fifty different tables. As a new customer is added, they are added to the new state table. Inserts happen quicker, selects happen quicker, backups happen quicker unless all of our customers are located in one state.

If we group our customers into regions and want to store data not in fifty states but in three regions we could do this wist a list range. Note that we can define the table name when we define the partition.

create table customers (....) partition by list (state)
(partition part1 values ('Texas', 'Louisiana', 'Oklahoma', 'Arkansas') tablespace tola_ts ,
 partition part2 values ('California', 'Oregon', 'Washington', 'Hawaii') tablespace pac_ts,
 partition category_other values (default));
In this example we create the tola_ts, pac_ts, and default tablespace. We split seven of the states into three buckets and store customers into the three areas. This make reporting simpler and optimizes for select statements looking for customers in or around Texas or along the Pacific Ocean. Note that we could also subpartition this data to separate the big cities from rural areas
create table customers (.....) partition by list (state)
(partition part1 values ('Texas") tablespace texas_ts
 subpartition big_cities 
  (partition texas_cities values('Houston', 'Dallas', 'San Antonio', 'Austin', 'Fort Worth', 'El Paso') tablespace big_texas_ts,
   partition category_other values(default) tablespace small_texas_ts),
 partition part2 values ('California', 'Oregon', 'Washington', 'Hawaii') tablespace pac_ts,
 partition category_other values (default));
);
This will create four tablespaces. One for Texas big cities, one for Texas small cities, one for Pacific rim states, and one for all other states.

Database 12c added a few new commands to help manage and maintain partitions. We can now alter partitions and add, truncate, drop, split, and merge. The add and merge are very valuable functions that allow us to update ranges. If, for example, we paid a consultant two years ago to define a partition by range and they went out four years with the following

create table sales (.....) partition by range (salesdate)
(
 partition part_2015 values less than (TO_DATE('01-Jan-2016', 'DD-MON-YYYY')),
 partition part_2016 values less than (TO_DATE('01-Jan-2017', 'DD-MON-YYYY'))
)
ENABLE ROW MOVEMENT;
But we want to start saving data by quarter rather than by year we could execute the following
alter table sales add
 partition p_q1_2016 values less than (TO_DATE('01-Apr-2016', 'DD-MON-YYYY')),
 partition p_q2_2016 values less than (TO_DATE('01-Jul-2016', 'DD-MON-YYYY'));
This would slide in quarterly reporting and allow us to handle a larger volume than was created before. If at the end of the year we want to aggregate everything back into a year rather than a quarter basis we can do this with a merge command
alter table sales
merge partitions p_q1_2016, p_q2_2016, p_q3_2016, p_q45_2016
into partition part_2016;

Fortunately, Enterprise Manager has a partition advisor that looks at the history of your select statements and suggests how you should divide your tables into partitions. It notices that you do a lot of select by state or select by zip code and recommends partitioning by list or by hash based on your usage patterns. This was a new feature added with Enterprise Manager 11 and has gotten more robust and reliable with 13c. We should see a significant speed up if we get the right combination of partitions and indexes and could potentially take a select statement from 45 seconds to sub seconds as shown in the Enterprise Manager screen shots below.

In summary, partitioning is very powerful. It helps you split up your larger tables so that they fit into the memory that you have allocated. The return on investment is difficult to do because the cost for partitioning vs the cost of memory and resulting speed up for queries is hard to measure. Enterprise Manager has tools to help you with this analysis but it is difficult to put into future dollars and what if analysis. It would be nice if you could say that splitting your table into partitions would reduce your buffer cache and allow you to shrink your SGA size by 25%. The tools are not quite there. They do tell you that you can reduce your select times by partitioning the data and predict relatively accurately how much faster a select statement will be with partitioning based on your current hardware configuration. All of these functions should work on Amazon RDS with the exception of manipulating a tablespace. This requires a different command syntax since manipulation of a tablespace requires system access. Typically the command would be alter database default tablespace users2 but with Amazon RDS you have to execute exec rdsadmin.rdsadmin_util.alter_default_tablespace('users2') instead. Given that this is not done very often, it is up to you to decide how and where you deploy your large table database.

preparing a desktop for PaaS

Fri, 2016-06-03 02:07
Before we can start looking at the different options of a database, we need to get a desktop ready to do database development. It sounds a little strange that we need to download software to get access to a cloud database. We could do everything from the command line but it is much simpler and easier if we can do this using desktop tools. The software that we are going to download and install are
  • To access cloud storage
    • Mozilla Firefox
    • RestClient extension for Firefox
    • Google Chrome
    • Postman extension for Chrome
    • CloudBerry for OpenStack
  • To access files in our instance
    • Putty
    • Filezilla
    • Cygwin
  • To access our database instance
    • SQL Developer
    • Microsoft Visual C++ libraries

The eventual target that we are looking to get to is

To do this we need to go to the Oracle Cloud Marketplace and look for the Windows 2012 Server instance. What we want to do is provision a Windows instance and use it as a remote desktop for connecting to the cloud. We could do this on our desktop but the benefit of using a Windows Server is that we can create more users and use this instance for a hands on workshop. We don't need to have anyone load any virtual machines, fight the differences between Mac and Windows, or wait for the binaries to download and install. We can do most of this on a virtual machine in the cloud and just add and delete users for workshops. To provision the Windows server, we go to the cloud marketplace, select Infrastructure, and Compute. We can the search for Windows and get a bootable image to use as our foundation.

Once we agree to the legal terms we can select an instance to provision this into. The way it works is that we copy a bootable image into a cloud instance. We can then create compute instances from this bootable image and customize it to our liking. Once we agree to the terms the marketplace connects to the Oracle Cloud and uses your authentication credentials to connect to the instance. From this is gets a list of instances associated with this account, checks to see if you have agreed to terms of marketplace use for this instance by setting your profile settings for the instance. Once the bootable image is ready, a splash screen is presented stating that you are ready to provision a compute instance.

The screen dumps you into a compute creation wizard that walks you through the compute provisioning. Rather than going through that interface we decided to start from scratch and log into the instance and provision a compute engine from scratch. We first select the boot image from our private images, select the shape to boot, define the instance name, configure ssh connectivity as well as set the Admininstrator password (not shown). Once we get the confirmation screen it takes a few minutes to create the boot disk then boot the compute instance on this newly formatted disk.

We can check the progress by looking at the storage and compute instance. When everything is done we should see a public ip address for our instance. If we don't see our instance it is either still building or we should see an error in the history. Unfortunately, the history is hidden and a failed provisioning is now shown unless you look for it by expanding the history.

Before we can connect to our instance with remote desktop, we need to define a security list to allow for rdp, associate this rule with our instance, and define the security rule for rdp and associate it with the security list and instance.

Once we have rdp enabled to our instance, we look up the public ip address and connect as the Administrator user with the password that we passed in with a json header at the bottom of the creation screen (not shown). When we log in we see the server tools splash screen pop up.

We want to create a secondary user, give this user admin rights as well as rights to remote desktop connect to the server. We might want to add more users not as admins but with remote desktop rights for hands on labs. We can add and delete users using this method and it refreshes the workshop for the next class.

At this point we can create a staging directory and install the software that we listed above. The only product that causes a problem with the install is the SQL Developer because it requires a Microsoft package that is not installed by default. We need to download the library and all of the packages that we downloaded are ready to install. I did not go through customization of the desktop or downloading the public and private keys used for the workshop. These are obvious steps using filezilla from a shared network storage on a server in the cloud. We downloaded Firefox and Chrome primarily because Internet Explorer does not support REST Api protocols and we will need a way to create and list storage containers. We could have skipped this installation and done everything through CloudBerry but we can do everything similarly on a Mac (no need for putty or cygwin). With Firefox you need to install the REST Client api extension and Chrome requires the Postman Extension.

In summary, we created a compute Windows 2012 Server instance in the Oracle Compute IaaS. We added a new user as a backup to our Administrator user. We enabled remote desktop and configured a Mac to connect to this service remotely. We then downloaded a set of binaries to our Windows desktop to allow us to manage and manipulate storage containers and database instances. We also downloaded some utilities to help us use command line tools to access our database and customize our instances. We technically could do all of this with a Windows desktop, Internet Explorer, and SQL Developer. We went to the extra steps so that we can do the same from a Mac or Windows desktop using the same tools.

database options

Thu, 2016-06-02 12:16
Before we dive into features and functions of database as a service, we need to look at the options that you have with the Oracle Database. We have discussed the differences between Standard Edition and Enterprise Edition but we really have not talked about the database options. When we select a database in the Oracle Cloud we are given the choice of Enterprise Edition, High Performance Edition, and Extreme Performance Edition. Today we are going to dive into the different Editions and talk about the options that you get with each option. It is important to note that all of the options are extra cost options that are licensed on a per processor or per user basis. If you go with Amazon RDS, EC2, or Azure Compute you need to purchase these options to match your processor deployment.

One of the standard slides that I use to explain the differences in the editions is shown below.

The options are cumulative when you look at them. The Enterprise Edition, for example, comes with Transparent Data Encryption (TDE). TDE is also included in the High Performance and Extreme Performance Editions. We are going to pull the pricing for all of these options from the Technology Price List. Below is a list of the options.

  • Enterprise Edition
    • Transparent Data Encryption
  • High Performance Edition
    • Diagnostics
    • Tuning
    • Partitioning
    • Advanced Compression
    • Advanced Security
    • Data Guard
    • Label Security
    • Multitenant
    • Audit Vault
    • Database Vault
    • Real Application Testing
    • OLAP
    • Spatial and Graphics
  • Extreme Performance Edition
    • Active Data Guard
    • In Memory
    • Real Application Clusters (RAC)
    • RAC One

Transparent Data Encryption

TDE is a subset of the Advanced Security option. TDE stops would-be attackers from bypassing the database and reading sensitive information from storage by enforcing data-at-rest encryption in the database layer. Data is stored in the table extents encrypted and read into the database encrypted. The Oracle Wallet is needed to read the data back and perform operations on the data. Advanced Security and Security Inside Out are blogs to dive deeper into TDE features, functions, and tutorials. There is also a Community Security Discussion Forum. The Advanced Security option is priced at $300 per named user or $15,000 per processor. If we assume a four year amortization the cost of this option is $587.50 per month per processor. The database license is $1,860 per month per processor. This says that a dual core system on Amazon EC2, RDS, or Azure Compute running the Oracle database will cost you the cost of the server plus $2,448 per month. If we go with a t2.large on Amazon EC2 (2 vCPUs and 8 GB of RAM) and 128 GB of disk our charge is $128 per month. If we bump this up to an r3.large (2 vCPU, 15 GB of RAM) the price goes up to $173 per month. The cost will be $2,620 per month which compares to Enterprise Edition at $3,000 per month per processor for PaaS/DBaaS. We could also run this in Oracle IaaS Compute at $150 per month (2 vCPUs, 30 GB of RAM) to compare apples to apples. It is strongly recommended that any data that you put in the cloud be encrypted. Security is good in the cloud but encryption of data in storage is much better. When you replicate data or backup data it is copied in the format that it is stored in. If your data is clear text, your backups could be clear text thus exposing you to potential loss of data. Encrypting the data at rest is storage is a baseline for running database in the cloud.

Diagnostics

Diagnostics is a subset of the Database Management Packs that allows you to look into the database and figure out things like lock contention, what is holding up a wait queue, and what resources are being consumed by processes inside the database. Historic views into the automated workload repository (AWR) reports are available with this option. You can get spot options but not historical views and comparative analytics on AWR information. Some of the tools are free like compression advisor and partitioning advisor while others are part of the diagnostics pack. Diagnostics are licensed at $150 per named user or $7,500 per processor. This correlates to $294 per processor per month. Unfortunately, you can't purchase Enterprise Edition DBaaS and add this but need to go with IaaS Compute and add this to the bring your own database license. The only way to get this feature is to go with the High Performance Edition. The binary that is installed on the cloud service specifically labels the database as Enterprise Edition, High Performance Edition, or Extreme Performance Edition. All of the features listed from here and below are prohibited from running on the Enterprise Edition when provisioned into the Oracle DBaaS. If you just want Diagnostics Pack on Enterprise Edition it does not make economic sense to purchase High Performance Edition at $4,000 per month per processor when you can do this on IaaS at $2,914 (the $2,620 from above plus $294).

Tuning

Tuning is also a subset of the Database Management Packs that allows you to look into sql queries, table layouts, and overall performance issues. Options like the SQL Tuning Advisor and Automatic SQL Tuning are part of this option. Tuning pack is $100 per named user or $5,000 per processor. This comes in at $196 per processor per month if purchased separately. A Tuning Whitepaper details some of the features and functions of the tuning pack if you want to learn more.

Partitioning

Partitioning is a way of improving performance of your database and backup by splitting how data is stored and read. Partitioning is powerful functionality that allows tables, indexes, and index-organized tables to be subdivided into smaller pieces, enabling these database objects to be managed and accessed at a finer level of granularity. Oracle provides a comprehensive range of partitioning schemes to address every business requirement. The key improvement is to reduce the amount of data that you are reading into memory on a query. For example, if you are looking for financial summary data for the last quarter, issuing a query into eight years of financial data should not need to read in 32 quarters of data but only data from the last quarter. If we partition the data on a monthly basis we only read in three partitions rather than all 32. Partitioning also allows us to compress older data to consume less storage while at rest. When we backup the database we don't need to copy the older partitions that don't change, only backup the partitions that have updated since our last backup. Partitioning is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. The three most purchased database options are diagnostics, tuning, and partitioning. The combined cost of these three options is $940 per processor per month. When we compare the $4,000 per processor per month of DBaaS to IaaS with these three options we are at parity.

Advanced Compression

Advanced Compression is a feature that allows you to compress data at rest (and in memory) so that it consumes less resources. Oracle Advanced Compression provides a comprehensive set of compression capabilities to help improve performance and reduce storage costs. It allows organizations to reduce their overall database storage footprint by enabling compression for all types of data: relational (table), unstructured (file), network, Data Guard Redo and backup data. Cost comparisons for this feature are directly comparable to storage costs. Advanced compression is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. Typical compression ratios are 3x to 10x compressions. This means that 1 TB of data will take up 600 GB or 100 GB at these compression ratios. Lower compression rates are recommended for data that lightly changes and high compression for data that will not change. The penalty for compression comes in when you update data that is compressed. The data must be uncompressed, the new data inserted, and recompressed.

Advanced Security

Advanced Security allows you to secure and encrypt data in the database. Advanced Security provides two important preventive controls to protect sensitive data at the source including transparent database encryption and on-the-fly redaction of display data. TDE stops would-be attackers from bypassing the database and reading sensitive information directly from storage by enforcing data-at-rest encryption in the database layer. Data Redaction complements TDE by reducing the risk of unauthorized data exposure in applications, redacting sensitive data before it leaves the database. Advanced Security is priced at $300 per named user and $15,000 per processor. The monthly cost will be $587.50 per month per processor for this option. Data redaction is typically required for replicating production data to development and test. If you have credit card, social security numbers, home addresses, or drivers license information in your database, redaction is important to have to remain Sarbanes Oxly and PCI compliant.

Data Guard

Data Guard is a key foundation piece of Maximum Availability Architecture and does not cost any additional money. You get data replication between two databases at no additional cost and data can be replicated as physical or logical replication between the database instances. This feature ensures high availability, data protection, and disaster recovery for enterprise data. Data Guard provides a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases to enable production Oracle databases to survive disasters and data corruptions. Data Guard maintains these standby databases as transactionally consistent copies of the production database. Then, if the production database becomes unavailable because of a planned or an unplanned outage, Data Guard can switch any standby database to the production role, minimizing the downtime associated with the outage. Data Guard can be used with traditional backup, restoration, and cluster techniques to provide a high level of data protection and data availability. It is important to note that Data Guard is not allowed in Amazon RDS and you must use EC2 or another cloud service to use this feature.

Label Security

Label Security has the ability to control access based on data classification and to enforce traditional multi-level security (MLS) policies for government and defense applications. Oracle Label Security benefits commercial organizations attempting to address numerous access control challenges including those associated with database and application consolidation, privacy laws and regulatory compliance requirements. When a user requests data, the database looks at the user credentials and roles that they have access to and filters the results that the user sees from a query. Label Security is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. Note that this is different than data redaction. With redaction, data is scrambled when data is copied. With Label Security, the data is not returned if the user does not have rights to read the data. An error is not returned from a query but a null value is returned if the user does not have rights to read a column. The biggest benefit to this option it does not require program changes to restrict access to data and present results to users. If, for example, we are going to show sales in a customer relationship program, we don't need to change the code based on the user being a sales rep or sales manager. The sales manager can see all of the sales rep information to track how their team is performing. Each sales rep can see their data but not the other sales rep data. It is important to note that Label Security is not allowed in Amazon RDS and you must use EC2 or another cloud service to use this feature.

Multitenant

Multitenant or Pluggable Database allows you to consolidate instances onto one server and reduce your overall management cost. The many pluggable databases in a single multitenant container database share its memory and background processes. This enables consolidation of many more pluggable databases compared to the old architecture, offering similar benefits to schema-based consolidation but with none of the major application changes required by that approach. Backups are done at the parent layer. Users are provisioned at the pluggable layer. Features of the instance (RAC, DataGuard, etc) are inherent to the parent and adopted by the pluggable container. To take a test system from single instance to data guard replicated only requires unplugging the database from the single instance system and plugging it into a data guard system. The same is true for RAC and all other features. Multitenant is licensed at $350 per user or $17,500 per processor. This come in at $685 per processor per month. It is important to note that this option is not available on Amazon RDS. This option is specifically disabled and not allowed. You must run this on EC2 to use this functionality or on another cloud platform.

Audit Vault

Audit Vault and Database Firewall monitors Oracle and non-Oracle database traffic to detect and block threats, as well as improves compliance reporting by consolidating audit data from databases, operating systems, directories, and other sources. Audit vault is licensed at $6,000 per processor and is not available on a per user basis. This comes in at $235 per processor per month. This option typically requires a separate server for security reasons where logs and logging information is copied to prevent data to be manipulated on a single system and the auditing system.

Database Vault

Database Vault reduces the risk of insider and outsider threats and addresses common compliance requirements by preventing privileged users (DBA) from accessing sensitive application data, preventing compromised privileged users accounts from being used to steal sensitive data or make unauthorized changes to databases and applications, providing strong controls inside the database over who can do what and controls over when and how applications, data and databases can be accessed, providing privilege analysis for all users and applications inside the database to help achieve least privilege model and make the databases and applications more secure. Database Vault is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. It is important to note that this option is not available on Amazon RDS. This option is specifically disabled and not allowed. You must run this on EC2 to use this functionality or on another cloud platform.

Real Application Testing

Real Application Testing helps you fully assess the effect of such system changes on real-world applications in test environments before deploying the change in production. Oracle Real Application Testing consists of two features, Database Replay and SQL Performance Analyzer. Together they enable enterprises to rapidly adopt new technologies that add value to the business while minimizing risk. Traces can be recorded for reads and writes and replayed on a test system. This makes the replay option perfect for development and testing instances. The product is licensed at $230 per named user or $11,500 per processor. This comes in at $450 per processor per month. It is important to note that not having the sys level access might or might not break this feature in Amazon RDS based on what you are trying to replay.

OLAP

Online Analytics Processing or OLAP is a multidimensional analytic engine embedded in Oracle Database 12c. Oracle OLAP cubes deliver sophisticated calculations using simple SQL queries - producing results with speed of thought response times. This outstanding query performance may be leveraged transparently when deploying OLAP cubes as materialized views – enhancing the performance of summary queries against detail relational tables. Because Oracle OLAP is embedded in Oracle Database 12c, it allows centralized management of data and business rules in a secure, scalable and enterprise-ready platform. OLAP is licensed at $460 per user or $23,000 per processor. This comes in at $901 per processor per month. This feature is good for BI Analytics packages and Data Warehouse systems.

Spatial and Graphics

Spatial and Graphics supports a full range of geospatial data and analytics for land management and GIS, mobile location services, sales territory management, transportation, LiDAR analysis and location-enabled Business Intelligence. The graph features include RDF graphs for applications ranging from semantic data integration to social network analysis to linked open data and network graphs used in transportation, utilities, energy and telcos and drive-time analysis for sales and marketing applications. This option is licensed at $350 per user or $17,500 per processor. This come in at $685 per processor per month. It is important to note that this option is not supported in Amazon RDS. You must select EC2 or another cloud service to get this option.

All of the above options are bundled into the High Performance Edition. If we add up all of the options we get a total of

  • Transparent Data Encryption - $587.50 per month
  • Diagnostics - $294 per month
  • Tuning - $196 per month
  • Partitioning - $450 per month
  • Advanced Compression - $450 per month
  • Advanced Security - $587.50 per month
  • Data Guard - bundled
  • Label Security - $450 per month
  • Multitenant - $685 per month
  • Audit Vault - $235 per month
  • Database Vault - $450 per month
  • Real Application Testing - $450 per month
  • OLAP - $901 per month
  • Spatial and Graphics - $685 per month
This roughly bubbles up to $5,833.50 per processor per month for the High Performance options. Oracle bundles all of this for an additional $1000 per processor per month. The Extreme Performance Edition options include Active Data Guard, In Memory, and RAC.

Active Data Guard

Active Data Guard has the same features and functions as Data Guard but allows the target database to be open for read/write and updates happen bidirectionally. Active Data Guard is licensed at $230 per user or $11,500 per processor. This come in at $450 per processor per month.

In Memory

In Memory optimizes both analytics and mixed workload OLTP, delivering outstanding performance for transactions while simultaneously supporting real-time analytics, business intelligence, and reports. Most DBAs optimize performance by creating indexes to find data quicker. This works if you know the questions ahead of time. If you don't know the question it is difficult to tune for everything. In Memory allows you to create a row based copy of the data as well as a column based copy of the data for quick column sorts and searches. In Memory is licensed at $460 per user or $23,000 per processor. This come in at $901 per month per processor. The key advantage of this option is that it prevents you from purchasing a second database to do analytics and reporting on the same box as your transactional system.

Real Application Clusters (RAC)

RAC is a cluster database with a shared cache architecture that overcomes the limitations of traditional shared-nothing and shared-disk approaches to provide highly scalable and available database solutions for all your business applications. Oracle RAC is a key component of Oracle's private cloud architecture. Oracle RAC support is included in the Oracle Database Standard Edition for higher levels of system uptime and a critical part of the MAA strategy. RAC is licensed at $460 per user or $23,000 per processor. This come in at $901 per month per processor. It is important to note that RAC is not supported in Amazon or Azure. The system requires shared storage between compute instances that neither platforms provide. The only option for this configuration is Oracle DBaaS/PaaS.

The options for Extreme performance come in at $2,252 per processor per month but Oracle only charges an extra $1000 with Extreme Edition.

In Summary, there are a ton of options for the database. You need to figure out what options you need and if you need more than a couple it is economically beneficial to go with High Performance. If you need RAC, Active Data Guard, or In Memory you must purchase the Extreme Performance Edition. It is also important to note that not all features are supported in Amazon RDS and you must either go with Oracle Database as a Service or build a system using IaaS. RAC is the only exception where it is only available with Oracle DBaaS. We will go into a couple of these features in upcoming days to look at the value, how to use, and what is required to make the functionality work with other cloud providers.

Oracle Database 12c SQL by Jason Price

Wed, 2016-06-01 02:07
Given that we have a database in Amazon RDS and Oracle PaaS we can go through some books from Oracle Press and see if anything breaks running through a book. Let's start with something simple, Oracle Database 12c SQL by Jason Price, published by Oracle Press. This is an introductory book that goes through the basic data types, sql commands, and an introduction of XML at the end of the book. The material should be relatively straightforward and not have any issues or problems executing the sample code. The sample code can be downloaded from Oracle Press Books by searching for the book title and downloading the Chapter 1 sample code. This will give us a way to load a table with data and execute code against the table. We will use SQL Developer to execute the code from d:\workshops\sql books\SQL and see what works and what does not work.

To get started, we need to use our Amazon RDS instance and SQL Developer that we installed yesterday. We connect with the user oracle to port 1521 after opening up the port to anyone. From this connection we can execute sql code in the main part of the SQL Developer window and load the sample code to execute. We can test the connection with the following command

select sysdate from dual;

we can follow along the book and create the user store, load the schema into the database, and look at the examples throughout the book.

Everything worked on Amazon RDS. We were able to create users, grant them audit functionality, execute XML code, and generally do everything listed in the book. The audit did not report back as expected but this could have been a user error. According to the Amazon RDS Documentation audting should work. We might not have had something set properly to report back the right information.

In summary, the Amazon RDS is a good platform to learn how to program 12c SQL and the various user level commands. If you go through a book like Oracle Database 12c SQL everything should work. This or the Oracle PaaS equivalent make an excellent sandbox that you can use for a day or two and turn off minimizing your cost of experimenting.

resizing Amazon RDS Oracle EE

Tue, 2016-05-31 09:38
Yesterday we looked at connecting to DBaaS with the Oracle platform as a service option. We wanted to extend the database table size because we expect to grow the tablespace beyond current storage capacity. We are going to do the same thing today for Amazon RDS running an Oracle Enterprise Edition 12c instance. To start our journey, we need to create a database instance in Amazon. This is done by going to the Amazon AWS Console and click on RDS to launch a database instance. We then click on the Launch DB Instance button and click on the Oracle tab. We select Enterprise Edition then the Dev/Test for our example. We accept the defaults, define our database with ORACLE_SID=pri, username of oracle, no multi-AZ replication, and select the processor and memory size.

While the database is creating we need to change the default network configuration. By default port 1521 is open but open to an ip address range. We need to open this up to everyone so that we can connect to the database from our desktop instance. We are using a Windows 2012 instance in the Oracle IaaS cloud so the default mapping back to the desktop we used to create the database does not work. Note that since we do not have permission to connect via ssh connecting with a tunnel is not an option. We must open up port 1521 to the world and can not use a tunnel to connect these two instances. The only security option that we have is ip white listing, vpn, opening up port 1521 to the world. This is done by going into the security groups definition on the detail page of the instance. We change the default inbound rule from an ip address to anywhere.

We could have alternatively defined a security group that links the ip address of our IaaS instance as well as our desktop prior to creation of this database to keep security a little tighter. Once the database is finished creating we can connect to it. We get the connection string (DNS address) and open up SQL Developer. We create a new database connection using the sid of pri, username of oracle, and port 1521. Once we connect we can define the DBA view to allow us to manage parts of the database since we do not have access using Enterprise Manager.

It looks like the table space will autoextend into the available space, all we should have to do is extend the /rdsdata partition. This is done by modifying the RDS instance from the console. We change the storage from the 20 that we created to 40, turn on advanced monitoring (not necessary for this exercise), and check the apply immediately button. This reconfigures the database and extends the storage. Note the resize command that happens for us. This is a sysdba level command that is executed on our behalf since we do not have sys rights outside the console.

We can look at the new instance and see that the size has grown and we have more space to expand into.

We can see the changes from the monitoring console

In summary, we are able to easily scale storage in an Amazon RDS instance even though we do not have sys access to the system. We do need to use the AWS Console to make this happen and can not do this through Enterprise Manager because we can't add the agent to the instance. It is important to note that some of the options that are available from the console and some are available with altered command line options that give you elevated admin privileges without giving you system access. Look at the new command structures and decide if forking your admin tools just to run on RDS is something worth doing or too much effort. These changes effectively lock you into running your Oracle database on Amazon RDS. For example to change the tablespace definition in Oracle you would typically type

alter database default tablespace users2;

but with Amazon RDS you would need to type

exec rdsadmin.rdsadmin_util.alter_default_tablespace('users2');

Is this a show stopper? It could be or it might be trivial. It could stop some pre-packaged applications from working in Amazon RDS and you are forced to go with EC2 and S3. The upgrade storage example that we just went through would be a manual process with the involvement of an operating system administrator thus incurring additional cost and time. Again, this blog is not about A is better than B. This blog is about showing things that are hidden and helping you decide which is better for you. Our recommendation is to play with Amazon RDS and Oracle PaaS and see which fits your needs best.

resizing database as a service with Oracle

Mon, 2016-05-30 02:07
Historically, what happens to a database months after deployment has always been an issue and problem. If we go out and purchase a computer and disk storage then deploy a database onto the server. If we oversize the hardware and storage, we wasted budget. If we undersize the hardware and storage we had to purchase a new computer or new storage and get an operating system expert to reconfigure everything on the new server and get a database administrator to reconfigure the database installation to run on the new server or new storage. For example, if we purchased a 1 TB disk drive and allocated it all to /u02 the database had a ton of space to grow into. We put the DATA area there and put the RECO area into /u03. Our database service suddenly grows wildly and we have a record number of transactions and increase the offerings in our product catalog and our tablespace suddenly grows to over 800 GB. Disk performance starts to suffer and we want to grow our 1 TB to 2 TB. To do this we have to shut down our database, shut down the operating system, attach the new disk, format and mount it as /u05, copy the data from /u02 to /u05, remount /u05 as /u02, and reboot the system. We could have backed up the database from /u02 and reformatted /u02 and /u05 as a logical volume to allow us to dynamically grow the disk and allow us to purchase a 1 TB for our /u05 disk rather than a 2 TB disk and reduce our cost. We successfully grew our tablespace by purchasing more hardware, involving an operating system admin, and our database administrator. We were only down for a day or half day while we copied all of our data and modified the disk layout.

Disk vendors attacked this problem early by offering network or fiber attached storage rather that direct attached storage. They allow you to add disks dynamically keeping you from having to go out and purchase new disks. You can attach your disk as a logical unit number and add spindles as desired. This now requires you to get a storage admin involved to update your storage layout and grow your logical unit space from 1 TB to 2 TB. You then need to get your operating system admin to grow the file system that is on your /u02 logical unit mount to allow your database admin to grow the tablespace beyond the 1 TB boundary. Yes, this solves the problem of having to bring down the server, touch the hardware, add new cables and spindles to the computer. It allows data centers to be remote and configurations to be done dynamically with remote management tools. It also addresses the issue of disk failures much easier and quicker by pushing the problem to the storage admin to monitor and fix single disk issues. It solves a problem but there are better ways today to address this issue.

With infrastructure as a service we hide these issues by treating storage in the cloud as dynamic storage. With Amazon we can provision our database in EC2 and storage in S3. If we need to grow our S3, we allocate more storage to our bucket and grow the file system in EC2. The database admin then needs to go in and grow the tablespace to fill the new storage area. We got rid of the need for a storage admin, reduced our storage cost, and eliminated a step in our process. We still need an operating system admin to grow the file system and a database admin to grow the tablespace. The same is true if we use Azure compute or Oracle IaaS.

Let's go through how to attach and grow storage to a generic compute instance. We have a CentOS image running in IaaS on the Oracle Cloud. We can see that the instance has 9 GB allocated to it as the root operating system. We would like to add a 20 GB disk then grow the disk to 40 GB as a second test. At first we notice that our instance is provisioned and we the 9 GB disk labeled CentOS7 allocated to our instance as /dev/xvdb. We then create a root partition /dev/xvdb1, provision an operating system onto it using the xfs file system, and mount it as the root filesystem.

To add a 20 GB disk, we go into the Compute management screen, and create a new storage volume. This is easy because we just create a new volume and allocate 20 GB to it.

Given that this disk is relatively small, we don't have to wait long and can then attach it to our CentOS7 instance by clicking on the hamburger menu to the right of our new 20 GB disk and attaching it to our CentOS7 instance.

It is important to note that we did not need to reboot the instance but suddenly the disk appears as /dev/xvdc. We can then partition the disk with fdisk, create a file system with mkfs, and mount the disk by creating a new /u02 mount point and mounting /dev/xvdc1 on /u02.

The real exercise here is to grow this 20 GB mounted disk to 40 GB. We can go into the Volume storage and Update the storage to a larger size. This is simple and does not require a reboot or much work. We go to the Storage console, Update the disk, grow it to 40 GB, and go back to the operating system and notice that our 20 GB disk is now 40 GB. We can create a new partition /dev/xvdc2 and allocate it to our storage.

Note that we selected poorly when we made our file system selection. We selected to lay out an ext3 file system onto our /dev/xvdc1 partition. We can't grow the ext3 filesystem. We should have selected ext4. We did this on purpose to prove a point. The file selection is critical and if you make the wrong choice there is no turning back. The only way to correct this is to get a backup of our /u02 mount and restore it onto the ext4 newly formatted partition. We also made a second wrong choice of laying the file system directly on the raw partition. We really should have created a logical partition from this one disk and put the file system on the logical partition. This would allow us to take our new /dev/xvdc2, create a new physical partition, add the physical partition to our logical partition, and grow the ext4 file system. Again, we did this on purpose to prove a point. You need to plan on expansion when you first lay out a system. To solve this problem we need to unmount the /u02 disk, delete the /dev/xvdc1 and /dev/xvdc2 partitions, create a physical partition with logical volume manager, create a logical partition, and lay an ext4 file system onto this new volume. We then restore our data from the backup and can simply grow the partition much easier in the future. We are not going to go through these steps because the exercise is to show you that it is much easier with platform as a service and not how to do it on infrastructure as a service.

If we look at a database as a service disk layout we notice that we have /dev/xvdc1 as /u01 which represents the ORACLE_HOME, /dev/mapper/dataVolGroup-lvol0 as /u02 which represents the tablespace area for the database, /dev/mapper/fraVolGroup-lvol0 which represents the fast recovery area (where RMAN dumps backups), and /dev/mapper/redoVolGroup-lvol0 which represents the redo log area (where DataGuard dumps the transactions logs). The file systems are logical volumes and created by default for us. The file systems are ext4 which can be seen by looking at the /etc/fstab file. If we need to grow the /u02 partition we can do this by using the scale up option for the database. We can add 20 GB and extend the data partition or the fra partition. We also have the option of attaching the storage as /u05 and manually growing partitions as desired. It is important to note that scaling up the database does require a reboot and restart of the database. When we try to scale up this database instance we get a warning that there is a Java service that depends upon the database and it must be stopped before we can add the storage desired.

In summary, we can use IaaS to host a database. It does get rid of the need for a storage administrator. It does not get rid of the need for an operating system administrator. We still have to know the file system and operating system commands. If we use PaaS to host a database, we can add storage as a database administrator and not need to mess with the logical volume or file system commands. We can grow the file system and add table extents quickly and easily. If we undersize our storage, correcting for this mistake is much easier than it was years ago. We don't need to overpurchase storage anymore because we can allocate it on demand and pay for the storage as we use it. We can easily remove one of the headaches that has been an issue for years and no longer need to triple our storage estimates and go with realistic estimates and control budget better and easier.

SQL Developer connection to DBaaS

Fri, 2016-05-27 02:07
Today we are going to connect to our database using SQL Developer. We could connect using sqlplus with a remote command but instead we are going to use a graphical tool to connect to our database in the cloud. It is important to note that this is the same tool that is used to connect to our on premise database. We can execute sql commands, look at the status of the database, clone pluggable databases from one service to another, and generally manipulate and manage the database with command line features of wizards.

SQL Developer is a free integrated development environment that simplifies the development and management of Oracle Database in both traditional and Cloud deployments. SQL Developer offers complete end-to-end development of your PL/SQL applications, a worksheet for running queries and scripts, a DBA console for managing the database, a reports interface, a complete data modeling solution, and a migration platform for moving your 3rd party databases to Oracle. There are a few books that have been written about this product.

as well as blogs I suggest looking at the following

We are not going to dive deep into SQL Developer but rather introduce a couple of concepts for monitoring our database in the cloud. We are running version 4.1.3 on a Windows desktop. We actually are cheating a little bit and running it on a Windows 2012 Server that is provisioned into IaaS in the Oracle Cloud. It makes a good scratch space for demos and development hands on labs. When we connect we can connect to the public ip address of our database on port 1521 or we can create an ssh tunnel and connect to localhost on port 1521. We will first connect via an ssh tunnel. To start, we need to log into our database service and figure out what the ip address is for the system we provisioned. For our system we notice that the ip address is 129.152.150.120.

We are going to first connect with ip tunneling through putty. We launch putty and enter the ip address, the ssh keys, and open up port 1521 as a tunnel. We open a connection and all connections to port 1521 on localhost will be connected to our cloud service at the ip address specified. Note that this solution works if we have one database that we are connecting to. If we have two database instances in the cloud we will need to map a different port number on localhost to port 1521 or open up the ports to the internet which we will talk about later. We need to keep this shell active and open but we can iconify the window.

In SQL Developer we can now create a new connection to our database. This is done by clicking on the green plus sign in the top right of the screen. This opens a dialog window to define the connection to the database. We will call this connection prs12cHP which is the name of our service in the cloud. We are going to connect as sys so we need to select the advanced connection to connect as sysdba. It is important to note that you can not do this with Amazon RDS if you provision an Oracle database in the Amazon PaaS. Amazon does not allow you to login as sys or system and does not give you sysdba privileges. If you want sysdba access you will need to deploy Oracle into Amazon EC2 to get access. Once we define our connection to localhost, port 1521, sys as sysdba, and an OID of ORCL we can test our interface and accept the connection once it is successful. Note that we can execute commands in the right window and look at things like what version of the database we are running. In this example we are running the High Performance Edition so we can use diag and tuning extensions from SQL Developer.

There is a new DBA feature in the latest release of SQL Developer. We can launch a navigation menu to add our cloud database by going to the View ... DBA option at the top of the screen. This give us another green plus sign so that we can add the database and expose typical management views adn functions. Two things that are of note here are a simple exposure to pluggable database as well as a clone option associated with this exposure.

We can do other things like look at backup jobs, look at table space allocation and location, look at users that are authorized and active. This is not a replacement for Enterprise Manager because it is looking at immediate and not historic data.

Now that we have connected through a tunnel, let's look at another option. We can open up port 1521 on the database service and connect straight to the ip address. This method is not recommended because it opens up your database to all ip addresses on the internet if you are using a demo or evaluation account. You can whitelist ip addresses, vpn, or subnet limit the systems that it answers. This is done through the compute service management interface under the networking tab. We need to enable the dblistener for our database service. Once we do this we can connect SQL Developer to the database using the ip address of the database service. We might need to do this if we are connecting to multiple cloud servers and don't want to create a tunnel for each of them.

In summary, we have connected to our database service using SQL Developer. This is the same tool that we use to connect to databases in our data center. We can connect the same way that we normally do via an ip address or tunnel to keep the server in the cloud a little more secure. We noted the differences between the Amazon RDS and Oracle DBaaS options and provided a workaround with EC2 or Azure Compute as an alternative. It is important to remember the differences between PaaS features and IaaS features when it comes time to calculating the cost of services. PaaS gives you expanded features like automated backup and size up/down which we will look at next week.

Using Enterprise Manager to manage cloud services

Thu, 2016-05-26 08:18
Yesterday we talked about the virtues of Enterprise Manager. To honest the type of monitoring tool is not important but the fact that you have one is. One of the virtues that VMWare touts of VSphere is that you can manage instances on your server as well as instances in VCloud. This is something worthy of playing with. The same tool for your on premise instances also managing your instances in the cloud has power. Unfortunately, VCloud allows you to allocate virtual machines and storage associated with it so you only have a IaaS option of compute only. You can't allocate just storage. You can't deploy a database server unless you have a database deployed that you want to clone. You need to start with an operating system and build from there. There are benefits of PaaS and SaaS that you will never see in the VCloud implementation.

Oracle Enterprise Manager provides the same universal management interface for on premise and in cloud services. Amazon falls short on this. First, they don't have on premise instances so the tools that they have don't monitor anything in your data center, only in their cloud. Microsoft has tools for monitoring services plugins for looking at Azure services. It is important to note that you need a gateway server in the Azure cloud to aggregate the data and ship the telemetry data back and report it in the monitoring tool. There is a good Blog detailing the cost if IaaS monitoring in Azure. The blog points out that the outbound data transfer for monitoring can cost up to $17/month/server so this is not something that comes for free.

Today we are going to look at using Enterprise Manager as a management tool for on premise systems, the Oracle Public Cloud, Amazon AWS, and Microsoft Azure. We are going to cheat a little and use a VirtualBox instance of Enterprise Manager 13c. We are not going to go through the installation process. The books and blogs that we referenced yesterday detail how to do this. Unfortunately, the VirtualBox instance is available from edelivery.oracle.com. We are not going to use this instance but are going to use an instance for demo purposes only available internal to Oracle. The key difference between the two systems is that the edelivery instance is 21 GB in size for download and expands to provide an OEM 13c instance for testing while the internal system (retriever.us.oracle.com) has a 12c and 11g database installed and is 39.5 GB (expanded to almost 90 GB when uncompressed). Given the size of the instance I really can't provide external access to this instance. You can recreate this by downloading the edelivery system, installing an 11g database instance, installing a 12c database instance, and configuring OEM to include data from those instances to replicate the screen shots that we are including.

If we look at the details on the virtual box instance we notice that we need at least 2 cores and 10 GB of memory to run this instance. The system is unusable at 8 GB of RAM. We really should bump this up to 12 GB of RAM but given that it is for demo purposes and for training it is ok if it runs a little slow. If we were running this in production it is recommended to grow this to 4 cores and 16 GB of memory and also recommended that you not use a downloaded VirtualBox instance for production but install from scratch.

The key things that we are going to do are walk through what it takes to add a monitoring agent onto the service that we are trying to monitor and manage. If we look at the architecture of Enterprise Manager we notice that there are three key components; the Oracle Management Repository (OMR), the Oracle Management Service (OMS), and the Oracle Management Agent (OMA). The OMR is basically a database that keeps a history of all telemetry actions as well as reports and analytics for the systems being monitored. The OMS is the heart of Enterprise Manager and runs on a WebLogic server. The code is written in Java and presents the primary user interface to the administrators as well as being the gateway between the OMR and the agents or OMAs. The agents are installed on the target systems and collect operating system data, database data, weblogic data, and all other log data to ship back to the OMR for analysis by the users.

It is important to note at this point that most PaaS and SaaS providers do not allow you to install an Enterprise Manager Agent or any other management agent on their instances. They want to manage the services for you and force you to use their tools to manage their instance. SalesForce, for example, only gives you access to your customer relationship data. You can export your contact lists to an csv file to backup your data but you can't correlate the contact list to the documents that you have shared with these users. Amazon RDS does not provide a file system access, system access to the database, or access to the operating system so that you can install the management agent. You must use their tools to monitor services provided on their sites. Unfortunately, this inhibits you from looking at important things like workload repository reports or sql tuning guides to see if something is running slow or waiting on a lock. Your only choice is to deploy the desired PaaS or SaaS as a manual or bundled install on IaaS forcing you to manually manage things like backups and patching on your own.

The first thing that we need to do in Enterprise Manager is to log in and click on the Setup button on the top right. We need to define named credentials since we are going to connect to the cloud service using public and private ssh keys. We need to follow the Security pull down to Named Credentials.

We click on the Create icon in the top left and add credentials with public and private keys. If we don't have an ssh key to access the service we can generate an ssh key using ssh-keygen which generates a public and private key and upload the key using the SSH Access pull down in the hamburger menu. Once we upload the ssh key we can use ssh -i keyname.ppk opc@ip_address for our database server. We will use this keyname.ppk to connect with Enterprise Manager and have all telemetry traffic transferred via the ssh protocol.

Once we have the credentials valid in the cloud account we can create the ssh access through Enterprise Manager. To do this we to to Setup at the top right, Security, Named Credentials. We then click on the Create button in the middle left to start entering data about the credentials. The name in the the screen shot below failed because it begins with a number so we switched it to ssh2017 since 2017ssh failed the naming convention. We are trying to use host access via ssh which is done with pull down menu definitions. The system defaults to a host access but we need to change from host to global which does not tie our credentials to one ip address. We upload our public and private key as well as associate this with the opc user since that user has sudo rights. We can verify the credentials by looking at the bottom of the list. This should allow us to access our cloud host via ssh and deploy an agent to our cloud target.

Note that we created two credentials because we had a step fail later. We created credentials for the opc user and for the oracle user. The opc credentials are called ssh2017 as shown in the screen shots. The oracle credentials are called oracle2017 and are not shown. The same steps are used just the username is changed as well as the name of the credentials.

If we want to install the management agent onto our instance we need to know the ip address of the service that we are going to monitor as well as an account that can sudo to root or run elevated admin services. We go to the Enterprise Manager splash screen, login, select the Setup button in the top right and drill down to Add Target and Add Target Manually. This takes us to the Add Target screen where we can Install Agent on Host. To get rid of the warnings, we added our cloud target ip address to the /etc/hosts file and used a fully qualified and short name associated with the ip address. We probably did not add the right external dns name but it works with Enterprise Manager. When we add the host we use the fully qualified host name. We can find this by logging into the cloud target and looking at the /etc/hosts file on that server. This gives us the local ip address and a fully qualified host name. Once we have this we can enter a directory to upload the agent software to. We had to create an agent directory under the /u01/app/oracle directory. We select the oracle2017 credentials (the screen shots use ssh2017 but this generates an error later) we defined in the previous step and start uploading the agent software and configuring the host as a target.

Note that we could have entered the ip address rather than going through adding the ip address to /etc/hosts. We would have received a warning with the ip address.

When we first tried this we got an error during the initialization phase that opc did not own the /u01/app/oracle directory and had to create an agent directory and change ownership. Fortunately, we could easily resubmit and enter a new directory without having to reenter all of the other information. The deployment takes a while because Enterprise Manager needs to upload the agent binaries, extract, and install them. The process is updated with status so that you can see the progress and restart when errors happen. When we changed the ownership, the installation failed at a later step stating the opc did not have permission to add the agent to the inventory. We corrected this by installing as oracle and setting the /u01/app/oracle/agent directory to be owned by oracle.

When we commit the ip address or host name as well as the ssh credentials, we can track progress as the management server deploys the agent. We get to a point where we note that the oracle user does not have ssh capabilities and we will need to run some stuff manually from the opc account.

At this point we should have an enterprise manager connection to a cloud host. To get this working from my VirtualBox behind my AT&T Uverse wireless router I first had to configure a route on my broadband connection and set the ip address of the Enterprise Manager VirtualBox image to a static ip address. This allows the cloud instance to talk back to the OMS and store data in the OMR.

The next step is to discover the database instances. This is done by going through a guided discovery on the host that we just provisioned. It took a few minutes to sync up with the OMS but we could verify this with the emctl status agent command on the target host. We add the target manually using the guided discovery and select database services to look for on the target.

At this point we should have a database, listener, and host connected to our single pane of management glass. We should see a local database (em12c) and a cloud based database (prs12cHP). We can look at the host characteristics as well as dive into sql monitoring, database performance, and database management like backup and restore options or adding users to the repository. We could add a Java Cloud Service as well as link these two systems together and trace a web page request down to a sql read and look at what the longest latency component is. We can figure out if the network, java memory allocation, or databse disk is causing the slowest response. We can also look at sql tuning recommendations to get suggestions on changing our sql code or execution plans using the arw report and sql tuning utilities in Enterprise Manager.

In summary, we can connect to an on premise server as well as a cloud server. We can't connect to an Amazon RDS instance because we don't get file system level access to push a client to or a root user to change the agent permissions. We do get this with IaaS on Oracle, Compute servers on Azure, and EC2 on Amazon. We also get this with PaaS on Oracle and potentially event Force.com from SalesForce. No one give you this ability with SaaS. It is assumed that you will take the SaaS solution as is and not need to look under the covers. Having a single pane of glass for monitoring and provisioning services is important. The tool should do more than tell you how full a disk is or how much of a cpu is loaded or available. It should dive into the application and let you look at where bottlenecks are and help troubleshoot issues. We could spend weeks diving into Enterprise Manager and the different management packs but we are on a journey to look at PaaS options from Amazon, Microsoft, and Oracle.

Managing servers and instances in the cloud

Wed, 2016-05-25 07:59
Managing servers and instances has been an ongoing issue since the introduction of the first computer. Recently with the advent of virtualization the idea of a management console to control what processors are running what services and what storage is allocated to what operating system has gained popularity. Many people are familiar with VMWare VSphere where you get a view of processors. We get a view of a server and can see virtual images deployed on this server. We can see how well the resources (memory, cpu, and disk) are being utilized. We can allocate more or less resources since this is a dynamic allocation and make sure that we are not over allocating resources and wasting them or under allocating them and causing applications to run slower.

In this example we can see that we have two processors, 2 GB of memory, and just under 300 GB of disk on this computer. We have five virtual machines running on this computer and can dive into each operating system and look at what operating system is installed and how the limited resources are allocated and utilized. What we can't see is what applications are installed and how the applications are running. For example, is the Windows Home Server 2011 running an Apache Web Server and how many hits did the web server get in the past four days? Monitoring tools beg the question of what are you monitoring. If you are managing limited resources and making sure that you have not over or under allocated services, tools like VSphere are excellent tools. Unfortunately, you will need other tools to dive into another tool. EMC, for example, has a storage manager that lets you look not only at a logical unit level but a controller and disk level. It understands VMWare and lets you look at how disks are related to virtualization engines and how they are consuming resources.

Again, this is a very good tool to look at how well a disk is performing, how well data is laid out across spindles, and how well your data network is being transmitted between disk and server. We can see hot spots. We can see disks that are over and under utilized. We can manage a scarce resource and make sure that it is properly utilized.

When we talk about monitoring we need to shift our thought process. Yes, it is important to manage compute, memory, and storage resources but it is also important to realize that these resources are commodities. If we run low, we get more. If we use too much we are wasting resources. We should be able to automate allocation of resources and size up or size down resources without manual monitoring. What we are really interested in is how well is our company running. If we are a university we might be interested in the latency of delivering online video classes. We might be interested in how many classes are being added to a student schedule during registration. If we are a ticket retailer we might be interested in how many tickets were requested and paid for on a minute by minute basis. Note that we are not talking about how well a disk drive is allocated or if we have enough processors allocated to a virtual machine, we are talking in term of business terms. We are looking at tying revenue generating services back to computer resources and trying to figure out what is causing a problem. In the online video classroom example, we might have our processors allocated properly, storage tuned to the last IOP, and memory allocated to buffer data and reduce disk reads. If we are on the same network as the athletic department and our basketball team made it to the elite eight during March madness and the athletic department live streams the game on the same network as our classroom servers our classes will be offline due to demand to watch the basketball game. Tools from EMC and VMware will show that everything is working fine and life is good. Meanwhile the help desk is getting calls from students off campus that can't access their assignments during midterms and their Thursday class is not available. What we need is a monitoring system that can look at systems and incorporate more than just processor and disk. What we need is a tool that can look at systems and services and not just resources. We would like to look at the video distribution system and be able to dive into the disk, network, or processor and see what the bottleneck is and fix it quickly.

Oracle released a tool years ago called Enterprise Manager. The tool started out as a database monitoring tool that allowed you to dive into sql calls and figure out why it was taking longer than necessary. With acquisitions of companies like BEA and Sun Microsystems the tool expanded to look at how Java was performing inside a WebLogic server and how disk drives were performing that were serving up requests for the database and WebLogic server. Acquisitions of companies like JD Edwards and PeopleSoft drove the monitoring tools in the opposite direction and screens showing how many purchase orders were being processed on an hourly basis were suddenly available. You could look at what was the bottleneck in closing your books for the end of month reconciliation. Was it a manual process waiting on a report to drop into a directory or was it a sql statement that was taking minutes rather than seconds to complete? You could start looking at a process like purchase orders and dive into a database to see if a table was reaching storage limits as well as figure out that someone recently patched the database which caused an index to not look at a new column that was created and searches are now going against this column so select statements are doing a full table scan rather than using an index to report answers quicker. Adding more storage in this case will be a waste of time. Yes, we are running out of storage on a table but the real issue is we need to re-index the database or execute a new sql execution plan. Below is a screen shot of how well a database is performing with links to look at all the sub-components of the database.

Books have been written on Enterprise Manager. We are not going to cover everything in this blog to make you an expert on the subject.

There are also a number of blogs related to Enterprise Manager

This is a partial list of blogs returned by a Google search. I am sure I missed a few. Note that the list of books and blogs is not a short list. There are classes offered by Oracle University that you can take virtually or in a classroom (both cost money).

The way that Oracle Enterprise Manager is paid for is simple. The base system is free and you pay for the options that you want to use. Unfortunately, the Technology Price Guide is not very clear as to what is and is not Enterprise Manager and what is an option on the database. For example, on page 7, most of the management packs are listed. If you want diagnostics for the database you will need to license your database at $7,500 per processor and not Enterprise Manager. You can license at $150 per named user but the licensing metrics for your database need to match the licensing for you management pack. You could have a two processor license for production and a 25 named user license for development and testing so you will need to blend these licenses into Enterprise Manager with the management packs. Diagnostics is specifically confusing because you enable or disable this feature in Enterprise Manager and not in the database. The telemetry data is being collected for the database but the reporting on the results of the analysis is not being done in the database. You could turn on the reporting in Enterprise Manager without involving the DBA thus incurring an additional license fee that you had not paid for. There is no license key or email that is sent to Oracle saying that you enabled the license it is a simple checkbox in Enterprise Manager that says turn on diagnostic reporting. In recent versions a warning screen pops up telling you that this is not a free feature. In OEM 10g the feature was turned on by default and you had to turn it off. This has changed in recent releases. If you try to turn this feature on when connecting to an Enterprise Edition in the Oracle Public Cloud you will get a feature not available message. You need to go with High Performance or Extreme Performance edition of the database to get the diagnostics enabled.

There are also management packs for Oracle Applications and the pricing for these products can be found in the Oracle Applications Price List. You need to search for the word "packs" to find the price of the management packs in this list. You can get a list of all the management packs from the Oracle Tech Network page for Enterprise Manager

It is important to note that the Enterprise Manager that runs in your data center monitoring your servers and Oracle hardware and software products is the same tool that you can use to monitor and manage PaaS and IaaS resources in the Oracle Public Cloud. You can connect to the instance in the cloud using ssh and read the telemetry from the cloud instance as if it were installed on one of your servers. You can use extensions to the latest version of Enterprise Manager, 13c, to clone a pluggable database instance from your on site installation to a cloud instance.

You can also setup reporting and self service requests to have end users ask for a new service to be provisioned either on site or in the cloud. Below is a screen shot of how to do this for a database. We could do something similar for a WebLogic server, an Apache Web server, a PeopleSoft instance for dev/test, or any layer of the Oracle stack.

In summary, selection of a management tool is important. Tools are good to understand and properly use. At some point you need to step back and ask what is the questions that I need answers to. Am I diving too deep on trying to optimize something that is not that worth deep analysis? Could I automate this and not have to monitor it at all? If I run out of processing power does it make sense to automatically scale up the number of processors? Should I scale out by spinning up more web servers? Do I need to re-architect my network topology to isolate disk traffic from client traffic? If I generate a report who will consume the results? Is the report for someone in IT? Purchasing? The process owner? Is it a technology or financial report? Products like Enterprise Manager allow you to generate all of these reports using different management extensions. My suggestion is to look at some of the introductory videos on the Oracle Tech Network to get an introduction to the problem that you are trying to solve then figure out how much it will cost to measure what is important to you.

database alternatives

Tue, 2016-05-24 10:05
One of the key questions that I get asked on a regular basis is to justify the cost of some product. Why not use freeware? Why not put things together and use free stuff? When I worked at Texas A&M and Rice University we first looked at public domain software. We heavily used the Apache web server, Tomcat, MySQL, Postgress, Linux, and BSD. These applications worked up to a point. Yes, you can spin up one Apache web server on one server. Yes, you can have one Apache web server listen on multiple IP addresses and host multiple web servers. The issue typically is not how many web servers can you handle but how many clients can you answer. Easily 90% of the web servers could handle the load that it saw on a regular basis. We spent 80% of our time on the 10% that could not handle the load. Not all of the web servers could handle the functionality. For example, a student registration system needs to keep a shopping cart of classes selected and you need to level up to an Apache Tomcat server to persistently keep this data and database connections live. If you use a web server you need to store all transactions in the database, all of the classes selected, and all of the fees associated with the class. Every interaction with the web server causes multiple connections with the database server. Doing this drives the number of processors needed by the database thus driving up the cost of the hardware and software license.

If we use an application server that can handle caching of data, we can keep a list of available classes on the application server and not only have to go back to the database server for transactions. When a student selects a class, it takes it out of inventory and puts it in their class schedule for the next year. The same is true for on-line shopping, purchasing tickets to a play or airline, drafting for a fantasy football team. Years ago ESPN ran a March Madness contest on-line. They presented your selections with an Apache web server and every team selection required an interaction with their database on the back end. The system operated miserably and it took hours to select all rounds to fill out your bracket. They updated the server with Javascript and a Tomcat server and allowed you to fill out all of round one in your browser. Once you finished the first round you submitted your selections and were presented with a round two based on your first round selections. They later put this on WebLogic and put all of the round selections in Java code on the WebLogic server. The single interaction with the database became submission of your complete bracket. They went from thousands of interactions with a database to a single interaction per submission.

We can have similar architecture discussions at the database layer as well. If I am looking at a simple table lookup, why pay for a robust database like Oracle 12c? Why not use something like Azure Table Storage Services and do a simple select statement from a file store. Why not put this in a free version of Oracle in Apex on the web and define a REST api to pull the data based on a simple or potentially more complex select statement. Again, 90% of the problems can be solved with simple solutions. Simple table lookups like translating a simple part name to a price can be done with Excel, MySQL, APEX, JSON processing, or REST apis. The difficulty comes up with the remaining 10%. How do I correlate multiple tables together to figure out the price of an item based on cost of inventory, cost of shipping, electrical costs, compensation costs for contractors and sales people, and other factors that determine profitability and pricing. How do I do a shortest routing algorithm for a trucking system based on traffic, customer orders, inventory in a warehouse, the size of a truck, and the salary of the driver and loading dock personnel. For things like this you need a more complex database that can handle multiple table joins, spatial data, and pulling in road conditions and traffic patterns from external sources. Products like IBM DB2, Oracle Database, and Microsoft SQL Server can address some of these issues.

We also need to look at recovery and restoration time. When a Postgress server crashes, how long does it take to recover the database and get it back online? Can I fail over to a secondary parallel server because downtime is lost revenue or lost sales. If you go to HomeDepot to order plumbing parts and their site goes down, how long does it take to go to the Ace or Lowes web site and order the same part and have it delivered by the same delivery truck to your home or office? Keeping inventory, order entry, and web services up becomes more than just answering a query. It becomes a mission critical service that can not go down for more than a few seconds. Services like Data Guard, Golden Gate, and Real Application Clustering are required to keep services up and active. MySQL, MongoDB, Amazon Aurora, and other new entry level database technologies can handle simple requests but take minutes/hours to recover information for a database. Failing over through storage to another site is typically not an answer in this case. It takes minutes/hours to recover and restart a moderate database of 20 TB or larger. First the data replication needs to finish then the database needs to be booted at a secondary site and it needs to maintain consistency in the data as it comes back up. The application server then needs to connect to the new service and recommit requests that came in during and since the system failure. As this is happening, customers are opening a new browser tab and going to your competition to find the same part on another site.

In summary, it takes more than just getting a bigger and faster application server or database. Moving the services to the cloud isn't necessarily the answer. You need to make sure that you move the two components together the majority of the time. Look at your application and ask where do you spend more of your time? It is tuning sql statements? Is it writing new queries to answer business questions? Is it optimizing your disk layout to get tables to the database faster? Take a step back and ask why is the database pounding the disk so hard. Can I cache this data in the database by adding a little more memory to the disk controller or database server? Can I cache the data at the application server by adding more memory there and keep from asking the database for the same information over and over again? In the next few days we are going to look at database options and database monitoring. We are going to look at some of these tools and refer back to the bigger picture. Yes, we can tune the storage to deliver all of the bits at the highest rate possible. Our question will not be how to do this but should we be doing this. Would something like an Exadata or an in-memory option allow us to transfer less data across the storage network and get us answers faster? Would adding memory somewhere allow us to buffer more data and reduce the database requests which reduces the amount of data needed from the disk.

database management

Mon, 2016-05-23 14:52
Today we are going to look at managing an Oracle database. We are going to start with a 12c database that we created in the Oracle Public Cloud. We selected database as a service (as opposed to virtual image), monthly billing, 12c, and enterprise edition high performance edition. We accepted the defaults for the table size so that we can figure out how to extend the table size and selected no backups rather than starting RMAN for daily incrementals or cloud object storage for weekly full backups.

We basically have four options for managing a database. If we have a small number of databases we might look at using the sqlplus sysdba command line access and grind through administration. We also have a database monitor that is installed by default with the database cloud service. We can dive into this database through the monitor and look at log running queries, tablespace sizes, and generic utilization. We can also connect with sql developer and look at the new DBA interfaces that were added in the latest release in early 2016. The fourth and final way of administering is to look at commercial management tools like Oracle Enterprise Manager (OEM) or other tools that aggregate multiple systems and servers and give you exposure beyond just the database. These commercial tools allow you to look at they layer that you are most interested in. You can get a PeopleSoft Management Pack for OEM that allows you to look at purchase order flow, or payroll requests. You can get diagnostics and tuning packs for the application server and database that allows you to look at what part of the PeopleSoft implementation is taking the longest. Is it the network connection? It is a poorly tuned Java Virtual Machine that is memory thrashing? It is a sql statement that is waiting on a lock? Is it a storage spindle that is getting hammered from another application? Is it a run away process on your database server that is consuming all of the resources? All of these questions can be answered with a monitoring tool if you not only know how to use it but what is available for free and what you need to purchase to get the richer and more valuable information.

To get to the database monitor we go to the cloud services console (which changed over the weekend so it looks a little different), click on database, click on Service Console, and click on the database name.

If we click on the dbaas_monitor menu item in the hamburger menu system to the right of the service name it might fail to connect the first time. It will take the ip address of the database and try to open https://ip address/dbaas_monitor. We first need to open up port 443 to be able to communicate to this service.

To get to the network connection we need to go to the Compute Service Monitor, click on the Network tab, and change the proper port number for our server prs12cHP. If we hover over the labels on the left we see what ports we are looking for. We are specifically interested in the https protocol. If we click on the hamburger menu next to this line item we can Update the security list which pops up a new window.

To enable this protocol we enable the service and click the Update button. Once we do this we can retry the dbaas_monitor web page. We should expect a security exception the first time and need to add an exception. We login as dbaas_monitor and the password that we entered in the bottom left of the screen for the system passwords when we created the database.

At this point we can look at cpu utilization, table space usage, if the database is running, and all other monitoring capabilities. Below are the screen shots for the listener and the table sizes and storage by pluggable database.

We can look a little deeper at things like alerts, wait times, and real time sql monitoring. These are all available through command line but providing a service like this allows junior database administrators to look at stuff quickly and easily.

The biggest drawback to this system is that you get a short snapshot and not a long term historic archive of this data. If we use Enterprise Manager, which we will look at in a later blog, from a central site we collect the data in a local repository we can look back at months old data rather than live or data from the past few hours.

In summary, if we use platform as a service, we get tooling and reporting tools integrated into services rather than having to spin these up or look at everything from the command line as is done with infrastructure as a service. We get other features but we are diving into database monitoring this week. We briefly touched on database monitoring through what was historically called dbmonitor and is moving towards dbaas_monitor or a central enterprise manager pane of glass for database services in our data center and in the cloud. One of the key differentials from Oracle Database as a Service and Amazon RDS is database monitoring. We will look at database monitoring for Amazon RDS later this week and notice there are significant differences.

Database in Microsoft Azure

Fri, 2016-05-20 02:07
Today we are going to look at what it takes to install Oracle Database Enterprise Edition 12c in Microsoft Azure. We had previously looked at deploying Application Express in Azure. The steps to deploy Enterprise Edition are almost the same. We start with the same process by logging into the portal, click on New, search for Oracle and look for the enterprise edition of the database.

In this example we are going to select Enterprise Edition 12c.

The two links at the bottom link you to the licensing and privacy statements from the Oracle website. Note that the license is not included for this edition of the database and you need to adhere to the licensing restrictions of a perpetual license for a cloud deployment. If we refer back to our calculations for perpetual license in AWS we amortize the database license over four years brings this cost to $3,720/month for a four core server as recommended by Microsoft. Note that we can go with a smaller core count and smaller memory count unlike with Amazon. AWS restricts us to a minimum core count for the Oracle database but Azure allows you to go below the suggested minimums to a system that is unusable. It is impossible to run the database on a single core 1 GB of RAM but the option is presented to you. From the previous screen, we click Create to start the deployment. We can only deploy into a Classic Virtual Machine instance.

The first things that we need to define are the server name, username to log in as, and password or ssh keys for the username. We can also define a new storage group or pull from an existing storage group. For our test either works.

When we look at the shapes suggested by Microsoft, a D12 Standard shape (4 cores and 28 GB) is the smallest configuration. This comes in at $290/month or roughly $10/day. This is a little more than we want to pay for a simple test system. We can get by with 2 cores and 3.75 GB for a simple experiment. We can do this at $89/month or roughly $3/day with an A2 Standard shape. We select the shape and click Select.

On the next screen we select the storage profile. The first option is Standard or Premium disk. If we select Premium SSD our shape gets resized to D2 Standard at a much higher per month charge. This gives us a higher IOP to storage which might or might not be required for our deployment. If we default back to Standard to get the lower shape cost, we have the option or locally replicated data, replication between data centers, and read access in a second geo the price goes from $2.40/100 GB/month to $4.80 to $6.10. We will go for the locally replicated data to minimize cost. We can define a new domain name for this account or accept the default. We can also define a virtual network for this instance as well. We can select the subnet routine as well as dynamic or static ip address assignment. We are going to accept the defaults for the network.

We do need to open port 1521 by adding an endpoint to this instance. If we scroll down on the network screen we can add a port by adding an endpoint. We might or might not want to open up this port. When we do this it opens up the port to the world. We can tunnel through ssh to access port 1521 but for demonstration purposes we are going to open up this port to the world and potentially look at white listing or ip address restricting access to this instance. We might also want to open port 1158 to see the enterprise manager console, port 80 for application express which is also available in enterprise edition of the database.

We do have the option of monitoring extensions to look at how things are performing. We are going to skip this option for our experiment but it is interesting to note that you do have additional options for monitoring.

We are not going to explore the diagnostics storage or availability sets because they really don't apply to the database. They are more concerned with operating system and do not extend into the database. At this point we are ready to launch the instance so we click Ok. We do get one final review before we provision the instance with the database installed.

When we click Ok we get a message that the instance is deploying. We can look at more detail by clicking on the bell icon at the top and drilling down into the deployment detail.

It is important to note that the database binaries are installed by the database is not configured. There is no listener running. The ORACLE_SID has not been set. We need to run the odbca to create a database instance.

Other tutorials on installing an Oracle Database on Azure can be found at

To create a database at this point we need to run the dbca command. When I first tried to execute this command I got a strange error in that the system asked for a password then cleared the screen. This is a known issue relating to line wrap and XTERM configurations. It can be fixed by going into the putty settings and turning off line wrap.

If we look at the command line needed to create a database with dbca we notice that we first need -silent to disable the system from using a default X-Window screen to walk you through the installation. We do not have the X-Window system enabled or the ports configured so we need to install the database from the command line. This is done with the -silent option. The second option is -createDatabase. This tells dbca to create a new database. We also need to define a template to use as the foundation. Fortunately we have pre-defined templates in the /u01/app/oracle/product/12.1.0/dbhome_1/assistants/dbca/templates directory. We will be usign the General_Purpose.dbc template. We could use the Data_Warehouse.dbc or create a new one with the New_Database.dbt template. We also need to define the ORACLE_SID and characterset with the -gdbName, -sid, and -characterSet parameters. We finally wrap up the command options with -responseFile set to NO_VALUE. The entire command looks like

dbca -silent -createDatabase -templateName General_Purpose.dbc -gdbname orcl -sid orcl -responseFile NO_VALUE -characterSet AL32UTF8 -memoryPercentage 30 -emConfiguration LOCAL
This will create a database with ORACLE_SID set to orcl. We add a couple of other paramters to configure enterprise manager to be local rather than a central enterprise manager agent and limit the memory that we will use to 30% of the memory on the system.

The database creation agent will configure the database. This step will take 10-15 minutes to get to 100%. Some tutorials on how to use dbca in the silent mode can be found at

There are really no videos on youtube showing an install. In our example we should have include the -pdbName option to create an initial pluggable database as part of our database installation. Once we see the 100%, the database is complete. We then need to set our ORACLE_SID, ORACLE_HOME, PATH, and start the listener so that we can connect to the database. This is done with the commands


oraenv
export ORACLE_HOME=/u01/app/oracle/product/12.0.1/db_home
export PATH=$PATH:$ORACLE_HOME/bin
lsnrctl start
https://blogs.oracle.com/pshuff/resource/azure_db_12c_dbca_lsnrctl.png" width="90%">

From here we can look at the header information to verify that we installed a 12c Enterprise Edition and look at the location of the data files with the following commands


select * from v$version;
select con_id, name from v$datafile order by 1;

We can connect with SQL Developer because we opened up port 1521.

In summary, we can deploy Oracle Database 12c into the Microsoft Azure cloud. We get a partial install when we provision the database from the Marketplace. We still need to go through the dbca configuration as well as spinning up the listener and opening up the right ports for the database. The solution is not PaaS but database on IaaS. We can not size up the database with a single command. We do not get patching or automated backup, in fact we have not event setup backup at this point. This is similar to the Amazon AWS installation in EC2 but falls short of the database as a service delivered as PaaS in the Oracle Public Cloud. Pricing has the same considerations as the Database on AWS EC2 discussion we had yesterday with the only difference being the price for the compute and storage instance. We did not need to look at the online calculator because Microsoft does a very good job of presenting pricing options when you are configuring the instance. Again, we are not trying to say that once implementation is better or worse than the other but provide information so that you can decide your tradeoffs when selecting one cloud vendor over another.

Database in Amazon EC2

Thu, 2016-05-19 02:07
Today we are going to look at what it takes to get a 12c database instance up and running in Amazon EC2. Note that this is different than our previous posts on getting Standard Edition running on Amazon and running Enterprise Edition running on Amazon RDS. We are going to take the traditional approach as if we were installing the database on a virtual image like VMWare, HyperV, or OracleVM. The approach is to take IaaS and layer the database upon it.

There are a few options on how to create the database instance. We can load everything from scratch, we can load a pre-defined AMI, we can create a golden image and clone it, we can do a physical to virtual then import the instance into the cloud, or we can create a Chef recipe and automate everything. In this blog we are going to skip the load everything because it is very cumbersome and time consuming. You basically would have to load the operating system, patch the operating system, create users and groups, download the binaries, unpack the binaries, manage the firewall, and manage the cloud port access rights. Each of these steps takes 5-30 minutes so the total time to get the install done would be 2-3 hours. Note that this is much better than purchasing hardware, putting it in a data center, loading the operating system and following all the same steps. We are also going to skip the golden image and cloning option since this is basically loading everything from scratch then cloning an instance. We will look at cloning a physical and importing into the cloud in a later blog. In this blog we are going to look at selecting a pre-defined AMI and loading it.

One of the benefits of the Marketplace model is that you get a pre-defined and pre-configured installation of a software package. Oracle provides the bundle for Amazon in the form of an AMI. For these instances you need to own your own perpetual license. It is important to understand the licensing implications and how Oracle defines licensing for AWS. Authorized Cloud Environment instances with 4 or fewer virtual cores are counted as 1 socket, which is considered equivalent to a processor license. For Authorized Cloud Environment instances with more than 4 virtual cores, every 4 virtual cores used (rounded up to the closest multiple of 4) equate to a licensing requirement of 1 socket. This is true for the Standard Edition license. For the Enterprise Edition license the assumption is that the cloud processor is an x86 chip set to a processor license is required for every 2 virtual cores. All of the other software like partitioning, diagnostics, tuning, compression, advanced security, etc also need to be licensed with the same metric.

If we look at the options for AMIs available we go to the console, click on EC2, and click on Launch Instance.

When we search for Oracle we get a wide variety of products like Linux, SOA, and database. If we search for Oracle database we refine the search a little more but get other supplementary products that are not the database but products that relate to the database. If we search for Oracle database 12c we get six return values.

We find two AMIs that look the same but the key difference is that one limits you to 16 cores and the other does not. We can select either one for our tests. If we search the Community AMIs we get back a variety of 11g and 10g installation options but no 12c options. (Note that the first screen shot is the Standard Edition description, it should be the Enterprise Edition since two are listed).

We are going to use the Commercial Marketplace and select the first 12c database instance. This takes us to a screen that lets us select the processing shape. Note that the smaller instances are not allowed because you need a little memory and a single core does not run the database very well. This is one of the advantages over selecting an operating system ourselves and finding out that we selected too few cores or not enough memory. Our selections are broken down into general purpose, compute optimized, or storage optimized. The key difference is how many cores, how much memory, and dedicated vs generic IOPs to the disk.

We could select an m3.xlarge or c3.xlarge and the only difference would be the amount of memory allocated. Network appears to be a little different with the c3.xlarge having less network throughput. We are going to select the m3.xlarge. Looking at pricing we should be charged $0.351/hour for the Ec2 instance, $0.125 per GB-month provisioned or $5/month for our 40 GB of disk, and $0.065 per provisioned IOP-month or $32.50/month. Our total cost of running this x3.xlarge instance will be $395.52/month or $13.18/day. We can compare this to a similarly configured Amazon RDS at $274.29/month. We need to take into account that we will need to purchase two processor licenses of the Enterprise Edition license at $47,500 per processor license. The cost of this license over four years will be $95,000 for the initial license plus 22% or $20,900 per year for support. Our four year cost of ownership will be $178,600. Amortizing this over four years brings this cost to $3,720/month. Our all in cost for the basic Enterprise Edition will cost us $4,116.35/month. If we want to compare this to the DBaaS cost that we covered earlier we also need to add the cost of the Transparent Data Encryption so that we can encrypt data in the cloud. This module is included in the Advanced Security Module which is priced at $15,000 per processor license. The four year cost of ownership for this package is $56,400 bringing the additional cost to $1,175/month. We will be spending $5,291.35 for this service with Amazon.

If we want to compare this with PaaS we have the option or purchasing the same instance at $1,500/OCPU/month or $3,000/month or $2.52/OCPUhour for the Enterprise Edition on a Virtual Image. We only need two OCPUs because this provides us with two threads per virtual core where Amazon provides you with one thread per core. We are really looking for thread count and not virtual core count. Four virtual processors in Amazon is equivalent to two OCPUs so our cost for a virtual image will be $1.5K/OCPU * 2 OCPUs. If we go with the Database as a Service we are looking at $3,000/OCPU/month or $6,000/month or $5.04/OCPU/hour for the Enterprise Edition as a service. What we need to rationalize is the extra $708/month for the PaaS service. Do we get enough benefit from having this as a service or do we spend more time and energy up front to pay less each month?

If we are going to compare the High Performance edition against the Amazon EC2 edition we have to add in the options that we get with High Performance. There are 13 features that need to be licensed to make the comparison the same. Each of these options cost anywhere from $11,500 per processor to $23,000 per processor. We saw earlier that each option will add $1,175/month so adding the three most popular options, partitioning, diagnostics, and tuning, will cost $3,525/month more. The High Performance edition will cost us $2,000/OCPU/month or $4K/month for the virtual image and $4,000/OCPU/month or $8K/month. Again we get ten more options bundled on with the High Performance option at $8K/month compared to $8,816.35 with the AWS EC2 option. We also get all of the benefits of PaaS vs IaaS for this feature set.

Once we select our AMI, instance type, we have to configure the options. We can request a spot instance but this is highly discouraged for a database. If you get terminated because your instance is needed you could easily loose data unless you have DataGuard configured and setup for synchronous data commit. We can provision this instance into a virtual private network which is different from the way it is done in the Oracle cloud. In the Oracle cloud you provision the service then configure the virtual instance. In Amazon EC2 it is done at the same time. You do have the option of provisioning the instance into one of five instance zones but all are located in US East. You can define the administration access roles with the IAM role option. You have to define these prior to provisioning the database. You can also define operating of this instance from the console. You can stop or terminate the instance when it is shut down as well as prohibit someone from terminating the instance unless they have rights to do so. You can enable CloudWatch (at an additional charge of $7.50/month) to monitor this service and restart it if it fails. We can also add elastic block attachment so that our data can migrate from one instance to another at an additional cost.

We now have to consider the reserved IOPs for our instance when we look at the storage. By default we get 8 GB for the operating system, 50 GB for the data area with 500 provisioned IOPS, and 8 GB for log space. The cost of the reserved IOPS adds $38.75/month. If we were looking at every penny we would also have to look at outbound traffic from the database. If we read all of our 50 GB back it would increase the price of the service by a little over $3/month. Given that this is relatively insignificant we can ignore it but it was worthy of looking at with the simple monthly calculator.

Our next screen is the tags which we will not use but could be used to search if we have a large number of instances. The screen after that defines the open ports for this service. We want to add other ports like 1521 for the database, and 443 and 80 for application express. Port 1158 and 22 were predefined for us to allow for enterprise manager and ssh access.

At this point we are ready to launch our instance. We will have 50 GB of table space available and the database will be provisioned and ready for us upon completion.

Some things to note in the provisioning of this instance. We were never asked for an OID for the database. We were never asked for a password associated with the sys, system, or sysdba user account. We were never asked for a password to access the operating system instance. When we click on launch we are asked for an ssh key to access the instance once it is created.

When you launch the instance you see a splash screen then a detail screen as the instance is created. You also get an email confirming that you are provisioning an instance from the marketplace. At this point I notice that I provisioned Standard Edition and not Enterprise Edition. The experience is the same and nothing should change up to this point so we can continue with the SE AMI.

Once the instance is created we can look at the instance information and attach to the service via putty or ssh. The ip address that we were assigned was 54.242.14.146. We load the private key and ip address into putty and connect. We first failed with oracle then got an error message with root. Once we connect with ec2-user we are asked if we want to create a database, enter the OID, and enter the sys, system, and dbsnmp passwords.

The database creation takes a while (15-30 minutes according to the create script) and you get a percent complete notification as it progresses. At this point we have a database provisioned, the network configured, security through ssh keys to access the instance, and should be ready to connect to our database with sql developer. In our example it took over an hour to create the database after taking only five minutes to provision the operating system instance. The process stalled at 50% complete and sat there for a very long time. I also had to copy the /home/ec2-user/.ssh/authorized_keys into the /home/oracle/.ssh directory (after I created it) to allow the oracle user to login. The ec2-user account has rights to execute as root so you can create this directory, copy the file, and change ownership of the .ssh directory and contents to oracle. After you do this you can login as oracle and manage the database who owns the processes and directories in the /u01 directory.

It is important to note that the database in EC2 provides more features and functions than the Amazon RDS version of the database. Yes, you get automated backup with RDS but it is basically a snapshot to another storage cloud instance. With the EC2 instance you get features like spatial, multi-tenant, and sys access to the database. You also get the option to use RMAN for backups to directories that you can read offsite. You can setup DataGuard and Enterprise Manager. The EC2 feature set is significantly more robust but requires more work to setup and operate.

In summary, we looked at what it takes to provision a database onto Amazon EC2 using a pre-defined AMI. We also looked at the cost of doing this and found out that we can minimally do this at roughly $5.3K/month. When we add features that are typically desired this price grows to $8.8K/month. We first compared this to running DBaaS in a virtual instance in the Oracle Public Cloud at $6K/month (with a $3K/month smaller footprint available) and DBaaS as a service at $8K/month (with a $4K/month smaller footprint available). We talked about the optional packs and packages that are added with the High Performance option and talked about the benefits of PaaS vs IaaS. We did not get into patching, backups, and restart features provided with PaaS but did touch on them briefly when we went through our instance launch. We also compared this to the Amazon RDS instance in features and functions at about a hundred of dollars per month cheaper. The bulk of the cost is the database license and not the compute or storage configuration. It is important to note that the cost of the database perpetual license is still being paid for if you are running the service or not. With PaaS you do get the option of keeping the data active in cloud storage attached to a compute engine that is running but you can turn off the database license on an hourly or monthly basis to save money if this fits your usage model of a database service.

What's New in the Cloud

Wed, 2016-05-18 02:07
One thing that the last year has taught me is that things change quickly. One of the biggest challenges is to keep up with this change and figure out what is new and what is not. We are going to take a diversion today and look at changes in the Oracle Public Cloud then get back to provisioning database into different cloud platforms tomorrow. This is important because it helps us define how to differentiate platform as a service from infrastructure as a service with software installed on it. Entries like scale up and scale down of services, DataGuard between two data centers for DBaaS, temporary bursting services to larger instances, various connectors and plug ins for integration and SOA services are examples of PaaS advantages. Many of these features automatically happen or reduce hundreds of commands that needs to be executed to configure a service or integration. Provisioning a database into an IaaS service comes with tradeoffs and sacrifices. It is important to know what added services you are getting when you select PaaS over IaaS. The list of new features helps us understand the added value of PaaS and how we can leverage them.

Let's start with infrastructure and see how things have changed. If you go to the Oracle Public Cloud Documentation you see tabs listing all of the services. For infrastructure this corresponds to compute, storage, and networking. If we click on infrastructure then compute, it takes us to the Compute Documentation. Note that there is a What's New page. At the time of writing this blog, the newest entry is April 2016. The key announcements in this entry include

  • April 2016
    • Oracle Compute Cloud Service — Generally Available (GA)- it was controlled availability
    • 1 OCPU subscription - previous min was 500 OCPUs
    • Bursting - non-metered services can short term double the cores allocated an additional services billed like a metered service
    • Oracle-provided Windows images - Windows 2012 R2
    • Oracle-provided Solaris images - Solaris x86 11.3
    • Cloning storage volumes using snapshots
    • Cloning instances using snapshots
    • Resizing storage volumes - storage can be resized while attached to active instance
    • Private Images page moved to a new tab on the web console
    • Instance IP addresses now shown on the Instances page
    • Improved image upload tool
  • March 2016
    • Changes in the web console for creating storage volumes
    • opc-init documentation - startup initialization scripts when a new image is booted
  • February 2016
    • Oracle Network Cloud Service - VPN for Dedicated Compute
    • Security IP list as the destination in a security rule created using the web console
    • SSH key management actions moved to the Network tab of the web console
    • Summary information displayed for each resource in the web console
    • Simplified navigation and improved performance in the web console - Orchestration tab changed

There isn't a what's new for storage and networking because it is folded into the compute page. Note that there were a few storage entries (resize to an active instance and cloning storage volumes) and network entries (VPN, Security list, SSH key management) in the compute page.

For platform as a service, there is a What's new for DBaaS that details changes to the database as a service and schema as a service options.

  • May 2016
    • Oracle Data Guard available - database creation and replication between data centers
    • Backup and recovery available through the console - previously required ssh access
    • Updated version of Oracle REST Data Services
    • Oracle GlassFish Server removed - services now available through REST services
  • April 2016
    • Configure a service instance’s database as the replication database for Golden Gate
  • March 2016
    • Add an SSH public key to a service instance - allows for multiple ssh keys to an instance
    • Jan 2016 PSU integrated into base image for single-instance databases
    • Jan 2016 bundle patch integrated into base image for Oracle RAC databases
  • February 2016
    • Selectable database character set and national character set during instance creation
    • Jan 2016 PSU available for patching
  • January 2016
    • 2 TB (terabyte) storage volumes now supported
    • Ability to create “temporary” storage volumes using Oracle Compute Cloud Service - storage can be short term added and removed as needed.

In the Application Development area there is a What's New for

  • Application Container Service
    • May 2016
      • New Command-Line Interface
      • New utilities for JavaScript and Node packaging and dependency management
      • New deployment configurations for Java-based applications target Oracle Application Container Cloud Service
      • A new Oracle Developer Cloud Service sample project
    • April 2016
      • Node.js 0.10.x, 0.12.x
      • Oracle Linux 6.6
      • Oracle Java SE 7, 8
  • Developer Cloud Service
    • May 2016
      • Deploy to Oracle Application Container Cloud Service instances
      • Snippets support
      • New Member dialog
      • Home tab remembers your last opened child tab
      • Upload artifacts to the project’s Maven repository from the Code tab
      • View the dependency information for Gradle builds
      • The Code button in the Commits view displays files of the current path
      • More pre-defined standard search queries added in the Merge Request tab
      • Audit Log in the Job Details page
      • Build is triggered on push to Git repository
      • Deploy to Oracle Java Cloud Service using Oracle WebLogic REST APIs
      • Lock a Git repository branch
      • Restrict push and merge actions on a protected branch
      • HipChat Webhook support
  • Java Cloud Service
    • May 2016
      • Manage Oracle platform services from a command line interface (CLI)
      • Create and manage access rules
      • Create service instances that use database deployments with cloud-only backups
      • Flexible usage changes to Oracle Java Cloud Service non-metered subscriptions - additional processors can be short term allocated and billed on a metered basis
    • April 2016
      • Create WebLogic Server 12.2.1 service instances
      • Provision service instances with a domain partition
      • Create service instances that use Oracle Real Application Clusters (RAC) databases
      • New patches are available, WebLogic server, Java Developer Kit
    • March 2016
      • Manage SSH access for service instances
      • Add a second load balancer to a service instance
  • Mobile Cloud Service
    • May 2016
      • Location Platform API
      • Microsoft Azure Active Directory authentication
      • export and import artifacts across MCS instances
      • OAuth and JWT token policies for REST connectors
    • April 2016
      • Facebook credentials or their corporate single-sign on credentials
      • JavaScript SDK has been re-tooled to specifically support browser-based mobile apps
      • Cordova SDK supports hybrid development on the Cordova framework

For Content and Collaboration Services

  • Process Cloud Services
    • April 2016
      • New Process Editor
      • New Data Association editor
      • Transformation editor
      • Business Indicator metrics
      • Business Analytics dashboards
      • Outbound REST Connector editor
      • Document-Initiated Process
      • Web Service Message Protection
      • Security Certificates
      • New REST APIs
      • Workspace Enhancements
      • SSO and Authentication
      • Web Form Snapshots
      • Business Objects from JSON instance

For the Integration Cloud Service

  • Integration Cloud Service
    • April 2016
      • Orchestration support - BPEL Process integration
      • Oracle Sales Cloud Adapter - REST APIs and interface catalog
      • REST Adapter enhancements
      • SAP Adapter - inbound integration support
      • Microsoft SQL Server Adapter - inbound integration support
      • File Adapter - inbound integration support
      • Java Messaging Server Adapter - outbound integration support
      • DocuSign Adapter - outbound integration support
      • SuccessFactors Adapter - outbound integration support
      • ServiceNow Adapter - outbound integration support
      • Oracle Field Service Adapter - inbound and outbound integration support
      • Adapter Portal
      • Search improvements
      • Mapper visual enhancements
      • Execution Agent (on-premises Oracle Integration Cloud Service)
    • March 2016
      • Adobe eSign Adapter - outbound integration support
      • File Adapter - outbound integration support (support for 5 MB)
      • Microsoft SQL Server Adapter - outbound integration support
      • FTP Adapter - secure FTP server support
      • SAP Adapter - TRFC, QRFC, and error document support
      • Oracle Database adapter - inbound integration support
      • Oracle Siebel Adapter - inbound integration support
      • Salesforce Adapter - custom WSDL support
      • REST Adapter - multidimensional, nested array support in JSON documents
      • Scheduler - Delete files upon successful retrieval after an error
      • Large payload support - 10 MB
  • SOA Cloud Service
    • May 2016
      • Oracle Enterprise Scheduler is now available as part of Oracle SOA Cloud Service
      • Three new tutorials
    • March 2016
      • Scale Oracle SOA Cloud Service Nodes
      • Non-Metered Subscriptions
      • Oracle Managed File Service
      • Oracle B2B

For Business Analytics the changes are

  • March 2016
    • File size limit increased to 50MB
    • Visualize data in Oracle Applications
    • Update data sources after upload
    • New ways to present data visualizations; Donut charts, Tile views, Text boxes
    • Enhancements to visualizations; Trends, Color management, Thumbnails, Sort data elements, Filter data
    • Quickly copy report columns with “Save Column As…”
    • Build multiple data models
    • Upload data from Excel spreadsheets and OTBI (Oracle Transactional Business Intelligence) data sources
    • Data Loader deprecated
    • Integrate with multiple data sources
    • Whitelist safe domains
    • Index content and schedule crawls
    • Download the public key for remote data connectivity
    • Updates to the REST API
In summary, it is important to look at the new services and new announcements. Some of the changes are relatively small and of low impact. Other changes provide new features and functions that might change the way that you can leverage cloud services. These pages are updated monthly while the cloud services are typically updated every other week. It is recommended that you get into a routine schedule of checking the What's New links in the documentation. Unfortunately, there is not a single location to look at all of these updates. This blog is an attempt to aggregate the new features for Iaas and PaaS.

Amazon RDS

Tue, 2016-05-17 02:07
Today we are going to look at Amazon RDS as a solution for running the Oracle Database in the cloud. According to the Amazon RDS website RDS is an easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. Amazon RDS provides you six familiar database engines to choose from, including Amazon Aurora, Oracle, Microsoft SQL Server, PostgreSQL, MySQL and MariaDB.

We are going to focus on the Oracle Standard Edition and Enterprise Edition deployments via RDS. We previously looked at Application Express using Amazon RDS so we won't dive deep into the different shapes available and skim over pricing in this discussion. With Amazon RDS, you can deploy multiple editions of Oracle Database. You can run Amazon RDS for Oracle under two different licensing models – “License Included” and “Bring-Your-Own-License (BYOL)”. In the "License Included" service model, you do not need separately purchased Oracle licenses; the Oracle Database software has been licensed by AWS. "License Included" pricing. Oracle has a policy paper on cloud licensing. It is important to understand database licensing and how it applies to hard partitions, soft partitions, and cloud environments.

Automated backups are turned on by default and snapshots are enabled as they are for EC2 instances. You can scale up and down processors as well as scale up and down IOPs allocated for the Oracle instance. You can use Amazon VPC to connect this instance to your data center as well as Amazon HSM to encrypt your data.

The two biggest issues that you need to consider with any cloud strategy is security and lock in. Backups are done from Amazon zone to Amazon zone. Oracle RMAN is not available as a backup mechanism and neither is the Oracle Advanced Security. Encryption is done at the disk layer and not inside the database. Amazon strips the ability for you to replicate your data back to your data center or use your security keys to encrypt your data. You have to use their backup tools and their encryption technology using their keys to protect your data. Key management and key rotation become an issue for security sensitive applications and data sets.

Amazon RDS pricing is available on the Amazon web page. Pricing starts at $0.035/hour for a quarter virtual processor and 1 GB of RAM and goes up to $3.64/hour for a standard instance and high memory instance. This pricing is for Standard Edition One of the database and includes the database license. For Standard Edition Two and Enterprise Edition, you must bring your own license. Pricing for this model starts at $0.017/hour and grows to $7.56/hour. You can also pay for a reserved account that dedicates an instance to you for as low as $99/year growing upto $33.8K/year. Data ingestion to load the database is free but there is a cost associated with copying data from the RDS instance to your client at a charge ranging from $0.09/GB/month to $0.05/GB/month at higher transfer rates. We recommend that you use the Amazon AWS Pricing Calculator to figure out your monthly charges.

To create an RDS instance, we go to the AWS Console and select the RDS instance. From here we select the "Get Started Now" button in the middle of the screen. We then select the Oracle tab and the Oracle EE Select button. To save money we are going to select the Dev/Test version but both effectively do the same thing. The key difference in the dev or production selections are minor. The production instance preloads the Instance Class with a m3.xlarge, Multi-AZ turned on, and Storage Type set to SSD. There is one item displayed, Provisioned IOPS, in the create screen that is not in the dev option. We can make the dev option look just like the production option by selecting a large enough instance and turning on Multi-AZ in the next screen.

Production instance

Development instance

We are going to select the latest 12.1.0.2 version, an instance with a little memory, general purpose storage to get 3 IOPS/GB since this is a test instance, and define our ORACLE_SID and account to access the instance.

The next screen is what defines this as Platform as a Service and not an EC2 instance with an AMI. We automatically open ports in the operating system and network access by opening port 1521 for the listener, we confirm the OID, select the character name set, turn on or off encryption in the storage, define the backup window and retention period, as well as patching policies. We are going to accept the defaults and not change anything. The one thing that Amazon does that Oracle does not is define the VPC connection when you define the database. Oracle requires a separate step to create a VPN connection. If you select Multi-AZ, I would have expected to see a selection of zones that you can replicate across. For all of the options that I selected, the Availability Zone was greyed out and I could not select the failover zone. I assume that you have to pre-define a zone relationship to make this work but it was never an option for my tests.

Once you click on Create Instance you see a splash screen and can go to the RDS monitor to look at instances that you have launched.

Once the provisioning is finished we can look at the connection details and use SQL Developer to connect to the instance. It is important to note here that we do not have SSH or command line access to this database instance. We only have access through port 1521 and do not have sys, system, or sysdba access to this database instance.

We can connect with SQL Developer by using the details from the previous screen to get the endpoint, port, and instance identifier.

The first thing to note with the database connection is that the RDS instance does not have a container. You are connecting to the base instance and pluggable databases are not available. If you have purchased the multi-tenant option for the database, RDS is not an option. If we dive into the database configuration we note that auditing is turned off for RDS. The primary reason for this is that you would not have access to the audit logs since you don't have file system access to read the logs. If you look at the control management access packs parameter, diagnostics and tuning is turned on and enables. This means that you have to purchase these additional packages to run in RDS. There is no way to turn this option off and these licenses are not included as part of the RDS usage license. You also do not have access to this data but have to go through the monitor screens to view this data and sql developer. The database compatability type is set to 12.0.0. Given that we do not have sys access we can not run in compatability mode to help migrate 11g databases into a container. Some important parameters are set that we can not change; enable_ddl_logging is false disabling DataGuard, enable_goldengate_replication is false disabling Golden Gate, enable_pluggable_database is false disabling Multi-Tenant. Default_tbs_type is set to bigfile and there are no mechanisms to change this.

It is important to figure out what the default user that we created can and can't do when managing the database. The roles assigned to this user are rather limited. We can compare the roles of the oracle user (the one we created) to the sys user. Note that the oracle roles are a short list.

The RDS Option Documentation talks about connecting to enterprise manager and application express. We were not able to connect to ports 1158 or 5500 as suggested. My gut says that this has to do with the port routing rules that were created by default.

If we are running out of table space we can modify the existing instance and grow the storage. This is done by going to the RDS instance page and selecting modify instance. You type in the new storage size and click apply immediately.

Once the modification finishes we can see the new storage in the details page in the RDS console.

We should note that we do not see the addition to the tablespace because it is added to the filesystem but the tablespaces are all configured to auto extend and consume all available space. Unfortunately, this makes it look like all of the tablespace is full and our used percent will always be relatively high for the files that our tables are stored. We need to monitor disk usage separately with a different part of the RDS console.

In summary, you can run the Oracle database in Amazon RDS. There are limitations and issues that you need to be aware of when doing this. Some of the features are limited and not available to you. Some of the features are required which you might not be using today. If you are running an application server in EC2, running the database in RDS makes sense. The intention of this blog is not to tear down one implementation and elevate another but to elevate discussion on what to look out for when you have decided to run in a specific cloud instance. Up next, how does Amazon RDS differ from Amazon EC2 with a pre-configured AMI.

Database as a Virtual Image

Mon, 2016-05-16 02:07
The question that we are going to dive into this week is what does it really mean to be platform as a service vs infrastructure as a service. Why not go to Amazon and spin up an EC2 instance or search for an Oracle provided AMI on Amazon or Virtual Image on Azure? What benefit do I get from PaaS? To answer that we need to look at the key differences. Let's look at the two options when you provision a database in the Oracle DBaaS. When you provision a database you have the option of service levels; Database Cloud Service and Database Cloud Service - Virtual Image. We looked at the provisioning of the cloud service. It provisions a database, creates the network rules, and spins up an instance for us. What happens when we select Virtual Image?

The release and version screens are the same. We selected 12c for the release and High Performance for the version. Note that the questions are much simpler. We are not asked about how much storage. We are not asked for an SID or sys password. We are not asked about backup options. We are not given the option of DataGuard, RAC, or GoldenGate. We are only asked to name the instance, pick a compute shape, and provide an ssh public key.

This seems much simpler and better. Unfortunately, this isn't true. What happens from here is that a Linux 6.6 instance is created and a tarball is dropped into a staging area. The database is not provisioned. The file system is not prepared. The network ports are not configured and enabled. True, the virtual instance creation only takes a few minutes but all we are doing is provisioning a Linux instance and copying a tarball into a directory. Details on the installation process can be found at Database Cloud Installation - Virtual Image Documentation.

If you look at the detailed information about a system that is being created with a virtual image and a system that is being created as a service there are vast differences.

The first key difference is the amount of information displayed. Both instances have the same edition, Enterprise Edition - High Performance. Both will display this difference in the database as well as in the banner if asked what version the database is. The Service Level is different with the virtual image displayed as part of the service level. This effects the billing. The virtual image is a lower cost because less is done for you.

Product (per OCPU) General Purpose High-Memory Per Month Per Hour Per Month Per Hour Standard Edition Service $600 $1.008 $700 $1.176 Enterprise Edition Service $3,000 $5.040 $3,100 $5.208 High Performance Service $4,000 $6.720 $4,100 $6.888 Extreme Performance Service $5,000 $8.401 $5,100 $8.569

Virtual Image Product (per OCPU) General Purpose High-Memory Per Month Per Hour Per Month Per Hour Standard Edition Service $400 $0.672 $500 $0.840 Enterprise Edition Service $1,500 $2.520 $1,600 $2.688 High Performance Service $2,000 $3.360 $2,100 $3.528 Extreme Performance Service $3,000 $5.040 $3,100 $5.208

The only other information that we get from the management screen is that the instance comsumes 30 GB rather than 100 GB that the database service instance consumes. Note that the database service instance also has the container name and a connection string for connecting to the database. Both will eventually show an ip address and we should look into the operating system to see the differences. The menu to the right of the instance is also different. If we look at the virtual machine instance we only see ssh access, access rules, and deletion of the instance as options.

The ssh access allows us to upload the public key or look at the existing public key that is used to access the instance. The access rules takes us to a new screen that shows the security rules that have been defined for this instance, which is only ssh and nothing else.

If we look at a database as a service instance, the menu is different and allows us to look at things like the DBaaS Monitor, APEX, Enterprise Manager monitor, as well as the ssh and access rules.

Note that the database as a service instance has a lot more security rules defined with most of them being disabled. We can open up ports 80, 443, 4848, 1158, 5500, and 1521. We don't have to define these rules, just enable them if we are accessing them from a whitelist, ip address range, or public internet.

Once we connect to both instances we can see that both are running

Linux hostname 3.8.13-68.2.2.2.el6uek.x86_64 #2 SMP Fri Jun 19 16:29:40 PDT 2015  x86_64 x86_64 x86_64 GNU/Linux
We can see that the file system is different with the /u01, /u02, /u03, and /u04 partitions not mounted in the screen shots below.

If we look at the installation instructions we see that we have to create the /u01, /u02, /u03, and /u04 disks by hand. These are not created for us. We also need to create a logical volume as well as creating the storage services. Step one is to scale up the service by adding a disk. We need to grow the existing file system by first attaching a logical volume then laying out/expanding the logical volume that we have. Note that we can exactly mirror our on-premise system at this point. If we put everything into a 1 TB /u01 partition and blend the log files and data files into one disk (not really recommended) we can do this.

To add the /u01 disk we need to scale up the service and add storage. Note that we only can add a raw disk and can not grow the data volume as we can with the database service.

Note that this scale up does require a reboot of the service. We have the option of adding one logical unit or a full 1 TB disk then partitioning it or we can add the different volumes into different disks. The drawback of doing this is that the way that attached storage is charged is $50/TB/month so adding four disks that consume 20 GB each will consume $200/month because we are allocated the full 1 TB even though we just allocate 20 GB on each disk. We do not subdivide the disk when it is attached and are charged on a per TB basis and not a per GB basis. To save money it is recommended to allocate a full TB rather than a smaller amount. To improve performance and reliability it is recommended to allocate multiple disks and stripe data across multiple spindles and logical units. This can be done at the logical volume management part of disk management detailed in the documentation in provisioning the virtual image instance.

We can look at the logical volume configuration with the lvm pvdisplay, lvm vgdisplay, and lvm lvdisplay. This allows us to look at the physical volume mapping to map physical volumes to logical unit numbers, look at logical volumes for mirroring and stripping options, and volume group options which gets mapped to the data, reco, and fra areas.

Once our instance has rebooted we note that we added /dev/xvdc which is 21.5 GB in size. After we format this disk it partitions down to a 20 GB disk as we asked. If we add a second disk we will get /dev/xvdd and can map these two new disks into a logical volume that we can map to /u01/and /u02. A nicer command to use to look at this is the lsblk command which does not require elevated root privileges to run.

Once we go through the mapping of the /u01, /u02, /u03, and /u04 disks (the documentation only goes into single disks with no mirroring to mount /u01 and /u02) we can expand the binary bits located in /scratch/db. There are two files in this directory, db12102_bits.tar.gz and db12102_se2bits.tar.gz. These are the enterprise edition and standard edition versions of the database.

We are not going to go through the full installation but look at some of the key differences between IaaS with a tarball (or EC2 with an AMI) and a DBaaS installation. The primary delta is that the database is fully configured and ready to run in about an hour with DBaaS. With IaaS we need to create and mount a file system, untar and install the database, configure network ports, define security rules, and write scripts to automatically start the database upon restarting the operating system. We loose the menu items in the management page to look at the DBaaS Monitor, Enterprise Manager monitor, and Application Express interface. We also loose the patching options that appear in the DBaaS management screen. We loose the automated backup and database instance and PDB creation as is done with the DBaaS.

In summary, the PaaS/DBaaS provisioning in not only a shortcut but removes manual steps in configuring the service as well as daily operations. We could have just as easily provisioned a compute service, attached storage, downloaded the tarball that we want to use from edelivery.oracle.com. The key reasons that we don't want to do this are first pricing and second patching. If we provision a virtual image of database as a service the operating system is ready to accept the tarball and we don't need to install the odbc drivers and other kernel modules. We also get to lease the database on an hourly or monthly basis rather than purchasing a perpetual license to run on our compute instance.

Up next, selecting a pre-configured AMI on Amazon and running it in AWS compared to a virtual image on the Oracle Public Cloud.

DBaaS for real this time

Fri, 2016-05-13 02:07
We have danced around creating a database in the Oracle Public Cloud for almost a week now. We have talked about Schema as a Service, Exadata as a Service, licensing, and the different versions of DBaaS. Today, let's tackle what it takes to actually create a database. It is important to note that the accounts that we are using are metered services accounts. We don't have the option to run as a non-metered service and have to provision the services on an hourly or monthly basis. Unfortunately, we are not going to go through the step by step process of creating a database. There are plenty of other sites that do this well

And my personal favorite

I personally like the Oracle by Example links. Most of the screen shots are out of date and look slightly different if you go through the steps now. For example, the Configure Backup and Recovery screen shots from the first link above shows local backup as an options. This option has been removed from the menu. My guess is a few months from now all of this will be removed and you will be asked for a container that will be automatically created for you rather than having to enter a container that was previously created as is done now. The critical steps that are needed to follow these examples are

  1. Get a trial cloud account - instructions on how to do this
  2. Log into your cloud account - Account documentation
  3. Navigate to the Database Cloud Service console
  4. Click the Create Instance button
  5. Define the Subscription type, billing type, software release, software edition
  6. Configure your instance with name, description, ssh public key, compute shape, backup mechanism and location, storage size, sys password, SID and PID, and optional configurations (like potentially DataGuard, RAC, and GoldenGate).
  7. Wait for instance to be provisioned
  8. Connect to the database via ssh using ssh private key and putty/ssh
  9. Optionally open up ports (port 1521 for client connect, port 80 for apex)
  10. Do something productive

The tutorials go through screen shots for all of these services. You can also watch this on youtube

Things to watch out for when you create a database instance in the Oracle Public Cloud
  1. If you configure a backup service on a demo system and increase the database size to anything of size, you will overflow the 500 GB of storage in about three weeks. Things will stop working when you try to create a service
  2. All ports are locked down with the exception of ssh. You can use an ssh tunnel to securely connect to localhost:1521 if you tunnel this port. If you are using a demo account you can only open port 1521 to the world. White listing and ip address lists are not supported in the demo accounts
  3. Play with SQL Developer connections across the internet. It works just like it does on-premise. The DBA tool has good management interfaces that allows you to do simple administration services from the tool
  4. Play with Enterprise Manager 13c. It is easy to connect to your database via ssh and add your cloud instance to the OEM console. You can manage it just like an on-premise database. Cloning a PDB to the cloud is trivial. Database backup to the cloud is trivial
  5. Play with unplugging and replugging a PDB in 12c. You can clone and unplug from your on-premise system, copy the xml files to the cloud, and plug in the PDB to create a clone in the cloud.
  6. The longer you let a database run, the smaller your credit will get. If you are playing with a sandbox you can stop a database. This will stop charging for the database (at $3-$5/hour) and you will only get charged for the compute and storage (at $0.10/hour). If you leave a database running for 24 hours you burn through $72-$120 based on your edition selection. You will burn through $3 in 24 hours if you turn off the database and restart it when you want to jump back into your sandbox. Your data will still be there. That is what you are paying $3 a day for.
  7. If you are using a demo system, you can extend your evaluation once or twice. There is a button at the top right allowing you to extend you evaluation period. Make sure you do this before time runs out. Once time runs out you need to request another account from another email address.
  8. If you are playing with an application, make sure that you spin up WebLogic or Tomcat in a Java or Compute instance in the same account. Running a application server on-premise and a database in the cloud will suffer from latency. You are shipping MB/GB across with select statement returns. You are shipping KB/MB to paint part of a screen. It is better to put the latency between the browser and the app server than the app server and the database server
  9. Request an account in Amazon and Azure. The more you play with DBaaS in the Oracle environment the more you will appreciate it. Things like creating a RAC cluster is simple. Linking a Java Service to a Database Service is simple. Running a load balancer in front of a Java Service is easy. Play with the differences between Iaas with a database and Paas DBaaS. There is a world of difference.
  10. If you run your demo long enough, look at the patch administration. It is worth looking at since this is a major differential between Oracle, Amazon, and Azure.

In summary, we didn't go through a tutorial on how to create a database as a service. At this point all of you should have looked at one or two tutorials, one or two videos, and one or two documentation pages. You should have a sample database to move forward with. It does not matter if it is Standard Edition, or Enterprise Edition, High Performance, or Extreme Performance. You should have a simple database that we can start to play with. The whole exercise should have taken you about an hour to learn and play and an hour to wait for the service to run to completion. Connect via ssh and run sqlplus as the oracle user. Open up port 1521 and download SQL Developer and connect to your cloud instance. Explore, play, and have fun experimenting. That is the primary reason why we give you a full database account and not a quarter of an account that you can't really do much with.

technology behind DBaaS

Thu, 2016-05-12 02:07
Before we can analyze different use cases we need to first look at a couple of things that enable these use cases. The foundation for most of these use cases is data replication. We need to be able to replicate data from our on-premise database into a cloud database. The first issue is replicating data and the second is access rights to the data and database allowing you to pull the data into your cloud database.

Let's first look at how data is stored in a database. If you use a Linux operating system, this is typically done by splitting information into four categories; ORACLE_HOME, +DATA, +FRA, and +RECO. The binaries that represent the database and all of the database processes go into the ORACLE_HOME or ORACLE_BASE. In the cloud this is dropped into /u01. If you are using non-rac the file system is a logical volume manager (LVM) where you stripe multiple disks to mirror or triple mirror data to keep a single disk failure from bringing down your database or data. If you are using a rac database this goes into ASM. ASM is a disk technology that manages replication and performance. There are a variety of books and websites written on this technology

LVM links

ASM links

The reason why we go into storage technologies is that we need to know how to manage how and where data is stored in our DBaaS. If we access everything with IaaS and roll out raw compute and storage, we need to know how to scale up storage if we run out of space. With DBaaS this is done with the scale up menu item. We can grow the file system by adding logical units to our instance and grow the space allocated for data storage or data logging.

The second file system that we should focus on is the +DATA area. This is where data is stored and all of our file extents and tables are located. For our Linux cloud database this is auto-provisioned into /u02. In our test system we create a 25 GB data area and get a 20G file system in the +DATA area.

If we look at the /u02 file system we notice that there is one major directory /u02/app/oracle/oradata. In the oradata there is one directory associated with the ORACLE_SID. In our example we called it ORCL. In this directory we have the control01.dbf, sysaux01.dbf, system01.dbf, temp01.dbf, undotbs01.dbf, and users01.dbf. These files are the place where data is stored for the ORCL SID. There is also a PDB1 directory in this file structure. This correlates to the pluggable database that we called PDB1. The files in this directory correspond to the tables, system, and user information relating to this pluggable database. If we create a second pluggable a new directory is created and all of these files are created in that directory. The users01.dbf, PDB1_users01.pdf in the PDB1 directory, file defines all of the users and their access rights. The system01.dbf file defines the tables and system level structures. In a pluggable database the system01 file defines the structures for the PDB1 and not the entire database. The temp01.dbf holds temp data tables and scratch areas. The sysaux01.dbf contains the system information contains the control area structures and management information. The undotbs01.dbf is the flashback area so that we can look at information that was stored three days ago in a table. Note that there is no undotbs01.dbf file in the pluggable because this is done at a global area and not at the pluggable layer. Backups are done for the SID and not each PID. Tuning of memory and system tunables are done at the SID layer as well.

Now that we have looked at the files corresponding to tables and table extents, we can talk about data replication. If you follow the methodology of EMC and NetApp you should be able to replicate the dbf files between two file systems. Products like SnapMirror allow you to block copy any changes that happen to the file to another file system in another data center. This is difficult to do between an on-premise server and cloud instance. The way that EMC and NetApp do this are in the controller layer. They log write changes to the disk, track what blocks get changed, and communicate the changes to the other controller on the target system. The target system takes these block changes, figures out what actual blocks they correspond to on their disk layout and update the blocks as needed. This does not work in a cloud storage instance. We deal on a file layer and not on a track and sector or bock layer. The fundamental problem with this data replication mechanism is that you must restart or ingest the new file into the database. The database server does not do well if files change under it because it tends to cache information in memory and indexes into data get broken if data is moved to another location. This type of replication is good if you have an hour or more recovery point objective. If you are looking at minutes replication you will need to go with something like DataGuard, GoldenGate, or Active DataGuard.

DataGuard works similar to the block change recording but does so at the database layer and not the file system/block layer. When an update or insert command is executed in the database, these changes are written to the /u04 directory. In our example the +REDO area is allocated for 9.8 GB of disk. If we look at our /u04 structure we see /u04/app/oracle/redo contains redoXX.log file. With DataGuard we take these redo files, compress them, and transfer them to our target system. The target system takes the redo file, uncompresses it, and applies the changes to the database. You can structure the changes either as physical logging or logical logging. Physical logging allows you to translate everything in the database and records the block level changes. Logic logging takes the actual select statement and replicates it to the target system. The target system either inserts the physical changes into the file or executes the select statement on the target database. The physical system is used more than the logical replication because logical has limitations on some of the statements. For example, any blob or file operations can not translate to the target system because you can't guarantee that the file structure is the same between the two systems. There are a variety of books available on DataGuard. It is also important to note that DataGuard is not available for Standard Edition and Enterprise Edition but for High Performance Edition and Extreme Performance Edition only.

  • Oracle Data Guard 11g Handbook
  • Oracle Dataguard: Standby Database Failover Handbook
  • Creating a Physical Standby Documentation
  • Creating a Logical Standby Documentation

    Golden Gate is a similar process but there is an intermediary agent that takes the redo log, analyzes it, and translates it into the target system. This allows us to take data from an Oracle database and replicate it to SQL Server. It also allows us to go in the other direction. SQL Server, for example, is typically used for SCADA or process control systems. The Oracle database is typically used for analytics and heavy duty number crunching on a much larger scale. If we want to look at how our process control systems is operating in relation to our budget we will want to pull in the data for the process systems and look at how much we spend on each system. We can do this by either selecting data from the SQL Server or replicating the data into a table on the Oracle system. If we are doing complex join statements and pulling data in from multiple tables we would typically want to do this on one system rather than pulling the data across the network multiple times. Golden Gate allows us to pull the data into a local table and perform the complex select statements without having to suffer network latency more than the initial copy. Golden Gate is a separate product that you must pay for either on-premise or in the cloud. If you are replicating between two Oracle databases you could use Active DataGuard to make this work and this is available as part of Extreme Edition of the database.

    The /u03 area in our file system is where backups are placed. The file system for our sample system shows /u03/app/oracle/fast_recovery_area/ORCL. The ORCL is the ORACLE_SID of our installation. Note that there is no PDB1 area because all of the backup data is done at the system layer and not at the pluggable layer. The tool used to backup the database is RMAN. There are a variety of books available to help with RMAN as well as an RMAN online tutorial

    It is important to note that RMAN requires a system level access to the database. Amazon RDS does not allow you to replicate your data using RMAN but uses a volume snapshot and copies this to another zone. The impact of this is that first, you can not get your data out of Amazon with a backup and you can not copy your changes and data from the Amazon RDS to your on-premise system. The second impact is that you can't use Amazon RDS for DataGuard. You don't have sys access into the database which is required to setup DataGuard and you don't have access to a filesystem to copy the redo logs to drop into. To make this available with Amazon you need to deploy the Oracle database into EC2 with S3 storage as the back end. The same is true with Azure. Everything is deployed into raw compute and you have to install the Oracle database on top of the operating system. This is more of an IaaS play and not a PaaS play. You loose patching of the OS and database, automated backups, and automatic restart of the database if something fails. You also need to lay out the file system on your own and select LVM or some other clustering file system to prevent data loss from a single disk corruption. All of this is done for you with PaaS and DBaaS. Oracle does offer a manual process to perform backups without having to dive deep into RMAN technology. If you are making a change to your instance and want a backup copy before you make the change, you can backup your instance manually and not have to wait for the automated backup. You can also change the timing if 2am does not work for your backup and need to move it to 4am instead.

    We started this conversation talking about growing a table because we ran out of space. With the Amazon and Azure solutions, this must be done manually. You have to attach a new logical unit, map it into the file system, grow the file system, and potentially reboot the operating system. With the Oracle DBaaS we have the option of growing the file system either as a new logical unit, grow the /u02 file system to handle more table spaces, or grow the /u03 file system to handle more backup space.

    Once we finish our scale up the /u03 file system is no longer 20 GB but 1020 GB in size. The PaaS management console allocates the storage, attaches the storage to the instance, grows the logical volume to fill the additional space, and grows the file system to handle the additional storage. It is important to note that we did not require root privileges to do any of these operations. The DBA or cloud admin can scale up the database and expand table resources. We did not need to involve an operating system administrator. We did not need to request an additional logical unit from the storage admin. We did not need to get a senior DBA to reconfigure the system. All of this can be done either by a junior DBA or an automated script to grow the file system if we run out of space. The only thing missing for the automated script is a monitoring tool to recognize that we are running into a limit. The Oracle Enterprise Manager (OEM) 12c and 13c can do this monitoring and kick off processes if thresholds are crossed. It is important to note that you can not use OEM with Amazon RDS because you don't have root, file system, or system access to the installation which is required to install the OEM agent.

    In summary, we looked at the file system structure that is required to replicate data between two instances. We talked about how many people use third party disk replication technologies to "snap mirror" between two disk installations and talked about how this does not work when replicating from an on-premise to a cloud instance. We talked about DataGuard and GoldenGate replication to allow us to replicate data to the cloud and to our data center. We looked at some of the advantages of using DBaaS rather than database on IaaS to grow the file system and backup the database. Operations like backup, growing the file system, and adding or removing processors temporarily can be done by a cloud admin or junior DBA. These features required multiple people to make this happen in the past. All of these technologies are needed when we start talking about use cases. Most of the use cases assume that the data and data structures that exist in your on-premise database also exist in the cloud and that you can replicate data to the cloud as well as back from the cloud. If you are going to run a disaster recovery instance in the cloud, you need to be able to copy your changes to the cloud, make the cloud a primary instance, and replicate the changes back to your data center once you bring your database back online. The same is true for development and testing. It is important to be able to attach to both your on-premise database and database provisioned in the cloud and look at the differences between the two configurations.

Pages