Steve Karam
Web Friendly Interactive Data Visualization with D3.js

I have been tinkering quite a bit with new database and development frameworks. Recently I decided to beef up my knowledge of data visualization as it’s a very hot topic and the ultimate way to make your data understood.
R is an incredibly popular option which can be as simple or as complex as you want it to be, showing incredibly detailed charts, tables, and more. However, I wanted to find something which was 100% web based and interactive, as this is the easiest way to really show your data visualization creations to the masses and put it to best use in dashboards or other sites.
So, enter D3.js. This incredible package is built in JavaScript with interactive components similar to jQuery but adding the drawing, math, and tweening capabilities necessary to build highly complex data visualizations. Anything that can be made with paths, shapes, and colors is possible as you can see on their example page. Personal favorites are the Streamgraph, Cubism, and concept network browser.
But in honor of Pi Day and due to its hierarchical capabilities, I decided to go with an interactive Sunburst. A Sunburst chart is a data visualization method similar to a pie or donut chart but with additional layers for child data. In the case of my tinkering, I decided to create one which includes:
- Oracle Schemas
- Object Types
- Objects
Each of these components is represented along with its respective size in MB. Even cooler, you can click on a component to drill down into it and click the center of the circle to zoom out. All of this was accomplished with no Flash, and it is tested on Chrome, Firefox, Safari, Internet Explorer, and iOS. Chrome and Safari definitely handle it best, but the other browsers appear to work fine. However, if you are reading this from a news aggregator like Oracle Base or OraNA you may need to click over to my actual site or use the full page demo link below.
The original D3.js example on Sunburst charts can be found here. I also took concepts from this page for zooming. The JSON supplying the data was created using the Alexandria PL/SQL Library which includes SQL to JSON capabilities. Alternatively you can use CSV, TSV, and XML to populate D3.js charts. It is worth noting that a decent amount of JavaScript knowledge is needed to really grasp what the code is doing. D3.js makes heavy use of attribute chaining like jQuery which is powerful but sometimes difficult to read and understand.
If you don’t see the example below then there may be a compatibility issue I’m not aware of; please post it in the comments! You can also try my full page demo which may work better for you. You can also view source on the full page demo to see everything I did to make it come together, or just click here.
Be sure to hover over different areas on the chart (labels show underneath) and click around! You can always click the center circle to zoom out.
Sunburst Data Visualization var width = 340, height = 340, radius = Math.min(width, height) / 2; var x = d3.scale.linear() .range([0, 2 * Math.PI]); var y = d3.scale.sqrt() .range([0, radius]); var color = d3.scale.category20c(); var content = d3.select("#valtext"); var svg = d3.select("#chart").append("svg") .attr("width", width) .attr("height", height) .append("g") .attr("transform", "translate(170,170)"); var partition = d3.layout.partition() .value(function(d) { return d.size; }); var arc = d3.svg.arc() .startAngle(function(d) { return Math.max(0, Math.min(2 * Math.PI, x(d.x))); }) .endAngle(function(d) { return Math.max(0, Math.min(2 * Math.PI, x(d.x + d.dx))); }) .innerRadius(function(d) { return Math.max(0, y(d.y)); }) .outerRadius(function(d) { return Math.max(0, y(d.y + d.dy)); }); d3.json("/sunburst.json", function(error, root) { var path = svg.selectAll("path") .data(partition.nodes(root)) .enter().append("path") .attr("d", arc) .style("fill", function(d) { return color((d.children ? d : d.parent).name); }) .on("click", click) .on("mouseover", mouseover) .on("mouseout", mouseout); function click(d) { path.transition() .duration(750) .attrTween("d", arcTween(d)); } function mouseover(d) { content.append("h2") .attr("id", "current") .text(d.name + (d.size ? " - " + (d.size / 1024 / 1024) + "MB" : '')); } function mouseout(d) { content.html(''); } }); d3.select(self.frameElement).style("height", height + "px"); // Interpolate the scales! function arcTween(d) { var xd = d3.interpolate(x.domain(), [d.x, d.x + d.dx]), yd = d3.interpolate(y.domain(), [d.y, 1]), yr = d3.interpolate(y.range(), [d.y ? 20 : 0, radius]); return function(d, i) { return i ? function(t) { return arc(d); } : function(t) { x.domain(xd(t)); y.domain(yd(t)).range(yr(t)); return arc(d); }; }; }The post Web Friendly Interactive Data Visualization with D3.js appeared first on Steve Karam :: The Oracle Alchemist.
Cloning Your Oracle Installation

If there is one thing a DBA hates it is to waste time. Particularly on boring and tedious tasks. One such task is installing Oracle on a new system (or duplicating a home on an existing system) which generally involves X server setup (Xming for Windows, XQuartz on Mac, pure as the driven snow on Linux) and obnoxious Oracle Installer GUI screens.
Yet despite the widespread angst at performing Oracle installs, I’m surprised at the number of clients and DBAs I see trudging through Next buttons on every single server to get it installed. Nevermind the fact that with modern server deployment we should be able to create new systems which already have Oracle installed and configured; a DBA can clone Oracle Homes from server to server even if enterprise provisioning not an option.
Cloning the Oracle Home is done through simple tools: cp if you are cloning on the same server, tar (or other tools) and scp to go to a new server, and the vanilla runInstaller package from Oracle conveniently located in $ORACLE_HOME/oui/bin.
[oracle@orahost ~]$ echo $ORACLE_HOME /u01/app/oracle/product/11.2.0/db_1 [oracle@orahost ~]$ cd /u01/app/oracle/product/11.2.0 [oracle@orahost 11.2.0]$ cp -rp db_1 db_2 cp: cannot open `db_1/bin/nmhs' for reading: Permission denied cp: cannot open `db_1/bin/nmb' for reading: Permission denied cp: cannot open `db_1/bin/nmo' for reading: Permission denied
In the steps above, we copied the ORACLE_HOME located at /u01/app/oracle/product/11.2.0/db_1 to another directory (/u01/app/oracle/product/11.2.0/db_2). If you use the cp command, make sure you use the -rp flags; -r copies directories recursively, and -p preserves file attributes. You may also notice that a few files could not be copied. Don’t worry about these, when we run the installer it will take care of that.
Once the copy (or tar/untar between servers) is complete, the rest is simple. We will run the runInstaller command in the new Oracle Home and provide the information it requires. This is done silently, so no X Windows server is required at all.
[oracle@orahost 11.2.0]$ cd db_2/oui/bin
[oracle@orahost bin]$ ./runInstaller -silent -clone ORACLE_BASE="/u01/app/oracle" ORACLE_HOME="/u01/app/oracle/product/11.2.0/db_2" ORACLE_HOME_NAME="OraHome2"
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 3960 MB Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2013-03-12_09-46-15AM. Please wait ...[oracle@orahost bin]$ Oracle Universal Installer, Version 11.2.0.1.0 Production
Copyright (C) 1999, 2009, Oracle. All rights reserved.
You can find the log of this install session at:
/u01/app/oraInventory/logs/cloneActions2013-03-12_09-46-15AM.log
.................................................................................................... 100% Done.
Installation in progress (Tuesday, March 12, 2013 9:46:35 AM EDT)
............................................................................. 77% Done.
Install successful
Linking in progress (Tuesday, March 12, 2013 9:46:43 AM EDT)
Link successful
Setup in progress (Tuesday, March 12, 2013 9:48:14 AM EDT)
Setup successful
End of install phases.(Tuesday, March 12, 2013 9:50:31 AM EDT)
WARNING:
The following configuration scripts need to be executed as the "root" user.
/u01/app/oracle/product/11.2.0/db_2/root.sh
To execute the configuration scripts:
1. Open a terminal window
2. Log in as "root"
3. Run the scripts
The cloning of OraHome2 was successful.
Please check '/u01/app/oraInventory/logs/cloneActions2013-03-12_09-46-15AM.log' for more details.
The actual command we used was:
./runInstaller -silent -clone ORACLE_BASE=”/u01/app/oracle” ORACLE_HOME=”/u01/app/oracle/product/11.2.0/db_2″ ORACLE_HOME_NAME=”OraHome2″These flags defined:
- -silent: specifies that the command should be done without interaction
- -clone: specifies that the home is a clone of another location
- ORACLE_BASE: the base directory for Oracle products as per the Optimal Flexible Architecture (OFA)
- ORACLE_HOME: the new directory location
- ORACLE_HOME_NAME: a unique name for the Oracle Home for the server you have cloned to
Once you have finished running the command and have confirmed success, the standard root.sh (and orainstRoot.sh if this is the first Oracle install on the server) execution by the root user applies.
Good luck, and happy cloning!
The post Cloning Your Oracle Installation appeared first on Steve Karam :: The Oracle Alchemist.
Wearable Computing and Personal Data Collection

Wearable computing is all the rage right now and is poised to explode in the consumer market once a number of high-profile projects go into full production. The bulk of buzz surrounding the technology is focused on entertainment value and consumer usage. But how might wearable computing and data analysis come together to improve our lives? What kind of data can we collect and harness?
While wearable computing as a whole didn’t even make it into the Gartner Emerging Technologies 2012 Hype Cycle, it is obviously up and coming. Let’s take a look at some of the current and up-and-coming offerings. Or if you’d rather get right to the Personal Data Collection portion, click here.
PebblePebble was and is a Kickstarter sensation, smashing through its $100,000 goal and raising a total of $10,266,845. This watch comes in multiple colors and features an e-ink screen for low battery consumption and customization. Owners can select custom watch faces from Pebble and eventually from developers.
![]()
But what is really special about Pebble is the tech behind the scenes. Bluetooth connectivity allows it to interact with your smartphone (iOS and Android) to receive notifications wrist-top. SMS, calendar events, even Twitter mentions and Facebook likes will set your wrist abuzz and the message will pop up. In fact, you can even use it for Grid Control alerts. At the moment apps for the Pebble are scarce, but once they catch up with their production requirements it is expected that the SDK will be released to developers to allow custom watch faces and apps which interact over bluetooth with a phone.
While the Pebble has some great potential as a consumer device and does contain a 3-axis accelerometer, it is not yet fully exploitable for data collection. Pebble is at this time primarily a consumer device, though with the right additions it could one day become a true input/output device on its own. That’s not stopping Pebble though, and it sure as heck isn’t stopping me from wearing my Kickstarter Edition!
Google GlassGoogle Glass is a much hyped Minority Report meets Geordi La Forge headset that fits on your face like a pair of glasses. It includes a small screen that acts as a heads up display and various inputs including audio and video. With Google Glass, you can walk around taking pictures and recording anything you see all by saying a few commands. You can send and receive messages, video (including real time for chat), get directions overlaid in your visual view, get pertinent information such as flights or monetary conversions, and more.

It all starts with the words “Ok Glass”.
This level of geekism and ease of audio and video storage has already prompted one PR savvy bar to ban Google Glass preemptively. Google is masterful when it comes to data collection/analysis and a device which people will beg to wear around that captures video, audio, and interactions with a simple phrase is right up their alley.
And to be honest I can’t help but be taken in by it. Granted, most of the data collection it performs will be consumed by individuals and harnessed by Google; this is a monetization model which already works very well for them and social media sites. With the presentation of Google Shoes yesterday at SXSW, there’s a real possibility Google may be going the whole gamut in regards to wearable computing.
MYO Gesture Control Armband
The MYO armband is an up and coming technology that you wear on the forearm, granting you Jedi powers over wireless devices and is a strong contender in the ultimate goal of creating a Spider-Man web shooter. It is purely an input device (think of it as a Wii controller without the controller) which can read muscle activity in the arm and hand, allowing control over a variety of devices.
With a pair of MYOs on your arms, you can conduct an electronic symphony through arm and finger movements, control wireless quadcopters, flip through slides, or turn devices on and off. It has excellent implications for any wireless technology which requires any form of touch or accelerometer based input. Thanks to its ability to read electrical signals from the muscles, it has a lot of potential for personal data collection (a topic we will cover shortly).
Fitbit, Jawbone UP
I put these together because they’re close competitors and are used for the same purpose. Both of these gadgets are primarily used as advanced pedometers.
The Jawbone UP is a band you wear on your wrist which collects information about how your arms swing. By wearing the UP 24×7 it can keep track of the amount of activity you get, how many steps you walk, how long you stay idle, and even how you sleep at night. All the data it collects is shipped wirelessly to your smartphone where you can view and publish the information.
The Fitbit is the same concept (and in my opinion, a better device) which you wear on your pants or in your pocket when awake and tucked into a wristband when you sleep. They also have an upcoming device called the Flex which will be purely for the wrist and is a direct competitor to the UP. Fitbit can record your steps, activity level, and the amount of stairs you climb throughout the day which is sent wirelessly to your phone for interaction.
Both of these devices have a cool feature in common: their phone and web based tools allow you not only to see the data, but to correlate it with other activities. You are encouraged to enter the foods you ate or the mood you were in during certain times so you can view analytics on how the quality of your sleep, the food you eat, and your activity levels play on your day to day life. Both devices are pure sensors, but the data they collect is used primarily for helping you understand more about your activity levels and how you can improve.
Personal Data CollectionPersonal Data Collection (PDC for short from this point on) is not a new concept; in fact it has been going on since people started writing diaries and journals. But with wearable computing, both its ease of collection and its benefits can increase exponentially. Part of the problem with forecasting or correlating events in the lives of a person is that you have to rely on that person to keep accurate records in regards to both activities and time. And often, there are details that are simply impossible for a person to measure or record, leading to missing data.
But what if we could capture most, if not all, the personal data regarding the daily use of the human?
The problem is the sensors. While we can put glass on a person’s face, bands on their forearms, and shoes on their feet we are still missing a wide variety of data that comes from the brain and other organs. Of course it would be completely invasive to put a chip in everyone’s brain, gut, and kidneys just go grab some data, but boy would it be enlightening. In the case of organs, perhaps X PRIZE has it right and we should be looking at external collection methods like their joint venture with Qualcomm challenging teams to make a working medical tricorder a la Star Trek for a $10 million prize.
Just with the devices I listed above, you can collect the following information individually:
- Steps walked, stairs climbed, whether the person limps or slows after a certain time
- Sleep movement, wakefulness, amount of time it takes to fall into deep sleep
- Downward pressure of each step, foot angle, general posture, walk type, activity level
- Arm movement frequency, finger movement frequency, limb dexterity, muscle group usage, muscle density, flexibility, frequent arm muscle group usage
- Voice data, talking speed, language, dialect, social interaction methods and frequency, aural and visual interactions with people and objects, audio preferences, visual preferences, chat preferences and frequency, and even an idea of pleasing visual stimulus from characteristics of photos taken, mood via voice analysis
That is a ton of data (most of it from the Google device of course) and it doesn’t even begin to correlate the data between the different wearable gadgets. If these devices could talk to each other in some standardized way (Wearable Communication Protocol?) the amount of information we could put in use in our own personal lives would be astounding. For instance:
- Gain insight into what activities and visuals make us the most happy
- Gauge activity levels during a day following a certain sleep pattern
- Discover who you prefer to communicate with following meals, sleep, or exercise
- Determine the people who have the most positive influence on your life (very neat)
- Hand-eye coordination, hand-foot coordination
- Physical movement, fidgeting, and audio cues while using a computer (muscle movement in the fingers from typing)
It all makes for an extremely impressive set of information which can be used for trending and forecasting along with comprehensive personal data visualization. This information can be used to make decisions, provide data to doctors or psychotherapists, help with posture, sleeping habits, pain management, activity levels, even social interactions. Knowledge is power as they say and personal data collection through sensor measurement has the ability to help us transform our lives.
There are plenty of other things we could easily detect with modern technology: sweat levels, eye movements, breath depth, even muscle movements in the face to detect smiling and frowning. And who knows what the future will bring? Is there some way to measure caloric and nutritional intake without requiring the eater to jot down their meal? The more we can collect and store, the more we can crunch and analyze. No matter how you look at it, the real value of wearable computing is not consumer entertainment but consumer enlightenment.
The post Wearable Computing and Personal Data Collection appeared first on Steve Karam :: The Oracle Alchemist.
The Social Circle
Despite the popular idea of tech folks silently skulking in their cubicles with indie music streaming in through white earbuds, the fact is that most IT folks are a chatty and social bunch. More than likely you are in possession of at least a Facebook account and more than likely a Google+ account, Twitter account, and maybe even a LinkedIn account as well (we’re all in the top 5% or 1%, right?).
So I wanted to take this opportunity to offer my readers a quick list of where you can find me in the social world.
Follow Me on TwitterI do quite a bit of chatting on Twitter because it’s such an easy and quick platform for day to day chatter. At 140 characters you can say a surprising amount as long as you don’t try to talk about the DBMS_TRANSFORM_EXPIMP_INTERNAL or something. You can find me there at @OracleAlchemist. I’ll be happy to follow back!
Like (and Share!) My Facebook PageI recently signed up for a Facebook Page for Oracle Alchemist as well. I plan to use this to share blog articles as well as other content that is not right for the blog and too long for Twitter. You can find my Facebook Page at facebook.com/oraclealchemist.
Add Me on LinkedInI’m still up in the air on the LinkedIn platform. On the one hand it’s a social network of professionals and that’s always a good thing. On the other hand I find the interface to be extremely clunky and the mobile apps to be abysmal. Be that as it may, I post my blog content there as well as participate in Group discussions. Feel free to look me up at linkedin.com/in/stevekaram!
Add Me to Your Google+ CirclesI was originally in the Google+ Beta, but ended up leaving due to a lack of unique content. However, they have definitely grown quite a bit and I am glad to say I have a new profile there. Not only is it useful for SEO purposes, but it also is a great platform for random chat, ideas, and collaboration.
Got a Blog?Outside of the social networks there is, of course, this blog. Please feel free to comment on posts. I don’t mind link dropping as long as it’s relevant (though you may have to wait for me to approve the comment). Also depending on the blog I may add it to the sidebar or link it from future articles.
Thanks all, and happy Friday! I hope to see you out there on the social web.
The post The Social Circle appeared first on Steve Karam :: The Oracle Alchemist.
Oracle GoldenGate in the Enterprise
GoldenGate – The Bridge to Everywhere
Oracle GoldenGate has generated a lot of buzz in the Oracle community since the software was purchased by Oracle in 2009. It is marketed by Oracle as a niche application, but one that can be of use in nearly any Oracle environment. Additionally, Oracle has made it clear that they expect GoldenGate to be the chief replication method in the future instead of Streams of Advanced Replication.
The reasoning behind this is simple: GoldenGate is a true heterogeneous replication solution that is not tied directly to the Oracle software. Using GoldenGate it is possible to replicate from nearly any database to any other database. Available platforms include: DB2, MySQL, Oracle, SQL/MX, SQL Server, Sybase, Teradata, TimesTen, flat files, and many other platforms via Java. The software is capable of replicating in near real time or on a lag, and can take a single source and pump it to multiple destinations. GoldenGate is like the highway system for databases, allowing quick transfers to and from any location with many extra destinations easily available.
GoldenGate accomplishes these tasks through a simple set of processes distributed around a centralized messaging service for communication. Before we dive into how GoldenGate can be used in your Oracle environment, let’s go through the components of GoldenGate to get a better understanding of the software.
The Extract ProcessThe first step in any replication process is to capture changes on the source environment. GoldenGate accomplishes this via Extract processes that are able to read transaction logs on the source system and write change data into a trail file (we will cover this next). Multiple Extract processes can be configured which pull different sets of data. This data can be pulled constantly (24×7 replication), on a scheduled basis, or even configured as an event-based system where capturing of data begins when certain data changes are made.
By reading from transaction logs, GoldenGate is capable of recording changes with very minimal overhead to the source system. Even better, it is possible to use an Oracle 11g Active DataGuard environment as a GoldenGate source, which means that the source system is barely impacted at all.
The Extract process is also capable of using stored data as a source instead of transaction logs. This is mostly useful for making an initial copy of a database to a target environment.
Trail FilesThe Trail is where all extracted changes are stored. It holds a combination of DDL and DML. Trail files can be designated as either a local trail that stays on the same server as the extract, or a remote trail which is sent via TCP/IP to a target system.
GoldenGate maintains the trail files; the main responsibility of the DBA is to initially set up the storage and locations for the files. As data is extracted, it is written to the trail files. As the files fill up, new trail files are formed and the data keeps writing. As data is consumed, GoldenGate is capable of purging old trail information.
Oracle documentation recommends using separate disk for the trail file to guarantee GoldenGate processing is fast and does not contend with other resources on the environment. Even with fast systems and purging turned on, the trail file will need room to grow particularly in the event of a network outage. In order to know how much space is required, Oracle recommends the following formula:
(log volume in one hour) * (max number of hours network downtime) x 0.4 = trail space
This is based on Oracle’s estimate (page 6) that the GoldenGate Extract will actually record only 40% of data in redo logs. One of the things that makes GoldenGate so fast is the limited amount of data required for Extract; for instance, GoldenGate only extracts committed transactions whereas the redo logs hold all transactions whether committed or uncommitted.
Trail files are almost always written remotely for the purposes of replication. However, the Extract can store trails locally so that they can be picked up by another process called a Data Pump that will send the trail data to multiple destinations. If a Data Pump is not used, all written data will be on the target system and no storage is required for trail files on the source environment.
CollectorsThe Collector runs on the target system and receives trail file information to be written to disk. As data is extracted and sent over the network, the Collector will place the data in the proper files to be picked up by the Replicat process.
ReplicatThe Replicat process consumes data from trail files on the target system. This process is able to parse all trail information, run DDL and DML against the target environment, and keeps track of written records into an area called the checkpoint table. In order to maintain consistency and guarantee data is picked up properly after errors, the checkpoint table records the progress of replicat processes as data is consumed.
Replicats are capable of loading large sums of data at once or in a synchronized configuration. For instance, one replicat can be created to do an initial data load from a source environment, and another can be created to continue synchronizing changes once the initial load is complete.
If the replicat process receives errors, it will record its progress and abort. In GoldenGate, the act of aborting the process due to error is called abending. If a process is abended, it has aborted due to error and must be fixed.
Usage OverviewGoldenGate uses these components in order to build a complete replication environment. Extract processes pull data and write to trail files which are processed by collector processes and consumed by replicat processes. The real magic behind GoldenGate happens by exploiting this configuration in various ways:
- Multiple extract processes can pull different sets of data in parallel and broadcast it to multiple targets.
- Single extract processes can pull data and broadcast via Data Pumps
- Multiple systems can extract to a single target where multiple replicat processes can consolidate data
- Use multiple database types in any combination for the source(s) or target(s)
- Filter and/or map data en route to the target for custom data requirements
In this way, GoldenGate can be used to completely replace even the most vigorous replication or ETL processing structure. With these capabilities in mind, let’s talk about some of the ways GoldenGate could be used in the average Oracle environment.
Making the Most of GoldenGateThe GoldenGate installation is easy enough that it can be deployed on any system. It works on Unix, Linux, and Windows and transmits data over TCP/IP. The software itself is self-contained and can literally be set up within minutes. So with all this ease of use, how can we put it to work?
Simplified Extract, Transform, and Load (ETL)Oracle has many options for ETL and ELT. Most notable is Oracle Data Integrator (ODI), which now includes integration with GoldenGate for enhance change synchronization. ODI utilizes a highly efficient ELT process to collect multiple sets of data, load to a staging location, then transform the data for real-time warehousing. If you’re interested in learning more about ODI, this blog has some great information.
Even without ODI, GoldenGate is easily usable for the aggregation of data for warehouse loading. Extract processes can pull data simultaneously and in real-time from Oracle, SQL Server, AS/400 environments, XML, flat files, and many more sources. All of this data will be written as multiple trail files to the target environment that will then be written to the warehouse (again, on any platform) by replicat processes.
Using this method, GoldenGate can even filter and clean the data during the process in order to achieve the bulk of ETL work in a single set of steps. Or if you prefer, the data can simply be aggregated then cleaned by other methods on the target system for ELT.
Data DistributionWith the use of Data Pumps to send single data sources to multiple locations, it is very easy to guarantee that data across your enterprise is consistent and reliable. The extract process writes data to a local trail file that is picked up by a data pump. The data pump in turn sends this information to multiple destinations (near or far) that are collected and written to the target environment via the replicat. In this modern era of disk storage, data redundancy such as this eliminates the need for costly database links and instead allows real-time access to remote data by making use of data duplication rather than network reads.
In fact, the ‘real-time’ buzzword is huge with GoldenGate, and Oracle is using it as much as possible. The core theme behind GoldenGate is real-time access to real-time data; that is, the data you require will be accessible immediately and will always be up to date. Replacing database links with synchronized copies will improve performance on many systems tremendously as long as the data is guaranteed to be fresh.
Refresh Development ServersOne of the biggest annoyances for many DBAs is refreshing development systems with up to date data. GoldenGate is formed of reusable components, and it is very easy to extract and replicat initial data loads. By creating these processes in GoldenGate once, a DBA can initiate easy on-demand refreshes of development environments.
By using data pumps, the DBA can use a single extraction and use it to populate multiple development environments, individual environments for different development teams, DBA sandboxes, or development/QA environments in parallel. These environments can be spread across multiple operating systems, database systems, or even versions. Testing on 11g (or 12c soon) will be a breeze due to the ability to seamlessly replicate.
While it may not be as fast as solutions like Delphix, it is a valid method of refreshing via replication.
UpgradesThe capability to move between versions introduces another logical use of GoldenGate: upgrading to new releases. GoldenGate can replicate seamlessly from server to server or locally and can be used to duplicate a database onto a new version using existing processing configurations.
For instance, if GoldenGate is used to replicate data between a 10g database and another 10g database, an upgrade is as simple as stopping replication to the target system temporarily, upgrading to 11g, then starting replication again. GoldenGate operates outside of the data dictionary and can seamlessly work between these versions. The same goes for moving to another server or even to another operating system.
Rolling Development into ProductionMultiple extracts and replicats can be easily configured to pull development data and merge it with production tables. Alternatively development data can be pulled from multiple locations, or development and production data can be pulled from their respective databases and merged into a new environment. Whichever way you choose, the configurations for the extracts and replicats can be re-used to perform these operations on a regular basis.
Migrating 3rd Party Utility DatabasesSome 3rd party utilities only allow certain database platforms. If you have a 3rd party application which is only designed to run on MySQL or SQL Server for instance, the data from this system can be send to an Oracle database for reporting or historical record keeping. On some 3rd party utilities it may even be possible to integrate data from other systems into the 3rd party database. With Java connectors, it is even possible to load data into NoSQL options like MongoDB or Riak with the right coding.
Using Business Data in SharePointMany companies have started using SharePoint or other portal tools for their corporate infrastructure. Using GoldenGate, information from financial, customer, HR, or other databases can easily be integrated into the company portal no matter which server houses the SharePoint database.
Trouble Ticketing Systems and Monitoring ToolsUsing GoldenGate, it is possible to migrate data from trouble ticketing systems over to monitoring environments. For instance, if your company uses a ticketing tool to receive trouble tickets on a customer-facing application and a monitoring environment for IT personnel, GoldenGate can act as a messaging system by copying ticket information into the monitoring environment. Alternatively the flow could be reversed, and monitoring information could be used to supply extra information on tickets.
Master Data Management (MDM)The central component to many distributed systems, MDM is becoming more and more popular in large corporate environments. MDM solutions are central databases which hold key corporate data in an attempt to always ensure accurate information across distributed environments. For instance, if a company makes use of five different databases and applications which always requires information about employees, an MDM environment could be a central aggregation point for employee data which can then be used in the external environments.
In these types of setups, GoldenGate can be used both for supplying MDM data into other databases and for loading MDM data. Data pumps from the MDM central repository can spread information to all subscribing databases, and applicable information from databases in the enterprise can ship information to MDM for central storage.
ConclusionOracle GoldenGate 11g is poised to make big changes in the way data is moved between environments. While replication has existed for a long time, GoldenGate is the first highly available real-time replication tool with full heterogeneous support across many platforms. With the extensible extract/replicat design it is easy to deploy and reuse on many different systems for many needs. While there are many other viable replication options out there, Oracle GoldenGate is a very strong contender for enterprise usage.
Note: This article originally appeared in IOUG SELECT Journal Q4 2011.
The post Oracle GoldenGate in the Enterprise appeared first on Steve Karam :: The Oracle Alchemist.
Add History and Tab Completion to SQLPlus

In the uncharted backwaters of the Internet there exists a really awesome tool called rlwrap. This readline wrapper is able to call programs that are deficient in modern amenities like history, color, and tab completion, intercept input/output, and replace it as necessary with useful goodies. And what program could be more deficient than SQLPlus (sorry Tanel)?
With rlwrap you can instantly have history in SQLPlus by pressing the ‘up’ key. Not only that, but the history is remembered even if you exit and come back in later. Tab completion is also possible with a dictionary file, which you can easily populate with your favorite collection of tables, packages, etc. If you’re like me and for whatever reason have never taken to graphical tools but hate the missing features in SQLPlus, this tool is definitely a necessity. So let’s get it installed!
Download and InstallI was able to find an RPM of rlwrap on pbone for EL6 x86_64. You can download it directly here if that is what you are running, or go to this page if you need a different release.
After downloading to the server, I just had to do an RPM install:
[root@orahost ~]# rpm -Uvh rlwrap-0.37-1.el6.x86_64.rpm warning: rlwrap-0.37-1.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY Preparing... ########################################### [100%] 1:rlwrap ########################################### [100%]Trying it with SQLPlus
Out of excitement I had to try it right away just to see what it could do. Be aware, rlwrap only works with interactive command line so I don’t recommend trying to use it in scripts. Instead, I ran the simple command: rlwrap sqlplus hr/hr
[oracle@orahost ~]$ rlwrap sqlplus hr/hr SQL*Plus: Release 11.2.0.1.0 Production on Tue Mar 5 20:31:11 2013 Copyright (c) 1982, 2009, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL>
I was greeted with the familiar prompt and no indication anything was different. But when I typed a command (select * from departments;) and ran it, then pressed up, my previous command appeared. Unfortunately I can’t really show you this easily so you’ll just have to imagine it. Press up on your keyboard. Pretend it showed your last SQLPlus command. Be amazed.
Adding a DictionaryBut now we’re going to do something even better: add a dictionary so we can use tab completion. I generated the dictionary with the following script:
#!/bin/bash sqlplus -s hr/hr << ! > sqlplus.dict set head off pages 0 linesize 150 echo off feedback off verify off heading off select object_name from all_objects where object_type in ( 'TABLE', 'VIEW', 'PACKAGE', 'PROCEDURE', 'FUNCTION' ); !
You can have this script login as whoever you want (a DBA user would give you every table, package, procedure, and function in the entire DB). When it was finished, I had a nice sqlplus.dict file which contained everything I wanted tab completion to recognize.
We can fire up rlwrap again, this time with the -i flag which tells rlwrap to ignore case for tab complete, and the -f flag which tells it where the dictionary file is.
[oracle@orahost ~]$ rlwrap -if sqlplus.dict sqlplus hr/hr SQL*Plus: Release 11.2.0.1.0 Production on Tue Mar 5 21:54:15 2013 Copyright (c) 1982, 2009, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> select * from USER_TAB USER_TABLES USER_TAB_COL_STATISTICS USER_TAB_MODIFICATIONS USER_TAB_PRIVS_MADE USER_TAB_STAT_PREFS USER_TABLESPACES USER_TAB_COMMENTS USER_TAB_PARTITIONS USER_TAB_PRIVS_RECD USER_TAB_SUBPARTITIONS USER_TAB_COLS USER_TAB_HISTGRM_PENDING_STATS USER_TAB_PENDING_STATS USER_TAB_STATISTICS USER_TAB_COLUMNS USER_TAB_HISTOGRAMS USER_TAB_PRIVS USER_TAB_STATS_HISTORY SQL> select * from USER_TABLES
When I typed user_tab, I pressed tab three times and it showed me my available choices. Adding the letters le and pressing tab again completed it to USER_TABLES. If there are too many choices you get the familiar “Show all?” message from bash:
SQL> select * from ALL_ Display all 403 possibilities? (y or n)
One important note, it seems the default behavior for CTRL+C is changed when using rlwrap. You have to press CTRL+C then press Enter to cancel your command. Worth it!
ConclusionNeedless to say I instantly decided that I was in love with rlwrap and that it was going to retain a permanent position in my toolset. So I made an alias for it:
alias sql='rlwrap -if ~oracle/sqlplus.dict -pgreen sqlplus'
Now I can type the ‘sql’ command followed by any sqlplus arguments and it will automatically load the dictionary file for me and load up rlwrap. You can name the alias anything you want: sqlplusplus, sqlplussest, omgsql, it doesn’t matter. Just don’t overwrite your actual sqlplus command with the alias, because you will need the original for scripts where rlwrapper does not play nice.
One other thing you may have noticed is the -pgreen. I thought SQLPlus needed a smidgeon of color.
Have fun!
The post Add History and Tab Completion to SQLPlus appeared first on Steve Karam :: The Oracle Alchemist.
Getting Busy with Riak
This entry is part 2 of 2 in the series Database DiversityWelcome back to Database Diversity, where the queries are made up and the relationships don’t matter. Today we’re going to talk about a simple yet insanely powerful NoSQL Database called Riak. While its core functionality is a key/value datastore, Riak is capable of so much more: full text querying, MapReduce, and complex clusters. In our examples, we will install Riak on Oracle Enterprise Linux 6.3 and use it to store some data then retrieve and delete it. This basic operation is very easy; in fact, it would be boring to show you from the command line because it is well documented and involves some simple curl commands. Instead, we’re going to write, read, and delete via PL/SQL functions.
But first, let’s talk about what makes Riak so cool. Even if you don’t know how to use it, you should know the features; with 25% of the Fortune 50 using it, you never know when you might get asked to support it.
Riak the RESTfulRiak has a full HTTP 1.1 compliant API. This means that you can perform read, write, delete, and query operations using standard PUT (POST), GET, and DELETE operations. As such, curl is a great tool for working from the command line but any tool that can make HTTP calls will work.
Riak the BountifulOn top of the standard HTTP API, it also has APIs for Java, PHP, Python, Ruby, JQuery, node.js, and C/C++. The language of choice is Erlang; after all, Riak was written in it. Between the RESTful API and other programming language support, Riak can be used from nearly anywhere.
Riak the ScalableA huge selling point for Riak is that it is very scalable. Adding a node to a cluster is as simple as your first install, and doing so gives near linear performance on complex functions and with loads of concurrency. The data replication allows very robust fault tolerance (many failed nodes will still allow full access) and its distribution algorithms yield predictable results from even distribution.
Riak the Free
Riak is open source, though the parent company Basho does offer paid options. Riak Enterprise is offered for multi-datacenter replication, monitoring, and support. For storage, Riak Cloud Storage offers an S3 API compatible storage engine with replication capabilities. If you don’t need these services, you can scale Riak to your heart’s content for nothing.
Now that I’ve sung its praises, let’s get down to the install. I have a simple Oracle Enterprise Linux 6.3 x86_64 VM I spun up for the purpose using a standard server install. In order to install Riak, we must:
- Install the expect library
- Download riak 1.3 from AWS
- Install riak via rpm
- Set open file limit to 4096
- Start Riak
[root@orahost ~]# yum install expect ... Installed: expect.x86_64 0:5.44.1.15-4.el6 [root@orahost ~]# wget http://s3.amazonaws.com/downloads.basho.com/riak/1.3/1.3.0/rhel/6/riak-1.3.0-1.el6.x86_64.rpm --2013-03-03 19:01:08-- http://s3.amazonaws.com/downloads.basho.com/riak/1.3/1.3.0/rhel/6/riak-1.3.0-1.el6.x86_64.rpm Resolving s3.amazonaws.com... 72.21.195.97 Connecting to s3.amazonaws.com|72.21.195.97|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 23875120 (23M) [application/x-rpm] Saving to: “riak-1.3.0-1.el6.x86_64.rpm” 100%[============================================================================================================================>] 23,875,120 6.64M/s in 4.0s 2013-03-03 19:01:12 (5.64 MB/s) - “riak-1.3.0-1.el6.x86_64.rpm” saved [23875120/23875120] [root@orahost ~]# rpm -Uvh riak-1.3.0-1.el6.x86_64.rpm Preparing... ########################################### [100%] 1:riak ########################################### [100%] root@orahost ~]# ulimit -n 4096 [root@orahost ~]# riak start Attempting to restart script through sudo -H -u riak
And we are done and installed. At this point Riak is fully usable for reads and writes.
How Riak Storage WorksRiak data is stored inside of buckets, a flat namespace you can think of like a table. Buckets are unique in a Riak installation and can store oodles of key/value pairs. Keys are like primary keys, unique within a bucket. Values are where the magic happens: they can be anything. Strings, XML, JSON, binary, you name it. You can use it to store session data (login, expiry, preferences, etc), log files, backups, large XML files, whatever. The power to quickly store and retrieve data of any kind is where Riak shines.
Now, when an Oracle or other RDBMS person looks at this they may wonder why it’s so appealing; after all, you can make a table with a KEY, VALUE column combination and use that for the same purpose. But the strength of Riak is in its scalability and flexibility. It’s not meant to take the place of a relational database but it can be a great tool alongside one (or on its own).
Let’s PUT in Some Data!In order to write to Riak, we write to a URL in the form of: http://hostname:8098/buckets/bucketname/keys/key (note 8098 is the default port). The body of the HTTP PUT request will be the data to store into the bucket with the key you specified. For instance, to write into a bucket called ‘logdata’ on localhost with a numeric key, you could use: http://localhost:8098/buckets/logdata/keys/1.
There are a few required/recommended parameters, but the ones we’ll concern ourselves with here are Content-Type and Content-Length. And to do it, we will use the UTL_HTTP package built into Oracle.
Before we can do that, we have to connect to Oracle as SYS and setup Access Control Lists (ACL):
BEGIN
dbms_network_acl_admin.create_acl (
acl => 'riak.xml',
description => 'Permissions to access risk datastore',
principal => 'RIAK_USR',
is_grant => TRUE,
privilege => 'connect'
);
end;
/
begin
dbms_network_acl_admin.assign_acl (
acl => 'riak.xml',
host => '127.0.0.1'
);
end;
/In the two PL/SQL blocks above, we are creating an ACL called riak.xml which gives the RIAK_USR user the ability to connect to host 127.0.0.1. Now that it’s done, we can run UTL_HTTP operations against localhost.
Here is the function I’ve created to do this:
create or replace function testput(pv_bucket in varchar2, pv_key in varchar2, pv_data in varchar2)
return varchar2
is
lv_req utl_http.req;
lv_rsp utl_http.resp;
lv_out varchar2(32767);
begin
lv_req := utl_http.begin_request(url => 'http://127.0.0.1:8098/buckets/' || pv_bucket || '/keys/' || pv_key,
method => 'PUT');
utl_http.set_header(lv_req, 'Content-Type', 'text/plain');
utl_http.set_header(lv_req, 'Content-Length', length(pv_data));
utl_http.write_text(lv_req, pv_data);
lv_rsp := utl_http.get_response(lv_req);
lv_out := lv_rsp.status_code;
utl_http.end_response(lv_rsp);
return lv_out;
exception when utl_http.end_of_body then
utl_http.end_response(lv_rsp);
return lv_out;
end;
/In this function, you can pass in a bucket name, key, and value. The function opens a PUT request, sets the headers, and writes the body with the utl_http.write_text procedure. The response that is returned is the status code from riak (204 is normal for inserts).
SQL> select testput('newbucket', '1', 'My test document') from dual;
TESTPUT('NEWBUCKET','1','MYTESTDOCUMENT')
--------------------------------------------------------------------------------
204
GET It?
Using a GET operation I can pull data by key, like with this function:
create or replace function testget(pv_bucket in varchar2, pv_key in varchar2)
return varchar2
is
lv_req utl_http.req;
lv_rsp utl_http.resp;
lv_out varchar2(32767);
begin
lv_req := utl_http.begin_request(url => 'http://127.0.0.1:8098/buckets/' || pv_bucket || '/keys/' || pv_key,
method => 'GET');
lv_rsp := utl_http.get_response(lv_req);
utl_http.read_text(lv_rsp, lv_out);
utl_http.end_response(lv_rsp);
return lv_out;
exception when utl_http.end_of_body then
utl_http.end_response(lv_rsp);
return lv_out;
end;
/This function looks an awful lot like the PUT function. It retrieves data from the Riak database using a bucket and key with a GET operation. The body of the returned page is returned from the function.
SQL> select testget('newbucket', '1') from dual;
TESTGET('NEWBUCKET','1')
--------------------------------------------------------------------------------
My test documentAnd to make sure we cover it, let’s delete that row (and verify).
SQL> select testdelete('newbucket', '1') from dual;
TESTDELETE('NEWBUCKET','1')
--------------------------------------------------------------------------------
204
SQL> select testget('newbucket', '1') from dual;
TESTGET('NEWBUCKET','1')
--------------------------------------------------------------------------------
not foundIf you guessed we did this with an HTTP DELETE operation, you’d be right:
create or replace function testdelete(pv_bucket in varchar2, pv_key in varchar2)
return varchar2
is
lv_req utl_http.req;
lv_rsp utl_http.resp;
lv_out varchar2(4000);
begin
lv_req := utl_http.begin_request(url => 'http://127.0.0.1:8098/buckets/' || pv_bucket || '/keys/' || pv_key,
method => 'DELETE');
lv_rsp := utl_http.get_response(lv_req);
lv_out := lv_rsp.status_code;
utl_http.end_response(lv_rsp);
return lv_out;
exception when utl_http.end_of_body then
utl_http.end_response(lv_rsp);
return lv_out;
end;
/With those three functions, we can store (and update), retrieve, and delete any data we want. You can store VARCHAR2 and CLOB data, BLOBs as binary data, XML from DBMS_XMLGEN, JSON using the Alexandria Library, and really anything else you can think of. This data can be used for programming logic (like session data storage) or for later analysis with MapReduce.
And because they look so much alike, it would probably be really easy to put them in a nice Riak helper package. A project for another day, perhaps.
ConclusionJust key/value storage to a free, highly distributed and redundant database is useful and impressive. But as I’ve said, there’s a lot more you can do with Riak. For instance:
- Riak Search allows you to perform full text searches across the values of a bucket. This is an indexed search which is not quite as powerful as a MapReduce, but performs better and more predictably.
- Secondary Indexes (2i) allow you to pick and choose what will be indexed and how. You can define tags for your data which allow you to search it on your terms; for instance, for a bucket full of emails, you can add tags for twitter and blog addresses, then search on those.
- Links which allow one way relationships between objects that can be walked to form loose graph-like relationships.
- MapReduce is the big kahuna. As with most systems, you can do a Map to gather a single set of data or use Reduce functions to further aggregate the info. Riak supports MapReduce functions in JavaScript and Erlang.
I plan on publishing some MapReduce examples in the near future, but for now I value my sanity and wish to go on pretending GROUP BY CUBE is the best aggregating mechanism since they fired up the hazelnut chocolate mixer at the Nutella plant.
The post Getting Busy with Riak appeared first on Steve Karam :: The Oracle Alchemist.
A Few Words on Oracle Licenses

Despite reams of legalese and countless blog/forum posts, basic Oracle RDBMS licensing still remains a mystery to many. While I won’t even begin to discuss components like Oracle Apps, hopefully this post may help you attain a basic understanding of Oracle, what you can license, and what it costs.
EditionsThe first and most important thing to learn are the editions of Oracle RDBMS. These editions are:
- Oracle Express Edition (XE) – A free use version of Oracle which can use a single CPU (though it can be installed on a server with as many CPUs as you want), 1GB RAM, and 4GB data.
- Oracle Personal Edition (PE) – Single User, Single Machine development/deployment license which can use any SE1, SE, or EE feature with the exception of RAC and Management Packs.
- Oracle Enterprise Edition (EE) – Full featured edition of Oracle RDBMS which allows all core features along with the option to purchase add-on features (like Partitioning) and Management Packs (like the Diagnostics Pack).
- Oracle Standard Edition (SE) – A damped down of edition of Oracle which can run on a server with a maximum of four CPU sockets. It cannot take advantage of add-ons or Management Packs. It does, however, include Oracle RAC as long as all nodes combined have no more than 4 sockets.
- Oracle Standard Edition One (SE1) – A further reduced edition which shares the same features with SE but is limited to 2 CPU Sockets with no RAC option.
On top of knowing the editions, you also need to know about the type of license. The most common is a Processor based license, where your license count is based on the number of CPU Sockets/Cores and the CPU Type for EE or the number of CPU Sockets for SE and SE1. The other type is a Named User License, where you buy a number of seats on the database instead of licensing it for unlimited users. Please remember if you are pursuing this type of license: non-human processes are also named users. So any application server, script, or even a scheduled job is a ‘named user’.
Enterprise Edition Core FactorFor Enterprise Edition and all of its add-ons and management packs, there is a concept called “core factor” which is applied for licensing purposes. This factor is based on the type of CPU you are using in your server. For instance, if you are using a SUN M5000 with SPARC VII processors, the core factor is 0.75. That means 4 Quad-Core CPUs (16 cores) would equal 16 * 0.75 = 12 licenses. You would purchase 12 EE licenses to cover the server, and then purchase 12 licenses of whatever add-ons or management packs you might be using.
If, on the other hand, you used SPARC VII+ processors, the core factor is 0.5. The same CPU setup would only require 8 licenses in that case.
For the most part, x86/x86_64 architecture CPUs are a 0.5 core factor. A small number of processors are 0.25, but there is a reason: they are not very suitable for running Oracle, generally System On A Chip (SOC) solutions with a very high amount of hyperthreading.
You can find the full core factor table here.
Special CasesTwo types of databases are exempt from licensing restrictions: RMAN Catalogs and Grid/Cloud Control Repositories. If one of these repositories is the only database on a server, that server does not have to be licensed. But if the server or those databases are used for any custom or other Oracle purpose, they must be licensed. Generally speaking it can be beneficial to put these databases on the same server. See this page under “Infrastructure Repository Databases” for details.
Add-Ons and Management PacksLet’s get one thing out of the way: Standard Edition and Standard Edition One CAN NOT use any add-on or Management Pack features (except RAC for SE). That means you can’t even buy Diagnostics Pack for Oracle SE. The option does not exist. Based on licensing requirements, AWR/ASH/ADDM are completely off limits on an SE or SE1 database.
However, Enterprise Edition has options for add-ons like RAC, Partitioning, and Active DataGuard. These add-ons are extra cost and apply to any database on which they are used. For instance, if you want to use Enterprise Edition with Partitioning and RAC across four nodes, each of the four nodes must be fully licensed for EE, Partitioning, and RAC. If Oracle is running on it, you must license it.
MisconceptionsYour standby databases need to be licensed. I’ve heard a lot of mixed opinions on DataGuard and whether your standby database must be licensed. The definitive answer is YES. A DataGuard standby is running Oracle (in MOUNT mode), which means it must be fully licensed for all features.
This goes for Development, QA, and Testing servers as well. This one has a lot of misinformation surrounding it. While you can install Oracle on your PC and use it for your own development/deployment purposes with Oracle Personal Edition, you cannot have a dedicated production-cycle development or QA server without a license. If it is part of the development cycle, then it must be paid. The only way you can get around that is if it is single-machine, single-user, for personal development.
One other big misconception is Oracle on a virtual server, like an ESX environment. Oracle does not recognize soft partitioning of CPUs. If you host Oracle on a VM in an ESX environment, you must license the entire ESX environment. (Note, this article mentions that it is possible to work with Oracle on this).
Cost CasesSo let’s take a look at some costs. The license costs are:
- Enterprise Edition – $47,500 per unit (sockets * cores per socket * core factor)
- Standard Edition – $17,500 per unit (sockets)
- Standard Edition One – $5,800 per unit (sockets)
You can find the full cost list on Oracle’s site, which includes all licenses types, add-ons, and management package.
2 Node RAC + DataGuard + AWR/ASH
In this example we are going to set up a 2 node RAC cluster with DataGuard to a standby 2 node RAC cluster. We also want AWR/ASH for diagnostics. Each server (4 total) will have 2 Intel Xeon X7560 processors (8 core).
Our costs for Enterprise Edition would be (in list prices):
- 4 servers * 2 sockets per server * 8 cores per socket * 0.5 core factor = 32 units
- 32 Units Enterprise Edition ($47,500ea) = $1,520,000
- 32 Units RAC ($23,000ea) = $736,000
- 32 Units Diagnostics Pack ($5,000ea) = $160,000
Our grand total for this setup in EE would be $2,416,000. Hopefully you have a good discount!
Now, if we wanted to do the same thing in Standard Edition, we would have to make some exceptions. DataGuard does not work on SE, so we would have to use a third party software like DBVisit or homegrown scripts. Also, AWR/ASH don’t work in SE and can’t be added, so you might have to fall back to Statspack and adapt with excellent tools like Tanel Poder’s Snapper. In return for these trade-offs, let’s look at pricing:
- 4 servers * 2 sockets per server = 8 units
- 8 units Standard Edition ($17,500ea) = $140,000
The total price for two 2-node RAC clusters which 2 sockets per server (you can’t have more than 4 sockets in a single SE RAC cluster) is $140,000. Quite a big savings, and definitely enough leftover to buy some extra utilities.
Production, QA, and Development with Partitioning and AWR
In this example we’ll have three databases which must be licensed as they are part of the development production cycle. Each server is a SUN M5000 with 4 Quad-Core SPARC VII processors.
Enterprise Edition:
- 3 servers * 4 sockets per server * 4 cores per socket * 0.75 core factor = 36 units
- 36 units Enterprise Edition ($47,500ea) = $1,710,000
- 36 units Partitioning ($11,500ea) = $414,000
- 36 units Diagnostics Pack ($5,000ea) = $180,000
The grand total is $2,304,000.
If we were able to forego Partitioning and Diagnostics Pack, the requirement in Standard Edition would be:
- 3 servers * 4 sockets per server = 12 units
- 12 units Standard Edition ($17,500ea) = $210,000
Remember, a lot of this is up to you and your sales rep or third party reseller. Make sure you work out all the details and understand exactly what you can and can’t deploy. It is highly recommended that you keep an eye out for license use; the last thing you want is a licensing audit that you are not prepared for, with database installations ranging all over the environment. This paragraph also servers as a DISCLAIMER: This post is based on my personal knowledge and references available online and is not a substitute for actual information from Oracle Corporation. Please check with your sales rep on all licensing questions.
As you can see in these prices, it is VERY important that you buy what you need and nothing more. Instead of going the whole hog and getting Active DataGuard, Diagnostics and Tuning Packs, Partitioning, etc. on every single server, really take a look and see what your requirements are. You can save a huge amount of money that way.
Also note that there are special types of licensing structures out there such as site licenses. These are special deals worked out with Oracle in order to license your entire enterprise for a certain period, and can often save you a lot of money on licensing depending on your plans. If you decide to pursue this kind of agreement with Oracle, it is important to really work with your team to decide what you need and what you don’t, as well as anticipated business requirements for the near and long-term future.
The post A Few Words on Oracle Licenses appeared first on Steve Karam :: The Oracle Alchemist.
Deploying Outside the Box
This entry is part 4 of 4 in the series Grow Your Career- The best IT career book I’ve ever owned
- DBA Team, Front and Center
- DBA, Grow Thyself – Moving and Shaking in the Era of Data Dominance
- Deploying Outside the Box
Let’s face it: DBAs are frequently seen as a bottleneck. By some, they are even seen as a part of the bureaucracy, the shady organization seemingly designed to stop things from getting done. But to be fair, this is generally due to our central role in the application stack. Developers depend on us for a sane environment or in most cases, many sane environments. On the flip side, DBAs depend upon the system administrators for servers, storage, operating system, etc. But more importantly, DBAs are often victims of their own requirements: fault tolerance, reliable backups, performance, scalability. Giving these requirements up is out of the question; hence the bottleneck. The answer is not to reduce our requirements, but do them smarter.
In the rest of this post, we will look at a few new(ish) technologies which can help us accomplish that goal.
DelphixHave you ever felt embarrassed when you tell a developer or project manager that you will need a few days to deploy a new QA environment? Or that you really have to think about refreshes because they’re just so time consuming and complex? I hate being in that position, because it’s a position that is generally a result of being on an inflexible platform. And as mainstream and comfortable as it is, the standard “box plugged into a SAN” is a highly inflexible platform most of the time.
Delphix is a software appliance which uses a concept called “thin provisioning” to quickly deploy and refresh QA, development, reporting, testing, etc. environments. It currently works with Oracle and MS SQL Server, and this is what it does: it uses ZFS concepts and proprietary storage and compression logic to near-instantly deploy read/write database environments from any configured source at any stored point in time with almost no storage footprint on the target side.
To accomplish this, source databases are configured in Delphix and cloned to a disk area attached to the Delphix host. The data is compressed, filtered, and stored in DxFS (Delphix File System) where they can take advantage of extremely efficient caching for concurrency and block mapping for versioning. Change records are also shipped from the source to Delphix, and Delphix keeps a running history of changes to each source in order to enable point in time deployments.
A target is any system running Oracle. The target environment hooks into Delphix and, upon deployment, creates an instance and points to the source data. Delphix presents the point in time source data as datafiles to the target which can be opened and read. Writes are made to a dedicated area on the Delphix host which is versioned over the source data via block mapping. The result is as many read/write environments as you need as quickly as you need them with minimal disk requirements. In the end you save money on disk space, money on man hours, and time to deployment which eliminates the DBA as a bottleneck on critical development environments.
Honestly, this technology sounds almost too good to be true. Imagine being able to create unlimited QA servers, instant reporting databases, and daily refreshed development databases on the PC of every developer. If you want to see it in action, have a look at this video. You can also get a more comprehensive (and most likely better explained) explanation of technical details here.
VCE Vblock SystemAs an Oracle-centric kind of guy and a huge fan of Oracle products and tools, you would think I’d be talking about Exadata here. And Exadata is an amazing ‘engineered system’, capable of mind blowing batch performance, cluster configurations, and more. No doubt there will be posts about Exadata at a later time, but this one focuses on a technology called Vblock because of a major factor: it is application neutral.
You can run nearly anything on Vblock, from Windows to Linux to Oracle to Apache to…well, iTunes. Likewise you get to choose how your disk is configured (RAID level), where you want virtualization, and how you want to manage backups. Behind the scenes is a robust compute/network/storage stack which takes advantage of massive parallelization and a mix of disk resources (Flash, SAS, SATA) with caching mechanisms that dynamically tier data where it needs to be based on your working data set. And most importantly, they claim to provide a “5 nines solution”, which means it has a 99.999% uptime or 5 minutes per year.
My real attraction to this type of system is that it gives DBAs a critical tool they need: a pool. A Vblock System would be ideal to setup as a Database Farm, where DBAs can monitor performance, fault tolerance, and capacity in a large node based environment and quickly create new systems on demand. Instead of requiring new quotes, POs, and management processes every time a new database is required this solution would allow huge flexibility in deployments while providing outstanding compute performance and fault tolerance on the backend. This would include any database or version, clustering or standalone, virtual (VMWare is a collaborator on the Vblock environment) or physical system.
Every day new innovations are being made in CPU and network speeds (less so with disk). New solutions which incorporate many of these solutions into a flexible and heterogenous architecture are going to appeal greatly to businesses and forward thinking IT groups.
DBVisitDeploying new environments is incredibly important, and with the two technologies I just mentioned it can become a trivial exercise. But at the same time, there is still the need at times for traditional replication or standby databases. Oracle provides technologies like DataGuard and Streams, but the former requires Enterprise Edition (sometimes not cost effective) and the latter can be unnecessarily complex. Software like Golden Gate is incredibly flexible, but also fairly expensive for what it does.
DBVisit Standby is a third party disaster recovery software which can either replace or augment Oracle DataGuard. DBVisit Replicate can replicate data between Oracle, MySQL, and SQL Server, and provides a very user friendly interface to get the work done. To me, the coolest part of the software is that it can emulate and in some cases beat the features of Enterprise Edition, even on a Standard Edition database.
Oracle Enterprise Edition costs $47,500 per unit, and the number of units required is based on your CPU cores times a core factor. If you are running a standby database, both your production server and standby server must be fully licensed. So if we have two Intel x3960 boxes with 2 CPUs with 8 cores each, we would pay a list price of $47,500 * (2 * 8 * 0.5) * 2, which is $760,000. Even with a significant discount that is a significant amount of money. Standard Edition, on the other hand, is licensed by the socket at $17,500 per socket for a total of $17,500 * 2 * 2, which is $70,000. If you can live without Enterprise features and add-ons like AWR/ASH, this is a hugely significant cost savings. DBVisit Standby could be used to fill the disaster recovery gap here and then some; for instance, a new feature of DBVisit called Standby Database Synchronize which can recover a standby even in cases where NOLOGGING was used, or where archive logs are lost or behind. Even with Enterprise Edition, that is a tempting feature.
Simplicity is a big part of the decision as well. If you have Enterprise Edition, you can definitely use Streams if you wish for replication purposes. However, DBVisit is extremely easy to configure and use. A few weeks ago I made a (tongue in cheek) comparison of replication platforms:

Though GoldenGate is extraordinarily powerful, it is also a large and costly centralized environment. DBVisit is quick; I can pretty much guarantee you’ll have replication flowing (perhaps in many directions) within the day.
ConclusionThere are tons of new technologies coming out from databases to hardware to network to handheld devices. All too often, the DBA will learn about these technologies from management or other groups when it would behoove them to be the flag bearers for the new tech at their company. Keep an eye out for the real movers and shakers: the tech that makes it easier and safer to deploy quality systems.
Errata: Upon review, Streams is not an Enterprise Edition feature. It can be used in SE and SE1 as well.
The post Deploying Outside the Box appeared first on Steve Karam :: The Oracle Alchemist.
Start Using MongoDB
This entry is part 1 of 2 in the series Database DiversityGuten Tag, Namaste, Hola, Zdravstvuite! While you should always work to master your first language, a simple understanding of other cultures and their languages is always beneficial. In these days of social/career networks and rising global connectedness, being open to diversity is an absolute must.
This is true in the IT world as well. As I mentioned in my article “DBA, Grow Thyself – Moving and Shaking in the Era of Data Dominance” new technologies will happen and the last thing you want is to be seen as either ignorant or obstinate to its potential. While I never advocate senseless database migrations or platform changes, with the incredible diversity available in the data world today learning to at least say “hello” in a variety of environments can only be beneficial.
So for the near future, I will be doing a “Database Diversity” series (along with other posts) where we perform basic operations in a variety of database environments. The basic operations (if applicable) will be: install, startup, create a table, insert rows, create an index, and run a query. Again I stress “if applicable”, as these operations won’t always exist in every type of database. For today, our topic is…
MongoDBMongoDB is a well-known and popular NoSQL option created by 10gen. It is an open source solution designed to be highly scalable and available with robust options for data replication, processing via built in query operations and custom MapReduce jobs, and document storage. Data in Mongo is document oriented, meaning there are no tables per se but collections of documents, each represented in JSON notation. This makes it an extremely powerful tool for storing dynamic data and working directly with most application development frameworks. It is extensible as well; frameworks like Meteor combine the powerful features of node.js with MongoDB as the default storage engine, and there are even tools out there to convert SQL queries to work with it.
But like it or not, MongoDB is not an RDBMS. In today’s Database Diversity, we will download and install Mongo, create a collection, insert some data, create an index, and query our collection.
Installing MongoDB
MongoDB can be installed on most Linux repos using their respective software installation tools. On RHEL/CentOS/OEL, you can add the 10gen repositories to yum and install it like any other RPM. For Debian/Ubuntu, you can add the apt repo and install via apt-get.
Personally I prefer to download the binaries for my platform and install in the location of my choosing. If you have root privileges, you can choose to put the binaries into a central location like /usr/local/bin. If not, you can download and run mongod (the MongoDB daemon) from any location as any user for very fast deployment and flexibility. The overall download page can be found here, but on a Linux x86_64 platform you don’t even have to go that far. Here I am downloading the latest x86_64 MongoDB and extracting it:
steve@UbuntuVM:~/mongo$ pwd /home/steve/mongo steve@UbuntuVM:~/mongo$ wget http://downloads.mongodb.org/linux/mongodb-linux-x86_64-latest.tgz --2013-02-25 08:58:56-- http://downloads.mongodb.org/linux/mongodb-linux-x86_64-latest.tgz Resolving downloads.mongodb.org (downloads.mongodb.org)... 72.21.215.171 Connecting to downloads.mongodb.org (downloads.mongodb.org)|72.21.215.171|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 94187936 (90M) [application/x-tar] Saving to: `mongodb-linux-x86_64-latest.tgz' 100%[======================================================================================================================================>] 94,187,936 7.12M/s in 13s 2013-02-25 08:59:10 (6.76 MB/s) - `mongodb-linux-x86_64-latest.tgz' saved [94187936/94187936] steve@UbuntuVM:~/mongo$ tar -zxvf mongodb-linux-x86_64-latest.tgz mongodb-linux-x86_64-2013-02-24/README mongodb-linux-x86_64-2013-02-24/THIRD-PARTY-NOTICES mongodb-linux-x86_64-2013-02-24/GNU-AGPL-3.0 mongodb-linux-x86_64-2013-02-24/bin/mongodump mongodb-linux-x86_64-2013-02-24/bin/mongorestore mongodb-linux-x86_64-2013-02-24/bin/mongoexport mongodb-linux-x86_64-2013-02-24/bin/mongoimport mongodb-linux-x86_64-2013-02-24/bin/mongostat mongodb-linux-x86_64-2013-02-24/bin/mongotop mongodb-linux-x86_64-2013-02-24/bin/mongooplog mongodb-linux-x86_64-2013-02-24/bin/mongofiles mongodb-linux-x86_64-2013-02-24/bin/bsondump mongodb-linux-x86_64-2013-02-24/bin/mongoperf mongodb-linux-x86_64-2013-02-24/bin/mongosniff mongodb-linux-x86_64-2013-02-24/bin/mongod mongodb-linux-x86_64-2013-02-24/bin/mongos mongodb-linux-x86_64-2013-02-24/bin/mongo steve@UbuntuVM:~/mongo$ mv mongodb-linux-x86_64-2013-02-24/ mongo/
Normally, another step in a default installation is to create the /data/db directory, which is the default location for MongoDB file storage. However, as we are running this in a non-root setup for this tutorial, we will change the default location. To start the MongoDB daemon, the “mongod” command is used:
steve@UbuntuVM:~/mongo$ pwd /home/steve/mongo steve@UbuntuVM:~/mongo$ mkdir -p data logs steve@UbuntuVM:~/mongo$ mongo/bin/mongod --fork --dbpath /home/steve/mongo/data --logpath /home/steve/mongo/logs/mongoDB.log about to fork child process, waiting until server is ready for connections. forked process: 16471 all output going to: /home/steve/mongo/logs/mongoDB.log child process started successfully, parent exiting
The --dbpath option allows a custom data directory to be used (in our case, /home/steve/mongo/data). The --fork option tells MongoDB to run as a background daemon. If this command is not used, MongoDB will run in the foreground and a separate window must be used for other operations. However, if you do choose to use the --fork option, you must either supply the --syslog option (log MongoDB operations to syslog) or --logpath with your own custom logfile path and name. The --port command is also an option, but if unset the default connection port is 27017.
Now that we have installed MongoDB, we will attempt a connection via the CLI:
steve@UbuntuVM:~/mongo$ mongo/bin/mongo MongoDB shell version: 2.4.0-rc1-pre- connecting to: test Welcome to the MongoDB shell. For interactive help, type "help". For more comprehensive documentation, see http://docs.mongodb.org/ Questions? Try the support group http://groups.google.com/group/mongodb-user >
If we want to see which database we are currently connected to, we can issue the “db” command. You can also switch databases with the “use” command:
> db test > use mydb switched to db mydb
Note that the “mydb” database was never created and even though I’ve switched to it, doesn’t actually exist. You can switch to any database but until you create a collection inside it, it doesn’t actually exist except on your screen. However, if you want to see existing databases, you can use the “show dbs” command.
Creating a Collection and Inserting Data
I put these two topics in the same area because in MongoDB they are actually one and the same. There are no pre-defined tables in MongoDB but collections, a loose gathering of documents that are meant to be logically related in some way. For the sake of learning, you can consider a collection like a table, a document like a row, and a key:value pair like a column. But given the dynamic nature of MongoDB it can almost be dangerous to do so. For instance, in MongoDB a collection can have documents which share no keys in common (except the unique identifier _id key). You are not constrained by specifics and can freely dump nearly any data of any type into any collection. As you can imagine, this can be both good and bad depending on your development process, framework, and standards.
Here is an example MongoDB document:
{
name: 'Steve Karam'
database: 'MongoDB'
nickname: 'Mongo Alchemist'
really: false
}
But then here is an example of another document that can be part of the same collection:
{
name: { first: 'Steve', last: 'Karam' },
contact: [
{ type: 'blog', val: 'http://www.oraclealchemist.com' },
{ type: 'twitter', val: '@OracleAlchemist' } ]
}
This loose collection capability is both extremely intuitive and at the same time confusing for most relational oriented developers and DBAs. However, it grants us powerful capabilities for data storage/retrieval and complex aggregations.
So let’s create a simple collection with three rows and get the data from it (like a “select *”):
> db.employees.insert( { name: 'Bob', title: 'DBA', salary: 80000 } );
> db.employees.insert( { name: 'Jane', title: 'Developer', salary: 70000 } );
> db.employees.insert( { name: 'Kim', title: 'Manager', salary: 130000 } );
> db.employees.find();
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
{ "_id" : ObjectId("512b7674dfea6825c38b0b56"), "name" : "Jane", "title" : "Developer", "salary" : 70000 }
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "title" : "Manager", "salary" : 130000 }
For these three simple rows I decided to let MongoDB define the unique _id field, and also kept the elements the same (name, title, salary). However, adding a new element is as simple as updating a document:
> db.employees.update( { name: 'Kim' }, { $set: { office: '37L' } } );
> db.employees.find( { name: 'Kim' } );
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "office" : "37L", "salary" : 130000, "title" : "Manager" }
By searching for name = ‘Kim’ and using the $set operator to set a new element called ‘office’, the document has been updated and can be seen in the query. It should also be noted that while this mimics what we might do to update a row in a relational database, MongoDB is incredibly more powerful when it comes to data manipulation in-document. Basic operations like insert, update, delete, upsert, etc. are supported along with many more.
Remember how I said the ‘mydb’ database only existed on screen until we actually did something with it? By performing that first insert (Bob), we not only created the mydb database but the employees collection and its first row all in one step.
Indexing
Indexes are just as important in MongoDB as they are in any relational database. The rules for proper indexing performance are nearly identical as well. Columns used for searching or sorting are good candidates for index, with indexes sortable in ascending and descending order. Of course, MongoDB doesn’t stop there. In addition to standard b-tree and b-tree compound indexes it also allows 2d spatial indexes (for geohashed location queries), indexes on sub-documents, TTL indexes for document expiration pruning, and more.
Creating an index is as simple as using the ensureIndex command:
> db.employees.ensureIndex( { 'salary': 1 } );
In this example, an index was created on the “salary” element of the employees collection in ascending order. A value of -1 would have been descending order.
Note that because of the loose style in MongoDB, you can create an index on anything, even elements that don’t exist:
> db.employees.ensureIndex( { 'notexists': 1 } );
With the getIndexes() command we can see what indexes exist on a collection:
> db.employees.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "mydb.employees",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"salary" : 1
},
"ns" : "mydb.employees",
"name" : "salary_1"
},
{
"v" : 1,
"key" : {
"notexists" : 1
},
"ns" : "mydb.employees",
"name" : "notexists_1"
}
]
Note that there are three: the default unique index on _id (required), an index on salary, and an index on notexists…even though it doesn’t exist at the moment as an element in any document. Since I don’t really need that index, I’ll drop it:
> db.employees.dropIndex( 'notexists_1' );
{ "nIndexesWas" : 3, "ok" : 1 }
Querying a Collection
This is a highly complex topic and we are not going to be able to cover every possible query that MongoDB can do…mostly because it can do nearly anything. Thanks to the group command (GROUP BY on steroids) for relatively simple aggregations with under 20,000 unique groupings and MapReduce (GROUP BY times infinity + 1) which can aggregate data in pretty much any method that can be coded, there is no limit to how you can query MongoDB. But for relatively simple queries, we use the find command:
> db.employees.find();
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
{ "_id" : ObjectId("512b7674dfea6825c38b0b56"), "name" : "Jane", "title" : "Developer", "salary" : 70000 }
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "office" : "37L", "salary" : 130000, "title" : "Manager" }
A find command with no arguments is simply like a “select *” on a table. It will return all documents with all elements. However, you can query for specific elements easily:
> db.employees.find( { office: '37L' } );
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "office" : "37L", "salary" : 130000, "title" : "Manager" }
> db.employees.find( { name: 'Jane' } );
{ "_id" : ObjectId("512b7674dfea6825c38b0b56"), "name" : "Jane", "title" : "Developer", "salary" : 70000 }
> db.employees.find( { title: /d.*/i } );
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
{ "_id" : ObjectId("512b7674dfea6825c38b0b56"), "name" : "Jane", "title" : "Developer", "salary" : 70000 }
Oh, did I mention you can use regular expressions? The incorporation of perl regular expressions makes the MongoDB engine extremely powerful. You can, of course, also use comparison operators:
> db.employees.find( { salary: { $gt: 75000 } } );
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "office" : "37L", "salary" : 130000, "title" : "Manager" }
> db.employees.find( { salary: { $gt: 75000, $lt: 85000 } } );
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
Again, the list of capabilities with querying in MongoDB goes on and on. Rather than dive into each of them, I do want to make sure our previous information on creating indexes actually did something. Let’s run the salary queries again, this time using explain plans. You can do this by running the explain() method after your find() query:
> db.employees.find( { salary: { $gt: 75000 } } ).explain();
{
"cursor" : "BtreeCursor salary_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"salary" : [
[
75000,
1.7976931348623157e+308
]
]
}
}
Notice that the query used the salary_1 b-tree index, and shows the manner in which it was used with “indexBounds”. If I had used a non-indexed query, you would have seen:
> db.exmployees.find( { name: 'Kim' } ).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
Notice no indexBounds are defined, and the cursor type is “BasicCursor”. This is more or less a full table scan. If you think it should have used an index, you can supply a hint where you pass in the index name:
> db.employees.find( { salary: { $gt: 75000, $lt: 85000 } } ).hint('salary_1');
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
Conclusion
As I’ve mentioned many times, there is a lot more to MongoDB than these simple operations. However, even knowing the basics can be invaluable in the event that you or your business decides it is the right tool for a job at some point. The things you saw here just scratch the surface of what MongoDB actually does once you incorporate sharding, GridFS, MapReduce, etc. As you learn and grow with MongoDB, keep the following points in mind:
- Don’t be afraid to try things out with MongoDB. It is very easy to get up and running, create multiple collections, databases, etc.
- MongoDB can be very unforgiving. Dropping all data in a collection is as simple as db.employees.remove(), and once it is gone it is gone.
- The documentation for MongoDB is incredible. You can find it all on their site.
- If you’re a book learner, MongoDB: The Definitive Guide
is an invaluable resource and is very easy to follow.
- MongoDB is not SQL, but there are some easy comparisons that may help you come to grips with it.
- Want to try MongoDB without ever even installing it? Here’s a mini Mongo implementation at MongoDB’s website.
See you soon for the next Database Diversity article!
The post Start Using MongoDB appeared first on Steve Karam :: The Oracle Alchemist.
The Overly Shared Pool

Ever been in a really crowded pool? I have… and chances are so have your queries. This article will go into a little bit of high level detail about the shared pool and a portion of what it does for us, then show a real “good pools gone bad” issue resolution. If you feel like the shared pool is something you’ve already done triple backflips with a jackknife for a 10/10 finish into and you’d rather just sit poolside and order an extravagant beverage while reading my client’s tale of woe, you can skip down a bit.
Parsing = Translation and Optimization
When someone walks up to you and says, “Hello, how are you doing?” your brain receives audio input from your ears in the form of the English language. This input is then ‘parsed’ by the cerebral cortex in a place called “Wernicke’s Area”. This section of the brain takes the audio information and breaks it down based on inflection, syntax, and other factors. Once the question registers and is ‘understood’, your brain is able to come up with a suitable response. It can either dig through short-term memory to piece together recent events and describe, in detail, exactly how you are doing; or more efficiently, it can offer up a quick instinctual response like “Good, you?”
Oracle parsing works much the same way. When a statement is given to Oracle in the form of Structured Query Language (SQL), the statement must first be understood by the Oracle Instance (the brain, as it were). The statement is checked for syntax and permissions, and then broken down to a hash value so Oracle can offer an optimal response. The optimization phase of the parse is most important, and is done via one of three methods: hard parse, soft parse, or session cached cursor.
- Hard Parse – Oracle searches for the statement in an area called the Library Cache, a subsection of the Shared Pool. If the statement is not found, Oracle invokes a tool called the Cost Based Optimizer (CBO) in order to come up with an optimal execution plan. This invocation is widely considered a very expensive process for CPU, just as a well thought out response to a question requires more brainpower.
- Soft Parse – Oracle searches for the statement in the Library Cache and finds it. This allows Oracle to re-use execution plans without having to re-optimize the query and is akin to responding “Good, you?” when asked how you are doing.
- Cached Cursors – Oracle notices that you have run the same statement over and over again, and caches the execution plan into your private memory so the Shared Pool is no longer burdened for your session. This is akin to someone asking you the same question over and over again, at which point the answer is immediately given with no real thought.
Hard Parses are a necessary evil in order to initially optimize a query. Soft parses are generally considered to be good, but can be problematic under heavy concurrency (explained later). Cached Cursor responses are great for performance, but require RAM resources granted explicitly via Oracle parameters or client/OCI settings.
Query executions go through three main phases:
- Parse – Translate and optimize the query
- Execute – Lay down the execution plan and run the query
- Fetch – Pull back data from Oracle objects based on the execution plan
It is important to note that most programming languages explicitly call each of the three phases. When a query is run, it is possible to parse the query (known as a parse call) and then not execute or fetch the data. Reusing parsed queries (parse once, execute many) is a highly recommended practice.
The Shared Pool
When queries are hard parsed, the execution plans generated by the CBO are placed in the Shared Pool; specifically, in a subsection called the Library Cache. If the same query comes in again, Oracle must simply search the Library Cache and return the stored execution plan (soft parse).
However, both hard and soft parses put strain on the Shared Pool. The Shared Pool is simply a chunk of RAM, and CPU must manage RAM. This means that if 100 users are simultaneously trying to access the Shared Pool, CPUs will be dedicated to these users to access the RAM. If there are more users than there are CPUs, a queue will form and the CPU will work until all the users are done.
Think of it as a trip to the grocery store. To take items from the store, a cashier must first process you. If there are 100 shoppers trying to check out and only two cashiers, a large queue will form and long wait times will ensue. The bottleneck in this case is cashiers, and you could even call the event “waiting on resource: cashiers”. We can fix this by adding more cashiers, thereby cutting down on the wait times overall despite high concurrency.
In Oracle, the items are RAM segments, the cashiers are CPUs, and the shoppers are users or sessions. When more sessions need chunks of RAM than there are CPUs, enqueues and waits occur which cause the overall accessibility to the database to degrade. Wait events such as “latch: library cache” and “cursor pin S wait on X” are Oracle’s internal names for various types of wait events, just as “waiting on the slow cashier with no bagger” is our name for a wait event at the grocery store.
Continuing on this analogy, let’s say we have 20 shopper and 10 cashiers. This type of queue is very reasonable and will be processed rather quickly. Now imagine we throw in 20 more shoppers who come in to buy the exact same three or four items every few minutes. Then imagine we bring in a few shoppers who load up their cart, get to the front of the line, and then decide they don’t want the items after all. On top of that, we throw in a few store employees who walk through the line demanding various sub-tasks on top of the normal cashier duties. This situation will fly into chaos, as anyone who has ever gone to Wal-Mart on a Saturday afternoon can tell you.
The same thing occurs on Oracle. The shared pool is required for parsing, for queries from dual, for sequences, and countless other components. There are sessions which come through over and over and over to run the same thing every time, sessions that make a parse call but don’t actually execute or fetch, sessions that bring in huge queries that take forever to parse, and so on. In addition, Oracle has its own control structures, sequences, views, and other components that must be handled behind the scene while all this is going on. And here we’ve only discussed the checkout process! Imagine if during all this the cashier also had to order custom or out-of-stock items on the fly from warehouses (akin to disk reads from storage).
However, the answer to this predicament is not always to add more CPUs. We can also cut down on this behavior by introducing better design and optimizations. For instance:
- Making the shared pool bigger allows more queries to be soft parsed instead of hard parsed, cutting down on CPU and allocating more available latches.
- Allowing repeat queries to parse in session memory instead of the shared pool reduces strain.
- Ensuring that queries parse only when absolutely necessary at the application level reduces parse calls and therefore contention on the shared pool.
- Not running meaningless queries like “select 1 from dual” to check for database availability cuts down on parse calls, not to mention hits to the dual table which is in the shared pool.
A Real Life Shared Pool Problem
I had a client who called me in due to an incredibly severe issue with the shared pool. They followed all the “rules”: they used bind variables religiously, avoided anonymous PL/SQL blocks, and set a decent value for session_cached_cursors. Yet the system would become unresponsive and the following waits kept occurring when concurrency got really heavy:
- latch: library cache
- row cache lock
- cursor pin S wait on X
- latch: shared pool
AWR/ASH and V$ data showed that soft parses were happening almost exclusively. But even though the system was properly caching and re-using SQL in the shared pool resulting in soft parse, the contention buildup against the shared pool was too much for the system to handle and it would come to a grinding halt.
In the end, we went through a massive effort to reduce usage on the shared pool (hard or soft). This was accomplished by removing unnecessary DUAL calls (the code was doing a select from DUAL on every page) and most importantly, setting up true client cursor caching in the JBoss application server.
First have a look at this bit of nastiness:
Elapsed: 59.50 (mins) DB Time: 16,227.94 (mins)
Remember that Average Active Sessions (AAS) is equal to DB Time over Elapsed Time. This gives us a good indication of wait/work load during the 60 minute window. As you can see, the AAS during this time was ~273, far too high for most any respectable system.
Looking at the Top 5 Timed Events, you can see that we’re abnormally tied up in concurrency wait:
Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- latch: library cache 169,214 3,677,789 21735 377.7 Concurrenc row cache lock 31,355 60,136 1918 6.2 Concurrenc enq: TX - row lock contention 20,268 58,719 2897 6.0 Applicatio latch: shared pool 102,219 54,985 538 5.6 Concurrenc cursor: pin S wait on X 4,895,909 53,456 11 5.5 Concurrenc
There were reams of other statistics and information used to correctly identify the issue, but the key symptom could be seen right here in the AWR under the SQL ordered by Parse Calls section (query text has been removed to protect the guilty):
% Total
Parse Calls Executions Parses SQL Id
------------ ------------ --------- -------------
283,466 283,061 12.07 bd7nzg41xqmtu
Module: JDBC Thin Client
CONFIDENTIAL QUERY #1
204,990 204,987 8.73 bajtpw34f1h8n
Module: JDBC Thin Client
CONFIDENTIAL QUERY #2
166,868 166,870 7.11 avc1jqzz04wpr
SELECT 'x' FROM DUAL
108,239 108,239 4.61 4rg9kqppkwb2x
Module: JDBC Thin Client
CONFIDENTIAL QUERY #3
You should notice three things from the above info:
- The SELECT ‘x’ FROM DUAL query is definitely excessive, having occurred 46 times per second.
- More importantly, the Parse Calls match Executions 100% in every query
- There are a LOT of query executions
Interestingly enough conventional wisdom, advice, and even training always warns of the dangers of hard parsing vs. soft parsing, and vaguely mentions parse to execute ratio and how important it is, but for the most part the focus is given to the type of parse. But the case above shows that even soft parses aren’t enough with massive concurrency, an overabundance of small queries, and meticulous bind variable practices combined to create a huge case of shared pool contention with ~99% soft parses.
The application software in use was JBoss which has a feature called the Prepared Statement Cache. Since querying in JBoss is done by setting up a Prepared Statement and running it against the DB, the Prepared Statement Cache allowed JBoss to dedicate memory to keep the cursor open and reusable following a successful parse/execute/fetch. Even with all the bind variables in the world, we couldn’t “Parse Once, Execute Many” without the client keeping the parse cursor open. Take a look what happened when we found the root cause of the issue in the application server settings and corrected it (a value of 15 for prepared-statement-cache-size worked nicely):
% Total
Parse Calls Executions Parses SQL Id
------------ ------------ --------- -------------
13,481 66,349 2.32 8g5h090r1uxmc
Module: JDBC Thin Client
CONFIDENTIAL QUERY #1
11,346 104,237 1.95 5nz8u1h2qyr92
Module: JDBC Thin Client
CONFIDENTIAL QUERY #2
11,083 63,029 1.90 78xz4j1g270uw
Module: JDBC Thin Client
CONFIDENTIAL QUERY #3
10,663 35,397 1.83 02s458bkx4z4a
Module: JDBC Thin Client
CONFIDENTIAL QUERY #4
Notice that the parse to execute ratio was improved dramatically. We also got rid of the DUAL queries. Consequently, the latch issue went away never to return. The shared pool operates at about 10% of the wait it used to incur and never hits that breaking point where performance degrades exponentially.
The final lessons:
- Soft parses are preferable to hard parses in an OLTP application where query stability is preferred (though auto-gathered histograms sometimes messed this up in 10g before Adaptive Cursor Sharing was introduced in 11g)
- Bind variables can and should be used to improve plan reuse and reduce hard parses (CURSOR_SHARING is something of an alternative, but not preferred to actual bind variable usage in the code)
- To actually lower your parse to execute ratio, bind variables aren’t always enough. Your client needs to actually be configured to take advantage of them. This one doesn’t seem to make it into all the how-tos or best practices, but it should.
The post The Overly Shared Pool appeared first on Steve Karam :: The Oracle Alchemist.
DBA, Grow Thyself – Moving and Shaking in the Era of Data Dominance
This entry is part 3 of 4 in the series Grow Your Career- The best IT career book I’ve ever owned
- DBA Team, Front and Center
- DBA, Grow Thyself – Moving and Shaking in the Era of Data Dominance
- Deploying Outside the Box
Earlier this week I wrote an article about the role of DBA teams in the business today, focusing on advice for the DBA manager (though it is also very useful for DBAs as well). Today I want to focus on the DBA specifically with some points which, in my experience, are highly important at the business level and possibly unknown at the technical IT employee level.
Development/QA – It’s the New ProductionI remember years ago, I made it a personal mission to tell every DBA I worked with that backups should be treated like production. Even then it was such a novel concept to some people; who knew that the bundle of files that may one day save my career and company might actually be a production environment? Well certainly not the people who lost their jobs due to lack of backup. Ye sorry souls.
Nowadays that should be common knowledge and a common phrase, harkening back to the days when Russian roulette was played by DBAs and administrators for lack of disk space or free cycles. However, a new paradigm has arisen which is in many ways difficult to support but wholly necessary: development and QA are just as important as a production environment. In fact it’s not really a new paradigm at all, it’s just being enforced more and more.
To a company, being able to reliably push out software updates, new products, roll out new features or content, fix bugs, and provide solid user acceptance testing is paramount. It increases both user and executive confidence in the business and IT as a whole, and it creates fresh sources of revenue and opportunity. The last thing any DBA wants is to be the bottleneck in that process. In all areas of development from a database perspective – creation of development/QA databases, refreshing from production, providing viable statistics and metrics, monitoring, and backup and recovery – the DBA must be as active and safe as they would be in production. At some companies, even development environments have change request procedures that must be followed by the DBA or developers.
Let’s face it, DevOps is taking off and businesses know it. If the DBA (and system administration) groups can’t keep up with fast paced frameworks and QA requirements while keeping all aspects of development moving smoothly and safely, then they will be consigned to the “doesn’t have a sense of urgency” bin. The business will see the database as just a part of the software stack (and rightly so), and will not understand the reasoning behind a bottleneck however logical it may seem.
That’s why tools like Delphix are taking off. It’s not another monitoring tool or performance management tool…it’s a storage agnostic thin provisioning layer for databases allowing fast deployment and refresh of QA and development environments through virtualization. They are experiencing huge growth by reducing the dev/QA footprint and speeding up release schedules. I predict that in the very near future, a bottleneck to provisioning development/QA/reporting databases despite incredible technology support will be viewed as highly unacceptable.
Lobby for control over new technologyIf it’s data centric, own it. I know this may be a controversial one because of the serious lack of resources some DBA teams already encounter, but if at all possible the DBA should be keeping up on modern trends and embracing them. Like it or not, at some point your business will bring up the idea of using MongoDB, Hadoop, MySQL, or a wealth of non-Oracle technologies. While you need to make sure they are using the right tools for the job (like not replacing every single Oracle database with MongoDB just to do it), at some point the adoption of new technologies will be inevitable. And whether you feel those technologies are fads, useless, or boring you will need to keep up on them to stay relevant.
Take Hadoop, for example. It’s not a database, it’s a filesystem. The data stored on it is as dirty as it is plentiful, the environment is spread out over dozens if not hundreds of nodes, and doing anything meaningful with it requires coding or complex tools/layers. Yet if your company is not going to hire an experienced Hadoop Administrator, it’s a great job for the DBA. Gwen Shapira makes a great note of this, arguing that the DBAs experience with complex tuning requirements, data warehouses, and developers makes us a very suitable candidate.
Honestly, I wouldn’t say you need to become an expert in every data technology out there. There are just too many. But try to remain open and somewhat versed in the environments. And if your company starts moving toward one, be ready to jump at the opportunity (should it be logical) and take part in it. If possible, own it.
Sometimes it is as simple as saying that you want it. When new work comes down the pipe it is very common for everyone to take a step back; if the project is data centric, you should be taking a step forward. The goal is to make you undisputed heavyweight data champion of the world. New technologies mean new opportunity both inside and outside your company. Even niche products can look really great on your resume later down the line. If you can’t simply take over a technology, then lobby for it. Research and discuss it with developers, business analysts, and project managers. With just a little bit of conversation and a few emails, it isn’t hard to convince people that it is in their best interests to hand the data management to the DBA no matter what form it is in.
Speak to your target audienceOne of my favorite reddit subs (don’t judge) is r/explainlikeimfive (ELI5 for brevity). The idea is to take a complex idea and pound it down until it can be explained to a five year old. Now I’m not advocating that a DBA talk to everyone as if they were a child (though I know many IT professionals who do so), but more advocating that you speak appropriately to your target audience. Many of us already do this. But I can’t count the number of times I’ve seen an IT professional stand in front of a business analyst spewing out details about kernels and kdumps and semaphores and concurrency while the analysts eyes glazed over faster than a piping hot Krispy Kreme donut. Like it or not, there will be people at your company that don’t understand parameter settings and b-tree indexes. Trust me, they have their own language that you probably wouldn’t grok too quickly either (see what I did there?).
Just like learning different technologies, learn different business dialects. If not the verbiage, at least learn what interests them and their level of technical understanding. Metaphor is a great vehicle for this. I’ve had great success selling ideas or projects when I used metaphors that interested the person in question; sports metaphors for sports types, car metaphors for car types. Just make sure your metaphors make sense… don’t be that person everyone will one day giggle about for comparing improving database performance to the running back scoring a home run.
So when you talk to project managers, talk in terms of milestones and deadlines. For business analysts, talk about goals and roadmaps. For VPs, talk about objectives and projects. Notice every single one of these examples has to do with things getting done, not the process involved. That’s right! For the most part, they don’t care how it is done as long as it is done quickly, efficiently, and correctly. Leave the hidden parameter discussions for your technical brethren and your final report.
Of course, your mileage may vary. If you work for a company where even the CEO has written kernel code, you might look like a fool if you try to coat over the technical details. Use your gut, but keep a critical eye on your target audience before you flood them with details.
Above all, keep your ear to the ground and your mind open. Data is fast moving out of the cubicle and into the boardroom. If you want to improve your impact and influence, be ready to go along for the ride.
The post DBA, Grow Thyself – Moving and Shaking in the Era of Data Dominance appeared first on Steve Karam :: The Oracle Alchemist.
DBA Team, Front and Center
This entry is part 2 of 4 in the series Grow Your Career- The best IT career book I’ve ever owned
- DBA Team, Front and Center
- DBA, Grow Thyself – Moving and Shaking in the Era of Data Dominance
- Deploying Outside the Box
Let’s start by defining data. In his book Programming the Universe: A Quantum Computer Scientist Takes on the Cosmos, Seth Lloyd says the simplest unit of information is a bit; a single true or false. Add those bits together and you’ve got some meaningful data. Add that data up and you’ve got business logic. Collect, aggregate, and you’ve got corporate knowledge. Companies know that the data they collect is their memory. By collecting it, they build a brain for the organization with streams of data running through the business like a nerve center.
Now let’s talk teams. Different companies do it different ways, but generally speaking you’ve got server/storage/system admins, DBAs, developers, and QA. Developers are generally fairly close to the business; through analysts or project managers they take the high level requirements and turn it into applications. As such (and as many DBAs may notice) they have a lot of pull with the business and in some cases are more trusted as technical experts than the DBA team. QA teams are also usually grouped either close to the development team or close to the project manager/business analysts. Behind the scenes you’ve got your DBAs and system administrators; at times stereotypically seen as a crew with big heads and bigger egos, making cowboy changes in between rounds of video games.
Data is one of the most critical corporate assets and is at the forefront of modern technology requirements. Its distribution and management is an integral process for a company as a whole and for nearly every project that comes down the pipe. Yet at many companies (certainly not all), the DBA team is seen at best as a final dumping ground for project requirements and at worst as an obstacle. Either way, the DBA team is a custodian of the data, central to every IT group and the business itself. We manage over environments that are perceived as shadowy and obfuscated, highly complex and prone to failure.
If that is the case for your team, it needs to change. The best way to do that is by taking a close look at the goals of the DBA team.
Know your client and make sure they know youMany of the non-consulting DBAs I speak with describe their job as a support role to developers. This is a dangerous stance to take. When I think of a ‘support role’ I think of a person sitting in their cubicle incessantly starting up and shutting down databases, taking action only when prodded by the development team or management. Unfortunately, this describes a person who is highly expendable.
As DBA manager your team should of course support the organization, but less like support staff and more like consultants. Your client is the business. When a project comes down the pipe, make sure that it doesn’t land on your desk as requirements. Instead, try to be part of the architectural process. Play a part in making the requirements. Make sure your client, the business, sees your team as a valuable resource, crack specialists who understand their content and their needs. Only with your team’s leadership and help can that data be managed and served in the most efficient manner possible.
Think of it like an Oracle environment. What is a database? Just files on disk, capable of nothing. They store information without description, data without goals. Only by starting an instance and mounting that database does the data begin to flow and take on meaningful purpose. Your team should be the instance. You have the knowledge, the processes, and the experience with the database that your client needs to harness.
Above all, make sure each of the DBAs on the team feel that empowerment. Taking in requirements and passing them down to DBAs leads to rote work and a sense of uselessness. Instead, encourage the DBA team to interface directly with the business as technical leads. Your team will end up owning the issues if a query performs slow or data is lost anyways. So make sure they own the process and productivity required to bring a project to fruition as well. Help them become invaluable to their client.
The problem may not be in the database, but the database is part of the applicationIf I had a Flainian Pobble Bead for every time I’ve heard a DBA say “It’s not a database problem” I’d be a hyperrich man. These words are generally uttered when a problem escalates to the point where people are scrambling to find an issue with slowness. Other phrases you might hear around the same time are “it worked in QA,” “that code has been there for X years,” and “maybe you should increase the SGA.”
Yes, the problem could very well be rising concurrency or data volume. Yes, queries might be dynamically generated and resolving sub-optimally. And yes, some ad hoc query might be eating up your processes and RAM. Absolutely dig for that root cause and absolutely report on it. But not as a way to shift blame; instead, do it as a way to own the solution.
Look at the problem from the client’s eyes: a query may be written by developers, but it is run against a database. An index might be missing, but it’s missing from the database. The server might be out of resources or encountering slow disk reads, but it’s manifesting through the database. Yes, Virginia, it is a database problem.
While this puts you in a bad position when things go sour, it also gives you an outstanding opportunity to own both the problem and the solution. Even if there is nothing you can do about it, your team can express their mastery over their domain by assisting with diagnosis of the problem and delegating tasks and research questions to the appropriate teams. The DBA team is not at the bottom of the development ladder, but at the center of it. Projecting that image both when times are good and when times are bad is very rewarding to the business and your team.
Focus on solutionsDBAs are a fairly vocal bunch, and at times critical of other groups. Because we are the hub of data management in a company we get to see every bottleneck along the path to production or problem resolution. And, quite frankly, this can make some DBAs a little jaded. And said DBAs might be inclined to discuss it. Publicly. In meetings.
Work with your team to focus on solutions, not problems. Sitting in a meeting discussing the reasons why something happened are not productive, particularly when it is already known where the root cause lies. This one seems very obvious, but I see it happen on a daily basis. Some people just need to complain about their aches and pains. Instead, they should be offering the elixir.
A problem should be described once. A solution should be pored over and evaluated constantly. Work with your team to focus on talking about what can be done, not what happened.
ConclusionData is a complex topic, and has made its way to the forefront of the minds behind your business. As the team who architects, maintains, protects, optimizes, and coordinates the flow of data the DBA team has the ability to play a powerful central role in product development and management. If your team has been relegated to a backend support role, asserting your team’s knowledge of business needs surrounding data can be a huge boon to both your position and that of your DBAs.
The post DBA Team, Front and Center appeared first on Steve Karam :: The Oracle Alchemist.
5 quick and dirty Linux tips you may not know

Remember that day you were sitting at a friendly bash$ prompt and discovered tab completion? It was like a choir of angels came down from the heavens and sang Hallelujah, the Shell Hath Come (yes, that’s a bourne again reference). While these 5 tips may not be that monumental, they are still things I wouldn’t be able to do without… perhaps they will help you as well.
1. Grep without the grepI frequently see a lot of people running grep commands like this: ps -ef | grep pmon | grep -v grep
Since the ‘ps’ command is a process list, it unfortunately will not only return the pmon process you are looking for but the ‘grep pmon’ command you just ran.
[skaram@server2 ~]$ ps -ef | grep pmon skaram 15755 15704 0 11:35 pts/0 00:00:00 grep pmon oracle 31151 1 0 2012 ? 23:38:20 ora_pmon_ORCL
It is obnoxious, but easy to workaround. However, the ‘grep -v’ command is the long way. Here is the short way around the issue:
ps -ef | grep [p]mon
Simply wrap the first character of the search term in square brackets, and you are good to go.
[skaram@server2 ~]$ ps -ef | grep [p]mon oracle 31151 1 0 2012 ? 23:38:20 ora_pmon_ORCL2. Kill them all
Have you ever needed to, say, kill all LOCAL=NO processes on an Oracle server? Yeah, me neither. But if you did, not saying that I have, this is the kind of command I would use:
kill -9 `ps -ef | grep -i [l]ocal=no | awk '{print $2}'`
DISCLAIMER: DO NOT DO THAT ON A PRODUCTION SYSTEM.
3. Space hogsThis one is probably a little better known, but if you need to find the top space users in a directory just run the following command:
du -k | sort -n
[root@220883 log]# du -k | sort -n 4 ./conman 4 ./conman.old 4 ./cups 4 ./news/OLD 4 ./pm 4 ./ppp 4 ./samba 4 ./squid 4 ./vbox 8 ./mail 8 ./news 52 ./prelink 2864 ./directadmin 4940 ./httpd/domains 7000 ./httpd 15576 ./proftpd 20292 ./audit 44856 ./exim 216272 .
Note that the output number will be in kilobytes.
4. Getting the magic backDoes your job or client require you to use sudo in order to log into a database server as Oracle? Depending on the way you get X Windows going, that may have caused you grief in the past (particularly with installing Oracle). But you can easily grab your auth data and transfer it when you sudo to a new user like so:
sma11x:~ steve$ ssh -X skaram@server2 skaram@server2's password: [skaram@server2 ~]$ echo $DISPLAY localhost:10.0 [skaram@server2 ~]$ xclock
And you get the clock, right? But you need to be oracle to do an install. Before you do that sudo command, type xauth list
[skaram@server2 ~]$ xauth list server2/unix:10 MIT-MAGIC-COOKIE-1 c183a5016b775c9692b2da6b31552dae
When I sudo to oracle, notice xclock doesn’t work:
[skaram@server2 ~]$ sudo -u oracle -H bash Password: bash-3.2$ xclock X11 connection rejected because of wrong authentication. X connection to localhost:10.0 broken (explicit kill or server shutdown).
But I can add back my authorization by using xauth add and pasting the entire line from the xauth list output.
bash-3.2$ xauth add server2/unix:10 MIT-MAGIC-COOKIE-1 c183a5016b775c9692b2da6b31552dae xauth: creating new authority file /export/home/oracle/.Xauthority bash-3.2$ xclock
Doing so will make X it work once again, but now as the oracle user.
5. Screen your sessionsThis one was a godsend. Have you ever been working on a long running operation and you lost your connection to Oracle, or had to go home, or god knows what else? It is a horrible pain and one that really messes up a lot if it gets you at the wrong time. That should be a thing of the past if you use screen.
Screen is a multiplexed terminal, allowing you to spawn multiple terminals in a single terminal session. Opening a new ‘screen’ is simple; simply type the word screen at a Linux command line:
[skaram@server2 ~]$ echo HELLO! HELLO! [skaram@server2 ~]$ screen
Notice your terminal clears and you start at a new prompt. Now I will go ahead and make it wait for input.
[skaram@server2 ~]$ read -p "Press Enter" Press Enter
If I press CTRL+A, then CTRL+D, my screen will detach, putting me back on the ‘parent’ terminal session:
[skaram@server2 ~]$ [skaram@server2 ~]$ [skaram@server2 ~]$ [skaram@server2 ~]$ [skaram@server2 ~]$ echo HELLO! HELLO! [skaram@server2 ~]$ screen [detached] [skaram@server2 ~]$
If I want to get back to the child screen, I can type screen -rx to re-attach. Or start a new terminal with a fresh ‘screen’ command. You can even detach from multiple terminals and connect to the one of your choosing:
[skaram@server2 ~]$ screen -list There are screens on: 10913.pts-2.server2 (Detached) 10883.pts-2.server2 (Detached) 9700.pts-2.server2 (Detached) 3 Sockets in /var/run/screen/S-skaram. [skaram@server2 ~]$ screen -rx 10883.pts-2.server2
But wait, there’s more! If you and a buddy, coworker, consultant, engineer, parole officer, etc. want to share the same screen, you can login as the same user via SSH and then both type screen -rx. You will both join the multiplexed session (screen) and be able to type and see what each other is typing. It is awesome for following along, mentoring, etc.
Good luck with these, and don’t do anything dangerous on production. If you have any of your own tips to share, feel free to leave a comment!
The post 5 quick and dirty Linux tips you may not know appeared first on Steve Karam :: The Oracle Alchemist.
blobFS – Imported an Oracle dump from a BLOB using a FUSE filesystem

For those who have not been following, I reported previously that I had created a query-based filesystem with node.js, fuse4js, FUSE, and Oracle. This filesystem is mountable by any OS user and uses queries against Oracle tables as its source instead of disk. In this specific case, a table containing filenames and BLOBs was used as the source; the FUSE filesystem then presented the BLOB contents as a file to the OS. No actual files were written.
Curiosity led me to try a new test. What if I:
- Created a table and populated it
- Exported the table with ‘exp’
- Dropped the new table
- Loaded the exp dump file into my BLOB table
- Mounted the BLOB table as a filesystem
- Imported the table from the BLOB filesystem
I executed the test this morning, and it went incredibly smoothly. The exp dump file was easily stored as a BLOB, and ‘imp’ was able to read the file without any issue when it was translated to filesystem from the select query.
Set It UpTable Creation
SQL> create table test_import as select * from dba_users;
Table created.
SQL> select count(*) from test_import;
COUNT(*)
----------
13
Exporting the Table
steve@UbuntuVM:/var/www/blobFS$ exp steve/steve file=test_import.dmp tables=test_import Export: Release 11.2.0.2.0 - Production on Wed Feb 13 09:47:47 2013 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. Connected to: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production Export done in US7ASCII character set and AL16UTF16 NCHAR character set server uses AL32UTF8 character set (possible charset conversion) About to export specified tables via Conventional Path ... . . exporting table TEST_IMPORT 13 rows exported Export terminated successfully without warnings.
Load the Dumpfile Into BLOBTAB then Drop the Exported Table
steve@UbuntuVM:/var/www/blobFS$ php loadfile.php test_import.dmp
SQL> select * from blobtab;
FILENAME
--------------------------------------------------------------------------------
FILECONTENT
--------------------------------------------------------------------------------
test_import.dmp
0300014558504F52543A5631312E30322E30300A4453544556450A525441424C45530A383139320A
300A37320A300A0001036907D000010000000000000000000F002020202020202020202020202020
SQL> drop table test_import;
Table dropped.
SQL> select count(*) from test_import;
select count(*) from test_import
*
ERROR at line 1:
ORA-00942: table or view does not exist
Enter blobFS
Starting the Filesystem
steve@UbuntuVM:/var/www/blobFS$ node jsonFS.js mnt Mount point: mnt File system started at mnt To stop it, type this in another shell: fusermount -u mnt steve@UbuntuVM:/var/www/blobFS$ cd mnt steve@UbuntuVM:/var/www/blobFS/mnt$ ls -ltr total 0 -rwxr-xr-x 0 steve steve 21848 Dec 31 1969 test_import.dmp
Let’s Import!
steve@UbuntuVM:/var/www/blobFS/mnt$ imp steve/steve file=test_import.dmp fromuser=steve touser=steve Import: Release 11.2.0.2.0 - Production on Wed Feb 13 09:50:31 2013 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. Connected to: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production Export file created by EXPORT:V11.02.00 via conventional path import done in US7ASCII character set and AL16UTF16 NCHAR character set import server uses AL32UTF8 character set (possible charset conversion) . importing STEVE's objects into STEVE . . importing table "TEST_IMPORT" 13 rows imported Import terminated successfully without warnings.
The Moment of Truth
SQL> select count(*) from test_import;
COUNT(*)
----------
13
Conclusion
Even though this test is not much different from my last one in that I loaded files into BLOBs then mounted the FUSE filesystem to manifest them as files to the OS, but it’s awesome that I was able to use it for an Oracle-specific tool. Very meta, as they say.
I wonder if I can run an Oracle instance purely from BLOB-stored datafiles? Hmmm…
UpdateKevin Closson provided some links from his own blog which detail DBFS, Oracle’s implementation of FUSE (Note, I’d like to claim I did it before Oracle). Thanks for the great information, Kevin.
Also adding a test done by Ronald Rood, a full RW database created inside DBFS. Now I definitely want to get this working in my homebrew version.
The post blobFS – Imported an Oracle dump from a BLOB using a FUSE filesystem appeared first on Steve Karam :: The Oracle Alchemist.
Replication Platform Comparison

Based on my experience.

Hope this helps.
The post Replication Platform Comparison appeared first on Steve Karam :: The Oracle Alchemist.
What’s the diff?

If you’re a DBA on Enterprise Edition with the Diagnostics Pack and you don’t use the Automatic Workload Repository (AWR) reports, you are seriously missing out. AWR is the holy grail of snapshot-based metrics at the instance level. There are a ton of resources out there covering AWR, so I won’t be getting into much detail about standard run-of-the-mill AWR reports. Suffice it to say that:
- AWR reports are an outstanding starting point for delving into bottlenecks during a single snapshot window
- They provide metrics for all major components of the Oracle instance
- Database level metrics are also included for I/O times
- Query details and statistics are also included for the snapshot window being viewed
Sometimes it is enough just to look at the DB Time and Top 5 Timed Events (remember, DB Time / Elapsed Time = Average Active Sessions, a great indicator of load for the snapshot window). Other times, you have to delve a little deeper to get a good view of what the instance/database were up to for the duration of the snapshot. And sometimes, that’s not enough; what about when you need to compare it to another time period?
I can’t tell you how many times I’ve walked into a customer’s meeting room to find stacks of AWR reports side by side, ready for comparison. While that is definitely helpful (and sometimes necessary for a long running issue that is difficult to trend), it is often forgotten that Oracle has a built in tool right in the rdbms/admin directory which can compare snapshot windows for you: the AWR Diff Report (awrddrpt). Like AWR, it can be run via PL/SQL or from the OS command line. I used the latter case with:
SQL> @?/rdbms/admin/awrddrpt
Like AWR, you have to choose a snapshot window, except you do it twice. You pick a first begin/end snapshot pair, then a second begin/end snapshot pair. For instance, you could compare last Tuesday from 7-9AM against this Tuesday from 7-9AM.
Specify the First Pair of Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 13747
First Begin Snapshot Id specified: 13747
Enter value for end_snap: 13749
First End Snapshot Id specified: 13749
Specify the Second Pair of Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap2: 13915
Second Begin Snapshot Id specified: 13915
Enter value for end_snap2: 13917
Second End Snapshot Id specified: 13917
Once you have generated the report (you can choose between text and html like a standard AWR report). The final result is formatted like an AWR report (though much wider), with the same major sections. However, each section will report on the 1st time window, the 2nd time window, the metric delta, and the % difference. You can truly say that your I/O was x% higher or lower this week than last week, or that a wait event had double the impact. For instance, here is an image comparing the top 5 timed events:
Note that it reads like a combination AWR and diff report; items present in one time window but not in the other are shown, and items present in both time windows have deltas for comparison. SQL Metrics have the same capability, so you can compare executions, time, and other key metrics:
You can also compare I/O metrics:
And for the “what changed?” crowd, compare initialization parameters:
You can see an example AWR diff report here. Note that object names have been changed and query text removed to protect the innocent.
If you are not using AWR diff reports, I highly suggest you make your terminal a little wider and give them a shot. For viewing them, I’d definitely recommend HTML or Notepad with word wrapping turned off.
The post What’s the diff? appeared first on Steve Karam :: The Oracle Alchemist.
A BLOB Filesystem built with FUSE, node.js, and fuse4js

Here’s one for the “neato” category. This weekend I decided to try making a filesystem with FUSE (Filesystem in Userspace), a package which allows mountable filesystems to be customized and modified with a source other than disk. The data can come from anywhere, and it will take the traditional appearance of directories and files on the system where it is mounted.
So for this one, I decided to make a quick filesystem which uses an Oracle table containing BLOB data as its source. While FUSE has many language bindings (C, C++, Java, Python, Ruby, etc) I wanted to go with something a little more trendy, so I decided on JavaScript.
Proof of emptiness:
steve@UbuntuVM:/var/www/blobFS$ cd mnt steve@UbuntuVM:/var/www/blobFS/mnt$ ls -ltr total 0
The BLOB table (filename, filecontent):
SQL> select * from blobtab; FILENAME -------------------------------------------------------------------------------- FILECONTENT -------------------------------------------------------------------------------- pip.jpg FFD8FFE000104A46494600010100000100010000FFFE003B43524541544F523A2067642D6A706567 2076312E3020287573696E6720494A47204A50454720763632292C207175616C697479203D203930 40.jpg FFD8FFE000104A46494600010100000100010000FFDB004300030202020202030202020303030304 060404040404080606050609080A0A090809090A0C0F0C0A0B0E0B09090D110D0E0F101011100A0C wow.docx FILENAME -------------------------------------------------------------------------------- FILECONTENT -------------------------------------------------------------------------------- 504B030414000600080000002100E95110B08D010000C2050000130008025B436F6E74656E745F54 797065735D2E786D6C20A2040228A000020000000000000000000000000000000000000000000000
Mounting the Filesystem with node:
steve@UbuntuVM:/var/www/blobFS$ node jsonFS.js mnt/ Mount point: mnt/ File system started at mnt/ To stop it, type this in another shell: fusermount -u mnt/
And there they are (wow those are old)!
steve@UbuntuVM:/var/www/blobFS$ cd mnt steve@UbuntuVM:/var/www/blobFS/mnt$ ls -ltr total 0 -rwxr-xr-x 0 steve steve 77584 Dec 31 1969 wow.docx -rwxr-xr-x 0 steve steve 132528 Dec 31 1969 pip.jpg -rwxr-xr-x 0 steve steve 57376 Dec 31 1969 40.jpg
Now let’s make sure they are readable:

Thanks PIP-Boy!
Sweet! A few cool things about this implementation:
- You can use commands like ‘cp’ to copy the file out of the FUSE mounted location and onto a normal disk based filesystem for quick BLOB offloading.
- Filesystems can be mounted by a non-root user.
- It uses node.js for mounting the filesystem and retrieves JSON from a PHP server-side component that queries the database, meaning this filesystem can be mounted using remote data.
- Except for the small amount of C++ changes I had to make to fuse4js, the bulk of the programming is in Javascript which makes it fairly easy for developers.
- You can use any node.js modules to extend it, or even jQuery or other frameworks.
In order to make this madness a few components were required. The first main one was an Ubuntu Server 12.10 Quantal Quetzal VM with Apache 2, PHP5 (with OCI8), and Oracle XE 11.2.0.2 installed. I also installed node.js (downloading from the site seemed a little easier to work with than getting it from Ubuntu’s apt repo) for server-side Javascript functions. Node.js is an awesome modular package which allows event-driven, scalable HTTP servers to be created and started using pure JavaScript. It uses a tool called npm to download and compile new modules which can include anything from client requests, jQuery for easy AJAX calls, filesystem modification, socket management, and tons more. In my case, I used fuse4js, a github project from VMWare Labs which I was able to modify and expand a bit.
Overall Components:
- Oracle 11.2.0.2 XE
- PHP5 w/ OCI8
- PHP script called blobFSServer.php – queries the BLOB table in Oracle and emits a base64 encoded JSON array with files and content
- node.js with fuse4js and request modules installed via npm
- Modified the fuse4js code a bit for arguments
- Expanded on the jsonFS.js example to use HTTP requests to the PHP server for pulling JSON from BLOB data, modified arguments and some basic parsing code
Now that the basics are done (mounting, listing, reading) I plan to add the ability to cp/mv a binary file into the FUSE mount to write the BLOB data into the database. I would also like to separate out the JSON used for listing/getattr (just filenames and attributes) and actual file content (BLOB data) since returning a huge JSON array with all file information and content is just not going to work with bigger data/more rows. One caveat of FUSE is that it has strict user control at the OS level by default; while I can set the USER_ALLOW_OTHER flag in /etc/fuse.conf, I still have to find where in fuse4js I can set the ALLOW_OTHER flag so Apache doesn’t have to run as the same user that mounted the filesystem. Once all of that is wrapped up, I would like to incorporate subdirectories, better file management, and actual generic code (there’s a lot of hard coding in my prototype thus far). If people are interested, I’d consider making a github project or something for this.
All in all, FUSE is a really awesome piece of kit. The filesystem is low level enough that it plays a part in everything a developer, DBA, sysadmin, etc. does and being able to source that component from high level hierarchies/content stores (databases, web pages, big data, etc) has fascinating possibilities. From an Oracle database, a few ideas (other than BLOBs) might be XML files of table data generated from DBMS_XMLGEN (particularly if it queries every time the files are read), a directory/file structure built from the data dictionary, or a /proc like filesystem with low level data pulled from V$ and X$ views.
The post A BLOB Filesystem built with FUSE, node.js, and fuse4js appeared first on Steve Karam :: The Oracle Alchemist.
Just how big is your data?
For the want of a nail the shoe was lost, For the want of a shoe the horse was lost, For the want of a horse the rider was lost, For the want of a rider the battle was lost, For the want of a battle the kingdom was lost, And all for the want of a horseshoe-nail. – Benjamin Franklin
A while back (2007 to be exact, an eternity in Internet years), Google released a product called Google 411. You could call either 1-800-GOOG-411 or 1-877-GOOG-411 and search for businesses by city and state, category, or other criteria. It was a direct competitor to the local expensive 411 services, and it was completely free.
I remember at the time a lot of people I worked with scoffed at Google. It can’t be monetized, they said. Just like people used to say about Google’s 95% blank home page. What could they possibly have to gain from launching a free 411 phone service, obviously at some expense?
The answer is of course data. Google 411 has been shut down since 2010, but while it was around Google was able to harvest millions of examples of dialects, speech patterns, and vocal nuances. By launching a free service to be used and enjoyed by everyone, they were harvesting valuable data into a phoneme database that would play a key role in audio indexing.
To me, this is a fine example of how big data knows no bounds. For some companies big data isn’t just about the data they feel they may need, but the data they may never need. It’s collecting both the signal and the noise because even the noise is valuable. Even small inconsequential bits of log data could one day save your company as Ben Franklin’s horseshoe-nail saves the kingdom. And you’ll never know unless you collect it.
Think of it this way: every time a log file ages out, or you delete/truncate a table, every time a single bit is wiped away a piece of information is lost. Who knows what that information may have taught us once combined and visualized? Lost information is lost measurement returning whatever details it described to an unknown state. And what is unknown to your data is unknown to your company. Data is corporate memory. Even at the very least, storing and tracking data could help prepare your company for an uncertain future.
So should I just save everything?
Well no, that would be crazy. If you haven’t noticed I get a little overzealous when it comes to the possibilities of mass storage and correlation of data. But that’s on a futuristic and unrealistic level. At the corporate level storing every single bit of user, server, network, customer, and business data and retaining it ad infinitum could easily number in the petabytes. Good luck with that MapReduce at realistic prices. Author’s Note: if you’re reading this and you work at Facebook, great job on those petabytes.
Space is big. You just won’t believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it’s a long way down the road to the chemist’s, but that’s just peanuts to space. – Douglas Adams, Hitchhiker’s Guide to the GalaxyIt’s called Big Data, not Biggest Data. What is big data to your company might be peanuts to someone else (like Google and their phonobase). If you decide a foray into big data is worth it to your company and if you can assemble the team, the tools, and the information necessary to get it going the decisions on how far you take it are up to you. You get to pick what to keep and what to discard, how long to keep it, and what you hope to learn from it. If you’re not getting the results you hoped for then you can broaden your horizons or hunker down and keep storing data.
With the awesome compute power of software/hardware like Hadoop, Hive, HBase, MongoDB, Exadata, Teradata, Netezza, GreenPlum, etc, and the ‘resurgence’ of data visualization and statistical computing through commercialization the beauty of Big Data is that it can be as small or big as you want it to be.
The post Just how big is your data? appeared first on Steve Karam :: The Oracle Alchemist.
%The fabulous YAML

-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">
---
The Problem: Moving data outside of a database offers multiple options,
each with their own pros and cons.
The Solutions:
XML (Extensible Markup Language):
What it is: A markup language designed for full featured
data exchange
Info: http://en.wikibooks.org/wiki/XML_-_Managing_Data_Exchange
Provides:
- a schema to provide structure and definitions for your document
- markup to describe your content
- broad acceptance in the business and development communities
JSON (JavaScript Object Notation):
What it is: A more human readable serialized data interchange
format used originally for JavaScript
Info: http://www.json.org/
Provides:
- easy to read key/value pairing
- native datatypes with schema validation
- excellent compatibility with major development platforms
- strong use in AJAX development
CSV (Comma Separated Values):
What it is: Ol' Faithful in data interchange, not very standardized
or descriptive, but still useful
Info: http://en.wikipedia.org/wiki/Comma-separated_values
Provides:
- a quick format for loading structured data in delimited or fixed
format
- comfort for those who don't know or have the inclination to learn a
more modern format
- broad compatibility with nearly any language or database environment
natively
# Which brings us to YAML
YAML:
Info: http://www.yaml.org/
What it is: Data serialization that is super easy to read and write, first
proposed in 2001 by Clark Evans.
What it ain't: YAML Ain't Markup Language # get it?
Description:
YAML is actually designed to be very close to JSON; in fact, every JSON
document is a valid YAML document (but not the other way around). The big
difference is readability. YAML is focused on being extremely human
readable. In fact as you've probably guessed, this blog post is formatted
in YAML.
It is important to remember that YAML is really incomparable to XML.
While both of them can be used as data interchange formats, the purposes
are fundamentally different. Whereas XML is all about defining self
describing data through markup and providing values for that data in
a variety of ways, YAML is purely focused on serializing data in a
readable and parseable format.
And I have to say, YAML just squeezes by on being a parseable format.
Combinations of characters and indentation determine the type of node
each document line is (though multiline content is possible, as is
the case in this text). In the end, the only true types of nodes which
exist in YAML are Collections and Scalars. A collection can be either a
sequence of data or a key/value pair. Scalars are any type of integer,
string, date, and so on. Very few rules are enforced when it comes to
the actual content; however, YAML is very particular about indentation
(of course), and that each node of a collection must be unique. This
is a bonus for relational table loading/unloading, as primary key
integrity is guaranteed in a given collection of mappings. It also
forces more organized and more understandable data for human
reading.
Code exists for Python, PHP, C++, Ruby, perl, JavaScript, and more.
To learn more about YAML, check out the spec and play around with it
a bit. You can even find tools like a YAML Parser online, which can
convert YAML to JSON. In fact, you can copy the contents of this blog
post (dashes to dots) straight into it to see the result.
You can also check out a quick usage with PHP that I made using this
blog text by clicking here.
...
The post %The fabulous YAML appeared first on Steve Karam :: The Oracle Alchemist.








