Skip navigation.

Steve Karam

Syndicate content
Steve Karam's Oracle Blog
Updated: 7 hours 15 min ago

Web Friendly Interactive Data Visualization with D3.js

Thu, 2013-03-14 09:14
Oracle Storage Sunburst

I have been tinkering quite a bit with new database and development frameworks. Recently I decided to beef up my knowledge of data visualization as it’s a very hot topic and the ultimate way to make your data understood.

R is an incredibly popular option which can be as simple or as complex as you want it to be, showing incredibly detailed charts, tables, and more. However, I wanted to find something which was 100% web based and interactive, as this is the easiest way to really show your data visualization creations to the masses and put it to best use in dashboards or other sites.D3.js List Topology

So, enter D3.js. This incredible package is built in JavaScript with interactive components similar to jQuery but adding the drawing, math, and tweening capabilities necessary to build highly complex data visualizations. Anything that can be made with paths, shapes, and colors is possible as you can see on their example page. Personal favorites are the Streamgraph, Cubism, and concept network browser.

But in honor of Pi Day and due to its hierarchical capabilities, I decided to go with an interactive Sunburst. A Sunburst chart is a data visualization method similar to a pie or donut chart but with additional layers for child data. In the case of my tinkering, I decided to create one which includes:

  • Oracle Schemas
  • Object Types
  • Objects

Each of these components is represented along with its respective size in MB. Even cooler, you can click on a component to drill down into it and click the center of the circle to zoom out. All of this was accomplished with no Flash, and it is tested on Chrome, Firefox, Safari, Internet Explorer, and iOS. Chrome and Safari definitely handle it best, but the other browsers appear to work fine. However, if you are reading this from a news aggregator like Oracle Base or OraNA you may need to click over to my actual site or use the full page demo link below.

The original D3.js example on Sunburst charts can be found here. I also took concepts from this page for zooming. The JSON supplying the data was created using the Alexandria PL/SQL Library which includes SQL to JSON capabilities. Alternatively you can use CSV, TSV, and XML to populate D3.js charts. It is worth noting that a decent amount of JavaScript knowledge is needed to really grasp what the code is doing. D3.js makes heavy use of attribute chaining like jQuery which is powerful but sometimes difficult to read and understand.

If you don’t see the example below then there may be a compatibility issue I’m not aware of; please post it in the comments! You can also try my full page demo which may work better for you. You can also view source on the full page demo to see everything I did to make it come together, or just click here.

Be sure to hover over different areas on the chart (labels show underneath) and click around! You can always click the center circle to zoom out.

Sunburst Data Visualization   var width = 340, height = 340, radius = Math.min(width, height) / 2; var x = d3.scale.linear() .range([0, 2 * Math.PI]); var y = d3.scale.sqrt() .range([0, radius]); var color = d3.scale.category20c(); var content = d3.select("#valtext"); var svg = d3.select("#chart").append("svg") .attr("width", width) .attr("height", height) .append("g") .attr("transform", "translate(170,170)"); var partition = d3.layout.partition() .value(function(d) { return d.size; }); var arc = d3.svg.arc() .startAngle(function(d) { return Math.max(0, Math.min(2 * Math.PI, x(d.x))); }) .endAngle(function(d) { return Math.max(0, Math.min(2 * Math.PI, x(d.x + d.dx))); }) .innerRadius(function(d) { return Math.max(0, y(d.y)); }) .outerRadius(function(d) { return Math.max(0, y(d.y + d.dy)); }); d3.json("/sunburst.json", function(error, root) { var path = svg.selectAll("path") .data(partition.nodes(root)) .enter().append("path") .attr("d", arc) .style("fill", function(d) { return color((d.children ? d : d.parent).name); }) .on("click", click) .on("mouseover", mouseover) .on("mouseout", mouseout); function click(d) { path.transition() .duration(750) .attrTween("d", arcTween(d)); } function mouseover(d) { content.append("h2") .attr("id", "current") .text(d.name + (d.size ? " - " + (d.size / 1024 / 1024) + "MB" : '')); } function mouseout(d) { content.html(''); } }); d3.select(self.frameElement).style("height", height + "px"); // Interpolate the scales! function arcTween(d) { var xd = d3.interpolate(x.domain(), [d.x, d.x + d.dx]), yd = d3.interpolate(y.domain(), [d.y, 1]), yr = d3.interpolate(y.range(), [d.y ? 20 : 0, radius]); return function(d, i) { return i ? function(t) { return arc(d); } : function(t) { x.domain(xd(t)); y.domain(yd(t)).range(yr(t)); return arc(d); }; }; }

The post Web Friendly Interactive Data Visualization with D3.js appeared first on Steve Karam :: The Oracle Alchemist.

Cloning Your Oracle Installation

Tue, 2013-03-12 09:58
Clone Maker

If there is one thing a DBA hates it is to waste time. Particularly on boring and tedious tasks. One such task is installing Oracle on a new system (or duplicating a home on an existing system) which generally involves X server setup (Xming for Windows, XQuartz on Mac, pure as the driven snow on Linux) and obnoxious Oracle Installer GUI screens.

Yet despite the widespread angst at performing Oracle installs, I’m surprised at the number of clients and DBAs I see trudging through Next buttons on every single server to get it installed. Nevermind the fact that with modern server deployment we should be able to create new systems which already have Oracle installed and configured; a DBA can clone Oracle Homes from server to server even if enterprise provisioning not an option.

Cloning the Oracle Home is done through simple tools: cp if you are cloning on the same server, tar (or other tools) and scp to go to a new server, and the vanilla runInstaller package from Oracle conveniently located in $ORACLE_HOME/oui/bin.

[oracle@orahost ~]$ echo $ORACLE_HOME
/u01/app/oracle/product/11.2.0/db_1

[oracle@orahost ~]$ cd /u01/app/oracle/product/11.2.0
[oracle@orahost 11.2.0]$ cp -rp db_1 db_2
cp: cannot open `db_1/bin/nmhs' for reading: Permission denied
cp: cannot open `db_1/bin/nmb' for reading: Permission denied
cp: cannot open `db_1/bin/nmo' for reading: Permission denied

In the steps above, we copied the ORACLE_HOME located at /u01/app/oracle/product/11.2.0/db_1 to another directory (/u01/app/oracle/product/11.2.0/db_2). If you use the cp command, make sure you use the -rp flags; -r copies directories recursively, and -p preserves file attributes. You may also notice that a few files could not be copied. Don’t worry about these, when we run the installer it will take care of that.

Once the copy (or tar/untar between servers) is complete, the rest is simple. We will run the runInstaller command in the new Oracle Home and provide the information it requires. This is done silently, so no X Windows server is required at all.

[oracle@orahost 11.2.0]$ cd db_2/oui/bin
[oracle@orahost bin]$ ./runInstaller -silent -clone ORACLE_BASE="/u01/app/oracle" ORACLE_HOME="/u01/app/oracle/product/11.2.0/db_2" ORACLE_HOME_NAME="OraHome2"

Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 3960 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2013-03-12_09-46-15AM. Please wait ...[oracle@orahost bin]$ Oracle Universal Installer, Version 11.2.0.1.0 Production
Copyright (C) 1999, 2009, Oracle. All rights reserved.

You can find the log of this install session at:
 /u01/app/oraInventory/logs/cloneActions2013-03-12_09-46-15AM.log
.................................................................................................... 100% Done.


Installation in progress (Tuesday, March 12, 2013 9:46:35 AM EDT)
.............................................................................                                                   77% Done.
Install successful

Linking in progress (Tuesday, March 12, 2013 9:46:43 AM EDT)
Link successful

Setup in progress (Tuesday, March 12, 2013 9:48:14 AM EDT)
Setup successful

End of install phases.(Tuesday, March 12, 2013 9:50:31 AM EDT)
WARNING:
The following configuration scripts need to be executed as the "root" user.
/u01/app/oracle/product/11.2.0/db_2/root.sh
To execute the configuration scripts:
    1. Open a terminal window
    2. Log in as "root"
    3. Run the scripts
    
The cloning of OraHome2 was successful.
Please check '/u01/app/oraInventory/logs/cloneActions2013-03-12_09-46-15AM.log' for more details.

The actual command we used was:

./runInstaller -silent -clone ORACLE_BASE=”/u01/app/oracle” ORACLE_HOME=”/u01/app/oracle/product/11.2.0/db_2″ ORACLE_HOME_NAME=”OraHome2″

These flags defined:

  • -silent: specifies that the command should be done without interaction
  • -clone: specifies that the home is a clone of another location
  • ORACLE_BASE: the base directory for Oracle products as per the Optimal Flexible Architecture (OFA)
  • ORACLE_HOME: the new directory location
  • ORACLE_HOME_NAME: a unique name for the Oracle Home for the server you have cloned to

Once you have finished running the command and have confirmed success, the standard root.sh (and orainstRoot.sh if this is the first Oracle install on the server) execution by the root user applies.

Good luck, and happy cloning!

The post Cloning Your Oracle Installation appeared first on Steve Karam :: The Oracle Alchemist.

Wearable Computing and Personal Data Collection

Sun, 2013-03-10 18:35
Vitruvian Man

Wearable computing is all the rage right now and is poised to explode in the consumer market once a number of high-profile projects go into full production. The bulk of buzz surrounding the technology is focused on entertainment value and consumer usage. But how might wearable computing and data analysis come together to improve our lives? What kind of data can we collect and harness?

While wearable computing as a whole didn’t even make it into the Gartner Emerging Technologies 2012 Hype Cycle, it is obviously up and coming. Let’s take a look at some of the current and up-and-coming offerings. Or if you’d rather get right to the Personal Data Collection portion, click here.

Pebble

Pebble was and is a Kickstarter sensation, smashing through its $100,000 goal and raising a total of $10,266,845. This watch comes in multiple colors and features an e-ink screen for low battery consumption and customization. Owners can select custom watch faces from Pebble and eventually from developers.

Pebble Watch

But what is really special about Pebble is the tech behind the scenes. Bluetooth connectivity allows it to interact with your smartphone (iOS and Android) to receive notifications wrist-top. SMS, calendar events, even Twitter mentions and Facebook likes will set your wrist abuzz and the message will pop up. In fact, you can even use it for Grid Control alerts. At the moment apps for the Pebble are scarce, but once they catch up with their production requirements it is expected that the SDK will be released to developers to allow custom watch faces and apps which interact over bluetooth with a phone.

While the Pebble has some great potential as a consumer device and does contain a 3-axis accelerometer, it is not yet fully exploitable for data collection. Pebble is at this time primarily a consumer device, though with the right additions it could one day become a true input/output device on its own. That’s not stopping Pebble though, and it sure as heck isn’t stopping me from wearing my Kickstarter Edition!

Google Glass

Google Glass is a much hyped Minority Report meets Geordi La Forge headset that fits on your face like a pair of glasses. It includes a small screen that acts as a heads up display and various inputs including audio and video. With Google Glass, you can walk around taking pictures and recording anything you see all by saying a few commands. You can send and receive messages, video (including real time for chat), get directions overlaid in your visual view, get pertinent information such as flights or monetary conversions, and more.
Stay Glassy
It all starts with the words “Ok Glass”.

This level of geekism and ease of audio and video storage has already prompted one PR savvy bar to ban Google Glass preemptively. Google is masterful when it comes to data collection/analysis and a device which people will beg to wear around that captures video, audio, and interactions with a simple phrase is right up their alley.

And to be honest I can’t help but be taken in by it. Granted, most of the data collection it performs will be consumed by individuals and harnessed by Google; this is a monetization model which already works very well for them and social media sites. With the presentation of Google Shoes yesterday at SXSW, there’s a real possibility Google may be going the whole gamut in regards to wearable computing.

MYO Gesture Control Armband

Jedi Myo TricksThe MYO armband is an up and coming technology that you wear on the forearm, granting you Jedi powers over wireless devices and is a strong contender in the ultimate goal of creating a Spider-Man web shooter. It is purely an input device (think of it as a Wii controller without the controller) which can read muscle activity in the arm and hand, allowing control over a variety of devices.

With a pair of MYOs on your arms, you can conduct an electronic symphony through arm and finger movements, control wireless quadcopters, flip through slides, or turn devices on and off. It has excellent implications for any wireless technology which requires any form of touch or accelerometer based input. Thanks to its ability to read electrical signals from the muscles, it has a lot of potential for personal data collection (a topic we will cover shortly).

Fitbit, Jawbone UP

UP UP and awayI put these together because they’re close competitors and are used for the same purpose. Both of these gadgets are primarily used as advanced pedometers.

The Jawbone UP is a band you wear on your wrist which collects information about how your arms swing. By wearing the UP 24×7 it can keep track of the amount of activity you get, how many steps you walk, how long you stay idle, and even how you sleep at night. All the data it collects is shipped wirelessly to your smartphone where you can view and publish the information.

Fitbit UltraThe Fitbit is the same concept (and in my opinion, a better device) which you wear on your pants or in your pocket when awake and tucked into a wristband when you sleep. They also have an upcoming device called the Flex which will be purely for the wrist and is a direct competitor to the UP. Fitbit can record your steps, activity level, and the amount of stairs you climb throughout the day which is sent wirelessly to your phone for interaction.

Both of these devices have a cool feature in common: their phone and web based tools allow you not only to see the data, but to correlate it with other activities. You are encouraged to enter the foods you ate or the mood you were in during certain times so you can view analytics on how the quality of your sleep, the food you eat, and your activity levels play on your day to day life. Both devices are pure sensors, but the data they collect is used primarily for helping you understand more about your activity levels and how you can improve.

Personal Data Collection

Personal Data Collection (PDC for short from this point on) is not a new concept; in fact it has been going on since people started writing diaries and journals. But with wearable computing, both its ease of collection and its benefits can increase exponentially. Part of the problem with forecasting or correlating events in the lives of a person is that you have to rely on that person to keep accurate records in regards to both activities and time. And often, there are details that are simply impossible for a person to measure or record, leading to missing data.

But what if we could capture most, if not all, the personal data regarding the daily use of the human?

The problem is the sensors. While we can put glass on a person’s face, bands on their forearms, and shoes on their feet we are still missing a wide variety of data that comes from the brain and other organs. Of course it would be completely invasive to put a chip in everyone’s brain, gut, and kidneys just go grab some data, but boy would it be enlightening. In the case of organs, perhaps X PRIZE has it right and we should be looking at external collection methods like their joint venture with Qualcomm challenging teams to make a working medical tricorder a la Star Trek for a $10 million prize.

Just with the devices I listed above, you can collect the following information individually:

  • Steps walked, stairs climbed, whether the person limps or slows after a certain time
  • Sleep movement, wakefulness, amount of time it takes to fall into deep sleep
  • Downward pressure of each step, foot angle, general posture, walk type, activity level
  • Arm movement frequency, finger movement frequency, limb dexterity, muscle group usage, muscle density, flexibility, frequent arm muscle group usage
  • Voice data, talking speed, language, dialect, social interaction methods and frequency, aural and visual interactions with people and objects, audio preferences, visual preferences, chat preferences and frequency, and even an idea of pleasing visual stimulus from characteristics of photos taken, mood via voice analysis

That is a ton of data (most of it from the Google device of course) and it doesn’t even begin to correlate the data between the different wearable gadgets. If these devices could talk to each other in some standardized way (Wearable Communication Protocol?) the amount of information we could put in use in our own personal lives would be astounding. For instance:

  • Gain insight into what activities and visuals make us the most happy
  • Gauge activity levels during a day following a certain sleep pattern
  • Discover who you prefer to communicate with following meals, sleep, or exercise
  • Determine the people who have the most positive influence on your life (very neat)
  • Hand-eye coordination, hand-foot coordination
  • Physical movement, fidgeting, and audio cues while using a computer (muscle movement in the fingers from typing)

Data VisualizationIt all makes for an extremely impressive set of information which can be used for trending and forecasting along with comprehensive personal data visualization. This information can be used to make decisions, provide data to doctors or psychotherapists, help with posture, sleeping habits, pain management, activity levels, even social interactions. Knowledge is power as they say and personal data collection through sensor measurement has the ability to help us transform our lives.

There are plenty of other things we could easily detect with modern technology: sweat levels, eye movements, breath depth, even muscle movements in the face to detect smiling and frowning. And who knows what the future will bring? Is there some way to measure caloric and nutritional intake without requiring the eater to jot down their meal? The more we can collect and store, the more we can crunch and analyze. No matter how you look at it, the real value of wearable computing is not consumer entertainment but consumer enlightenment.

The post Wearable Computing and Personal Data Collection appeared first on Steve Karam :: The Oracle Alchemist.

The Social Circle

Fri, 2013-03-08 09:32
Social Networking Icons

Despite the popular idea of tech folks silently skulking in their cubicles with indie music streaming in through white earbuds, the fact is that most IT folks are a chatty and social bunch. More than likely you are in possession of at least a Facebook account and more than likely a Google+ account, Twitter account, and maybe even a LinkedIn account as well (we’re all in the top 5% or 1%, right?).

So I wanted to take this opportunity to offer my readers a quick list of where you can find me in the social world.

Follow Me on Twitter

I do quite a bit of chatting on Twitter because it’s such an easy and quick platform for day to day chatter. At 140 characters you can say a surprising amount as long as you don’t try to talk about the DBMS_TRANSFORM_EXPIMP_INTERNAL or something. You can find me there at @OracleAlchemist. I’ll be happy to follow back!

Like (and Share!) My Facebook Page

I recently signed up for a Facebook Page for Oracle Alchemist as well. I plan to use this to share blog articles as well as other content that is not right for the blog and too long for Twitter. You can find my Facebook Page at facebook.com/oraclealchemist.

Add Me on LinkedIn

I’m still up in the air on the LinkedIn platform. On the one hand it’s a social network of professionals and that’s always a good thing. On the other hand I find the interface to be extremely clunky and the mobile apps to be abysmal. Be that as it may, I post my blog content there as well as participate in Group discussions. Feel free to look me up at linkedin.com/in/stevekaram!

Add Me to Your Google+ Circles

I was originally in the Google+ Beta, but ended up leaving due to a lack of unique content. However, they have definitely grown quite a bit and I am glad to say I have a new profile there. Not only is it useful for SEO purposes, but it also is a great platform for random chat, ideas, and collaboration.

Got a Blog?

Outside of the social networks there is, of course, this blog. Please feel free to comment on posts. I don’t mind link dropping as long as it’s relevant (though you may have to wait for me to approve the comment). Also depending on the blog I may add it to the sidebar or link it from future articles.

Thanks all, and happy Friday! I hope to see you out there on the social web.

The post The Social Circle appeared first on Steve Karam :: The Oracle Alchemist.

Oracle GoldenGate in the Enterprise

Thu, 2013-03-07 08:25
Oracle GoldenGate Architecture DiagramGoldenGate – The Bridge to Everywhere

Oracle GoldenGate has generated a lot of buzz in the Oracle community since the software was purchased by Oracle in 2009. It is marketed by Oracle as a niche application, but one that can be of use in nearly any Oracle environment. Additionally, Oracle has made it clear that they expect GoldenGate to be the chief replication method in the future instead of Streams of Advanced Replication.

The reasoning behind this is simple: GoldenGate is a true heterogeneous replication solution that is not tied directly to the Oracle software. Using GoldenGate it is possible to replicate from nearly any database to any other database. Available platforms include: DB2, MySQL, Oracle, SQL/MX, SQL Server, Sybase, Teradata, TimesTen, flat files, and many other platforms via Java. The software is capable of replicating in near real time or on a lag, and can take a single source and pump it to multiple destinations. GoldenGate is like the highway system for databases, allowing quick transfers to and from any location with many extra destinations easily available.

GoldenGate accomplishes these tasks through a simple set of processes distributed around a centralized messaging service for communication. Before we dive into how GoldenGate can be used in your Oracle environment, let’s go through the components of GoldenGate to get a better understanding of the software.

Oracle GoldenGate Architecture Diagram
Source: Oracle Corporation

The Extract Process

The first step in any replication process is to capture changes on the source environment. GoldenGate accomplishes this via Extract processes that are able to read transaction logs on the source system and write change data into a trail file (we will cover this next). Multiple Extract processes can be configured which pull different sets of data. This data can be pulled constantly (24×7 replication), on a scheduled basis, or even configured as an event-based system where capturing of data begins when certain data changes are made.

By reading from transaction logs, GoldenGate is capable of recording changes with very minimal overhead to the source system. Even better, it is possible to use an Oracle 11g Active DataGuard environment as a GoldenGate source, which means that the source system is barely impacted at all.

The Extract process is also capable of using stored data as a source instead of transaction logs. This is mostly useful for making an initial copy of a database to a target environment.

Trail Files

The Trail is where all extracted changes are stored. It holds a combination of DDL and DML. Trail files can be designated as either a local trail that stays on the same server as the extract, or a remote trail which is sent via TCP/IP to a target system.

GoldenGate maintains the trail files; the main responsibility of the DBA is to initially set up the storage and locations for the files. As data is extracted, it is written to the trail files. As the files fill up, new trail files are formed and the data keeps writing. As data is consumed, GoldenGate is capable of purging old trail information.

Oracle documentation recommends using separate disk for the trail file to guarantee GoldenGate processing is fast and does not contend with other resources on the environment. Even with fast systems and purging turned on, the trail file will need room to grow particularly in the event of a network outage. In order to know how much space is required, Oracle recommends the following formula:

(log volume in one hour) * (max number of hours network downtime) x 0.4 = trail space

This is based on Oracle’s estimate (page 6) that the GoldenGate Extract will actually record only 40% of data in redo logs. One of the things that makes GoldenGate so fast is the limited amount of data required for Extract; for instance, GoldenGate only extracts committed transactions whereas the redo logs hold all transactions whether committed or uncommitted.

Trail files are almost always written remotely for the purposes of replication. However, the Extract can store trails locally so that they can be picked up by another process called a Data Pump that will send the trail data to multiple destinations. If a Data Pump is not used, all written data will be on the target system and no storage is required for trail files on the source environment.

Collectors

The Collector runs on the target system and receives trail file information to be written to disk. As data is extracted and sent over the network, the Collector will place the data in the proper files to be picked up by the Replicat process.

Replicat

The Replicat process consumes data from trail files on the target system. This process is able to parse all trail information, run DDL and DML against the target environment, and keeps track of written records into an area called the checkpoint table. In order to maintain consistency and guarantee data is picked up properly after errors, the checkpoint table records the progress of replicat processes as data is consumed.

Replicats are capable of loading large sums of data at once or in a synchronized configuration. For instance, one replicat can be created to do an initial data load from a source environment, and another can be created to continue synchronizing changes once the initial load is complete.

If the replicat process receives errors, it will record its progress and abort. In GoldenGate, the act of aborting the process due to error is called abending. If a process is abended, it has aborted due to error and must be fixed.

Usage Overview

GoldenGate uses these components in order to build a complete replication environment. Extract processes pull data and write to trail files which are processed by collector processes and consumed by replicat processes. The real magic behind GoldenGate happens by exploiting this configuration in various ways:

  • Multiple extract processes can pull different sets of data in parallel and broadcast it to multiple targets.
  • Single extract processes can pull data and broadcast via Data Pumps
  • Multiple systems can extract to a single target where multiple replicat processes can consolidate data
  • Use multiple database types in any combination for the source(s) or target(s)
  • Filter and/or map data en route to the target for custom data requirements

In this way, GoldenGate can be used to completely replace even the most vigorous replication or ETL processing structure. With these capabilities in mind, let’s talk about some of the ways GoldenGate could be used in the average Oracle environment.

Making the Most of GoldenGate

The GoldenGate installation is easy enough that it can be deployed on any system. It works on Unix, Linux, and Windows and transmits data over TCP/IP. The software itself is self-contained and can literally be set up within minutes. So with all this ease of use, how can we put it to work?

Simplified Extract, Transform, and Load (ETL)

Oracle has many options for ETL and ELT. Most notable is Oracle Data Integrator (ODI), which now includes integration with GoldenGate for enhance change synchronization. ODI utilizes a highly efficient ELT process to collect multiple sets of data, load to a staging location, then transform the data for real-time warehousing. If you’re interested in learning more about ODI, this blog has some great information.

Even without ODI, GoldenGate is easily usable for the aggregation of data for warehouse loading. Extract processes can pull data simultaneously and in real-time from Oracle, SQL Server, AS/400 environments, XML, flat files, and many more sources. All of this data will be written as multiple trail files to the target environment that will then be written to the warehouse (again, on any platform) by replicat processes.

Using this method, GoldenGate can even filter and clean the data during the process in order to achieve the bulk of ETL work in a single set of steps. Or if you prefer, the data can simply be aggregated then cleaned by other methods on the target system for ELT.

Data Distribution

With the use of Data Pumps to send single data sources to multiple locations, it is very easy to guarantee that data across your enterprise is consistent and reliable. The extract process writes data to a local trail file that is picked up by a data pump. The data pump in turn sends this information to multiple destinations (near or far) that are collected and written to the target environment via the replicat. In this modern era of disk storage, data redundancy such as this eliminates the need for costly database links and instead allows real-time access to remote data by making use of data duplication rather than network reads.

In fact, the ‘real-time’ buzzword is huge with GoldenGate, and Oracle is using it as much as possible. The core theme behind GoldenGate is real-time access to real-time data; that is, the data you require will be accessible immediately and will always be up to date. Replacing database links with synchronized copies will improve performance on many systems tremendously as long as the data is guaranteed to be fresh.

Refresh Development Servers

One of the biggest annoyances for many DBAs is refreshing development systems with up to date data. GoldenGate is formed of reusable components, and it is very easy to extract and replicat initial data loads. By creating these processes in GoldenGate once, a DBA can initiate easy on-demand refreshes of development environments.

By using data pumps, the DBA can use a single extraction and use it to populate multiple development environments, individual environments for different development teams, DBA sandboxes, or development/QA environments in parallel. These environments can be spread across multiple operating systems, database systems, or even versions. Testing on 11g (or 12c soon) will be a breeze due to the ability to seamlessly replicate.

While it may not be as fast as solutions like Delphix, it is a valid method of refreshing via replication.

Upgrades

The capability to move between versions introduces another logical use of GoldenGate: upgrading to new releases. GoldenGate can replicate seamlessly from server to server or locally and can be used to duplicate a database onto a new version using existing processing configurations.

For instance, if GoldenGate is used to replicate data between a 10g database and another 10g database, an upgrade is as simple as stopping replication to the target system temporarily, upgrading to 11g, then starting replication again. GoldenGate operates outside of the data dictionary and can seamlessly work between these versions. The same goes for moving to another server or even to another operating system.

Rolling Development into Production

Multiple extracts and replicats can be easily configured to pull development data and merge it with production tables. Alternatively development data can be pulled from multiple locations, or development and production data can be pulled from their respective databases and merged into a new environment. Whichever way you choose, the configurations for the extracts and replicats can be re-used to perform these operations on a regular basis.

Migrating 3rd Party Utility Databases

Some 3rd party utilities only allow certain database platforms. If you have a 3rd party application which is only designed to run on MySQL or SQL Server for instance, the data from this system can be send to an Oracle database for reporting or historical record keeping. On some 3rd party utilities it may even be possible to integrate data from other systems into the 3rd party database. With Java connectors, it is even possible to load data into NoSQL options like MongoDB or Riak with the right coding.

Using Business Data in SharePoint

Many companies have started using SharePoint or other portal tools for their corporate infrastructure. Using GoldenGate, information from financial, customer, HR, or other databases can easily be integrated into the company portal no matter which server houses the SharePoint database.

Trouble Ticketing Systems and Monitoring Tools

Using GoldenGate, it is possible to migrate data from trouble ticketing systems over to monitoring environments. For instance, if your company uses a ticketing tool to receive trouble tickets on a customer-facing application and a monitoring environment for IT personnel, GoldenGate can act as a messaging system by copying ticket information into the monitoring environment. Alternatively the flow could be reversed, and monitoring information could be used to supply extra information on tickets.

Master Data Management (MDM)

The central component to many distributed systems, MDM is becoming more and more popular in large corporate environments. MDM solutions are central databases which hold key corporate data in an attempt to always ensure accurate information across distributed environments. For instance, if a company makes use of five different databases and applications which always requires information about employees, an MDM environment could be a central aggregation point for employee data which can then be used in the external environments.

In these types of setups, GoldenGate can be used both for supplying MDM data into other databases and for loading MDM data. Data pumps from the MDM central repository can spread information to all subscribing databases, and applicable information from databases in the enterprise can ship information to MDM for central storage.

Conclusion

Oracle GoldenGate 11g is poised to make big changes in the way data is moved between environments. While replication has existed for a long time, GoldenGate is the first highly available real-time replication tool with full heterogeneous support across many platforms. With the extensible extract/replicat design it is easy to deploy and reuse on many different systems for many needs. While there are many other viable replication options out there, Oracle GoldenGate is a very strong contender for enterprise usage.

Note: This article originally appeared in IOUG SELECT Journal Q4 2011.

The post Oracle GoldenGate in the Enterprise appeared first on Steve Karam :: The Oracle Alchemist.

Add History and Tab Completion to SQLPlus

Wed, 2013-03-06 07:10
SQLPlus With Color

In the uncharted backwaters of the Internet there exists a really awesome tool called rlwrap. This readline wrapper is able to call programs that are deficient in modern amenities like history, color, and tab completion, intercept input/output, and replace it as necessary with useful goodies. And what program could be more deficient than SQLPlus (sorry Tanel)?

With rlwrap you can instantly have history in SQLPlus by pressing the ‘up’ key. Not only that, but the history is remembered even if you exit and come back in later. Tab completion is also possible with a dictionary file, which you can easily populate with your favorite collection of tables, packages, etc. If you’re like me and for whatever reason have never taken to graphical tools but hate the missing features in SQLPlus, this tool is definitely a necessity. So let’s get it installed!

Download and Install

I was able to find an RPM of rlwrap on pbone for EL6 x86_64. You can download it directly here if that is what you are running, or go to this page if you need a different release.

After downloading to the server, I just had to do an RPM install:

[root@orahost ~]# rpm -Uvh rlwrap-0.37-1.el6.x86_64.rpm 
warning: rlwrap-0.37-1.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY
Preparing...                ########################################### [100%]
   1:rlwrap                 ########################################### [100%]

Trying it with SQLPlus

Out of excitement I had to try it right away just to see what it could do. Be aware, rlwrap only works with interactive command line so I don’t recommend trying to use it in scripts. Instead, I ran the simple command: rlwrap sqlplus hr/hr

[oracle@orahost ~]$ rlwrap sqlplus hr/hr

SQL*Plus: Release 11.2.0.1.0 Production on Tue Mar 5 20:31:11 2013

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>

I was greeted with the familiar prompt and no indication anything was different. But when I typed a command (select * from departments;) and ran it, then pressed up, my previous command appeared. Unfortunately I can’t really show you this easily so you’ll just have to imagine it. Press up on your keyboard. Pretend it showed your last SQLPlus command. Be amazed.

Adding a Dictionary

But now we’re going to do something even better: add a dictionary so we can use tab completion. I generated the dictionary with the following script:

#!/bin/bash

sqlplus -s hr/hr << ! > sqlplus.dict
set head off pages 0 linesize 150 echo off feedback off verify off heading off
select object_name from all_objects where object_type in (
  'TABLE', 'VIEW', 'PACKAGE', 'PROCEDURE', 'FUNCTION'
);
!

You can have this script login as whoever you want (a DBA user would give you every table, package, procedure, and function in the entire DB). When it was finished, I had a nice sqlplus.dict file which contained everything I wanted tab completion to recognize.

We can fire up rlwrap again, this time with the -i flag which tells rlwrap to ignore case for tab complete, and the -f flag which tells it where the dictionary file is.

[oracle@orahost ~]$ rlwrap -if sqlplus.dict sqlplus hr/hr

SQL*Plus: Release 11.2.0.1.0 Production on Tue Mar 5 21:54:15 2013

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select * from USER_TAB
USER_TABLES                     USER_TAB_COL_STATISTICS         USER_TAB_MODIFICATIONS          USER_TAB_PRIVS_MADE             USER_TAB_STAT_PREFS
USER_TABLESPACES                USER_TAB_COMMENTS               USER_TAB_PARTITIONS             USER_TAB_PRIVS_RECD             USER_TAB_SUBPARTITIONS
USER_TAB_COLS                   USER_TAB_HISTGRM_PENDING_STATS  USER_TAB_PENDING_STATS          USER_TAB_STATISTICS             
USER_TAB_COLUMNS                USER_TAB_HISTOGRAMS             USER_TAB_PRIVS                  USER_TAB_STATS_HISTORY          
SQL> select * from USER_TABLES

When I typed user_tab, I pressed tab three times and it showed me my available choices. Adding the letters le and pressing tab again completed it to USER_TABLES. If there are too many choices you get the familiar “Show all?” message from bash:

SQL> select * from ALL_
Display all 403 possibilities? (y or n)

One important note, it seems the default behavior for CTRL+C is changed when using rlwrap. You have to press CTRL+C then press Enter to cancel your command. Worth it!

Conclusion

Needless to say I instantly decided that I was in love with rlwrap and that it was going to retain a permanent position in my toolset. So I made an alias for it:

alias sql='rlwrap -if ~oracle/sqlplus.dict -pgreen sqlplus'

Now I can type the ‘sql’ command followed by any sqlplus arguments and it will automatically load the dictionary file for me and load up rlwrap. You can name the alias anything you want: sqlplusplus, sqlplussest, omgsql, it doesn’t matter. Just don’t overwrite your actual sqlplus command with the alias, because you will need the original for scripts where rlwrapper does not play nice.

One other thing you may have noticed is the -pgreen. I thought SQLPlus needed a smidgeon of color.

SQLPlus With Color

Have fun!

The post Add History and Tab Completion to SQLPlus appeared first on Steve Karam :: The Oracle Alchemist.

Getting Busy with Riak

Tue, 2013-03-05 06:00
1410258_95416504This entry is part 2 of 2 in the series Database Diversity


Database Diversity

Welcome back to Database Diversity, where the queries are made up and the relationships don’t matter. Today we’re going to talk about a simple yet insanely powerful NoSQL Database called Riak. While its core functionality is a key/value datastore, Riak is capable of so much more: full text querying, MapReduce, and complex clusters. In our examples, we will install Riak on Oracle Enterprise Linux 6.3 and use it to store some data then retrieve and delete it. This basic operation is very easy; in fact, it would be boring to show you from the command line because it is well documented and involves some simple curl commands. Instead, we’re going to write, read, and delete via PL/SQL functions.

But first, let’s talk about what makes Riak so cool. Even if you don’t know how to use it, you should know the features; with 25% of the Fortune 50 using it, you never know when you might get asked to support it.

Riak the RESTful

Riak has a full HTTP 1.1 compliant API. This means that you can perform read, write, delete, and query operations using standard PUT (POST), GET, and DELETE operations. As such, curl is a great tool for working from the command line but any tool that can make HTTP calls will work.

Riak the Bountiful

On top of the standard HTTP API, it also has APIs for Java, PHP, Python, Ruby, JQuery, node.js, and C/C++. The language of choice is Erlang; after all, Riak was written in it. Between the RESTful API and other programming language support, Riak can be used from nearly anywhere.

Riak the Scalable

A huge selling point for Riak is that it is very scalable. Adding a node to a cluster is as simple as your first install, and doing so gives near linear performance on complex functions and with loads of concurrency. The data replication allows very robust fault tolerance (many failed nodes will still allow full access) and its distribution algorithms yield predictable results from even distribution.

Riak the Free

basho-riak Riak is open source, though the parent company Basho does offer paid options. Riak Enterprise is offered for multi-datacenter replication, monitoring, and support. For storage, Riak Cloud Storage offers an S3 API compatible storage engine with replication capabilities. If you don’t need these services, you can scale Riak to your heart’s content for nothing.

Installing Riak

Now that I’ve sung its praises, let’s get down to the install. I have a simple Oracle Enterprise Linux 6.3 x86_64 VM I spun up for the purpose using a standard server install. In order to install Riak, we must:

  • Install the expect library
  • Download riak 1.3 from AWS
  • Install riak via rpm
  • Set open file limit to 4096
  • Start Riak

[root@orahost ~]# yum install expect
...
Installed:
  expect.x86_64 0:5.44.1.15-4.el6

[root@orahost ~]# wget http://s3.amazonaws.com/downloads.basho.com/riak/1.3/1.3.0/rhel/6/riak-1.3.0-1.el6.x86_64.rpm
--2013-03-03 19:01:08--  http://s3.amazonaws.com/downloads.basho.com/riak/1.3/1.3.0/rhel/6/riak-1.3.0-1.el6.x86_64.rpm
Resolving s3.amazonaws.com... 72.21.195.97
Connecting to s3.amazonaws.com|72.21.195.97|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 23875120 (23M) [application/x-rpm]
Saving to: “riak-1.3.0-1.el6.x86_64.rpm”

100%[============================================================================================================================>] 23,875,120  6.64M/s   in 4.0s

2013-03-03 19:01:12 (5.64 MB/s) - “riak-1.3.0-1.el6.x86_64.rpm” saved [23875120/23875120]

[root@orahost ~]# rpm -Uvh riak-1.3.0-1.el6.x86_64.rpm
Preparing...                ########################################### [100%]
   1:riak                   ########################################### [100%]

root@orahost ~]# ulimit -n 4096
[root@orahost ~]# riak start
Attempting to restart script through sudo -H -u riak

And we are done and installed. At this point Riak is fully usable for reads and writes.

How Riak Storage Works

Riak data is stored inside of buckets, a flat namespace you can think of like a table. Buckets are unique in a Riak installation and can store oodles of key/value pairs. Keys are like primary keys, unique within a bucket. Values are where the magic happens: they can be anything. Strings, XML, JSON, binary, you name it. You can use it to store session data (login, expiry, preferences, etc), log files, backups, large XML files, whatever. The power to quickly store and retrieve data of any kind is where Riak shines.

Now, when an Oracle or other RDBMS person looks at this they may wonder why it’s so appealing; after all, you can make a table with a KEY, VALUE column combination and use that for the same purpose. But the strength of Riak is in its scalability and flexibility. It’s not meant to take the place of a relational database but it can be a great tool alongside one (or on its own).

Let’s PUT in Some Data!

In order to write to Riak, we write to a URL in the form of: http://hostname:8098/buckets/bucketname/keys/key (note 8098 is the default port). The body of the HTTP PUT request will be the data to store into the bucket with the key you specified. For instance, to write into a bucket called ‘logdata’ on localhost with a numeric key, you could use: http://localhost:8098/buckets/logdata/keys/1.

There are a few required/recommended parameters, but the ones we’ll concern ourselves with here are Content-Type and Content-Length. And to do it, we will use the UTL_HTTP package built into Oracle.

Before we can do that, we have to connect to Oracle as SYS and setup Access Control Lists (ACL):

BEGIN
  dbms_network_acl_admin.create_acl (
    acl          => 'riak.xml',
    description  => 'Permissions to access risk datastore',
    principal    => 'RIAK_USR',
    is_grant     => TRUE,
    privilege    => 'connect'
  );
end; 
/

begin
  dbms_network_acl_admin.assign_acl (
    acl => 'riak.xml',
    host => '127.0.0.1'
  );
end;
/

In the two PL/SQL blocks above, we are creating an ACL called riak.xml which gives the RIAK_USR user the ability to connect to host 127.0.0.1. Now that it’s done, we can run UTL_HTTP operations against localhost.

Here is the function I’ve created to do this:

create or replace function testput(pv_bucket in varchar2, pv_key in varchar2, pv_data in varchar2)
return varchar2
is
  lv_req utl_http.req;
  lv_rsp utl_http.resp;
  lv_out  varchar2(32767);
begin
  lv_req := utl_http.begin_request(url => 'http://127.0.0.1:8098/buckets/' || pv_bucket || '/keys/' || pv_key,
                                   method => 'PUT');
  utl_http.set_header(lv_req, 'Content-Type', 'text/plain');
  utl_http.set_header(lv_req, 'Content-Length', length(pv_data));
  utl_http.write_text(lv_req, pv_data);
  lv_rsp := utl_http.get_response(lv_req);
  lv_out := lv_rsp.status_code;
  utl_http.end_response(lv_rsp);

  return lv_out;

exception when utl_http.end_of_body then
  utl_http.end_response(lv_rsp);
  return lv_out;

end;
/

In this function, you can pass in a bucket name, key, and value. The function opens a PUT request, sets the headers, and writes the body with the utl_http.write_text procedure. The response that is returned is the status code from riak (204 is normal for inserts).

SQL> select testput('newbucket', '1', 'My test document') from dual;

TESTPUT('NEWBUCKET','1','MYTESTDOCUMENT')
--------------------------------------------------------------------------------
204

GET It?

Using a GET operation I can pull data by key, like with this function:

create or replace function testget(pv_bucket in varchar2, pv_key in varchar2)
return varchar2
is
  lv_req utl_http.req;
  lv_rsp utl_http.resp;
  lv_out  varchar2(32767);
begin
  lv_req := utl_http.begin_request(url => 'http://127.0.0.1:8098/buckets/' || pv_bucket || '/keys/' || pv_key,
                                   method => 'GET');
  lv_rsp := utl_http.get_response(lv_req);
  utl_http.read_text(lv_rsp, lv_out);
  utl_http.end_response(lv_rsp);

  return lv_out;

exception when utl_http.end_of_body then
  utl_http.end_response(lv_rsp);
  return lv_out;

end;
/

This function looks an awful lot like the PUT function. It retrieves data from the Riak database using a bucket and key with a GET operation. The body of the returned page is returned from the function.

SQL> select testget('newbucket', '1') from dual;

TESTGET('NEWBUCKET','1')
--------------------------------------------------------------------------------
My test document

And to make sure we cover it, let’s delete that row (and verify).

SQL> select testdelete('newbucket', '1') from dual;

TESTDELETE('NEWBUCKET','1')
--------------------------------------------------------------------------------
204

SQL> select testget('newbucket', '1') from dual;

TESTGET('NEWBUCKET','1')
--------------------------------------------------------------------------------
not found

If you guessed we did this with an HTTP DELETE operation, you’d be right:

create or replace function testdelete(pv_bucket in varchar2, pv_key in varchar2)
return varchar2
is
  lv_req utl_http.req;
  lv_rsp utl_http.resp;
  lv_out  varchar2(4000);
begin
  lv_req := utl_http.begin_request(url => 'http://127.0.0.1:8098/buckets/' || pv_bucket || '/keys/' || pv_key,
                                   method => 'DELETE');
  lv_rsp := utl_http.get_response(lv_req);
  lv_out := lv_rsp.status_code;
  utl_http.end_response(lv_rsp);

  return lv_out;

exception when utl_http.end_of_body then
  utl_http.end_response(lv_rsp);
  return lv_out;

end;
/

With those three functions, we can store (and update), retrieve, and delete any data we want. You can store VARCHAR2 and CLOB data, BLOBs as binary data, XML from DBMS_XMLGEN, JSON using the Alexandria Library, and really anything else you can think of. This data can be used for programming logic (like session data storage) or for later analysis with MapReduce.

And because they look so much alike, it would probably be really easy to put them in a nice Riak helper package. A project for another day, perhaps.

Conclusion

Just key/value storage to a free, highly distributed and redundant database is useful and impressive. But as I’ve said, there’s a lot more you can do with Riak. For instance:

  • Riak Search allows you to perform full text searches across the values of a bucket. This is an indexed search which is not quite as powerful as a MapReduce, but performs better and more predictably.
  • Secondary Indexes (2i) allow you to pick and choose what will be indexed and how. You can define tags for your data which allow you to search it on your terms; for instance, for a bucket full of emails, you can add tags for twitter and blog addresses, then search on those.
  • Links which allow one way relationships between objects that can be walked to form loose graph-like relationships.
  • MapReduce is the big kahuna. As with most systems, you can do a Map to gather a single set of data or use Reduce functions to further aggregate the info. Riak supports MapReduce functions in JavaScript and Erlang.

I plan on publishing some MapReduce examples in the near future, but for now I value my sanity and wish to go on pretending GROUP BY CUBE is the best aggregating mechanism since they fired up the hazelnut chocolate mixer at the Nutella plant.

The post Getting Busy with Riak appeared first on Steve Karam :: The Oracle Alchemist.

A Few Words on Oracle Licenses

Fri, 2013-03-01 10:37
statistics

Despite reams of legalese and countless blog/forum posts, basic Oracle RDBMS licensing still remains a mystery to many. While I won’t even begin to discuss components like Oracle Apps, hopefully this post may help you attain a basic understanding of Oracle, what you can license, and what it costs.

Editions

The first and most important thing to learn are the editions of Oracle RDBMS. These editions are:

  • Oracle Express Edition (XE) – A free use version of Oracle which can use a single CPU (though it can be installed on a server with as many CPUs as you want), 1GB RAM, and 4GB data.
  • Oracle Personal Edition (PE) – Single User, Single Machine development/deployment license which can use any SE1, SE, or EE feature with the exception of RAC and Management Packs.
  • Oracle Enterprise Edition (EE) – Full featured edition of Oracle RDBMS which allows all core features along with the option to purchase add-on features (like Partitioning) and Management Packs (like the Diagnostics Pack).
  • Oracle Standard Edition (SE) – A damped down of edition of Oracle which can run on a server with a maximum of four CPU sockets. It cannot take advantage of add-ons or Management Packs. It does, however, include Oracle RAC as long as all nodes combined have no more than 4 sockets.
  • Oracle Standard Edition One (SE1) – A further reduced edition which shares the same features with SE but is limited to 2 CPU Sockets with no RAC option.
License Type

On top of knowing the editions, you also need to know about the type of license. The most common is a Processor based license, where your license count is based on the number of CPU Sockets/Cores and the CPU Type for EE or the number of CPU Sockets for SE and SE1. The other type is a Named User License, where you buy a number of seats on the database instead of licensing it for unlimited users. Please remember if you are pursuing this type of license: non-human processes are also named users. So any application server, script, or even a scheduled job is a ‘named user’.

Enterprise Edition Core Factor

For Enterprise Edition and all of its add-ons and management packs, there is a concept called “core factor” which is applied for licensing purposes. This factor is based on the type of CPU you are using in your server. For instance, if you are using a SUN M5000 with SPARC VII processors, the core factor is 0.75. That means 4 Quad-Core CPUs (16 cores) would equal 16 * 0.75 = 12 licenses. You would purchase 12 EE licenses to cover the server, and then purchase 12 licenses of whatever add-ons or management packs you might be using.

If, on the other hand, you used SPARC VII+ processors, the core factor is 0.5. The same CPU setup would only require 8 licenses in that case.

For the most part, x86/x86_64 architecture CPUs are a 0.5 core factor. A small number of processors are 0.25, but there is a reason: they are not very suitable for running Oracle, generally System On A Chip (SOC) solutions with a very high amount of hyperthreading.

You can find the full core factor table here.

Special Cases

Two types of databases are exempt from licensing restrictions: RMAN Catalogs and Grid/Cloud Control Repositories. If one of these repositories is the only database on a server, that server does not have to be licensed. But if the server or those databases are used for any custom or other Oracle purpose, they must be licensed. Generally speaking it can be beneficial to put these databases on the same server. See this page under “Infrastructure Repository Databases” for details.

Add-Ons and Management Packs

Let’s get one thing out of the way: Standard Edition and Standard Edition One CAN NOT use any add-on or Management Pack features (except RAC for SE). That means you can’t even buy Diagnostics Pack for Oracle SE. The option does not exist. Based on licensing requirements, AWR/ASH/ADDM are completely off limits on an SE or SE1 database.

However, Enterprise Edition has options for add-ons like RAC, Partitioning, and Active DataGuard. These add-ons are extra cost and apply to any database on which they are used. For instance, if you want to use Enterprise Edition with Partitioning and RAC across four nodes, each of the four nodes must be fully licensed for EE, Partitioning, and RAC. If Oracle is running on it, you must license it.

Misconceptions

Your standby databases need to be licensed. I’ve heard a lot of mixed opinions on DataGuard and whether your standby database must be licensed. The definitive answer is YES. A DataGuard standby is running Oracle (in MOUNT mode), which means it must be fully licensed for all features.

This goes for Development, QA, and Testing servers as well. This one has a lot of misinformation surrounding it. While you can install Oracle on your PC and use it for your own development/deployment purposes with Oracle Personal Edition, you cannot have a dedicated production-cycle development or QA server without a license. If it is part of the development cycle, then it must be paid. The only way you can get around that is if it is single-machine, single-user, for personal development.

One other big misconception is Oracle on a virtual server, like an ESX environment. Oracle does not recognize soft partitioning of CPUs. If you host Oracle on a VM in an ESX environment, you must license the entire ESX environment. (Note, this article mentions that it is possible to work with Oracle on this).

Cost Cases

So let’s take a look at some costs. The license costs are:

  • Enterprise Edition – $47,500 per unit (sockets * cores per socket * core factor)
  • Standard Edition – $17,500 per unit (sockets)
  • Standard Edition One – $5,800 per unit (sockets)

You can find the full cost list on Oracle’s site, which includes all licenses types, add-ons, and management package.

2 Node RAC + DataGuard + AWR/ASH

In this example we are going to set up a 2 node RAC cluster with DataGuard to a standby 2 node RAC cluster. We also want AWR/ASH for diagnostics. Each server (4 total) will have 2 Intel Xeon X7560 processors (8 core).

Our costs for Enterprise Edition would be (in list prices):

  • 4 servers * 2 sockets per server * 8 cores per socket * 0.5 core factor = 32 units
  • 32 Units Enterprise Edition ($47,500ea) = $1,520,000
  • 32 Units RAC ($23,000ea) = $736,000
  • 32 Units Diagnostics Pack ($5,000ea) = $160,000

Our grand total for this setup in EE would be $2,416,000. Hopefully you have a good discount!

Now, if we wanted to do the same thing in Standard Edition, we would have to make some exceptions. DataGuard does not work on SE, so we would have to use a third party software like DBVisit or homegrown scripts. Also, AWR/ASH don’t work in SE and can’t be added, so you might have to fall back to Statspack and adapt with excellent tools like Tanel Poder’s Snapper. In return for these trade-offs, let’s look at pricing:

  • 4 servers * 2 sockets per server = 8 units
  • 8 units Standard Edition ($17,500ea) = $140,000

The total price for two 2-node RAC clusters which 2 sockets per server (you can’t have more than 4 sockets in a single SE RAC cluster) is $140,000. Quite a big savings, and definitely enough leftover to buy some extra utilities.

Production, QA, and Development with Partitioning and AWR

In this example we’ll have three databases which must be licensed as they are part of the development production cycle. Each server is a SUN M5000 with 4 Quad-Core SPARC VII processors.

Enterprise Edition:

  • 3 servers * 4 sockets per server * 4 cores per socket * 0.75 core factor = 36 units
  • 36 units Enterprise Edition ($47,500ea) = $1,710,000
  • 36 units Partitioning ($11,500ea) = $414,000
  • 36 units Diagnostics Pack ($5,000ea) = $180,000

The grand total is $2,304,000.

If we were able to forego Partitioning and Diagnostics Pack, the requirement in Standard Edition would be:

  • 3 servers * 4 sockets per server = 12 units
  • 12 units Standard Edition ($17,500ea) = $210,000
Conclusion

Remember, a lot of this is up to you and your sales rep or third party reseller. Make sure you work out all the details and understand exactly what you can and can’t deploy. It is highly recommended that you keep an eye out for license use; the last thing you want is a licensing audit that you are not prepared for, with database installations ranging all over the environment. This paragraph also servers as a DISCLAIMER: This post is based on my personal knowledge and references available online and is not a substitute for actual information from Oracle Corporation. Please check with your sales rep on all licensing questions.

As you can see in these prices, it is VERY important that you buy what you need and nothing more. Instead of going the whole hog and getting Active DataGuard, Diagnostics and Tuning Packs, Partitioning, etc. on every single server, really take a look and see what your requirements are. You can save a huge amount of money that way.

Also note that there are special types of licensing structures out there such as site licenses. These are special deals worked out with Oracle in order to license your entire enterprise for a certain period, and can often save you a lot of money on licensing depending on your plans. If you decide to pursue this kind of agreement with Oracle, it is important to really work with your team to decide what you need and what you don’t, as well as anticipated business requirements for the near and long-term future.

The post A Few Words on Oracle Licenses appeared first on Steve Karam :: The Oracle Alchemist.

Deploying Outside the Box

Wed, 2013-02-27 12:44
waterflowThis entry is part 4 of 4 in the series Grow Your Career


Grow Your Career

I’ve been on quite a “broaden your horizons” kick lately, as evidenced by my team and DBA development posts and ultimately the Database Diversity series (in progress). But I truly think it is important to keep not only aware but conversant in emerging technologies both for your personal benefit and the benefit of your company and team.

The thornbush is the old obstacle in the road. It must catch fire if you want to go further. – Franz Kafka

Let’s face it: DBAs are frequently seen as a bottleneck. By some, they are even seen as a part of the bureaucracy, the shady organization seemingly designed to stop things from getting done. But to be fair, this is generally due to our central role in the application stack. Developers depend on us for a sane environment or in most cases, many sane environments. On the flip side, DBAs depend upon the system administrators for servers, storage, operating system, etc. But more importantly, DBAs are often victims of their own requirements: fault tolerance, reliable backups, performance, scalability. Giving these requirements up is out of the question; hence the bottleneck. The answer is not to reduce our requirements, but do them smarter.

In the rest of this post, we will look at a few new(ish) technologies which can help us accomplish that goal.

Delphix

Have you ever felt embarrassed when you tell a developer or project manager that you will need a few days to deploy a new QA environment? Or that you really have to think about refreshes because they’re just so time consuming and complex? I hate being in that position, because it’s a position that is generally a result of being on an inflexible platform. And as mainstream and comfortable as it is, the standard “box plugged into a SAN” is a highly inflexible platform most of the time.

Delphix is a software appliance which uses a concept called “thin provisioning” to quickly deploy and refresh QA, development, reporting, testing, etc. environments. It currently works with Oracle and MS SQL Server, and this is what it does: it uses ZFS concepts and proprietary storage and compression logic to near-instantly deploy read/write database environments from any configured source at any stored point in time with almost no storage footprint on the target side.

To accomplish this, source databases are configured in Delphix and cloned to a disk area attached to the Delphix host. The data is compressed, filtered, and stored in DxFS (Delphix File System) where they can take advantage of extremely efficient caching for concurrency and block mapping for versioning. Change records are also shipped from the source to Delphix, and Delphix keeps a running history of changes to each source in order to enable point in time deployments.

A target is any system running Oracle. The target environment hooks into Delphix and, upon deployment, creates an instance and points to the source data. Delphix presents the point in time source data as datafiles to the target which can be opened and read. Writes are made to a dedicated area on the Delphix host which is versioned over the source data via block mapping. The result is as many read/write environments as you need as quickly as you need them with minimal disk requirements. In the end you save money on disk space, money on man hours, and time to deployment which eliminates the DBA as a bottleneck on critical development environments.

Honestly, this technology sounds almost too good to be true. Imagine being able to create unlimited QA servers, instant reporting databases, and daily refreshed development databases on the PC of every developer. If you want to see it in action, have a look at this video. You can also get a more comprehensive (and most likely better explained) explanation of technical details here.

VCE Vblock System

As an Oracle-centric kind of guy and a huge fan of Oracle products and tools, you would think I’d be talking about Exadata here. And Exadata is an amazing ‘engineered system’, capable of mind blowing batch performance, cluster configurations, and more. No doubt there will be posts about Exadata at a later time, but this one focuses on a technology called Vblock because of a major factor: it is application neutral.

You can run nearly anything on Vblock, from Windows to Linux to Oracle to Apache to…well, iTunes. Likewise you get to choose how your disk is configured (RAID level), where you want virtualization, and how you want to manage backups. Behind the scenes is a robust compute/network/storage stack which takes advantage of massive parallelization and a mix of disk resources (Flash, SAS, SATA) with caching mechanisms that dynamically tier data where it needs to be based on your working data set. And most importantly, they claim to provide a “5 nines solution”, which means it has a 99.999% uptime or 5 minutes per year.

My real attraction to this type of system is that it gives DBAs a critical tool they need: a pool. A Vblock System would be ideal to setup as a Database Farm, where DBAs can monitor performance, fault tolerance, and capacity in a large node based environment and quickly create new systems on demand. Instead of requiring new quotes, POs, and management processes every time a new database is required this solution would allow huge flexibility in deployments while providing outstanding compute performance and fault tolerance on the backend. This would include any database or version, clustering or standalone, virtual (VMWare is a collaborator on the Vblock environment) or physical system.

Every day new innovations are being made in CPU and network speeds (less so with disk). New solutions which incorporate many of these solutions into a flexible and heterogenous architecture are going to appeal greatly to businesses and forward thinking IT groups.

DBVisit

Deploying new environments is incredibly important, and with the two technologies I just mentioned it can become a trivial exercise. But at the same time, there is still the need at times for traditional replication or standby databases. Oracle provides technologies like DataGuard and Streams, but the former requires Enterprise Edition (sometimes not cost effective) and the latter can be unnecessarily complex. Software like Golden Gate is incredibly flexible, but also fairly expensive for what it does.

DBVisit Standby is a third party disaster recovery software which can either replace or augment Oracle DataGuard. DBVisit Replicate can replicate data between Oracle, MySQL, and SQL Server, and provides a very user friendly interface to get the work done. To me, the coolest part of the software is that it can emulate and in some cases beat the features of Enterprise Edition, even on a Standard Edition database.

Oracle Enterprise Edition costs $47,500 per unit, and the number of units required is based on your CPU cores times a core factor. If you are running a standby database, both your production server and standby server must be fully licensed. So if we have two Intel x3960 boxes with 2 CPUs with 8 cores each, we would pay a list price of $47,500 * (2 * 8 * 0.5) * 2, which is $760,000. Even with a significant discount that is a significant amount of money. Standard Edition, on the other hand, is licensed by the socket at $17,500 per socket for a total of $17,500 * 2 * 2, which is $70,000. If you can live without Enterprise features and add-ons like AWR/ASH, this is a hugely significant cost savings. DBVisit Standby could be used to fill the disaster recovery gap here and then some; for instance, a new feature of DBVisit called Standby Database Synchronize which can recover a standby even in cases where NOLOGGING was used, or where archive logs are lost or behind. Even with Enterprise Edition, that is a tempting feature.

Simplicity is a big part of the decision as well. If you have Enterprise Edition, you can definitely use Streams if you wish for replication purposes. However, DBVisit is extremely easy to configure and use. A few weeks ago I made a (tongue in cheek) comparison of replication platforms:

Though GoldenGate is extraordinarily powerful, it is also a large and costly centralized environment. DBVisit is quick; I can pretty much guarantee you’ll have replication flowing (perhaps in many directions) within the day.

Conclusion

There are tons of new technologies coming out from databases to hardware to network to handheld devices. All too often, the DBA will learn about these technologies from management or other groups when it would behoove them to be the flag bearers for the new tech at their company. Keep an eye out for the real movers and shakers: the tech that makes it easier and safer to deploy quality systems.

Errata: Upon review, Streams is not an Enterprise Edition feature. It can be used in SE and SE1 as well.

The post Deploying Outside the Box appeared first on Steve Karam :: The Oracle Alchemist.

Start Using MongoDB

Mon, 2013-02-25 09:32
languagesThis entry is part 1 of 2 in the series Database Diversity


Database Diversity

Guten Tag, Namaste, Hola, Zdravstvuite! While you should always work to master your first language, a simple understanding of other cultures and their languages is always beneficial. In these days of social/career networks and rising global connectedness, being open to diversity is an absolute must.

This is true in the IT world as well. As I mentioned in my article “DBA, Grow Thyself – Moving and Shaking in the Era of Data Dominance” new technologies will happen and the last thing you want is to be seen as either ignorant or obstinate to its potential. While I never advocate senseless database migrations or platform changes, with the incredible diversity available in the data world today learning to at least say “hello” in a variety of environments can only be beneficial.

So for the near future, I will be doing a “Database Diversity” series (along with other posts) where we perform basic operations in a variety of database environments. The basic operations (if applicable) will be: install, startup, create a table, insert rows, create an index, and run a query. Again I stress “if applicable”, as these operations won’t always exist in every type of database. For today, our topic is…

MongoDB

MongoDB is a well-known and popular NoSQL option created by 10gen. It is an open source solution designed to be highly scalable and available with robust options for data replication, processing via built in query operations and custom MapReduce jobs, and document storage. Data in Mongo is document oriented, meaning there are no tables per se but collections of documents, each represented in JSON notation. This makes it an extremely powerful tool for storing dynamic data and working directly with most application development frameworks. It is extensible as well; frameworks like Meteor combine the powerful features of node.js with MongoDB as the default storage engine, and there are even tools out there to convert SQL queries to work with it.

But like it or not, MongoDB is not an RDBMS. In today’s Database Diversity, we will download and install Mongo, create a collection, insert some data, create an index, and query our collection.

Installing MongoDB

MongoDB can be installed on most Linux repos using their respective software installation tools. On RHEL/CentOS/OEL, you can add the 10gen repositories to yum and install it like any other RPM. For Debian/Ubuntu, you can add the apt repo and install via apt-get.

Personally I prefer to download the binaries for my platform and install in the location of my choosing. If you have root privileges, you can choose to put the binaries into a central location like /usr/local/bin. If not, you can download and run mongod (the MongoDB daemon) from any location as any user for very fast deployment and flexibility. The overall download page can be found here, but on a Linux x86_64 platform you don’t even have to go that far. Here I am downloading the latest x86_64 MongoDB and extracting it:

steve@UbuntuVM:~/mongo$ pwd
/home/steve/mongo
steve@UbuntuVM:~/mongo$ wget http://downloads.mongodb.org/linux/mongodb-linux-x86_64-latest.tgz
--2013-02-25 08:58:56--  http://downloads.mongodb.org/linux/mongodb-linux-x86_64-latest.tgz
Resolving downloads.mongodb.org (downloads.mongodb.org)... 72.21.215.171
Connecting to downloads.mongodb.org (downloads.mongodb.org)|72.21.215.171|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94187936 (90M) [application/x-tar]
Saving to: `mongodb-linux-x86_64-latest.tgz'

100%[======================================================================================================================================>] 94,187,936  7.12M/s   in 13s     

2013-02-25 08:59:10 (6.76 MB/s) - `mongodb-linux-x86_64-latest.tgz' saved [94187936/94187936]

steve@UbuntuVM:~/mongo$ tar -zxvf mongodb-linux-x86_64-latest.tgz 
mongodb-linux-x86_64-2013-02-24/README
mongodb-linux-x86_64-2013-02-24/THIRD-PARTY-NOTICES
mongodb-linux-x86_64-2013-02-24/GNU-AGPL-3.0
mongodb-linux-x86_64-2013-02-24/bin/mongodump
mongodb-linux-x86_64-2013-02-24/bin/mongorestore
mongodb-linux-x86_64-2013-02-24/bin/mongoexport
mongodb-linux-x86_64-2013-02-24/bin/mongoimport
mongodb-linux-x86_64-2013-02-24/bin/mongostat
mongodb-linux-x86_64-2013-02-24/bin/mongotop
mongodb-linux-x86_64-2013-02-24/bin/mongooplog
mongodb-linux-x86_64-2013-02-24/bin/mongofiles
mongodb-linux-x86_64-2013-02-24/bin/bsondump
mongodb-linux-x86_64-2013-02-24/bin/mongoperf
mongodb-linux-x86_64-2013-02-24/bin/mongosniff
mongodb-linux-x86_64-2013-02-24/bin/mongod
mongodb-linux-x86_64-2013-02-24/bin/mongos
mongodb-linux-x86_64-2013-02-24/bin/mongo
steve@UbuntuVM:~/mongo$ mv mongodb-linux-x86_64-2013-02-24/ mongo/

Normally, another step in a default installation is to create the /data/db directory, which is the default location for MongoDB file storage. However, as we are running this in a non-root setup for this tutorial, we will change the default location. To start the MongoDB daemon, the “mongod” command is used:

steve@UbuntuVM:~/mongo$ pwd
/home/steve/mongo
steve@UbuntuVM:~/mongo$ mkdir -p data logs
steve@UbuntuVM:~/mongo$ mongo/bin/mongod --fork --dbpath /home/steve/mongo/data --logpath /home/steve/mongo/logs/mongoDB.log
about to fork child process, waiting until server is ready for connections.
forked process: 16471
all output going to: /home/steve/mongo/logs/mongoDB.log
child process started successfully, parent exiting

The --dbpath option allows a custom data directory to be used (in our case, /home/steve/mongo/data). The --fork option tells MongoDB to run as a background daemon. If this command is not used, MongoDB will run in the foreground and a separate window must be used for other operations. However, if you do choose to use the --fork option, you must either supply the --syslog option (log MongoDB operations to syslog) or --logpath with your own custom logfile path and name. The --port command is also an option, but if unset the default connection port is 27017.

Now that we have installed MongoDB, we will attempt a connection via the CLI:

steve@UbuntuVM:~/mongo$ mongo/bin/mongo
MongoDB shell version: 2.4.0-rc1-pre-
connecting to: test
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
	http://docs.mongodb.org/
Questions? Try the support group
	http://groups.google.com/group/mongodb-user
>

If we want to see which database we are currently connected to, we can issue the “db” command. You can also switch databases with the “use” command:

> db
test
> use mydb
switched to db mydb

Note that the “mydb” database was never created and even though I’ve switched to it, doesn’t actually exist. You can switch to any database but until you create a collection inside it, it doesn’t actually exist except on your screen. However, if you want to see existing databases, you can use the “show dbs” command.

Creating a Collection and Inserting Data

I put these two topics in the same area because in MongoDB they are actually one and the same. There are no pre-defined tables in MongoDB but collections, a loose gathering of documents that are meant to be logically related in some way. For the sake of learning, you can consider a collection like a table, a document like a row, and a key:value pair like a column. But given the dynamic nature of MongoDB it can almost be dangerous to do so. For instance, in MongoDB a collection can have documents which share no keys in common (except the unique identifier _id key). You are not constrained by specifics and can freely dump nearly any data of any type into any collection. As you can imagine, this can be both good and bad depending on your development process, framework, and standards.

Here is an example MongoDB document:

{
  name: 'Steve Karam'
  database: 'MongoDB'
  nickname: 'Mongo Alchemist'
  really: false
}

But then here is an example of another document that can be part of the same collection:

{
  name: { first: 'Steve', last: 'Karam' },
  contact: [
    { type: 'blog', val: 'http://www.oraclealchemist.com' },
    { type: 'twitter', val: '@OracleAlchemist' } ]
}

This loose collection capability is both extremely intuitive and at the same time confusing for most relational oriented developers and DBAs. However, it grants us powerful capabilities for data storage/retrieval and complex aggregations.

So let’s create a simple collection with three rows and get the data from it (like a “select *”):

> db.employees.insert( { name: 'Bob', title: 'DBA', salary: 80000 } );
> db.employees.insert( { name: 'Jane', title: 'Developer', salary: 70000 } );
> db.employees.insert( { name: 'Kim', title: 'Manager', salary: 130000 } );
> db.employees.find();
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
{ "_id" : ObjectId("512b7674dfea6825c38b0b56"), "name" : "Jane", "title" : "Developer", "salary" : 70000 }
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "title" : "Manager", "salary" : 130000 }

For these three simple rows I decided to let MongoDB define the unique _id field, and also kept the elements the same (name, title, salary). However, adding a new element is as simple as updating a document:

> db.employees.update( { name: 'Kim' }, { $set: { office: '37L' } } );
> db.employees.find( { name: 'Kim' } );
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "office" : "37L", "salary" : 130000, "title" : "Manager" }

By searching for name = ‘Kim’ and using the $set operator to set a new element called ‘office’, the document has been updated and can be seen in the query. It should also be noted that while this mimics what we might do to update a row in a relational database, MongoDB is incredibly more powerful when it comes to data manipulation in-document. Basic operations like insert, update, delete, upsert, etc. are supported along with many more.

Remember how I said the ‘mydb’ database only existed on screen until we actually did something with it? By performing that first insert (Bob), we not only created the mydb database but the employees collection and its first row all in one step.

Indexing

Indexes are just as important in MongoDB as they are in any relational database. The rules for proper indexing performance are nearly identical as well. Columns used for searching or sorting are good candidates for index, with indexes sortable in ascending and descending order. Of course, MongoDB doesn’t stop there. In addition to standard b-tree and b-tree compound indexes it also allows 2d spatial indexes (for geohashed location queries), indexes on sub-documents, TTL indexes for document expiration pruning, and more.

Creating an index is as simple as using the ensureIndex command:

> db.employees.ensureIndex( { 'salary': 1 } );

In this example, an index was created on the “salary” element of the employees collection in ascending order. A value of -1 would have been descending order.

Note that because of the loose style in MongoDB, you can create an index on anything, even elements that don’t exist:

> db.employees.ensureIndex( { 'notexists': 1 } );

With the getIndexes() command we can see what indexes exist on a collection:

> db.employees.getIndexes();
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"ns" : "mydb.employees",
		"name" : "_id_"
	},
	{
		"v" : 1,
		"key" : {
			"salary" : 1
		},
		"ns" : "mydb.employees",
		"name" : "salary_1"
	},
	{
		"v" : 1,
		"key" : {
			"notexists" : 1
		},
		"ns" : "mydb.employees",
		"name" : "notexists_1"
	}
]

Note that there are three: the default unique index on _id (required), an index on salary, and an index on notexists…even though it doesn’t exist at the moment as an element in any document. Since I don’t really need that index, I’ll drop it:

> db.employees.dropIndex( 'notexists_1' );
{ "nIndexesWas" : 3, "ok" : 1 }

Querying a Collection

This is a highly complex topic and we are not going to be able to cover every possible query that MongoDB can do…mostly because it can do nearly anything. Thanks to the group command (GROUP BY on steroids) for relatively simple aggregations with under 20,000 unique groupings and MapReduce (GROUP BY times infinity + 1) which can aggregate data in pretty much any method that can be coded, there is no limit to how you can query MongoDB. But for relatively simple queries, we use the find command:

> db.employees.find();
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
{ "_id" : ObjectId("512b7674dfea6825c38b0b56"), "name" : "Jane", "title" : "Developer", "salary" : 70000 }
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "office" : "37L", "salary" : 130000, "title" : "Manager" }

A find command with no arguments is simply like a “select *” on a table. It will return all documents with all elements. However, you can query for specific elements easily:

> db.employees.find( { office: '37L' } );
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "office" : "37L", "salary" : 130000, "title" : "Manager" }
> db.employees.find( { name: 'Jane' } );
{ "_id" : ObjectId("512b7674dfea6825c38b0b56"), "name" : "Jane", "title" : "Developer", "salary" : 70000 }
> db.employees.find( { title: /d.*/i } );
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
{ "_id" : ObjectId("512b7674dfea6825c38b0b56"), "name" : "Jane", "title" : "Developer", "salary" : 70000 }

Oh, did I mention you can use regular expressions? The incorporation of perl regular expressions makes the MongoDB engine extremely powerful. You can, of course, also use comparison operators:

> db.employees.find( { salary: { $gt: 75000 } } );
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "office" : "37L", "salary" : 130000, "title" : "Manager" }
> db.employees.find( { salary: { $gt: 75000, $lt: 85000 } } );
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }

Again, the list of capabilities with querying in MongoDB goes on and on. Rather than dive into each of them, I do want to make sure our previous information on creating indexes actually did something. Let’s run the salary queries again, this time using explain plans. You can do this by running the explain() method after your find() query:

> db.employees.find( { salary: { $gt: 75000 } } ).explain();
{
	"cursor" : "BtreeCursor salary_1",
	"nscanned" : 2,
	"nscannedObjects" : 2,
	"n" : 2,
	"millis" : 0,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"isMultiKey" : false,
	"indexOnly" : false,
	"indexBounds" : {
		"salary" : [
			[
				75000,
				1.7976931348623157e+308
			]
		]
	}
}

Notice that the query used the salary_1 b-tree index, and shows the manner in which it was used with “indexBounds”. If I had used a non-indexed query, you would have seen:

> db.exmployees.find( { name: 'Kim' } ).explain();
{
	"cursor" : "BasicCursor",
	"nscanned" : 0,
	"nscannedObjects" : 0,
	"n" : 0,
	"millis" : 0,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"isMultiKey" : false,
	"indexOnly" : false,
	"indexBounds" : {
		
	}
}

Notice no indexBounds are defined, and the cursor type is “BasicCursor”. This is more or less a full table scan. If you think it should have used an index, you can supply a hint where you pass in the index name:

> db.employees.find( { salary: { $gt: 75000, $lt: 85000 } } ).hint('salary_1');
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }

Conclusion

As I’ve mentioned many times, there is a lot more to MongoDB than these simple operations. However, even knowing the basics can be invaluable in the event that you or your business decides it is the right tool for a job at some point. The things you saw here just scratch the surface of what MongoDB actually does once you incorporate sharding, GridFS, MapReduce, etc. As you learn and grow with MongoDB, keep the following points in mind:

  • Don’t be afraid to try things out with MongoDB. It is very easy to get up and running, create multiple collections, databases, etc.
  • MongoDB can be very unforgiving. Dropping all data in a collection is as simple as db.employees.remove(), and once it is gone it is gone.
  • The documentation for MongoDB is incredible. You can find it all on their site.
  • If you’re a book learner, MongoDB: The Definitive Guide is an invaluable resource and is very easy to follow.
  • MongoDB is not SQL, but there are some easy comparisons that may help you come to grips with it.
  • Want to try MongoDB without ever even installing it? Here’s a mini Mongo implementation at MongoDB’s website.

See you soon for the next Database Diversity article!

The post Start Using MongoDB appeared first on Steve Karam :: The Oracle Alchemist.

The Overly Shared Pool

Thu, 2013-02-21 05:55
pool

Ever been in a really crowded pool? I have… and chances are so have your queries. This article will go into a little bit of high level detail about the shared pool and a portion of what it does for us, then show a real “good pools gone bad” issue resolution. If you feel like the shared pool is something you’ve already done triple backflips with a jackknife for a 10/10 finish into and you’d rather just sit poolside and order an extravagant beverage while reading my client’s tale of woe, you can skip down a bit.

Parsing = Translation and Optimization

When someone walks up to you and says, “Hello, how are you doing?” your brain receives audio input from your ears in the form of the English language. This input is then ‘parsed’ by the cerebral cortex in a place called “Wernicke’s Area”. This section of the brain takes the audio information and breaks it down based on inflection, syntax, and other factors. Once the question registers and is ‘understood’, your brain is able to come up with a suitable response. It can either dig through short-term memory to piece together recent events and describe, in detail, exactly how you are doing; or more efficiently, it can offer up a quick instinctual response like “Good, you?”

Oracle parsing works much the same way. When a statement is given to Oracle in the form of Structured Query Language (SQL), the statement must first be understood by the Oracle Instance (the brain, as it were). The statement is checked for syntax and permissions, and then broken down to a hash value so Oracle can offer an optimal response. The optimization phase of the parse is most important, and is done via one of three methods: hard parse, soft parse, or session cached cursor.

  • Hard Parse – Oracle searches for the statement in an area called the Library Cache, a subsection of the Shared Pool. If the statement is not found, Oracle invokes a tool called the Cost Based Optimizer (CBO) in order to come up with an optimal execution plan. This invocation is widely considered a very expensive process for CPU, just as a well thought out response to a question requires more brainpower.
  • Soft Parse – Oracle searches for the statement in the Library Cache and finds it. This allows Oracle to re-use execution plans without having to re-optimize the query and is akin to responding “Good, you?” when asked how you are doing.
  • Cached Cursors – Oracle notices that you have run the same statement over and over again, and caches the execution plan into your private memory so the Shared Pool is no longer burdened for your session. This is akin to someone asking you the same question over and over again, at which point the answer is immediately given with no real thought.

Hard Parses are a necessary evil in order to initially optimize a query. Soft parses are generally considered to be good, but can be problematic under heavy concurrency (explained later). Cached Cursor responses are great for performance, but require RAM resources granted explicitly via Oracle parameters or client/OCI settings.

Query executions go through three main phases:

  1. Parse – Translate and optimize the query
  2. Execute – Lay down the execution plan and run the query
  3. Fetch – Pull back data from Oracle objects based on the execution plan

It is important to note that most programming languages explicitly call each of the three phases. When a query is run, it is possible to parse the query (known as a parse call) and then not execute or fetch the data. Reusing parsed queries (parse once, execute many) is a highly recommended practice.

The Shared Pool

When queries are hard parsed, the execution plans generated by the CBO are placed in the Shared Pool; specifically, in a subsection called the Library Cache. If the same query comes in again, Oracle must simply search the Library Cache and return the stored execution plan (soft parse).

However, both hard and soft parses put strain on the Shared Pool. The Shared Pool is simply a chunk of RAM, and CPU must manage RAM. This means that if 100 users are simultaneously trying to access the Shared Pool, CPUs will be dedicated to these users to access the RAM. If there are more users than there are CPUs, a queue will form and the CPU will work until all the users are done.

Think of it as a trip to the grocery store. To take items from the store, a cashier must first process you. If there are 100 shoppers trying to check out and only two cashiers, a large queue will form and long wait times will ensue. The bottleneck in this case is cashiers, and you could even call the event “waiting on resource: cashiers”. We can fix this by adding more cashiers, thereby cutting down on the wait times overall despite high concurrency.

In Oracle, the items are RAM segments, the cashiers are CPUs, and the shoppers are users or sessions. When more sessions need chunks of RAM than there are CPUs, enqueues and waits occur which cause the overall accessibility to the database to degrade. Wait events such as “latch: library cache” and “cursor pin S wait on X” are Oracle’s internal names for various types of wait events, just as “waiting on the slow cashier with no bagger” is our name for a wait event at the grocery store.

Continuing on this analogy, let’s say we have 20 shopper and 10 cashiers. This type of queue is very reasonable and will be processed rather quickly. Now imagine we throw in 20 more shoppers who come in to buy the exact same three or four items every few minutes. Then imagine we bring in a few shoppers who load up their cart, get to the front of the line, and then decide they don’t want the items after all. On top of that, we throw in a few store employees who walk through the line demanding various sub-tasks on top of the normal cashier duties. This situation will fly into chaos, as anyone who has ever gone to Wal-Mart on a Saturday afternoon can tell you.

The same thing occurs on Oracle. The shared pool is required for parsing, for queries from dual, for sequences, and countless other components. There are sessions which come through over and over and over to run the same thing every time, sessions that make a parse call but don’t actually execute or fetch, sessions that bring in huge queries that take forever to parse, and so on. In addition, Oracle has its own control structures, sequences, views, and other components that must be handled behind the scene while all this is going on. And here we’ve only discussed the checkout process! Imagine if during all this the cashier also had to order custom or out-of-stock items on the fly from warehouses (akin to disk reads from storage).

However, the answer to this predicament is not always to add more CPUs. We can also cut down on this behavior by introducing better design and optimizations. For instance:

  • Making the shared pool bigger allows more queries to be soft parsed instead of hard parsed, cutting down on CPU and allocating more available latches.
  • Allowing repeat queries to parse in session memory instead of the shared pool reduces strain.
  • Ensuring that queries parse only when absolutely necessary at the application level reduces parse calls and therefore contention on the shared pool.
  • Not running meaningless queries like “select 1 from dual” to check for database availability cuts down on parse calls, not to mention hits to the dual table which is in the shared pool.


A Real Life Shared Pool Problem

I had a client who called me in due to an incredibly severe issue with the shared pool. They followed all the “rules”: they used bind variables religiously, avoided anonymous PL/SQL blocks, and set a decent value for session_cached_cursors. Yet the system would become unresponsive and the following waits kept occurring when concurrency got really heavy:

  • latch: library cache
  • row cache lock
  • cursor pin S wait on X
  • latch: shared pool

AWR/ASH and V$ data showed that soft parses were happening almost exclusively. But even though the system was properly caching and re-using SQL in the shared pool resulting in soft parse, the contention buildup against the shared pool was too much for the system to handle and it would come to a grinding halt.

In the end, we went through a massive effort to reduce usage on the shared pool (hard or soft). This was accomplished by removing unnecessary DUAL calls (the code was doing a select from DUAL on every page) and most importantly, setting up true client cursor caching in the JBoss application server.

First have a look at this bit of nastiness:

Elapsed:               59.50 (mins)
   DB Time:           16,227.94 (mins)

Remember that Average Active Sessions (AAS) is equal to DB Time over Elapsed Time. This gives us a good indication of wait/work load during the 60 minute window. As you can see, the AAS during this time was ~273, far too high for most any respectable system.

Looking at the Top 5 Timed Events, you can see that we’re abnormally tied up in concurrency wait:

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
latch: library cache                169,214   3,677,789  21735  377.7 Concurrenc
row cache lock                       31,355      60,136   1918    6.2 Concurrenc
enq: TX - row lock contention        20,268      58,719   2897    6.0 Applicatio
latch: shared pool                  102,219      54,985    538    5.6 Concurrenc
cursor: pin S wait on X           4,895,909      53,456     11    5.5 Concurrenc

There were reams of other statistics and information used to correctly identify the issue, but the key symptom could be seen right here in the AWR under the SQL ordered by Parse Calls section (query text has been removed to protect the guilty):

% Total
 Parse Calls  Executions     Parses    SQL Id
------------ ------------ --------- -------------
     283,466      283,061     12.07 bd7nzg41xqmtu
Module: JDBC Thin Client
CONFIDENTIAL QUERY #1

     204,990      204,987      8.73 bajtpw34f1h8n
Module: JDBC Thin Client
CONFIDENTIAL QUERY #2
     166,868      166,870      7.11 avc1jqzz04wpr
SELECT 'x' FROM DUAL

     108,239      108,239      4.61 4rg9kqppkwb2x
Module: JDBC Thin Client
CONFIDENTIAL QUERY #3

You should notice three things from the above info:

  1. The SELECT ‘x’ FROM DUAL query is definitely excessive, having occurred 46 times per second.
  2. More importantly, the Parse Calls match Executions 100% in every query
  3. There are a LOT of query executions

Interestingly enough conventional wisdom, advice, and even training always warns of the dangers of hard parsing vs. soft parsing, and vaguely mentions parse to execute ratio and how important it is, but for the most part the focus is given to the type of parse. But the case above shows that even soft parses aren’t enough with massive concurrency, an overabundance of small queries, and meticulous bind variable practices combined to create a huge case of shared pool contention with ~99% soft parses.

The application software in use was JBoss which has a feature called the Prepared Statement Cache. Since querying in JBoss is done by setting up a Prepared Statement and running it against the DB, the Prepared Statement Cache allowed JBoss to dedicate memory to keep the cursor open and reusable following a successful parse/execute/fetch. Even with all the bind variables in the world, we couldn’t “Parse Once, Execute Many” without the client keeping the parse cursor open. Take a look what happened when we found the root cause of the issue in the application server settings and corrected it (a value of 15 for prepared-statement-cache-size worked nicely):

% Total
Parse Calls  Executions     Parses    SQL Id
------------ ------------ --------- -------------
      13,481       66,349      2.32 8g5h090r1uxmc
Module: JDBC Thin Client
CONFIDENTIAL QUERY #1

      11,346      104,237      1.95 5nz8u1h2qyr92
Module: JDBC Thin Client
CONFIDENTIAL QUERY #2

      11,083       63,029      1.90 78xz4j1g270uw
Module: JDBC Thin Client
CONFIDENTIAL QUERY #3

      10,663       35,397      1.83 02s458bkx4z4a
Module: JDBC Thin Client
CONFIDENTIAL QUERY #4

Notice that the parse to execute ratio was improved dramatically. We also got rid of the DUAL queries. Consequently, the latch issue went away never to return. The shared pool operates at about 10% of the wait it used to incur and never hits that breaking point where performance degrades exponentially.

The final lessons:

  • Soft parses are preferable to hard parses in an OLTP application where query stability is preferred (though auto-gathered histograms sometimes messed this up in 10g before Adaptive Cursor Sharing was introduced in 11g)
  • Bind variables can and should be used to improve plan reuse and reduce hard parses (CURSOR_SHARING is something of an alternative, but not preferred to actual bind variable usage in the code)
  • To actually lower your parse to execute ratio, bind variables aren’t always enough. Your client needs to actually be configured to take advantage of them. This one doesn’t seem to make it into all the how-tos or best practices, but it should.

The post The Overly Shared Pool appeared first on Steve Karam :: The Oracle Alchemist.

DBA, Grow Thyself – Moving and Shaking in the Era of Data Dominance

Wed, 2013-02-20 15:05
1024773_57103207This entry is part 3 of 4 in the series Grow Your Career


Grow Your Career

Beefing up your technical skills is important, and it’s what makes you a good DBA, Developer, Architect, or whatever you happen to be. But at the same time, it is incredibly important to keep up with the professional trends in your organization. While all organizations are different, there are some definite trends that are taking shape in most businesses that it would be wise not to ignore. And after all, all those storage, database, application, and utility vendors promised you tons of free time when you adopt their platform, right? It’s not like you have a lot to do. /sarcasm

Earlier this week I wrote an article about the role of DBA teams in the business today, focusing on advice for the DBA manager (though it is also very useful for DBAs as well). Today I want to focus on the DBA specifically with some points which, in my experience, are highly important at the business level and possibly unknown at the technical IT employee level.

Development/QA – It’s the New Production

I remember years ago, I made it a personal mission to tell every DBA I worked with that backups should be treated like production. Even then it was such a novel concept to some people; who knew that the bundle of files that may one day save my career and company might actually be a production environment? Well certainly not the people who lost their jobs due to lack of backup. Ye sorry souls.

Nowadays that should be common knowledge and a common phrase, harkening back to the days when Russian roulette was played by DBAs and administrators for lack of disk space or free cycles. However, a new paradigm has arisen which is in many ways difficult to support but wholly necessary: development and QA are just as important as a production environment. In fact it’s not really a new paradigm at all, it’s just being enforced more and more.

To a company, being able to reliably push out software updates, new products, roll out new features or content, fix bugs, and provide solid user acceptance testing is paramount. It increases both user and executive confidence in the business and IT as a whole, and it creates fresh sources of revenue and opportunity. The last thing any DBA wants is to be the bottleneck in that process. In all areas of development from a database perspective – creation of development/QA databases, refreshing from production, providing viable statistics and metrics, monitoring, and backup and recovery – the DBA must be as active and safe as they would be in production. At some companies, even development environments have change request procedures that must be followed by the DBA or developers.

Let’s face it, DevOps is taking off and businesses know it. If the DBA (and system administration) groups can’t keep up with fast paced frameworks and QA requirements while keeping all aspects of development moving smoothly and safely, then they will be consigned to the “doesn’t have a sense of urgency” bin. The business will see the database as just a part of the software stack (and rightly so), and will not understand the reasoning behind a bottleneck however logical it may seem.

That’s why tools like Delphix are taking off. It’s not another monitoring tool or performance management tool…it’s a storage agnostic thin provisioning layer for databases allowing fast deployment and refresh of QA and development environments through virtualization. They are experiencing huge growth by reducing the dev/QA footprint and speeding up release schedules. I predict that in the very near future, a bottleneck to provisioning development/QA/reporting databases despite incredible technology support will be viewed as highly unacceptable.

Lobby for control over new technology

If it’s data centric, own it. I know this may be a controversial one because of the serious lack of resources some DBA teams already encounter, but if at all possible the DBA should be keeping up on modern trends and embracing them. Like it or not, at some point your business will bring up the idea of using MongoDB, Hadoop, MySQL, or a wealth of non-Oracle technologies. While you need to make sure they are using the right tools for the job (like not replacing every single Oracle database with MongoDB just to do it), at some point the adoption of new technologies will be inevitable. And whether you feel those technologies are fads, useless, or boring you will need to keep up on them to stay relevant.

Take Hadoop, for example. It’s not a database, it’s a filesystem. The data stored on it is as dirty as it is plentiful, the environment is spread out over dozens if not hundreds of nodes, and doing anything meaningful with it requires coding or complex tools/layers. Yet if your company is not going to hire an experienced Hadoop Administrator, it’s a great job for the DBA. Gwen Shapira makes a great note of this, arguing that the DBAs experience with complex tuning requirements, data warehouses, and developers makes us a very suitable candidate.

Honestly, I wouldn’t say you need to become an expert in every data technology out there. There are just too many. But try to remain open and somewhat versed in the environments. And if your company starts moving toward one, be ready to jump at the opportunity (should it be logical) and take part in it. If possible, own it.

Sometimes it is as simple as saying that you want it. When new work comes down the pipe it is very common for everyone to take a step back; if the project is data centric, you should be taking a step forward. The goal is to make you undisputed heavyweight data champion of the world. New technologies mean new opportunity both inside and outside your company. Even niche products can look really great on your resume later down the line. If you can’t simply take over a technology, then lobby for it. Research and discuss it with developers, business analysts, and project managers. With just a little bit of conversation and a few emails, it isn’t hard to convince people that it is in their best interests to hand the data management to the DBA no matter what form it is in.

Speak to your target audience

One of my favorite reddit subs (don’t judge) is r/explainlikeimfive (ELI5 for brevity). The idea is to take a complex idea and pound it down until it can be explained to a five year old. Now I’m not advocating that a DBA talk to everyone as if they were a child (though I know many IT professionals who do so), but more advocating that you speak appropriately to your target audience. Many of us already do this. But I can’t count the number of times I’ve seen an IT professional stand in front of a business analyst spewing out details about kernels and kdumps and semaphores and concurrency while the analysts eyes glazed over faster than a piping hot Krispy Kreme donut. Like it or not, there will be people at your company that don’t understand parameter settings and b-tree indexes. Trust me, they have their own language that you probably wouldn’t grok too quickly either (see what I did there?).

Just like learning different technologies, learn different business dialects. If not the verbiage, at least learn what interests them and their level of technical understanding. Metaphor is a great vehicle for this. I’ve had great success selling ideas or projects when I used metaphors that interested the person in question; sports metaphors for sports types, car metaphors for car types. Just make sure your metaphors make sense… don’t be that person everyone will one day giggle about for comparing improving database performance to the running back scoring a home run.

So when you talk to project managers, talk in terms of milestones and deadlines. For business analysts, talk about goals and roadmaps. For VPs, talk about objectives and projects. Notice every single one of these examples has to do with things getting done, not the process involved. That’s right! For the most part, they don’t care how it is done as long as it is done quickly, efficiently, and correctly. Leave the hidden parameter discussions for your technical brethren and your final report.

Of course, your mileage may vary. If you work for a company where even the CEO has written kernel code, you might look like a fool if you try to coat over the technical details. Use your gut, but keep a critical eye on your target audience before you flood them with details.

Above all, keep your ear to the ground and your mind open. Data is fast moving out of the cubicle and into the boardroom. If you want to improve your impact and influence, be ready to go along for the ride.

The post DBA, Grow Thyself – Moving and Shaking in the Era of Data Dominance appeared first on Steve Karam :: The Oracle Alchemist.

DBA Team, Front and Center

Mon, 2013-02-18 11:16
conference_roomThis entry is part 2 of 4 in the series Grow Your Career


Grow Your Career

A lot of database professionals seem to agree that the DBA acronym stands for “Default Blame Accepter”. Indeed, I’d be lying if I said that I’ve never felt that way. But I believe that a more in-depth look into why the DBA team always seems to be at the center of a critical issue is in order.

Let’s start by defining data. In his book Programming the Universe: A Quantum Computer Scientist Takes on the Cosmos, Seth Lloyd says the simplest unit of information is a bit; a single true or false. Add those bits together and you’ve got some meaningful data. Add that data up and you’ve got business logic. Collect, aggregate, and you’ve got corporate knowledge. Companies know that the data they collect is their memory. By collecting it, they build a brain for the organization with streams of data running through the business like a nerve center.

Now let’s talk teams. Different companies do it different ways, but generally speaking you’ve got server/storage/system admins, DBAs, developers, and QA. Developers are generally fairly close to the business; through analysts or project managers they take the high level requirements and turn it into applications. As such (and as many DBAs may notice) they have a lot of pull with the business and in some cases are more trusted as technical experts than the DBA team. QA teams are also usually grouped either close to the development team or close to the project manager/business analysts. Behind the scenes you’ve got your DBAs and system administrators; at times stereotypically seen as a crew with big heads and bigger egos, making cowboy changes in between rounds of video games.

Data is one of the most critical corporate assets and is at the forefront of modern technology requirements. Its distribution and management is an integral process for a company as a whole and for nearly every project that comes down the pipe. Yet at many companies (certainly not all), the DBA team is seen at best as a final dumping ground for project requirements and at worst as an obstacle. Either way, the DBA team is a custodian of the data, central to every IT group and the business itself. We manage over environments that are perceived as shadowy and obfuscated, highly complex and prone to failure.

If that is the case for your team, it needs to change. The best way to do that is by taking a close look at the goals of the DBA team.

Know your client and make sure they know you

Many of the non-consulting DBAs I speak with describe their job as a support role to developers. This is a dangerous stance to take. When I think of a ‘support role’ I think of a person sitting in their cubicle incessantly starting up and shutting down databases, taking action only when prodded by the development team or management. Unfortunately, this describes a person who is highly expendable.

As DBA manager your team should of course support the organization, but less like support staff and more like consultants. Your client is the business. When a project comes down the pipe, make sure that it doesn’t land on your desk as requirements. Instead, try to be part of the architectural process. Play a part in making the requirements. Make sure your client, the business, sees your team as a valuable resource, crack specialists who understand their content and their needs. Only with your team’s leadership and help can that data be managed and served in the most efficient manner possible.

Think of it like an Oracle environment. What is a database? Just files on disk, capable of nothing. They store information without description, data without goals. Only by starting an instance and mounting that database does the data begin to flow and take on meaningful purpose. Your team should be the instance. You have the knowledge, the processes, and the experience with the database that your client needs to harness.

Above all, make sure each of the DBAs on the team feel that empowerment. Taking in requirements and passing them down to DBAs leads to rote work and a sense of uselessness. Instead, encourage the DBA team to interface directly with the business as technical leads. Your team will end up owning the issues if a query performs slow or data is lost anyways. So make sure they own the process and productivity required to bring a project to fruition as well. Help them become invaluable to their client.

The problem may not be in the database, but the database is part of the application

If I had a Flainian Pobble Bead for every time I’ve heard a DBA say “It’s not a database problem” I’d be a hyperrich man. These words are generally uttered when a problem escalates to the point where people are scrambling to find an issue with slowness. Other phrases you might hear around the same time are “it worked in QA,” “that code has been there for X years,” and “maybe you should increase the SGA.”

Yes, the problem could very well be rising concurrency or data volume. Yes, queries might be dynamically generated and resolving sub-optimally. And yes, some ad hoc query might be eating up your processes and RAM. Absolutely dig for that root cause and absolutely report on it. But not as a way to shift blame; instead, do it as a way to own the solution.

Look at the problem from the client’s eyes: a query may be written by developers, but it is run against a database. An index might be missing, but it’s missing from the database. The server might be out of resources or encountering slow disk reads, but it’s manifesting through the database. Yes, Virginia, it is a database problem.

While this puts you in a bad position when things go sour, it also gives you an outstanding opportunity to own both the problem and the solution. Even if there is nothing you can do about it, your team can express their mastery over their domain by assisting with diagnosis of the problem and delegating tasks and research questions to the appropriate teams. The DBA team is not at the bottom of the development ladder, but at the center of it. Projecting that image both when times are good and when times are bad is very rewarding to the business and your team.

Focus on solutions

DBAs are a fairly vocal bunch, and at times critical of other groups. Because we are the hub of data management in a company we get to see every bottleneck along the path to production or problem resolution. And, quite frankly, this can make some DBAs a little jaded. And said DBAs might be inclined to discuss it. Publicly. In meetings.

Work with your team to focus on solutions, not problems. Sitting in a meeting discussing the reasons why something happened are not productive, particularly when it is already known where the root cause lies. This one seems very obvious, but I see it happen on a daily basis. Some people just need to complain about their aches and pains. Instead, they should be offering the elixir.

A problem should be described once. A solution should be pored over and evaluated constantly. Work with your team to focus on talking about what can be done, not what happened.

Conclusion

Data is a complex topic, and has made its way to the forefront of the minds behind your business. As the team who architects, maintains, protects, optimizes, and coordinates the flow of data the DBA team has the ability to play a powerful central role in product development and management. If your team has been relegated to a backend support role, asserting your team’s knowledge of business needs surrounding data can be a huge boon to both your position and that of your DBAs.

The post DBA Team, Front and Center appeared first on Steve Karam :: The Oracle Alchemist.

5 quick and dirty Linux tips you may not know

Thu, 2013-02-14 14:57
Tux Illustration

Remember that day you were sitting at a friendly bash$ prompt and discovered tab completion? It was like a choir of angels came down from the heavens and sang Hallelujah, the Shell Hath Come (yes, that’s a bourne again reference). While these 5 tips may not be that monumental, they are still things I wouldn’t be able to do without… perhaps they will help you as well.

1. Grep without the grep

I frequently see a lot of people running grep commands like this: ps -ef | grep pmon | grep -v grep

Since the ‘ps’ command is a process list, it unfortunately will not only return the pmon process you are looking for but the ‘grep pmon’ command you just ran.

[skaram@server2 ~]$ ps -ef | grep pmon
skaram   15755 15704  0 11:35 pts/0    00:00:00 grep pmon
oracle   31151     1  0  2012 ?        23:38:20 ora_pmon_ORCL

It is obnoxious, but easy to workaround. However, the ‘grep -v’ command is the long way. Here is the short way around the issue:

ps -ef | grep [p]mon

Simply wrap the first character of the search term in square brackets, and you are good to go.

[skaram@server2 ~]$ ps -ef | grep [p]mon
oracle   31151     1  0  2012 ?        23:38:20 ora_pmon_ORCL

2. Kill them all

Have you ever needed to, say, kill all LOCAL=NO processes on an Oracle server? Yeah, me neither. But if you did, not saying that I have, this is the kind of command I would use:

kill -9 `ps -ef | grep -i [l]ocal=no | awk '{print $2}'`

DISCLAIMER: DO NOT DO THAT ON A PRODUCTION SYSTEM.

3. Space hogs

This one is probably a little better known, but if you need to find the top space users in a directory just run the following command:

du -k | sort -n

[root@220883 log]# du -k | sort -n
4	./conman
4	./conman.old
4	./cups
4	./news/OLD
4	./pm
4	./ppp
4	./samba
4	./squid
4	./vbox
8	./mail
8	./news
52	./prelink
2864	./directadmin
4940	./httpd/domains
7000	./httpd
15576	./proftpd
20292	./audit
44856	./exim
216272	.

Note that the output number will be in kilobytes.

4. Getting the magic back

Does your job or client require you to use sudo in order to log into a database server as Oracle? Depending on the way you get X Windows going, that may have caused you grief in the past (particularly with installing Oracle). But you can easily grab your auth data and transfer it when you sudo to a new user like so:

sma11x:~ steve$ ssh -X skaram@server2
skaram@server2's password: 
[skaram@server2 ~]$ echo $DISPLAY
localhost:10.0
[skaram@server2 ~]$ xclock

And you get the clock, right? But you need to be oracle to do an install. Before you do that sudo command, type xauth list

[skaram@server2 ~]$ xauth list
server2/unix:10  MIT-MAGIC-COOKIE-1  c183a5016b775c9692b2da6b31552dae

When I sudo to oracle, notice xclock doesn’t work:

[skaram@server2 ~]$ sudo -u oracle -H bash
Password: 
bash-3.2$ xclock
X11 connection rejected because of wrong authentication.
X connection to localhost:10.0 broken (explicit kill or server shutdown).

But I can add back my authorization by using xauth add and pasting the entire line from the xauth list output.

bash-3.2$ xauth add server2/unix:10  MIT-MAGIC-COOKIE-1  c183a5016b775c9692b2da6b31552dae
xauth:  creating new authority file /export/home/oracle/.Xauthority
bash-3.2$ xclock

Doing so will make X it work once again, but now as the oracle user.

5. Screen your sessions

This one was a godsend. Have you ever been working on a long running operation and you lost your connection to Oracle, or had to go home, or god knows what else? It is a horrible pain and one that really messes up a lot if it gets you at the wrong time. That should be a thing of the past if you use screen.

Screen is a multiplexed terminal, allowing you to spawn multiple terminals in a single terminal session. Opening a new ‘screen’ is simple; simply type the word screen at a Linux command line:

[skaram@server2 ~]$ echo HELLO!
HELLO!
[skaram@server2 ~]$ screen

Notice your terminal clears and you start at a new prompt. Now I will go ahead and make it wait for input.

[skaram@server2 ~]$ read -p "Press Enter"
Press Enter

If I press CTRL+A, then CTRL+D, my screen will detach, putting me back on the ‘parent’ terminal session:

[skaram@server2 ~]$ 
[skaram@server2 ~]$ 
[skaram@server2 ~]$ 
[skaram@server2 ~]$ 
[skaram@server2 ~]$ echo HELLO!
HELLO!
[skaram@server2 ~]$ screen
[detached]
[skaram@server2 ~]$

If I want to get back to the child screen, I can type screen -rx to re-attach. Or start a new terminal with a fresh ‘screen’ command. You can even detach from multiple terminals and connect to the one of your choosing:

[skaram@server2 ~]$ screen -list
There are screens on:
	10913.pts-2.server2	(Detached)
	10883.pts-2.server2	(Detached)
	9700.pts-2.server2	(Detached)
3 Sockets in /var/run/screen/S-skaram.

[skaram@server2 ~]$ screen -rx 10883.pts-2.server2

But wait, there’s more! If you and a buddy, coworker, consultant, engineer, parole officer, etc. want to share the same screen, you can login as the same user via SSH and then both type screen -rx. You will both join the multiplexed session (screen) and be able to type and see what each other is typing. It is awesome for following along, mentoring, etc.

Conclusion

Good luck with these, and don’t do anything dangerous on production. If you have any of your own tips to share, feel free to leave a comment!

The post 5 quick and dirty Linux tips you may not know appeared first on Steve Karam :: The Oracle Alchemist.

blobFS – Imported an Oracle dump from a BLOB using a FUSE filesystem

Wed, 2013-02-13 10:33
binary

For those who have not been following, I reported previously that I had created a query-based filesystem with node.js, fuse4js, FUSE, and Oracle. This filesystem is mountable by any OS user and uses queries against Oracle tables as its source instead of disk. In this specific case, a table containing filenames and BLOBs was used as the source; the FUSE filesystem then presented the BLOB contents as a file to the OS. No actual files were written.

Curiosity led me to try a new test. What if I:

  1. Created a table and populated it
  2. Exported the table with ‘exp’
  3. Dropped the new table
  4. Loaded the exp dump file into my BLOB table
  5. Mounted the BLOB table as a filesystem
  6. Imported the table from the BLOB filesystem

I executed the test this morning, and it went incredibly smoothly. The exp dump file was easily stored as a BLOB, and ‘imp’ was able to read the file without any issue when it was translated to filesystem from the select query.

Set It Up

Table Creation

SQL> create table test_import as select * from dba_users;

Table created.

SQL> select count(*) from test_import;

   COUNT(*)
----------
        13

Exporting the Table

steve@UbuntuVM:/var/www/blobFS$ exp steve/steve file=test_import.dmp tables=test_import

Export: Release 11.2.0.2.0 - Production on Wed Feb 13 09:47:47 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 
64bit Production
Export done in US7ASCII character set and AL16UTF16 NCHAR character set
server uses AL32UTF8 character set (possible charset conversion)

About to export specified tables via Conventional Path ...
. . exporting table                    TEST_IMPORT         13 rows exported
Export terminated successfully without warnings.

Load the Dumpfile Into BLOBTAB then Drop the Exported Table

steve@UbuntuVM:/var/www/blobFS$ php loadfile.php test_import.dmp

SQL> select * from blobtab;

FILENAME
--------------------------------------------------------------------------------
FILECONTENT
--------------------------------------------------------------------------------
test_import.dmp
0300014558504F52543A5631312E30322E30300A4453544556450A525441424C45530A383139320A
300A37320A300A0001036907D000010000000000000000000F002020202020202020202020202020

SQL> drop table test_import;

Table dropped.

SQL> select count(*) from test_import;
select count(*) from test_import
                      *
ERROR at line 1:
ORA-00942: table or view does not exist

Enter blobFS

Starting the Filesystem

steve@UbuntuVM:/var/www/blobFS$ node jsonFS.js mnt
Mount point: mnt
File system started at mnt
To stop it, type this in another shell: fusermount -u mnt

steve@UbuntuVM:/var/www/blobFS$ cd mnt
steve@UbuntuVM:/var/www/blobFS/mnt$ ls -ltr
total 0
-rwxr-xr-x 0 steve steve 21848 Dec 31  1969 test_import.dmp

Let’s Import!

steve@UbuntuVM:/var/www/blobFS/mnt$ imp steve/steve file=test_import.dmp fromuser=steve touser=steve

Import: Release 11.2.0.2.0 - Production on Wed Feb 13 09:50:31 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights 
reserved.

Connected to: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 
64bit Production

Export file created by EXPORT:V11.02.00 via conventional path
import done in US7ASCII character set and AL16UTF16 NCHAR character set
import server uses AL32UTF8 character set (possible charset conversion)
. importing STEVE's objects into STEVE
. . importing table                  "TEST_IMPORT"         13 rows imported
Import terminated successfully without warnings.

The Moment of Truth

SQL> select count(*) from test_import;

   COUNT(*)
----------
        13

Conclusion

Even though this test is not much different from my last one in that I loaded files into BLOBs then mounted the FUSE filesystem to manifest them as files to the OS, but it’s awesome that I was able to use it for an Oracle-specific tool. Very meta, as they say.

I wonder if I can run an Oracle instance purely from BLOB-stored datafiles? Hmmm…

Update

Kevin Closson provided some links from his own blog which detail DBFS, Oracle’s implementation of FUSE (Note, I’d like to claim I did it before Oracle). Thanks for the great information, Kevin.

Also adding a test done by Ronald Rood, a full RW database created inside DBFS. Now I definitely want to get this working in my homebrew version.

The post blobFS – Imported an Oracle dump from a BLOB using a FUSE filesystem appeared first on Steve Karam :: The Oracle Alchemist.

Replication Platform Comparison

Tue, 2013-02-12 16:18
1360706706.jpg

Based on my experience.

Hope this helps.

The post Replication Platform Comparison appeared first on Steve Karam :: The Oracle Alchemist.

What’s the diff?

Tue, 2013-02-12 14:02
statistics

If you’re a DBA on Enterprise Edition with the Diagnostics Pack and you don’t use the Automatic Workload Repository (AWR) reports, you are seriously missing out. AWR is the holy grail of snapshot-based metrics at the instance level. There are a ton of resources out there covering AWR, so I won’t be getting into much detail about standard run-of-the-mill AWR reports. Suffice it to say that:

  • AWR reports are an outstanding starting point for delving into bottlenecks during a single snapshot window
  • They provide metrics for all major components of the Oracle instance
  • Database level metrics are also included for I/O times
  • Query details and statistics are also included for the snapshot window being viewed

Sometimes it is enough just to look at the DB Time and Top 5 Timed Events (remember, DB Time / Elapsed Time = Average Active Sessions, a great indicator of load for the snapshot window). Other times, you have to delve a little deeper to get a good view of what the instance/database were up to for the duration of the snapshot. And sometimes, that’s not enough; what about when you need to compare it to another time period?

I can’t tell you how many times I’ve walked into a customer’s meeting room to find stacks of AWR reports side by side, ready for comparison. While that is definitely helpful (and sometimes necessary for a long running issue that is difficult to trend), it is often forgotten that Oracle has a built in tool right in the rdbms/admin directory which can compare snapshot windows for you: the AWR Diff Report (awrddrpt). Like AWR, it can be run via PL/SQL or from the OS command line. I used the latter case with:

SQL> @?/rdbms/admin/awrddrpt

Like AWR, you have to choose a snapshot window, except you do it twice. You pick a first begin/end snapshot pair, then a second begin/end snapshot pair. For instance, you could compare last Tuesday from 7-9AM against this Tuesday from 7-9AM.

Specify the First Pair of Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 13747
First Begin Snapshot Id specified: 13747

Enter value for end_snap: 13749
First End Snapshot Id specified: 13749

Specify the Second Pair of Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap2: 13915
Second Begin Snapshot Id specified: 13915

Enter value for end_snap2: 13917
Second End Snapshot Id specified: 13917

Once you have generated the report (you can choose between text and html like a standard AWR report). The final result is formatted like an AWR report (though much wider), with the same major sections. However, each section will report on the 1st time window, the 2nd time window, the metric delta, and the % difference. You can truly say that your I/O was x% higher or lower this week than last week, or that a wait event had double the impact. For instance, here is an image comparing the top 5 timed events:

Screen Shot 2013-02-12 at 1.53.54 PM

Note that it reads like a combination AWR and diff report; items present in one time window but not in the other are shown, and items present in both time windows have deltas for comparison. SQL Metrics have the same capability, so you can compare executions, time, and other key metrics:

Screen Shot 2013-02-12 at 1.55.20 PM

You can also compare I/O metrics:

Screen Shot 2013-02-12 at 1.56.59 PM

And for the “what changed?” crowd, compare initialization parameters:

Screen Shot 2013-02-12 at 1.58.41 PM

You can see an example AWR diff report here. Note that object names have been changed and query text removed to protect the innocent.

If you are not using AWR diff reports, I highly suggest you make your terminal a little wider and give them a shot. For viewing them, I’d definitely recommend HTML or Notepad with word wrapping turned off.

The post What’s the diff? appeared first on Steve Karam :: The Oracle Alchemist.

A BLOB Filesystem built with FUSE, node.js, and fuse4js

Mon, 2013-02-11 05:45
Screen Shot 2013-02-10 at 5.28.01 PM

Here’s one for the “neato” category. This weekend I decided to try making a filesystem with FUSE (Filesystem in Userspace), a package which allows mountable filesystems to be customized and modified with a source other than disk. The data can come from anywhere, and it will take the traditional appearance of directories and files on the system where it is mounted.

So for this one, I decided to make a quick filesystem which uses an Oracle table containing BLOB data as its source. While FUSE has many language bindings (C, C++, Java, Python, Ruby, etc) I wanted to go with something a little more trendy, so I decided on JavaScript.

Proof of emptiness:

steve@UbuntuVM:/var/www/blobFS$ cd mnt
steve@UbuntuVM:/var/www/blobFS/mnt$ ls -ltr
total 0

The BLOB table (filename, filecontent):

SQL> select * from blobtab;

FILENAME
--------------------------------------------------------------------------------
FILECONTENT
--------------------------------------------------------------------------------
pip.jpg
FFD8FFE000104A46494600010100000100010000FFFE003B43524541544F523A2067642D6A706567
2076312E3020287573696E6720494A47204A50454720763632292C207175616C697479203D203930

40.jpg
FFD8FFE000104A46494600010100000100010000FFDB004300030202020202030202020303030304
060404040404080606050609080A0A090809090A0C0F0C0A0B0E0B09090D110D0E0F101011100A0C

wow.docx

FILENAME
--------------------------------------------------------------------------------
FILECONTENT
--------------------------------------------------------------------------------
504B030414000600080000002100E95110B08D010000C2050000130008025B436F6E74656E745F54
797065735D2E786D6C20A2040228A000020000000000000000000000000000000000000000000000

Mounting the Filesystem with node:

steve@UbuntuVM:/var/www/blobFS$ node jsonFS.js mnt/
Mount point: mnt/
File system started at mnt/
To stop it, type this in another shell: fusermount -u mnt/

And there they are (wow those are old)!

steve@UbuntuVM:/var/www/blobFS$ cd mnt
steve@UbuntuVM:/var/www/blobFS/mnt$ ls -ltr
total 0
-rwxr-xr-x 0 steve steve  77584 Dec 31  1969 wow.docx
-rwxr-xr-x 0 steve steve 132528 Dec 31  1969 pip.jpg
-rwxr-xr-x 0 steve steve  57376 Dec 31  1969 40.jpg

Now let’s make sure they are readable:

Thanks PIP-Boy!

Thanks PIP-Boy!

Sweet! A few cool things about this implementation:

  • You can use commands like ‘cp’ to copy the file out of the FUSE mounted location and onto a normal disk based filesystem for quick BLOB offloading.
  • Filesystems can be mounted by a non-root user.
  • It uses node.js for mounting the filesystem and retrieves JSON from a PHP server-side component that queries the database, meaning this filesystem can be mounted using remote data.
  • Except for the small amount of C++ changes I had to make to fuse4js, the bulk of the programming is in Javascript which makes it fairly easy for developers.
  • You can use any node.js modules to extend it, or even jQuery or other frameworks.

In order to make this madness a few components were required. The first main one was an Ubuntu Server 12.10 Quantal Quetzal VM with Apache 2, PHP5 (with OCI8), and Oracle XE 11.2.0.2 installed. I also installed node.js (downloading from the site seemed a little easier to work with than getting it from Ubuntu’s apt repo) for server-side Javascript functions. Node.js is an awesome modular package which allows event-driven, scalable HTTP servers to be created and started using pure JavaScript. It uses a tool called npm to download and compile new modules which can include anything from client requests, jQuery for easy AJAX calls, filesystem modification, socket management, and tons more. In my case, I used fuse4js, a github project from VMWare Labs which I was able to modify and expand a bit.

Overall Components:

  • Oracle 11.2.0.2 XE
  • PHP5 w/ OCI8
    • PHP script called blobFSServer.php – queries the BLOB table in Oracle and emits a base64 encoded JSON array with files and content
  • node.js with fuse4js and request modules installed via npm
    • Modified the fuse4js code a bit for arguments
    • Expanded on the jsonFS.js example to use HTTP requests to the PHP server for pulling JSON from BLOB data, modified arguments and some basic parsing code

Now that the basics are done (mounting, listing, reading) I plan to add the ability to cp/mv a binary file into the FUSE mount to write the BLOB data into the database. I would also like to separate out the JSON used for listing/getattr (just filenames and attributes) and actual file content (BLOB data) since returning a huge JSON array with all file information and content is just not going to work with bigger data/more rows. One caveat of FUSE is that it has strict user control at the OS level by default; while I can set the USER_ALLOW_OTHER flag in /etc/fuse.conf, I still have to find where in fuse4js I can set the ALLOW_OTHER flag so Apache doesn’t have to run as the same user that mounted the filesystem. Once all of that is wrapped up, I would like to incorporate subdirectories, better file management, and actual generic code (there’s a lot of hard coding in my prototype thus far). If people are interested, I’d consider making a github project or something for this.

All in all, FUSE is a really awesome piece of kit. The filesystem is low level enough that it plays a part in everything a developer, DBA, sysadmin, etc. does and being able to source that component from high level hierarchies/content stores (databases, web pages, big data, etc) has fascinating possibilities. From an Oracle database, a few ideas (other than BLOBs) might be XML files of table data generated from DBMS_XMLGEN (particularly if it queries every time the files are read), a directory/file structure built from the data dictionary, or a /proc like filesystem with low level data pulled from V$ and X$ views.

The post A BLOB Filesystem built with FUSE, node.js, and fuse4js appeared first on Steve Karam :: The Oracle Alchemist.

Just how big is your data?

Sat, 2013-02-09 14:21
binary For the want of a nail the shoe was lost, For the want of a shoe the horse was lost, For the want of a horse the rider was lost, For the want of a rider the battle was lost, For the want of a battle the kingdom was lost, And all for the want of a horseshoe-nail. – Benjamin Franklin

A while back (2007 to be exact, an eternity in Internet years), Google released a product called Google 411. You could call either 1-800-GOOG-411 or 1-877-GOOG-411 and search for businesses by city and state, category, or other criteria. It was a direct competitor to the local expensive 411 services, and it was completely free.

I remember at the time a lot of people I worked with scoffed at Google. It can’t be monetized, they said. Just like people used to say about Google’s 95% blank home page. What could they possibly have to gain from launching a free 411 phone service, obviously at some expense?

The answer is of course data. Google 411 has been shut down since 2010, but while it was around Google was able to harvest millions of examples of dialects, speech patterns, and vocal nuances. By launching a free service to be used and enjoyed by everyone, they were harvesting valuable data into a phoneme database that would play a key role in audio indexing.

To me, this is a fine example of how big data knows no bounds. For some companies big data isn’t just about the data they feel they may need, but the data they may never need. It’s collecting both the signal and the noise because even the noise is valuable. Even small inconsequential bits of log data could one day save your company as Ben Franklin’s horseshoe-nail saves the kingdom. And you’ll never know unless you collect it.

Think of it this way: every time a log file ages out, or you delete/truncate a table, every time a single bit is wiped away a piece of information is lost. Who knows what that information may have taught us once combined and visualized? Lost information is lost measurement returning whatever details it described to an unknown state. And what is unknown to your data is unknown to your company. Data is corporate memory. Even at the very least, storing and tracking data could help prepare your company for an uncertain future.

So should I just save everything?

Well no, that would be crazy. If you haven’t noticed I get a little overzealous when it comes to the possibilities of mass storage and correlation of data. But that’s on a futuristic and unrealistic level. At the corporate level storing every single bit of user, server, network, customer, and business data and retaining it ad infinitum could easily number in the petabytes. Good luck with that MapReduce at realistic prices. Author’s Note: if you’re reading this and you work at Facebook, great job on those petabytes.

Space is big. You just won’t believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it’s a long way down the road to the chemist’s, but that’s just peanuts to space. – Douglas Adams, Hitchhiker’s Guide to the Galaxy

It’s called Big Data, not Biggest Data. What is big data to your company might be peanuts to someone else (like Google and their phonobase). If you decide a foray into big data is worth it to your company and if you can assemble the team, the tools, and the information necessary to get it going the decisions on how far you take it are up to you. You get to pick what to keep and what to discard, how long to keep it, and what you hope to learn from it. If you’re not getting the results you hoped for then you can broaden your horizons or hunker down and keep storing data.

With the awesome compute power of software/hardware like Hadoop, Hive, HBase, MongoDB, Exadata, Teradata, Netezza, GreenPlum, etc, and the ‘resurgence’ of data visualization and statistical computing through commercialization the beauty of Big Data is that it can be as small or big as you want it to be.

The post Just how big is your data? appeared first on Steve Karam :: The Oracle Alchemist.

%The fabulous YAML

Thu, 2013-02-07 12:55
Screen Shot 2013-02-07 at 1.56.10 PM
					
				-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">
---

The Problem: Moving data outside of a database offers multiple options, 
             each with their own pros and cons.

The Solutions:

  XML (Extensible Markup Language): 
    What it is: A markup language designed for full featured 
                data exchange
    Info: http://en.wikibooks.org/wiki/XML_-_Managing_Data_Exchange
    Provides:
    - a schema to provide structure and definitions for your document
    - markup to describe your content
    - broad acceptance in the business and development communities

  JSON (JavaScript Object Notation): 
    What it is: A more human readable serialized data interchange 
                format used originally for JavaScript
    Info: http://www.json.org/
    Provides:
    - easy to read key/value pairing
    - native datatypes with schema validation
    - excellent compatibility with major development platforms
    - strong use in AJAX development

  CSV (Comma Separated Values): 
    What it is: Ol' Faithful in data interchange, not very standardized 
                or descriptive, but still useful
    Info: http://en.wikipedia.org/wiki/Comma-separated_values
    Provides:
    - a quick format for loading structured data in delimited or fixed 
      format
    - comfort for those who don't know or have the inclination to learn a 
      more modern format
    - broad compatibility with nearly any language or database environment 
      natively

# Which brings us to YAML

YAML: 
  Info: http://www.yaml.org/
  What it is: Data serialization that is super easy to read and write, first
              proposed in 2001 by Clark Evans.
  What it ain't: YAML Ain't Markup Language # get it?
  Description: 
    YAML is actually designed to be very close to JSON; in fact, every JSON
    document is a valid YAML document (but not the other way around). The big
    difference is readability. YAML is focused on being extremely human
    readable. In fact as you've probably guessed, this blog post is formatted
    in YAML. 

    It is important to remember that YAML is really incomparable to XML.
    While both of them can be used as data interchange formats, the purposes
    are fundamentally different. Whereas XML is all about defining self
    describing data through markup and providing values for that data in
    a variety of ways, YAML is purely focused on serializing data in a
    readable and parseable format.

    And I have to say, YAML just squeezes by on being a parseable format.
    Combinations of characters and indentation determine the type of node
    each document line is (though multiline content is possible, as is
    the case in this text). In the end, the only true types of nodes which
    exist in YAML are Collections and Scalars. A collection can be either a
    sequence of data or a key/value pair. Scalars are any type of integer,
    string, date, and so on. Very few rules are enforced when it comes to
    the actual content; however, YAML is very particular about indentation
    (of course), and that each node of a collection must be unique. This
    is a bonus for relational table loading/unloading, as primary key
    integrity is guaranteed in a given collection of mappings. It also
    forces more organized and more understandable data for human
    reading.

    Code exists for Python, PHP, C++, Ruby, perl, JavaScript, and more.

    To learn more about YAML, check out the spec and play around with it
    a bit. You can even find tools like a YAML Parser online, which can
    convert YAML to JSON. In fact, you can copy the contents of this blog
    post (dashes to dots) straight into it to see the result.

    You can also check out a quick usage with PHP that I made using this
    blog text by clicking here.

...

The post %The fabulous YAML appeared first on Steve Karam :: The Oracle Alchemist.