Skip navigation.

Feed aggregator

The Long Goodbye

Floyd Teter - Wed, 2015-05-20 10:43
I'm in the process of saying a long goodbye to my iPad 2...and tablets in general, at least for now.  I'm losing my tablet for a couple of reasons.

First, I've pushed the limits of my iPad's usability.  I really have...for a couple of years.  But I just don't find myself using it for any kind of serious work.  My MacBook Air has really taken over for both premise-based and cloud-based work.  Even with a keyboard of the tablet, I find myself much more productive on the Air.

Second, I'm finally upgrading my iPhone 5 next month.  The most likely upgrade candidate is the iPhone 6 Plus.  Yes, the Samsung Droid offerings look really great - even better in many respects.  But I'm invested in the Apple platform and there's not enough extra benefit in other platforms to justify a transition in my mind.  So it's likely the 6 Plus...with a very large screen.  And I see the 6 Plus taking over the few functions I now perform with my iPad.  So the iPad would become just another weight to lug through the many airports I visit every year.  I'm actually looking forward to cutting back on the amount of tech gear carried when I eliminate the iPad.

In all fairness, I think I would have come to this conclusion with any tablet.  I still consider the iPad to be the best of the bunch.  I have just come to the conclusion that the tablet format just does not provide enough value for least, not right now.

What's to become of the iPad?  It may become a third screen in my home office.  Or it may become a media viewer on a planned treadmill.  I mean, it's nice and all...but, at least for me, I've discovered the iPad concept to be much more of a nice to have than a got to have.  Cool but not necessary.

What about you?  Different view or experience?  Sound off in the comments.

CALL FOR NOMINATIONS – 2015 Oracle Excellence Award: Sustainability Innovation

Linda Fishman Hoyle - Wed, 2015-05-20 10:40

Is your organization using an Oracle product to help with a sustainability initiative while reducing costs? Saving energy? Saving gas? Saving paper?

For example, you may use Oracle’s Agile Product Lifecycle Management to design more eco-friendly products, Oracle Cloud Solutions to help drive down power consumption, Oracle Transportation Management to reduce fleet emissions, Oracle Exadata Database Machine to decrease power and cooling needs while increasing database performance, Oracle Environmental Accounting and Reporting to measure environmental impacts, or one of many other Oracle products.

Your organization may be eligible for the 2015 Oracle Excellence Award: Sustainability Innovation.

Submit a nomination form located here by Friday June 19 if your company is using any Oracle product to take an environmental lead as well as to reduce costs and improve business efficiencies using green business practices. These awards will be presented during Oracle OpenWorld 2015 (October 25-29) in San Francisco.

About the Award

  • Winners will be selected from the customer and/or partner nominations. Either a customer, their partner, or Oracle representative can submit the nomination form on behalf of the customer.
  • Winners will be selected based on the extent of the environmental impact they have had as well as the business efficiencies they have achieved through their combined use of Oracle products.

Nomination Eligibility

  • Your company uses at least one component of Oracle products, whether it's the Oracle database, business applications, Fusion Middleware, or Oracle Sun servers/storage.  
  • This solution should be in production or in active development.
  • Nomination deadline:  Friday June 19, 2015.

Benefits to Award Winners

  • Award presented to winners during Oracle OpenWorld by Jeff Henley, Oracle Executive Vice Chairman of the Board  
  • 2015 Oracle Sustainability Innovation Award logo for inclusion on your own website and/or other marketing materials
  • Possible placement in Oracle Profit Magazine and/or Oracle Magazine

See last year's winners here.

Questions? Send an email to:

Follow Oracle’s Sustainability Solutions on Twitter, LinkedIn, YouTube, and the Sustainability Matters blog.

Flipkart and Focus 3 - There’s Something (Profitable) About Your Privacy

Abhinav Agarwal - Wed, 2015-05-20 09:45
The third in my series on Flipkart and focus appeared in DNA on April 18th, 2015.

Part III – There’s Something (Profitable) About Your Privacy
Why do so many companies hanker after apps? Smartphone apps, tablet apps, iOS apps, Android apps, app-this, app-that….
Leave aside for a moment the techno-pubescent excitement that accompanies the launch of every new technology (if you are not old enough to remember words like “client-server[1]”, then “soa[2]” will surely sound familiar enough). Every Marketing 101 course drills into its students that acquiring a new customer is way costlier than retain an existing. Loyal customers (leaving aside the pejorative connotation the word “loyal” carries, implying that customers who shop elsewhere for a better deal are of dubious moral character) are what you should aspire to – that keep buying from you for a longer period of time[3] – and which allows you to refocus your marketing and advertising dollars towards the acquisition of newer customers, faster. If you spend less on unnecessary discounts and expensive retention schemes then margins from existing customers are automatically higher.

Customers can stay loyal if you can build a bond of affinity with them. You should aspire to be more like the local kirana owner (only infinitely richer), who in a perfect world knew everything about you – your likes, dislikes, which festivals you celebrated, and therefore which sweets you would buy, when your relatives came over to stay and what their likes were, what exotic food items you wanted, and so on. And who knew your name. Hence the marketer’s love for loyalty programs[4], no matter that customer loyalty is notoriously difficult to guarantee[5].

In the world of online retailing (actually, it applies just as well to any kind of retailing), how do you get to acquire a deep level of intimacy with your customer? Smartphone apps provide this degree of intimacy that desktop / laptop browsers cannot. This is by simple virtue of the fact that the smartphone travels with the user, the user is constantly logged on to the app, and the app knows where you go and where you are. So no wonder that in December 2011, Amazon offered a “brazen[6]” deal to its customers in brick-and-mortar stores to do an “in-store” price-check of items using the Amazon Price Check app[7], and if the same product was available on Amazon, get it at a discount off the store’s price. Though termed “not a very good deal[8]”, it nonetheless angered[9] the Retail Industry Leaders Association, and elsewhere was described as “Evil But It's the Future[10]”. The combination of availability – the app was installed on the smartphone that was with the user – and the integrated capabilities in the device – a camera that fed into a barcode scanner app –made this possible. The appeal of apps is undeniable.

The magical answer is – “app”. Your best-thing-since-sliced-bread app is installed on the customer’s smartphone (or tablet or phablet), is always running (even when it is not supposed to be running), knows everyone in your contacts (from your proctologist to the illegal cricket bookie), can hear what you speak (even your TV can do this now[11]), knows where you are, who you call, what text messages you send and receive, knows what other apps you have installed on your smartphone (presumably so it can see how potentially disloyal you could be), which Wi-Fi networks you connect to, access what photos and videos you have taken (naughty!) and so on and so forth. All this the better to hear you with, the better to see you with, and ultimately the better to eat you (your wallet) with – with due apologies to Little Red Riding Hood[12]. You may want to take a closer look at the permissions your favorite app wants when you install it – like Amazon India[13], eBay[14], Flipkart[15], Freecharge[16], HomeShiop18[17], Jabong[18], MakeMyTrip[19], Myntra[20], SnapDeal[21]. Great minds do seem to think alike, don’t they?

[Technical aside: I covered the red herrings thrown in favour of apps in the first part, but here is some more… You can store more data, more effectively, and process that data better using an app than you can with a plain browser-based approach. True. But not quite. The ever-evolving world of HTML5 (the standard that underpins how information is structured and presented on the web) has progressed to make both these points moot – with offline storage[22] and local SQL database support[23]. Yes, there are arguments to be made about handling large amounts of data offline with browser-based mechanisms, but these are for the most part edge-cases. To be fair, there are some high-profile cases of companies switching to native apps after experimenting with HTML5-based apps (hybrid apps that wrapped a browser-based UI with a native shell), like LinkedIn[24] and Facebook[25]. The appeal of apps therefore is undeniable. But, as I argued earlier, the appeal of apps does not negate the utility of browser-based interfaces.]

What is all this useful for? Your app now knows that you Ram, Shyam, and Laxman in your contacts have birthdays coming up, and it can suggest an appropriate gift for them. Convenient, isn’t it? While driving to work, you can simply tell your app – speak out the commands – to search for the latest perfume that was launched last week and to have it gift wrapped and delivered to your wife. The app already has your credit card details, and it knows your address. Your app knows that you are going on a vacation next week (because it can access your calendar, your SMS-es, and perhaps even your email) to Sikkim; it helpfully suggests a wonderful travel book and some warm clothing that you may need. The imagined benefits are immense.

But, there is a distinctly dark side to apps – as it relates to privacy – that should be a bigger reason of concern for customers and smartphone users alike. Three sets of examples should suffice.
You get a flyer from your favourite brick-and-mortar store, letting you know that you can buy those items that your pregnant daughter will need in the coming weeks. You head over to the store, furious – because your daughter is most certainly not pregnant. Later you find out that she is, and that the store hadn’t made a mistake. It turns out the truth is a little more subtler than that[26], and a little more sedate than what tabloid-ish coverage - with headlines like “How Companies Learn Your Secrets[27]” - made it out to be (the original presentation made at the PAW Conference is also available online[28]).

There are enough real dangers in this world without making it easier to use technology to make it even more unsafe. Considering how unsafe[29] air travel can be for women[30] and even girls[31], one has to question the wisdom of making it even[32] more so[33]. If this does not creep you out, then perhaps the Tinder app – which uses your location and “displays a pile of snapshots of potential dates in a user’s immediate area”[34], to as close as within 100 feet[35] - may give you pause for thought.

Do apps need all the permissions they ask for? No. But, … no! Would they work if they didn’t have all those permissions? 99% of the time, yes – they would work without a problem. For example, an app would need to access your camera if you wanted to scan a barcode to look up a product. The app would need access to your microphone if you wanted to speak out your query rather than type it in the app. What if you don’t particularly care about pointing your camera at the back of books to scan their barcodes, or speaking like Captain Kirk into your phone? Sorry, you are out of luck. You cannot selectively choose to not grant to certain privileges to an app – at least on a device running the Android mobile operating system. In other words, it is a take-it-or-leave-it world, where the app developer is in control. Not you. And wanting to know your location? Even if you are a dating app, it’s still creepy.

But surely app makers will ask you before slurping your very personal, very private information to its servers in the cloud? Yes, of course – you believe that to be true, especially if you are still in kindergarten.

A few weeks before its IPO[36], JustDial’s app was removed from the Google Play Store[37]. It was alleged that the updated version of the JustDial app had “started retrieving and storing the user’s entire phone book, without a warning or disclaimer. [38],[39]” Thereafter, JustDial’s mobile “Terms and Conditions” were updated to include the following line: “You hereby give your express consent to Justdial to access your contact list and/or address book for mobile phone numbers in order to provide and use the Service.[40]

In 2013, US-based social networking app Path was caught as it “secretly copied all its users’ iPhone address books to its private servers.”[41] Action was swift. The FTC investigated and reached a settlement with Path, which required “Path, Inc. to establish a comprehensive privacy program and to obtain independent privacy assessments every other year for the next 20 years. The company also will pay $800,000 to settle charges that it illegally collected personal information from children without their parents’ consent.”[42] In the US, a person’s address book “is protected under the First Amendment[43].” When the controversy erupted, it was also reported that “A person’s contacts are so sensitive that Alec Ross, a senior adviser on innovation to Secretary of State Hillary Rodham Clinton, said the State Department was supporting the development of an application that would act as a “panic button” on a smartphone, enabling people to erase all contacts with one click if they are arrested during a protest[44].” Of course, politics is not without its dose of de-rigueur dose of irony. That dose was delivered in 2015 when it emerged that Hillary Clinton had maintained a private email account even as she was Secretary of State in the Barack Obama presidency and refused to turn over those emails[45].

So what happened to Just Dial for allegedly breaching its users’ privacy? Nothing. No investigation. No fine. No settlement. No admission. No mea-culpa. In short, nothing. It was business as usual.
Apps can be incredibly liberating in eliminating friction in the buying process. But hitching your strategy to an app-only world is needless. It is an expensive choice – from many, many perspectives, and not just monetary. The biggest costs are of making you look immature should you have to reverse direction. As a case-in-point, one can point to the entirely avoidable brouhaha over Flipkart, Airtel, and Net Neutrality[46]. In this battle, no one came smelling like roses, least of all Flipkart, which attracted mostly negative attention[47] from the ill-advised step, notwithstanding post-fact attempts to bolt the stable door[48].

Let me end with an analogy. The trackpad on your laptop is very, very useful. Do you then disable the use of an externally connected mouse?

Disclaimer: views expressed are personal.

[1] "Computerworld - Google Books",
[2] "SOA: Hype vs. Reality - Datamation",
[3] "How Valuable Are Your Customers? - HBR",
[4] "Loyalty programmes: Are points that consumers stockpile juicy enough to keep them coming back? - timesofindia-economictimes",
[5] "What Loyalty? High-End Customers are First to Flee — HBS Working Knowledge",
[6] "Amazon's Price Check App Undercuts Brick-and-Mortar Stores Prices |",
[7] " Help: About the Amazon Price Check App",
[8] "Amazon pushing Price Check app with controversial online discounts | The Verge",
[9] "Retail association pissed about's Price Check app - GeekWire",
[10] "Amazon Price Check May Be Evil But It's the Future - Forbes",
[11] "Samsung smart TV issues personal privacy warning - BBC News",
[12] "Little Red Riding Hood - Wikipedia, the free encyclopedia",
[22] "Web Storage",
[23] "Offline Web Applications",
[24] "Why LinkedIn dumped HTML5 & went native for its mobile apps | VentureBeat | Dev | by J. O'Dell",
[25] "Mark Zuckerberg: Our Biggest Mistake Was Betting Too Much On HTML5 | TechCrunch",
[26] "Did Target Really Predict a Teen’s Pregnancy? The Inside Story",
[27] "How Companies Learn Your Secrets -",
[28] "Predictive Analytics World Conference: Agenda - October, 2010",
[29] "Federal judge upholds verdict that North Bergen man molested woman on flight ‹ Cliffview Pilot",
[30] "Man accused of groping woman on flight to Newark - NY Daily News",
[31] "Man jailed for molesting girl, 12, on flight to Dubai | The National",
[32] "Virgin is Going to Turn Your Flight Into a Creepy Bar You Can't Leave",
[33] "KLM Introduces A New Way To Be Creepy On An Airplane - Business Insider",
[34] "Tinder Dating App Users Are Playing With Privacy Fire - Forbes",
[35] "Include Security Blog | As the ROT13 turns….: How I was able to track the location of any Tinder user.",
[36], accessed April 11, 2015
[37] "Updated: JustDial App Pulled From Google Play Store; Privacy Concerns? - MediaNama",
[38] "Updated: JustDial App Pulled From Google Play Store; Privacy Concerns? - MediaNama",
[39] "Bad App Reviews for Justdial JD",, accessed April 09, 2015
[40] "Terms Of Use”,, accessed April 09, 2015
[41] "The Path Fiasco Wasn't A Privacy Breach, It Was A Data Ownership Breach - The Cloud to Cloud Backup Blog",
[42] "Path Social Networking App Settles FTC Charges it Deceived Consumers and Improperly Collected Personal Information from Users' Mobile Address Books | Federal Trade Commission",
[43] "Anger for Path Social Network After Privacy Breach -",
[44] Ibid.
[45] "Hillary Clinton deleted 32,000 'private' emails, refuses to turn over server - Washington Times",
[46] "Flipkart Pulls Out of Airtel Deal Amid Backlash Over Net Neutrality",
[47] "Flipkart's stand on net neutrality - The Hindu",

[48] "Our Internet is headed in the right direction: Amod Malviya - Livemint",

© 2015, Abhinav Agarwal (अभिनव अग्रवाल). All rights reserved.

Another Take on Maker Faire 2015

Oracle AppsLab - Wed, 2015-05-20 09:05

Editor’s note: Here’s another Maker Faire 2015 post, this one from Raymond. Check out Mark’s (@mvilrokx) recap too for AppsLab completeness.

I went to the Maker Faire 2015 Bay Area show over the weekend. A lot of similarity to last year, but a few new things.

In place of our spot last year, it was HP-Sprout demo stations. I guess HP is the main sponsor this year.


Sprout is an acquisition by HP, that they build a large touchpad and projector, as attachment to HP computer. It is kind of combination of projector, extended screen, touch screen, and working pad – that seems to blend physical things with virtual computer objects, such as capture objects into 3D graphics.

TechHive’s Mole-A-Whack is quite good station too – it is a reverse of classical Whack-A-Mole.


Here’s a video of it in action:

They use arduino-controlled Mole to whack kids who hide in the mole holes, but need raise head out of the hole cover (which is arduino-monitored), and reach to push a button (MaKey connected) to earn points.

The signals go into a Scratch program on computer for tally the winner.

This pipe organ is an impressive build:


As usual, lots of 3D printers, CNC mills, etc. and lots of drones flying.

Also I saw many college groups attending the events this year, bringing in all kinds of small builds for various applications.Possibly Related Posts:

Troubleshooting ASM Proxy instance startup

Oracle in Action - Wed, 2015-05-20 08:53

RSS content

Recently, I had trouble starting ASM proxy instance on one of the nodes in my  2 node flex cluster having nodes host01 and host02. As a result I could not access the volume I created on an ASM  diskgroup.  This post explains  how I resolved it.

While connected to host01, I created a volume VOL1 on DATA diskgroup with corresponding volume device /dev/asm/vol1-106 .

[grid@host01 root]$ asmcmd volcreate -G DATA -s 300m VOL1

[grid@host01 root]$ asmcmd volinfo -G DATA VOL1

Diskgroup Name: DATA

Volume Name: VOL1
Volume Device: /dev/asm/vol1-106
Size (MB): 320
Resize Unit (MB): 32
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS

I created  ACFS file system on the newly created volume

[root@host01 ~]# mkfs -t acfs /dev/asm/vol1-106

I also created corresponding mount point /mnt/acfsmounts/acfs1 on both the nodes in the cluster.

root@host01 ~]# mkdir -p /mnt/acfsmounts/acfs1

root@host02 ~]# mkdir -p /mnt/acfsmounts/acfs1

When I tried to mount the volume device, I could mount the volume device on host01 but not on host02 .

[root@host01 ~]#mount -t acfs /dev/asm/vol1-106 /mnt/acfsmounts/acfs1

[root@host01 ~]# mount | grep vol1

/dev/asm/vol1-106 on /mnt/acfsmounts/acfs1 type acfs (rw)

[root@host02 ~]# mount -t acfs /dev/asm/vol1-106 /mnt/acfsmounts/acfs1

mount.acfs: CLSU-00100: Operating System function: open64 failed with error data: 2
mount.acfs: CLSU-00101: Operating System error message: No such file or directory
mount.acfs: CLSU-00103: error location: OOF_1
mount.acfs: CLSU-00104: additional error information: open64 (/dev/asm/vol1-106)
mount.acfs: ACFS-02017: Failed to open volume /dev/asm/vol1-106. Verify the volume exists.

The corresponding volume device was visible on host01 but not on host02

[root@host01 ~]# cd /dev/asm
[root@host01 asm]# ls

[root@host02 ~]# cd /dev/asm
[root@host02 asm]# ls

Since ADVM / ACFS utilize an ASM Proxy instance in a flex cluster to access metadata from a local /  remote  ASM instance ,  I checked whether ASM Proxy instance was running on both the nodes and realized that whereas ASM Proxy instance was running on host01, it  was not running on host02

[root@host01 ~]# ps -elf | grep pmon | grep APX

0 S grid 27782 1 0 78 0 – 350502 – 10:09 ? 00:00:00 apx_pmon_+APX1

[root@host02 asm]# ps -elf | grep pmon | grep APX

[root@host01 ~]# srvctl status asm -proxy

ADVM proxy is running on node host01

[root@host01 ~]# crsctl stat res ora.proxy_advm -t
Name Target State Server State details
Local Resources

I tried to start ASM  proxy instance manually on host02

[grid@host02 ~]$ . oraenv
ORACLE_SID = [grid] ? +APX2
The Oracle base has been set to /u01/app/grid

[grid@host02 ~]$ sqlplus / as sysasm

SQL*Plus: Release Production on Sat May 2 10:31:45 2015

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to an idle instance.

SQL> startup

ORA-00099: warning: no parameter file specified for ASMPROXY instance
ORA-00443: background process "VUBG" did not start

SQL> ho oerr ORA 00443

00443, 00000, "background process \"%s\" did not start"
// *Cause: The specified process did not start.
// *Action: Ensure that the executable image is in the correct place with
// the correct protections, and that there is enough memory.

I checked the memory allocated to VM for host02 – It was 1.5 GB as against 2.5 GB assigned to VM for host01. I  increased the memory of host02 to 2.5 GB and ASM proxy instance started automatically.

[root@host01 ~]# crsctl stat res ora.proxy_advm -t
Name Target State Server State details
Local Resources

Hope it helps!


Oracle documentation


Related Links :


12c RAC Index

12c RAC: ORA-15477: cannot communicate with the volume driver


Comments:  0 (Zero), Be the first to leave a reply!
You might be interested in this:  
Copyright © ORACLE IN ACTION [Troubleshooting ASM Proxy instance startup], All Right Reserved. 2015.

The post Troubleshooting ASM Proxy instance startup appeared first on ORACLE IN ACTION.

Categories: DBA Blogs

Irrecoverable full backup part II : reporting

Laurent Schneider - Wed, 2015-05-20 08:34

After my post Can you restore from a full online backup ?, I needed to come up with a report.

Assuming that each backup goes in a different directory, I just wrote two reports.

  1. Report gaps in v$backup_redolog (or rc_backup_redolog if you use the catalog)
    ------- ------------- ------------
    /bck01/        284891       285140
    /bck01/        285140       285178
    /bck02/        284891       285140
    === GAP ===
    /bck02/        285178       285245 
    /bck03/        285178       285245
    /bck03/        285245       286931
    /bck03/        286931       287803
    /bck03/        287803       288148

    This could be done with analytics, by checking where the last next_change is not the current first_change, within a directory

    SELECT dir, 
      LAG missing_from_change#, 
      first_change# missing_to_change#
    FROM (
      SELECT REGEXP_REPLACE (handle, '[^/\]+$') dir,
        LAG(next_change#) OVER (
          PARTITION BY REGEXP_REPLACE (handle, '[^/\]+$')
          ORDER BY first_change#
        ) LAG
      FROM v$backup_piece p
      JOIN v$backup_redolog l 
        USING (set_stamp, set_count))
    WHERE LAG != first_change#;
    ------- -------------------- ------------------
    /bck02/               285140             285178
  2. Reports directories where archivelogs don’t include changes (backup redolog) from the earliest to the latest checkpoint (backup datafile)
      REGEXP_REPLACE (handle, '[^/\]+$') dir,
      MIN (checkpoint_change#),
      MAX (checkpoint_change#),
      MIN (first_change#),
      MAX (next_change#)
    FROM v$backup_piece p
      LEFT JOIN v$backup_datafile f 
        USING (set_stamp, set_count)
      LEFT JOIN v$backup_redolog l 
        USING (set_stamp, set_count)
    WHERE handle IS NOT NULL
      MIN (checkpoint_change#) < MIN (first_change#)
      MAX (checkpoint_change#) > MAX (next_change#)
    GROUP BY REGEXP_REPLACE (handle, '[^/\]+$');
    ------- ---------- ---------- ---------- ----------
    /bck04/     954292     954299     959487    1145473

    the archives for the changes from 954292 to 959487 are missing.

If some archive backups are missing in one directory, it does not mean the database is irrecoverable, the archive backups could be in another directory. But it means that single directory would no longer permit you to restore or duplicate.

Another approach with RESTORE PREVIEW was provided by Franck in my previous post : List all RMAN backups that are needed to recover.

Usual disclaimer: there are plenty of other irrecoverabilty causes from hardware defect to backup “optimization” that are beyond the scope of this post.

Tabular Form - Add Rows Top - Universal Theme

Denes Kubicek - Wed, 2015-05-20 06:20
This old example shows how to add rows to the top of the tabular form. Unfortunately this doesn't work with the new Universal Theme. In order to make it working some small changes are required. See this example on how to do it using the new Universal Theme.


Categories: Development

I’m Iouri Chadour and this is how I work

Duncan Davies - Wed, 2015-05-20 06:00

May’s entry in the ‘How I Work’ series is PeopleSoft Blogger Iouri “Yury” Chadour. Yury has been sharing his knowledge on his Working Scripts blog for 7 years, so is a valuable and consistent member of our community. Yury’s site is full of tips, particularly new tools to try and techniques ‘around the edges’ of PeopleSoft.  Thanks, and keep up the good work Yury!


Name: Iouri Chadour

Occupation: Vice President at Lazard Freres
Location: In the office in midtown NYC
Current computer: At work I use either standard Lenovo laptop or my VM client, my own machine is Lenovo X1 Carbon
Current mobile devices: Samsung Galaxy S3, iPad Air 2, Kindle Fire (Original)
I work: best when I have a set goal in mind – I like being able to check off my achievements from the list (more on that below.) As many others fellow bloggers have mentioned – challenge and ability to learn new things on the job are very important as well.

What apps/software/tools can’t you live without?
I use all of these Software Development Tools:

Application Designer
Notepad++ with lots of plugins PeopleCode user Defined language, Compare, AutoSave, NppExport, Explorer to name a few
Firefox with Firebug, AdBlock and Hootsuite
Feedly – this my main tool for following all the blogs and keeping up to date on the news
LastPass – very convenient password management for desktop and phone
KeePass – open source password manager
Toad for Oracle 12
Oracle jDeveloper
Aptana Studio
PeopleSoft TraceMagic
Wunderlist – Android app and Desktop for Taks Management
Microsoft Project or Project Libre
MS Excel
Greenshot Screen Capture
Gimp – basic image editing

Besides your phone and computer, what gadget can’t you live without?
I like my original Kindle Fire – I use it for reading more than any other device.

What’s your workspace like?

What do you listen to while you work?
Listening really depends on the mood at time of the day. I mostly use Slacker Radio to listen to everything from contemporary and classic jazz, Classical to Parisian Electro and House music.

What PeopleSoft-related productivity apps do you use?

App Designer
PeopleSoft Query Client for writing queries
Toad 12
Notepad++ to write and examine code and logs
TraceMagic for more advanced log review
Firefox with Firebug for HTML and JavaScript issues
On occasion Aptana Studio for JavaScript and HTML

Do you have a 2-line tip that some others might not know?
If I am stuck with a very difficult problem and can’t seem to find a good solution – I usually leave it and do something else – at some point the solution or a correct directions usually comes to my mind on it’s own.

What SQL/Code do you find yourself writing most often?
Since I work with a lot of Financials Modules so everything related to those modules. I do also write some tools related SQLs when I need to examine Process Scheduler tables.

What would be the one item you’d add to PeopleSoft if you could?
Code completion and Code/Project navigator – I use Notepad++ for now.

What everyday thing are you better at than anyone else?
I do not think I do something in particular better than anyone else, but I believe that I can be more efficient about some things than some of the people.

What’s the best advice you’ve ever received?
My family and my friends provided me with a lot of advice and support and I am greatly thankful for them being present in my life. But I do like the following quote:
“The more things that you read , the more things you will know. The more you learn, the more places you’ll go.” – Dr. Seuss

MemSQL 4.0

DBMS2 - Wed, 2015-05-20 03:41

I talked with my clients at MemSQL about the release of MemSQL 4.0. Let’s start with the reminders:

  • MemSQL started out as in-memory OTLP (OnLine Transaction Processing) DBMS …
  • … but quickly positioned with “We also do ‘real-time’ analytic processing” …
  • … and backed that up by adding a flash-based column store option …
  • … before Gartner ever got around to popularizing the term HTAP (Hybrid Transaction and Analytic Processing).
  • There’s also a JSON option.

The main new aspects of MemSQL 4.0 are:

  • Geospatial indexing. This is for me the most interesting part.
  • A new optimizer and, I suppose, query planner …
  • … which in particular allow for serious distributed joins.
  • Some rather parallel-sounding connectors to Spark. Hadoop and Amazon S3.
  • Usual-suspect stuff including:
    • More SQL coverage (I forgot to ask for details).
    • Some added or enhanced administrative/tuning/whatever tools (again, I forgot to ask for details).
    • Surely some general Bottleneck Whack-A-Mole.

There’s also a new free MemSQL “Community Edition”. MemSQL hopes you’ll experiment with this but not use it in production. And MemSQL pricing is now wholly based on RAM usage, so the column store is quasi-free from a licensing standpoint is as well.

Before MemSQL 4.0, distributed joins were restricted to the easy cases:

  • Two tables are distributed (i.e. sharded) on the same key.
  • One table is small enough to be broadcast to each node.

Now arbitrary tables can be joined, with data reshuffling as needed. Notes on MemSQL 4.0 joins include:

  • Join algorithms are currently nested-loop and hash, and in “narrow cases” also merge.
  • MemSQL fondly believes that its in-memory indexes work very well for nested-loop joins.
  • The new optimizer is fully cost-based (but I didn’t get much clarity as to the cost estimators for JSON).
  • MemSQL’s indexing scheme, skip lists, had histograms anyway, with the cutesy name skiplistogram.
  • MemSQL’s queries have always been compiled, and of course have to be planned before compilation. However, there’s a little bit of plan flexibility built in based on the specific values queried for, aka “parameter-sensitive plans” or “run-time plan choosing”.

To understand the Spark/MemSQL connector, recall that MemSQL has “leaf” nodes, which store data, and “aggregator” nodes, which combine query results and ship them back to the requesting client. The Spark/MemSQL connector manages to skip the aggregation step, instead shipping data directly from the various MemSQL leaf nodes to a Spark cluster. In the other direction, a Spark RDD can be saved into MemSQL as a table. This is also somehow parallel, and can be configured either as a batch update or as an append; intermediate “conflict resolution” policies are possible as well.

In other connectivity notes:

  • MemSQL’s idea of a lambda architecture involves a Kafka stream, with data likely being stored twice (in Hadoop and MemSQL).
  • MemSQL likes and supports the Spark DataFrame API, and says financial trading firms are already using it.

Other application areas cited for streaming/lambda kinds of architectures are — you guessed it! — ad-tech and “anomaly detection”.

And now to the geospatial stuff. I thought I heard:

  • A “point” is actually a square region less than 1 mm per side.
  • There are on the order of 2^30 such points on the surface of the Earth.

Given that Earth’s surface area is a little over 500,000,000 square meters, I’d think 2^50 would be a better figure, but fortunately that discrepancy doesn’t matter to the rest of the discussion. (Edit: As per a comment below, that’s actually square kilometers, so unless I made further errors we’re up to the 2^70 range.)

Anyhow, if the two popular alternatives for geospatial indexing are R-trees or space-filling curves, MemSQL favors the latter. (One issue MemSQL sees with R-trees is concurrency.) Notes on space-filling curves start:

  • In this context, a space-filling curve is a sequential numbering of points in a higher-dimensional space. (In MemSQL’s case, the dimension is two.)
  • Hilbert curves seem to be in vogue, including at MemSQL.
  • Nice properties of Hilbert space-filling curves include:
    • Numbers near each other always correspond to points near each other.
    • The converse is almost always true as well.*
    • If you take a sequence of numbers that is simply the set of all possibilities with a particular prefix string, that will correspond to a square region. (The shorter the prefix, the larger the square.)

*You could say it’s true except in edge cases … but then you’d deserve to be punished.

Given all that, my understanding of the way MemSQL indexes geospatial stuff — specifically points and polygons — is:

  • Points have numbers assigned to them by the space-filling curve; those are indexed in MemSQL’s usual way. (Skip lists.)
  • A polygon is represented by its vertices. Take the longest prefix they share. That could be used to index them (you’d retrieve a square region that includes the polygon). But actually …
  • … a polygon is covered by a union of such special square regions, and indexed accordingly, and I neglected to ask exactly how the covering set of squares was chosen.

As for company metrics — MemSQL cites >50 customers and >60 employees.

Related links

Categories: Other

Coding in PL/SQL in C style, UKOUG, OUG Ireland and more

Pete Finnigan - Wed, 2015-05-20 01:05

My favourite language is hard to pin point; is it C or is it PL/SQL? My first language was C and I love the elegance and expression of C. Our product PFCLScan has its main functionallity written in C. The....[Read More]

Posted by Pete On 23/07/14 At 08:44 PM

Categories: Security Blogs

Integrating PFCLScan and Creating SQL Reports

Pete Finnigan - Wed, 2015-05-20 01:05

We were asked by a customer whether PFCLScan can generate SQL reports instead of the normal HTML, PDF, MS Word reports so that they could potentially scan all of the databases in their estate and then insert either high level....[Read More]

Posted by Pete On 25/06/14 At 09:41 AM

Categories: Security Blogs

Automatically Add License Protection and Obfuscation to PL/SQL

Pete Finnigan - Wed, 2015-05-20 01:05

Yesterday we released the new version 2.0 of our product PFCLObfuscate . This is a tool that allows you to automatically protect the intellectual property in your PL/SQL code (your design secrets) using obfuscation and now in version 2.0 we....[Read More]

Posted by Pete On 17/04/14 At 03:56 PM

Categories: Security Blogs

Twitter Oracle Security Open Chat Thursday 6th March

Pete Finnigan - Wed, 2015-05-20 01:05

I will be co-chairing/hosting a twitter chat on Thursday 6th March at 7pm UK time with Confio. The details are here . The chat is done over twitter so it is a little like the Oracle security round table sessions....[Read More]

Posted by Pete On 05/03/14 At 10:17 AM

Categories: Security Blogs

PFCLScan Reseller Program

Pete Finnigan - Wed, 2015-05-20 01:05

We are going to start a reseller program for PFCLScan and we have started the plannng and recruitment process for this program. I have just posted a short blog on the PFCLScan website titled " PFCLScan Reseller Program ". If....[Read More]

Posted by Pete On 29/10/13 At 01:05 PM

Categories: Security Blogs

PFCLScan Version 1.3 Released

Pete Finnigan - Wed, 2015-05-20 01:05

We released version 1.3 of PFCLScan our enterprise database security scanner for Oracle a week ago. I have just posted a blog entry on the PFCLScan product site blog that describes some of the highlights of the over 220 new....[Read More]

Posted by Pete On 18/10/13 At 02:36 PM

Categories: Security Blogs

PFCLScan Updated and Powerful features

Pete Finnigan - Wed, 2015-05-20 01:05

We have just updated PFCLScan our companies database security scanner for Oracle databases to version 1.2 and added some new features and some new contents and more. We are working to release another service update also in the next couple....[Read More]

Posted by Pete On 04/09/13 At 02:45 PM

Categories: Security Blogs

Oracle Security Training, 12c, PFCLScan, Magazines, UKOUG, Oracle Security Books and Much More

Pete Finnigan - Wed, 2015-05-20 01:05

It has been a few weeks since my last blog post but don't worry I am still interested to blog about Oracle 12c database security and indeed have nearly 700 pages of notes in MS Word related to 12c security....[Read More]

Posted by Pete On 28/08/13 At 05:04 PM

Categories: Security Blogs

Row Store vs Column Store in SAP HANA

Yann Neuhaus - Wed, 2015-05-20 00:00

The SAP HANA database allows you to create your tables in Row or Column Store mode. In this blog, I will demonstrate that each method has its advantages and disadvantages and should be used for specific cases.

Thanks to two kind of tests, I will show you that the Row Store mode should be used for simple SELECT SQL queries, without aggregation and the Column Store mode should be used for complex SELECT queries, containing aggregation levels.

If you want to have more information regarding the Column Store or the In-memory technologies, don't hesitate to assist at the next dbi services event:

Test 1: Simple SELECT query Goal of the tests

This test will show you the difference of performance using a Row Store and a Column Store table in a simple SQL query.

Description of the test

A SELECT query will be send to the database and we will check the Server time response.

SQL Query Using a Row Store table

The SQL is the following:


Using a Column Store table

The SQL is the following:


Tables Row Store Table

You can find here information regarding the Row Store table used in the test.

Name:                 SALES_ROW

Table type:          Row Store

Row count:         10 309 873

Index:                1

Partition:            0 (SAP HANA doesn’t allow the possibility to create partition on Row Store table)




Column Store Table

You can find here information regarding the Column Store table used in the test.

Name:                  SALES_COLUMN

Table type:           Column Store

Row count:          10 309 873

Index:                 0 (SAP HANA automatically apply a index if it is need)

Partition:             1 RANGE partition on CUST_ID


Result of the test Using the Row Store table


Using the Column Store table


Test 2: Complex SELECT query Goal of the tests

This test will show you the difference of performance using a Row Store and a Column Store table in a complex SQL query.

Description of the test

A SELECT query will be send to the database and we will check the Server time response.

SQL Query Using a Row Store table

The SQL is the following:


Using a Column Store table

The SQL is the following:


Tables Row Store Fact Table

You can find here information regarding the Row Store table used in the test.

Name:                  SALES_ROW

Table type:          Row Store

Row count:         10 309 873

Index:                   2

Partition:             0 (SAP HANA doesn’t allow the possibility to create partition on Row Store table)

Column Store Fact Table

You can find here information regarding the Column Store table used in the test.

Name:                  SALES_COLUMN

Table type:          Column Store

Row count:         10 309 873

Index:                   0 (SAP HANA automatically apply a index if it is need)

Partition:             1 RANGE partition on CUST_ID

Result of the test Using the Row Store tables


Using the Column Store tables



Row and Column store modes in SAP HANA should be used in two different contexts:

 - Tables in Row store mode must be used in SELECT queries WITHOUT any aggregation functions

 -Tables in Column store mode are powerful when they are used to create analytical queries or view, using aggregation functions (GROUP BY, …)

The performance can be highly optimized if the tables selected in the queries have the right store mode.




2 minute Tech Tip: Working with JSON in APEX

Dimitri Gielis - Tue, 2015-05-19 16:30
On Monday Bob Rhubart did a video call with me in his series of 2MTT (2 Minute Tech Tip) on YouTube. You find my 2MMT here.

I talked about using JSON and APEX and gave two examples were we use it.
In previous blog posts I gave more details on those techniques. Here's a quick overview:
Categories: Development

Using HBase and Impala to Add Update and Delete Capability to Hive DW Tables, and Improve Query Response Times

Rittman Mead Consulting - Tue, 2015-05-19 16:21

One of our customers is looking to offload part of their data warehouse platform to Hadoop, extracting data out of a source system and loading it into Apache Hive tables for subsequent querying using OBIEE11g. One of the challenges that the project faces though is how to handle updates to dimensions (and in their case, fact table records) when HDFS and Hive are typically append-only filesystems; ideally writes to fact tables should only require INSERTs and filesystem appends but in this case they wanted to use an accumulating fact snapshot table, whilst the dimension tables all used SCD1-type attributes that had their values overwritten when updates to those values came through from the source system.

The obvious answer then was to use Apache HBase as part of the design, a NoSQL database that sits over HDFS but allows updates and deletes to individual rows of data rather than restricting you just to append/inserts. I covered HBase briefly on the blog a few months ago when we used it to store webserver log entries brought into Hadoop via Flume, but in this case it makes an ideal landing point for data coming into our Hadoop system as we can maintain a current-state record of the data brought into the source system updating and overwriting values if we need to. What was also interesting to me though was how well we could integrate this HBase data into our mainly SQL-style data processing; how much Java I’d have to use to work with HBase, and whether we could get OBIEE to connect to the HBase tables and query them directly (with a reasonable response time). In particular, could we use the Hive-on-HBase feature to create Hive tables over the HBase ones, and then query those efficiently using OBIEE, so that the data flow looked like this?


To test this idea out, I took the Flight Delays dataset from the OBIEE11g SampleApp & Exalytics demo data [PDF] and created four HBase tables to hold the data from them, using the BigDataLite 4.1 VM and the HBase Shell. This dataset has four tables:

  • FLIGHT_DELAYS – around 220m US flight records listing the origin airport, destination airport, carrier, year and a bunch of metrics (flights, late minutes, distance etc)
  • GEOG_ORIGIN – a list of all the airports in the US along with their city, state, name and so on
  • GEOG_DEST – a copy of the GEOG_ORIGIN table, used for filtering and aggregating on both origin and destination 
  • CARRIERS – a list of all the airlines associated with flights in the FLIGHT_DELAYS table

HBase is a NoSQL, key/value-store database where individual rows have a key, and then one or more column families made up of one or more columns. When you define a HBase table you only define the column families, and the data load itself creates the columns within them in a similar way to how the Endeca Server holds “jagged” data – individual rows might have different columns to each other and like MongoDB you can define a new column just by loading it into the database.

Using the HBase Shell CLI on the BigDataLite VM I therefore create the HBase tables using just these high-level column family definitions, with the individual columns within the column families to be defined later when I load data into them.

hbase shell
create 'carriers','details'
create 'geog_origin','origin'
create 'geog_dest','dest'
create 'flight_delays','dims','measures'

To get data into HBase tables there’s a variety of methods you can use. Most probably for the full project we’ll write a Java application that uses the HBase client to read, write, update and delete rows that are read in from the source application (see this previous blog post for an example where we use Flume as the source), or to set up some example data we can use the HBase Shell and enter the HBase row/cell values directly, like this for the geog_dest table:

put 'geog_dest','LAX','dest:airport_name','Los Angeles, CA: Los Angeles'
put 'geog_dest','LAX','dest:airport_name','Los Angeles, CA: Los Angeles'
put 'geog_dest','LAX','dest:city','Los Angeles, CA'
put 'geog_dest','LAX','dest:state','California'
put 'geog_dest','LAX','dest:id','12892'

and you can then use the “scan” command from the HBase shell to see those values stored in HBase’s key/value store, keyed on LAX as the key.

hbase(main):015:0> scan 'geog_dest'
ROW                                    COLUMN+CELL                                                                                                     
 LAX                                   column=dest:airport_name, timestamp=1432067861347, value=Los Angeles, CA: Los Angeles                           
 LAX                                   column=dest:city, timestamp=1432067861375, value=Los Angeles, CA                                                
 LAX                                   column=dest:id, timestamp=1432067862018, value=12892                                                            
 LAX                                   column=dest:state, timestamp=1432067861404, value=California                                                    
1 row(s) in 0.0240 seconds

For testing purposes though we need a large volume of rows and entering them all in by-hand isn’t practical, so this is where we start to use the Hive integration that now comes with HBase. For the BigDataLite 4.1 VM all you need to do to get this working is install the hive-hbase package using yum (after first installing the Cloudera CDH5 repo into /etc/yum.repos.d), load the relevant JAR files when starting your Hive shell session, and then create a Hive table over the HBase table mapping Hive columns to the relevant HBase ones, like this:

ADD JAR /usr/lib/hive/lib/zookeeper.jar;
ADD JAR /usr/lib/hive/lib/hive-hbase-handler.jar;
ADD JAR /usr/lib/hive/lib/guava-11.0.2.jar;
ADD JAR /usr/lib/hive/lib/hbase-client.jar;
ADD JAR /usr/lib/hive/lib/hbase-common.jar;
ADD JAR /usr/lib/hive/lib/hbase-hadoop-compat.jar;
ADD JAR /usr/lib/hive/lib/hbase-hadoop2-compat.jar;
ADD JAR /usr/lib/hive/lib/hbase-protocol.jar;
ADD JAR /usr/lib/hive/lib/hbase-server.jar;
ADD JAR /usr/lib/hive/lib/htrace-core.jar;
 (key string,
  carrier_desc string
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
("hbase.columns.mapping" = ":key,details:carrier_desc")
TBLPROPERTIES ("" = "carriers");
CREATE EXTERNAL TABLE hbase_geog_origin
 (key string,
  origin_airport_name string,
  origin_city string,
  origin_state string,
  origin_id string
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
("hbase.columns.mapping" = ":key,origin:airport_name,origin:city,origin:state,origin:id")
TBLPROPERTIES ("" = "geog_origin");
 (key string,
  dest_airport_name string,
  dest_city string,
  dest_state string,
  dest_id string
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
("hbase.columns.mapping" = ":key,dest:airport_name,dest:city,dest:state,dest:id")
TBLPROPERTIES ("" = "geog_dest");
CREATE EXTERNAL TABLE hbase_flight_delays
 (key string,
  year string,
  carrier string,
  orig string,
  dest string,
  flights tinyint,
  late   tinyint,
  cancelled bigint,
  distance smallint
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
("hbase.columns.mapping" = ":key,dims:year,dims:carrier,dims:orig,dims:dest,measures:flights,measures:late,measures:cancelled,measures:distance")
TBLPROPERTIES ("" = "flight_delays");

Bulk loading data into these Hive-on-HBase tables is then just a matter of loading the source data into a regular Hive table, and then running INSERT INTO TABLE … SELECT commands to copy the regular Hive rows into the HBase tables via their Hive metadata overlays:

insert into table hbase_carriers                           
select carrier, carrier_desc from carriers;
insert into table hbase_geog_origin
select * from geog_origin;
insert into table hbase_geog_dest
select * from geog_dest;
insert into table hbase_flight_delays
select row_number() over (), * from flight_delays;

Note that I had to create a synthetic sequence number key for the fact table, as the source data for that table doesn’t have a unique key for each row – something fairly common for data warehouse fact table datasets. In fact storing fact table data into a HBase table is not a very good idea for a number of reasons that we’ll see in a moment, and bear-in-mind that HBase is designed for sparse datasets and low-latency inserts and row retrievals so don’t read too much into this approach yet.

So going back to the original reason for using HBase to store these tables, updating rows within them is pretty straightforward. Taking the geog_origin HBase table at the start, if we get the row for SFO at the start using a Hive query over the HBase table, it looks like this:

hive> select * from hbase_geog_origin where key = 'SFO'; 
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
SFO   San Francisco, CA: San Francisco   San Francisco, CA   California   14771
Time taken: 29.126 seconds, Fetched: 1 row(s)

To update that row and others, I can load a new data file into the Hive table using HiveQL’s LOAD DATA command, or INSERT INTO TABLE … SELECT from another Hive table containing the updates, like this:

insert into table hbase_geog_origin    
select * from origin_updates;

To check that the value has in-fact updated I can either run the same SELECT query against the Hive table over the HBase one, or drop into the HBase shell and check it there:

hbase(main):001:0> get 'geog_origin','SFO'
COLUMN                                 CELL                                                                                                           
 origin:airport_name                   timestamp=1432050681685, value=San Francisco, CA: San Francisco International                                  
 origin:city                           timestamp=1432050681685, value=San Francisco, CA                                                               
 origin:id                             timestamp=1432050681685, value=14771                                                                           
 origin:state                          timestamp=1432050681685, value=California                                                                      
4 row(s) in 0.2740 seconds

In this case the update file/Hive table changed the SFO airport name from “San Francisco” to “San Francisco International”. I can change it back again using the HBase Shell like this, if I want:

put 'geog_origin','SFO','origin:airport_name','San Francisco, CA: San Francisco'

and then checking it again using the HBase Shell’s GET command on that key value shows it’s back to the old value – HBase actually stores X number of versions of each cell with a timestamp for each version, but by default it shows you the current one:

hbase(main):003:0> get 'geog_origin','SFO'
COLUMN                                 CELL                                                                                                           
 origin:airport_name                   timestamp=1432064747843, value=San Francisco, CA: San Francisco                                                
 origin:city                           timestamp=1432050681685, value=San Francisco, CA                                                               
 origin:id                             timestamp=1432050681685, value=14771                                                                           
 origin:state                          timestamp=1432050681685, value=California                                                                      
4 row(s) in 0.0130 seconds

So, so far so good. We’ve got a way of storing data in Hive-type tables on Hadoop and a way of updating and amending records within them by using HBase as the underlying storage, but what are these tables like to query? Hive-on-HBase tables with just a handful of HBase rows return data almost immediately, for example when I create a copy of the geog_dest HBase table and put just a single row entry into it, then query it using a Hive table over it:

hive> select * from hbase_geog_dest2;
LAXLos Angeles, CA: Los AngelesLos Angeles, CACalifornia12892
Time taken: 0.257 seconds, Fetched: 1 row(s)

Hive in this case even with a single row would normally take 30 seconds or more to return just that row; but when we move up to larger datasets such as the flight delays fact table itself, running a simple row count on the Hive table and then comparing that to the same query running against the Hive-on-HBase version shows a significant time-penalty for the HBase version:

hive> select sum(cast(flights as bigint)) as flight_count from flight_delays;
Total jobs = 1
Launching Job 1 out of 1
Total MapReduce CPU Time Spent: 7 seconds 670 msec
Time taken: 37.327 seconds, Fetched: 1 row(s)

compared to the Hive-on-HBase version of the fact table:

hive> select sum(cast(flights as bigint)) as flight_count from hbase_flight_delays;
Total jobs = 1
Launching Job 1 out of 1
Total MapReduce CPU Time Spent: 1 minutes 19 seconds 240 msec
Time taken: 99.154 seconds, Fetched: 1 row(s)

And that’s to be expected; as I said earlier, HBase is aimed at low-latency single-row operations rather than full table scan, aggregation-type queries, so it’s not unexpected that HBase performs badly here, but the response time is even worse if I try and join the HBase-stored Hive fact table to one or more of the dimension tables also stored in HBase.

In our particular customer example though these HBase tables were only going to be loaded once-a-day, so what if we copy the current version of each HBase table row into a snapshot Hive table stored in regular HDFS storage, so that our data loading process looks like this:


and then OBIEE queries the snapshot of the Hive-on-HBase table joined to the dimension table still stored in HBase, so that the query side looks like this:


Let’s try it out by taking the original Hive table I used earlier on to load the hbase_flight_delays table. and join that to one of the Hive-on-HBase dimension tables; I’ll start first by creating a baseline response time by joining that source Hive fact table to the source Hive dimension table (also used earlier to load the corresponding Hive-on-HBase table):

select sum(cast( as bigint)) as flight_count, o.origin_airport_name from flight_delays f 
join geog_origin o on f.orig = o.origin                                                             
and o.origin_state = 'California'                                                                       
group by o.origin_airport_name; 
17638Arcata/Eureka, CA: Arcata
9146Bakersfield, CA: Meadows Field
125433Burbank, CA: Bob Hope
1653Santa Maria, CA: Santa Maria Public/Capt. G. Allan Hancock Field
Time taken: 43.896 seconds, Fetched: 27 row(s)

So that’s just under 44 seconds to do the query entirely using regular Hive tables. So what if I swap-out the regular Hive dimension table for the Hive-on-HBase version, how does that affect the response time?

hive> select sum(cast( as bigint)) as flight_count, o.origin_airport_name from flight_delays f       
    > join hbase_geog_origin o on f.orig = o.key                                                        
    > and o.origin_state = 'California'                                                                 
    > group by o.origin_airport_name;
17638Arcata/Eureka, CA: Arcata
9146Bakersfield, CA: Meadows Field
125433Burbank, CA: Bob Hope
1653Santa Maria, CA: Santa Maria Public/Capt. G. Allan Hancock Field
Time taken: 51.757 seconds, Fetched: 27 row(s)

That’s interesting – even though we used the (updatable) Hive-on-HBase dimension table in the query, the response time only went up a few seconds to 51, compared to the 44 when we used just regular Hive tables. Taking it one step further though, what if we used Cloudera Impala as our query engine and copied the Hive-on-HBase fact table into a Parquet-stored Impala table, so that our inward data flow looked like this:


By using the Impala MPP engine – running on Hadoop but directly reading the underlying data files, rather than going through MapReduce as Hive does – and in-addition storing its data in column-store query-orientated Parquet storage, we can take advantage of OBIEE’s new support for Impala and potentially bring the query response time even further. Let’s go into the Impala Shell on the BigDataLite 4.1 VM, update Impala’s view of the Hive Metastore table data dictionary, and then create the corresponding Impala snapshot fact table using a CREATE TABLE … AS SELECT Impala SQL command:

[oracle@bigdatalite ~]$ impala-shell
[bigdatalite.localdomain:21000] > invalidate metadata;
[bigdatalite.localdomain:21000] > create table impala_flight_delays
                                > stored as parquet
                                > as select * from hbase_flight_delays;

Now let’s use the Impala Shell to join the Impala version of the flight delays table with data stored in Parquet files, to the Hive-on-HBase dimension table created earlier within our Hive environment:

[bigdatalite.localdomain:21000] > select sum(cast( as bigint)) as flight_count, o.origin_airport_name from impala_flight_delays f
                                > join hbase_geog_origin o on f.orig = o.key
                                > and o.origin_state = 'California'  
                                > group by o.origin_airport_name;
Query: select sum(cast( as bigint)) as flight_count, o.origin_airport_name from impala_flight_delays f
join hbase_geog_origin o on f.orig = o.key
and o.origin_state = 'California'
group by o.origin_airport_name
| flight_count | origin_airport_name                                              |
| 31907        | Fresno, CA: Fresno Yosemite International                        |
| 125433       | Burbank, CA: Bob Hope                                            |
| 1653         | Santa Maria, CA: Santa Maria Public/Capt. G. Allan Hancock Field |
Fetched 27 row(s) in 2.16s

Blimey – 2.16 seconds, compared to the best time of 44 seconds we go earlier when we just used regular Hive tables, let alone join to the dimension table stored in HBase. Let’s crank-it-up a bit and join another dimension table in, filtering on both origin and destination values:

[bigdatalite.localdomain:21000] > select sum(cast( as bigint)) as flight_count, o.origin_airport_name from impala_flight_delays f
                                > join hbase_geog_origin o on f.orig = o.key
                                > join hbase_geog_dest d on f.dest = d.key
                                > and o.origin_state = 'California'  
                                > and d.dest_state = 'New York'
                                > group by o.origin_airport_name;
Query: select sum(cast( as bigint)) as flight_count, o.origin_airport_name from impala_flight_delays f
join hbase_geog_origin o on f.orig = o.key
join hbase_geog_dest d on f.dest = d.key
and o.origin_state = 'California'
and d.dest_state = 'New York'
group by o.origin_airport_name
| flight_count | origin_airport_name                                   |
| 947          | Sacramento, CA: Sacramento International              |
| 3880         | San Diego, CA: San Diego International                |
| 4030         | Burbank, CA: Bob Hope                                 |
| 41909        | San Francisco, CA: San Francisco International        |
| 3489         | Oakland, CA: Metropolitan Oakland International       |
| 937          | San Jose, CA: Norman Y. Mineta San Jose International |
| 41407        | Los Angeles, CA: Los Angeles International            |
| 794          | Ontario, CA: Ontario International                    |
| 4176         | Long Beach, CA: Long Beach Airport                    |
Fetched 9 row(s) in 1.48s

Even faster. So that’s what we’ll be going with as our initial approach for the data loading and querying; load data into HBase tables as planned at the start, taking advantage of HBase’s CRUD capabilities but bulk-loading and initially reading the data using Hive tables over the HBase ones; but then, before we make the data available for querying by OBIEE, we copy the current state of the HBase fact table into a Parquet-stored Impala table, using Impala’s ability to work with Hive tables and metadata and create joins across both Impala and Hive tables, even when one of the Hive tables uses HBase as its underlying storage.

Categories: BI & Warehousing