Feed aggregator

Collaborate 2016

Jim Marion - Tue, 2016-03-29 17:28

I just wanted to post a quick note with my schedule for Collaborate 2016. I will be presenting PeopleSoft Developer Tips and Techniques on April 13th at 9:15 AM in Reef C. My publisher assured me that I will have a box of books waiting for me at the conference and we plan to give them away at various sessions during the conference. Here are a couple of other sessions I recommend attending at Collaborate:

There is one session I want to specifically highlight: A Designers Intro to the Oracle JET Framework. PeopleTools 8.55 includes Oracle JET, Oracle's brand new open source JavaScript Extension. You may never directly interact with Oracle JET, but it is always good to understand the tools and frameworks used by PeopleTools. Oracle JET is based on common JavaScript libraries such as RequireJS and Knockout and PeopleTools includes these related open source libraries (note: I have written about using RequireJS with PeopleSoft in prior posts).

10 Cloverfield Lane

Tim Hall - Tue, 2016-03-29 17:21

I just got back from watching 10 Cloverfield Lane. The trailers don’t really give much away, but don’t watch them anyway, just in case. Also, don’t read any reviews with spoilers before seeing it. I think it will still work if you know the story, but why ruin it?

Wow! What a movie! Wow!

The story is great. The acting is great. John Goodman is great. I didn’t really know what the story was before going in. I had just seen the trailer embedded above.

I’m so glad I went to see this film! I don’t want to say anything more for fear of ruining it for someone…

Cheers

Tim…

10 Cloverfield Lane was first posted on March 29, 2016 at 11:21 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Why Now is the Time for the Oracle SCM Cloud

Linda Fishman Hoyle - Tue, 2016-03-29 16:03

A Guest Post by SVP Rick Jewell, Oracle Applications Development (pictured left)

Cloud is the solution of choice in CRM, HCM, and most recently, ERP. But what about supply chain? We know that more than 80 percent of all companies either are using, or plan to use, cloud applications for SCM―that percentage is even higher for supply chain leaders. And those percentage shares increase with every market survey.

We also know that innumerable positive cloud experiences have primed the pump for a broad-based transition of SCM to cloud. Those experiences have come from other functional areas, as well as from SCM cloud-leading areas such as Transportation Management.

But what has been missing is the lack of new, enterprise-class solutions for SCM, built for the cloud. Now all that has changed with Release 11―and our timing could not be better.

Broader, Better, Faster

Yet is it sufficient to take the same SCM solutions that were designed and developed decades ago, for a world that no longer exists, and just deploy them in the cloud? A lot of our competitors seem to think so, but we did not. Here’s why.

Modern supply chains have transformed dramatically, and continue to transform as the nature of markets, of products and services, and of trading party interactions constantly evolves. The rigidity, complexities, and integration challenges of a typical legacy SCM deployment make them a business inhibitor, not a business enabler.

To address these challenges we chose the hard road. That road included an all-new set of SCM capabilities, built from the ground up with architectural flexibility, a rich yet easy to use user experience, multi-enterprise support, and broad, cross-functional business flows. We built supply chain solutions for 21st century supply chains―all designed for and delivered in the cloud. And that value proposition only gets better when you consider all of the other leading cloud solutions from Oracle.

Transition to SCM Cloud

We also offer flexible paths to Oracle SCM Cloud. Every business has its own combination of business pressures and ability to adopt change; therefore, we designed the Oracle SCM Cloud to provide multiple ways to transition to the cloud. We encourage all of our customers and prospects to engage with Oracle to develop their own roadmaps―whether it’s innovation at the edge, shared-service deployment of a business function, wholesale replacement of a division, or incremental enterprise deployment. The options are there, supported by integrations and by implementation expertise.

For more information, go to cloud.oracle.com/scm.

7 Bonus Blog Posts to Usher in Oracle HCM World

Linda Fishman Hoyle - Tue, 2016-03-29 15:44
There will be legions of HCM professionals descending on Chicago, IL, next week for Oracle HCM World. They will come to hear from industry influencers, share best practices, and learn how to modernize their business practices. Transformation and digital disruption will be hot topics.

Here are seven articles published in the last few weeks to promote the content at the conference. They are meaty in and of themselves, so scan the titles and dig in.

Why Your People Are Leaving and Your Recruiters Are Busy: Research shows that people are leaving their jobs, not for more money, but for the opportunity to grow and advance. Therefore, it’s time to energize the activities around learning and development. This article, by Oracle’s Sunita Khatri, is a precursor to similar content in the How to Grow Your People track at the conference.

Are You Following These 3 Laws of Attraction to Get the People You Want?: Matt Charney of Recruiting Daily states that despite all the new and emerging techniques to find potential employees, businsses still struggle to maximize their acquisition efforts. As a forerunner to his session in the Finding People track at HCM World, he writes about three laws of attraction to find the right people.

HR Analytics Pitfalls to Avoid: There are so many advantages of business intelligence and data-based decision-making, but tapping into the right data sources and reporting that intelligence back to key decision-makers can pose challenges. This article is written by Oracle’s Lauren McKay. There is an entire track dedicated to workforce analytics at the conference.

No More Silos: How HR Can Take Its Seat at the Table: Matt Charney, executive director of Recruiting Daily, believes HR has a big branding problem. He suggests that HR professionals approach their processes, policies, and platforms with a marketing mentality to boost their department’s brand and become a business partner for the business. Charney will present a session entitled “Marketing for HR” at the conference.

From Hindrance to Hero: How HR Can Reclaim Its Rightful Role: Peter Cappelli is the Management professor at the Wharton School and keynote speaker at Oracle HCM World. In this article, Cappelli offers five actions that HR professionals can take to sustain their role as strategic partner.

3 Trends That May Change the Way You Pay and Reward People: In this article, Oracle’s Chad Atwell offers one recommendation—stay flexible—to those managing total rewards in an organization. There will be a Paying People in 2016 and Beyond track at the conference. Atwell will present with customers Ansell and Compass Minerals on how to best implement and utilize compensation technology.

How to Approach Your HR Transformation: Stacey Harris, Vice President of Research and Analytics for Sierra-Cedar, starts the conversation about choosing one of many paths for a successful HR transformation based on a company’s specific needs. Harris is a speaker at the conference.

EBS 12.1.3 Data Masking Template Certified with EM 12.1.0.5

Steven Chan - Tue, 2016-03-29 15:37

(Contributing author:  Nirzari Raichura)

We're pleased to announce the certification of the E-Business Suite 12.1.3 Data Masking Template for the Data Masking Pack with Enterprise Manager Cloud Control 12.1.0.5.

You can use the Oracle Data Masking Pack with Oracle Enterprise Manager Grid Control 12c to mask sensitive data in cloned E-Business Suite environments.  Due to data dependencies, scrambling E-Business Suite data is not a trivial task.  The data needs to be scrubbed in such a way that allows the application to continue to function. 

How data masking works

You may scramble data in E-Business Suite cloned environments with EM12c using the following template:

What does data masking do in E-Business Suite environments?

Application data masking does the following:

  • De-identify the data:  Scramble identifiers of individuals, also known as personally identifiable information or PII.  Examples include information such as name, account, address, location, and driver's license number.
  • Mask sensitive data:  Mask data that, if associated with personally identifiable information (PII), would cause privacy concerns.  Examples include compensation, health and employment information.  
  • Maintain data validity:  Provide a fully functional application.

Support Note:

Related Articles

Categories: APPS Blogs

Oracle WebCenter Content for Primavera

WebCenter Team - Tue, 2016-03-29 15:10

Author: Mitchell Palski – Oracle Fusion Middleware Specialist 

Having a content repository integrated with Unifier can turn cluttered, unstructured content into organized assets by making it easier to catalog, access, search, and reuse documentation. Oracle WebCenter can serve as an external content repository and has zero effect on the end-user’s experience interacting with Primavera.

When attaching a document or drawing to a project in Unifier, the interface does not present anything about the underlying content repository which stores the file.


With WebCenter Content (WCC) being used as a content repository for Unifier, the same mechanical HVAC drawing attached to the drawing package is stored in WCC as indicated below.

Within WCC, Unifier creates a unique folder structure to categorize and store documents. The actual file name is an encrypted file name generated when the file is stored in the repository.

Those documents can now be:

  • Indexed for search
  • Entered into external workflows
  • Categorized into categories for retention and disposition
  • Accessed via the WebCenter web browser UI, desktop tools, or mobile app
  • Integrated into external applications and tools via native APIs and web services 
The best part? WebCenter is easy to install and configure with Primavera!
  1. Install WebCenter
  2. Create six custom metadata fields for data that comes from Primavera
  3. Enter WebCenter’s connection information on the Primavera “Repository” tab
If you want to enterprise-grade document management and retention to supplement your instance of Primavera, start with Oracle.

Forgot EBS 12.2 : “Invalid credentials passed” issue while starting Admin Server

Online Apps DBA - Tue, 2016-03-29 11:43

 Architecture in Oracle EBS R12.2 changed and Oracle introduced WebLogic Server. This was covered by Atul in his previous post on changes in EBS12.2.    WebLogic Server contains Admin Server in a Domain  and you use script adadminsrvctl to start Admin Server in Oracle EBS Domain. You need username/password to start/boot Admin Server and this […]

The post Forgot EBS 12.2 : “Invalid credentials passed” issue while starting Admin Server appeared first on Oracle Trainings for Apps & Fusion DBA.

Categories: APPS Blogs

In Depth: MySQL 5.6+ DDL

Pythian Group - Tue, 2016-03-29 10:07
Overview

DDL (Data Definition Language) statements create, alter, and remove database objects. These types of changes can be a very dangerous action to take on such a critical piece of your infrastructure. You want to make sure that the command that you are executing has been given proper thought and testing.

In this post I go through multiple version of MySQL and verify the best course of action to take in regards to executing DDL statements.  There are many things that you have to consider when making these types of changes, such as disk space, load on the database server, slave replication, the type of DDL statement you are executing, and if it will lock the table. 

Because of these risks, there are tools that can be used to help mitigate some of the dangers. But unless you have tested and verified their functionality, these tools in themselves can cause trouble. Whenever in doubt, take the time to test and verify any changes that you will make. In my testing I will be using :

pt-online-schema-change in particular since it is a very popular tool and I have used it many times.  Also, the primary reason it was created was before MySQL offered online DDL changes. In some cases, depending on your environment, the best course of action may be removing the database server from being accessed, by failing over to a slave, or taking a cluster node offline.

I will be focusing on the most common DDL statements as I want to keep this post to a reasonable size. Many of the MySQL DDL statements by default are using the INPLACE algorithm where it is able, which is only available in MySQL 5.6 or later. In earlier versions 5.5 and 5.1 with the InnoDB plugin they had fast index creation but all other table alters were blocking. Online DDL with the INPLACE algorithm allows MySQL to make a copy of the table in the background, copy the data to this table, make your table alters, and then swap the tables, all without locking the table. Some DDL statements can be done instantaneously, such as dropping an index or renaming a column. When MySQL isn’t able to use the INPLACE algorithm it will have to revert to using the COPY algorithm which will in turn lock the table. An example of this is changing a column definition from VARCHAR to BLOB. Whenever you are doing an INPLACE alter you will want to specify the algorithm in your command. This will help protect you in the case that MySQL is unable to do an INPLACE alter. MySQL will return an error rather than running the command with the COPY algorithm.


ALTER TABLE employee_test ALGORITHM=INPLACE, CHANGE COLUMN first_name first_name BLOB NULL;
ERROR 1846 (0A000): ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY.

All of my testing was done without specifying the algorithm, allowing MySQL to determine the best algorithm to use.  If there are any DDL statements that you want more information on, please refer to the documentation for the release of MySQL that you are using, as I will not be going into foreign keys.

The Setup

All of my testing was done in virtual machines (VMs) on my laptop. I have a VM that will be running mysqlslap to perform remote DML statements such as SELECT, UPDATE, DEELTE and INSERT, causing load on the database server. This will allow me to see any potential table locks or performance impact. Here is the setup of the MySQL machine and it’s components. I have created the table shown below and imported 10 million rows. While mysqlslap was running I performed each of the DDL statements and watched that the DML statements were being executed with no table locks. I then recorded the time as they completed.

MySQL Server Stats
  • CPU : 4x CPUs at 2.6 GHz Intel Core i7
  • Memory allocated to VM : 2 Gig
  • Memory allocated to MySQL Innodb buffer pool: 1 Gig
  • Flash Storage
  • Table has 10 Million Rows.
  • DML (Data Manipulation Language) statements such as select, insert, update, and delete, that will be executed against the table during DDL statements
Table Structure
CREATE TABLE `employee_test` (
`emp_no` int(11) NOT NULL AUTO_INCREMENT,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
PRIMARY KEY (`emp_no`),
KEY `ix_lastname` (`last_name`),
KEY `ix_firstname` (`first_name`)
) ENGINE=InnoDB AUTO_INCREMENT=10968502 DEFAULT CHARSET=latin1
MySQL DDL Commands
CREATE INDEX ix_hire_date ON employee_test (hire_date); --CREATE INDEX
CREATE FULLTEXT INDEX ix_lastname_fulltext ON employee_test(last_name); --CREATE FULLTEXT INDEX
DROP INDEX ix_hire_date ON employee_test; --DROP INDEX
OPTIMIZE TABLE employee_test; --OPTIMIZE TABLE
ALTER TABLE employee_test ADD COLUMN test_column INT NULL; --ADD COLUMN
ALTER TABLE employee_test DROP COLUMN f_name; --DROP COLUMN
ALTER TABLE employee_test CHANGE first_name f_name varchar(14) NOT NULL; --RENAME COLUMN
ALTER TABLE employee_test MODIFY COLUMN emp_no BIGINT AUTO_INCREMENT NOT NULL; --CHANGE COLUMN TYPE
pt-online-schema-change DDL Commands
pt-online-schema-change --execute --alter 'ADD FULLTEXT INDEX ix_lastname_fulltext (last_name)' D=employees,t=employee_test
pt-online-schema-change --execute --alter 'ENGINE=INNODB' D=employees,t=employee_test
pt-online-schema-change --execute --alter 'ADD COLUMN test_column3 INT NULL' D=employees,t=employee_test
pt-online-schema-change --execute --alter 'MODIFY COLUMN gender BLOB NULL' D=employees,t=employee_test
Results

This matrix is a representation of the testing that I performed and how quickly the commands took to execute. Be careful with Fulltext indexes on your tables since they potentially can cause additional locking by creating the necessary infrastructure in the background. Please see MySQL Innodb Fulltext Indexes for more details. This requirement causes a great deal of locking on the table.

DDL Matrix

pt-online-schema-change

For the DDL statements that cause locking of the table we wanted to look at incorporating pt-online-schema-change, to help us overcome this obstacle.

pt-online-schema-change results

pt-online-schema-change allowed us to perform the operations that locked the table previously with no locking. pt-onilne-schema-change also has many other features such as helping with the impact on slave replication, and handling foreign keys. But it also has it’s limitation such as not being able to run it on a table that already has triggers, or complications with foreign keys. There are also impacts on your environment if it is not properly tested and verified. One such example is, every time that I ran pt-online-schema-change in my test it caused a deadlock causing mysqlslap to die and no longer perform and further statements.

mysqlslap: Cannot run query UPDATE employee_test SET first_name = ‘BigPurpleDog’ WHERE last_name = ‘SmallGreenCat’; ERROR : Deadlock found when trying to get lock; try restarting transaction

This is why it is very important to try and determine the impact if any that pt-online-schema-change may have on your environment before starting to use it. I did not encounter this behavior with any of the MySQL DDL statements that I ran.

Performance Impact

While performing the changes there were consistent increases in CPU load, disk I/O, and disk usage as the new tables were being created for the table alters. We have to remember that when certain DDL statements are being executed, a full copy of the table is being performed, so you will want to make sure you have enough disk space to complete the change.  This is why it is very important to take into consideration the size of the table you are altering and the load on the MySQL server while performing DDL statements. It is preferred that you run any of the DDL statements that cause table copies, off hours as to avoid any delays or outages to the application that is using the data.

Query Execution Impact

Query Execution Baseline

Server Performance Impact

MySQL Alter Load
MySQL Alter Load

Conclusion

As I have observed in performing these tests, there are many things to consider when performing DDL statements to avoid potential downfalls. Here is a summary of the recommendations to executing DDL statements or using pt-online-schema-change. Before considering any of this determine if the statement you are going to perform is going to copy a table, and if it does, make sure you have enough disk space.

Without Fulltext
With Fulltext

If you are going to make changes to your production servers, make sure that you run your DDL statements during off hours when the server is at it’s lowest utilization for both CPU and disk.

For an added safety measure when you are performing any of the MySQL DDL statements that you are expecting to be executed INPLACE and will not lock the table, make sure you specify ALGORITHM=INPLACE in your statement. If MySQL is unable to execute the command in place, it will just return an error, instead of executing the statement with the COPY algorithm which will lock the table. Here are samples of the DDL statements that you should be able run INPLACE and not cause any locking of your table.

ALTER TABLE employee_test ALGORITHM=INPLACE, ADD INDEX ix_hire_date (hire_date); --CREATE INDEX
ALTER TABLE employee_test ALGORITHM=INPLACE, DROP INDEX ix_firstname; --DROP INDEX
ALTER TABLE employee_test ALGORITHM=INPLACE, ENGINE=INNODB; --OPTIMIZE TABLE
ALTER TABLE employee_test ALGORITHM=INPLACE, ADD COLUMN test_column INT NULL; --ADD COLUMN
ALTER TABLE employee_test ALGORITHM=INPLACE, DROP COLUMN f_name; --DROP COLUMN
ALTER TABLE employee_test ALGORITHM=INPLACE, CHANGE first_name f_name varchar(14) NOT NULL; --RENAME COLUMN

 

 

References

 

 

Categories: DBA Blogs

DataStax OpsCenter upgrade (4.1 to 5.1) for Cassandra – issue and resolution

Pythian Group - Tue, 2016-03-29 09:42

For the Apache Cassandra cluster (version C* 1.2.16) that I’ve supported, the monitoring of the cluster is through DataStax OpsCenter, version 4.1.2. As part of the effort to improve the monitoring capability for this cluster, my team decided first to upgrade OpsCenter to version 5.1.4, the latest available version of OpsCenter that is compatible with Cassandra 1.2.16. The same OpsCenter is also used to monitor another cluster of DataStax Enterprise (DSE) 4.5.2 (it corresponds to Apache Cassandra version 2.0.10).

During the upgrade we ran into an issue, and  we couldn’t find a similar problem to this one on Google. We’d like to use this post to document the problems that we faced as well as the solutions and findings we found during the upgrade.

 

Problem Overview

The OpsCenter upgrade procedure is as straightforward as what is described in the DataStax OpsCenter document. After OpsCenter upgrade, the OpsCenter web portal detects mismatched version of datastax-agents on all nodes. Choose the “FixMe” option from the portal to upgrade datastax-agents to version 5.1.4 on all nodes being monitored. After the datastax-agent upgrade, we addressed some datastax-agent configuration issues in “address.yaml” file to reflect the changes brought by the new version of OpsCenter/datastax-agent.

After all this was done, we double checked the log files for OpsCenter and datastax-agent. The OpsCenter log file was mostly clear, the datastax-agent log for DSE 4.5.2 cluster was also clear, but the datastax-agent log for Cassandra 1.2.16 was NOT. The corresponding OpsCenter web portal was not able to display Cassandra metrics for C* 1.2.16 cluster.

On each of the datastax-agent log files in the C* 1.2.16 cluster, we saw a lot of repeating  errors like the ones below:

          ERROR [async-dispatch-3] 2016-02-19 12:57:52,750 There was an error when attempting to load stored rollups.
          com.datastax.driver.core.exceptions.InvalidQueryException: Undefined name key in where clause (‘key EQ ‘<… …>”)
          at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
          at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:291)
          at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
          at clojurewerkz.cassaforte.client$execute.invoke(client.clj:289)
          … …
          ERROR [cassandra-processor-1] 2016-02-19 13:00:02,295 Error when proccessing cassandra callcom.datastax.driver.core.exceptions.InvalidQueryException: Unknown identifier key

 

Problem Analysis and Solution

The fact that the error showed up in datastax-agent log file gave me a hint that the error might be related with datastax-agent failing to write collected metrics into OpsCenter tables. So as the first step of the analysis, I compared the schema of “OpsCenter” keyspace between the two clusters monitored. Below is the example of two OpsCenter table definition comparison between the two clusters.

C* 1.2.16 Cluster

DSE 4.5.3 Cluster

CREATE TABLE events (  “KEY” blob,  column1 blob,  value blob,  PRIMARY KEY (“KEY”, column1)CREATE TABLE events (  key text,  action bigint,  level bigint,  success boolean,  time bigint,  PRIMARY KEY ((key))CREATE TABLE events_timeline (  “KEY” blob,  column1 bigint,  value blob,  PRIMARY KEY (“KEY”, column1)CREATE TABLE events_timeline (  key text,  column1 bigint,  value blob,  PRIMARY KEY ((key), column1)

 

From this table, we can clearly see that the upgrade process of OpsCenter and datastax-agent to verion 5.1.4 somehow doesn’t migrate OpsCenter schema properly for C* 1.2.16 cluster. The theory for the error is that the upgraded datastax-agent in C* 1.2.16 cluster is trying to query or update Cassandra metrics from OpsCenter tables in a fashion that matches the OpsCenter schema as in the DSE 4.5.2 cluster. But the actual OpsCenter schema in C* 1.2.16 still has the old definition, thus causing the invalid query exception as presented in the log file.

Once the problem is clear, the solution is straightforward. The steps are summarized below:

In C* 1.2.16 cluster,

  1. Take a snapshot for OpsCenter keyspace on all nodes
  2. Stop DataStax agents on all nodes, so they won’t try to write metrics into OpsCenter tables.
  3. Use CQL to drop OpsCenter tables and re-create them, matching the OpsCenter schema for DSE 4.5.3 cluster. Make sure that all table properties are the same.
  4. Once OpsCenter schema is recreated. Start DataStax agents on all nodes in.
  5. Verify the agent log file that the error message is gone.
  6. Restart OpsCenter service.

 

After these steps, we double checked the log files for all datastax-agents, and for OpsCenter, and we can confirm that there were no errors. The OpsCenter web portal was also able to display the Cassandra metrics properly.

Categories: DBA Blogs

The NoSQL- Relational Comparison Series Continues- Relational SQL vs MongoDB Methods

Chris Foot - Tue, 2016-03-29 09:01

When we compare relational and NoSQL Systems, one of the critical analyses we have to perform is data access mechanisms.  As we’ll learn over the next few articles of this series on data access, the SQL language used by relational database management systems is much more feature rich and powerful than its NoSQL counterpart.  This statement isn’t intended to sway readers to relational systems, it is just the author’s evaluation of both systems’ access languages.  

How to Use GoldenGate Token with COLMAP?

VitalSoftTech - Tue, 2016-03-29 07:00
Tokens are used to capture and store the environment variable values in the header of the GoldenGate trail record file. The trail file header contains a lot of information about the physical environment that produced the trail file and trail file contents. We can use this information to map token data to a target column […]
Categories: DBA Blogs

Index Usage

Jonathan Lewis - Tue, 2016-03-29 04:53

There are some questions about Oracle that are like the mythical Hydra – you think you’ve killed it, but for every head you cut off another two grow. The claim that “the optimizer will switch between using an index and doing a tablescan when you access more than X% of the data” re-appeared on the OTN database forum a little while ago – it doesn’t really matter what the specific value of X was – and it’s a statement that needs to be refuted very firmly because it’s more likely to cause problems than it is to help anyone understand what’s going on.

At a very informal level we may have an intuitive feeling that for a “precise” query accessing a “small” amount of data an indexed access path should make sense while for a “big” query accessing a “large” amount of data we might expect to see a tablescan, but any attempt to give a meaning to “small” and “large” that is both general purpose and strictly quantified will be wrong: there are too many variables involved.

Just as a quick demonstration of how badly we can be misled by a simple numeric claim here’s a quick test I created on a newly installed instance of 11.2.0.4, which I happened to set up with a locally defined tablespace using uniform extents of 1MB using the default 8KB blocksize but with manual (freelist) space management:


rem
rem     Script:   index_usage_pct.sql
rem     Dated:    March 2016
rem     Author:   J P Lewis
rem

drop table t1;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4
)
select
        cast(rownum as number(8,0))                              id,
        cast(trunc(dbms_random.value(0,1e6)) as number(8,0))     n1,
        lpad(rownum,6,'0')              v1,
        rpad('x',10,'x')                small_vc,
        rpad('x',151,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
begin dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t1_i1 on t1(id);

spool index_usage_pct.lst

select  num_rows, blocks, avg_row_len, round(num_rows/blocks) rows_per_block
from    user_tables
where   table_name = 'T1'
;

set autotrace on explain
select count(v1) from t1 where id between 1 and 245000;
set autotrace off

spool off

I’ve created a table with 1 million rows; the rows are about 180 bytes long (you’ll see the sizes a few lines further down the page), so it’s not an unreasonable model for lots of tables in typical systems – if you want to experiment further you can adjust the rpad() in the padding column; and I’ve created an index on a sequentially  (rownum) generated column. My call to autotrace will produce a truthful execution plan for the query supplied – there’s no risk of unexpected type conversion and no problems from bind variable peeking. As you can easilky infer, my query will access 245,000 rows in the table of 1,000,000 – nearly a quarter of the table. Would you expect to see Oracle use the index ?

Here’s the output from the script on MY brand new database, instance, and tablespace:


  NUM_ROWS     BLOCKS AVG_ROW_LEN ROWS_PER_BLOCK
---------- ---------- ----------- --------------
   1000000      25642         180             39

1 row selected.


 COUNT(N1)
----------
    245000

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 269862921

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    10 |  6843   (1)| 00:01:23 |
|   1 |  SORT AGGREGATE              |       |     1 |    10 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |   245K|  2392K|  6843   (1)| 00:01:23 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |   245K|       |   553   (1)| 00:00:07 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID">=1 AND "ID"<=245000)


There are no tricks involved here, no cunning fiddling with data structures or parameters – this is just a simple, straightforward, test.

Of course the result is probably a little counter-intuitive; 24.5% of the data seems a lot for the optimizer to pick an index. There are many reasons for this, the first being that the data is very well clustered relative to the index – the index’s clustering_factor is the smallest it could be for a B-tree indexing every row in this table.

Another important feature, though, is that I haven’t done anything with the system statistics so the optimizer was using various default values which tell it that a multiblock read will be quite small (eight blocks) and a lot slower than a single block read (26 ms vs. 12 ms). One simple change that many people might have made during or shortly after installation (though it shouldn’t really be done in any modern version of Oracle) is to set the db_file_multiblock_read_count parameter to 128 – with just this change the optimizer would assume that a multiblock read really would be 128 blocks, but that it would now take 266 ms. That means the optimizer will assume that the read will be ten times slower than it was, but will read 32 times as much data – a fairly significant relative improvement thanks to which the access path for my initial query will switch to a full tablescan and won’t switch back to an index range scan until I reduce the range from 245,000 to something like 160,000.

I can go further, of course. With a few background queries running to exercise the database I executed the dbms_stats.gather_system_stats() procedure with the ‘start’ and ‘stop’ options to collect some figures about the hardware and expected workload. This gave me the following results,  interpreted from the sys.aux_stats$ table:


MBRC       :126
MREADTIM   :0.902
SREADTIM   :0.386
CPUSPEED   :976

With the optmizer using these figures to compare the relative speed and size of single and multiblock reads I had to reduce my selected range to roughly 51,000 before the optimizer would choose the index range scan.

I could go on to demonstrate the effects of the dbms_resource_manager.calibrate_io procedure and the effects of allowing different degrees of parallelism with different system stats, but I think I’ve probably made the point that there’s a lot of variation in the break point between index range scans and tablescans EVEN when you don’t change the data. With this very well ordered (perfect clustering_factor) data I’ve seen the break point vary between 51,000 rows and 245,000 rows (5% and 25%).

And finally …

Let’s just finish with a last (and probably the most important) variation:  changing the pattern in the data we want from perfectly clustered to extremely scattered. If you check the query that generated the data you’ll see that we can do this by creating an index on column n1 instead of column id, and changing the where clause in the test query to n1 between 1 and 4500 (which, in my case, returned slightly more that 4,500 rows thanks to a small amount of duplication generated by the call to dbms_random.value()). At slightly under 0.5% of the data (and with my most recent settings for the system statistics) the optimizer chose to use a tablescan.

Remember, there are many factors involved in the optimizer choosing between a tablescan and an index range scan and one of the most significant factors in the choice is the (apparent) clustering of the data so, if you haven’t come across it before, you should examine the “table_cached_blocks” option that appeared in 11.2.0.4 for the procedure dbms_stats.set_table_prefs() as this allows you to give the optimizer a better idea of how well your data really is clustered.

Addendum (April 2016)

Following on from the demonstration of how changes in parameters, patterns and statistics can make a difference in what we (or the optimizer) might consider a “small” amount of data and whether an indexed access path would be appropriate, it’s worth mentioning that the Exadata technologies of smart scans and hybrid columnar compression and Oracle’s latest technology of In-Memory Colum Store do not change the way you think about indexes – they only change the (unspecifiable) volume at which an index ceases to be the better option to use.

 


Deadpool

Tim Hall - Tue, 2016-03-29 04:11

Yesterday I went to a late showing of Deadpool. If you haven’t seen the trailers already, don’t watch them as they give too much away!

I’ve been wanting to see it for a while, but I can’t get my cinema mojo back. The last thing I went to see was the new Star Wars film. I used to really love going to the cinema, but these days I really don’t like it. Even when I enjoy the film, I find the process rather irritating.

The opening credits were funny and totally irreverent. They really set the mood for the whole film.

Probably the hardest thing about introducing a “new character” to the audience (let’s forget the X-Men Origins: Wolverine stuff, since this film has) is you have to go through the backstory, which is typically pretty boring. Deadpool also had to do the boring stuff, but at least it does it in a more interesting way. Switching between backstory and “current time” crazy action is probably the best way to get it done.

The Deadpool character is very different to every other super hero. He breaks the fourth wall (talks to the audience), which is especially interesting when the characters around him seem confused, like they don’t understand who he is talking to. Added to this he self-references, is edgier and funnier than most super heroes. Also, the content is significantly more adult than any of the X-Men films that came before it.

I guess the bit I liked the least was how they gave him his mutant powers. I preferred the idea they used in X-Men Origins: Wolverine. I thought the method they used in this film seemed kind-of lame.

Overall, I’m glad I went to see it. I just wish I hadn’t seen the trailers as they give away a lot of the good stuff, which I think would have been more epic and funny if I hadn’t already seen it.

Honourable mentions go out to the two female mutant characters (Negasonic Teenage Warhead and Angel Dust) who were awesome. I would have liked to see more of them in the film.

Cheers

Tim…

Deadpool was first posted on March 29, 2016 at 10:11 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

My Demo Application

Denes Kubicek - Tue, 2016-03-29 00:34
If you experience a problem accessing My Demo Application (old and new version) at apex.oracle.com, that means there is a reason for that. One of the users deleted all the applications in my workspace and installed some of his own. I don't think this was intended but it is still a lot of work and I am trying to geta all the important stuff back online. After that I will need to rethink the way of accessing the workspace. It happened for the second time within the last four months.

Categories: Development

Another SQL Profile to the rescue!

Bobby Durrett's DBA Blog - Mon, 2016-03-28 18:57

We have had problems with set of databases over the past few weeks. Our team does not support these databases, but my director asked me to help. These are 11.2.0.1 Windows 64 bit Oracle databases running on Windows 2008. The incident reports said that the systems stop working and that the main symptom was that the oracle.exe process uses all the CPU. They were bouncing the database server when they saw this behavior and it took about 30 minutes after the bounce for the CPU to go back down to normal. A Windows server colleague told me that at some point in the past a new version of virus software had apparently caused high CPU from the oracle.exe process.

At first I looked for some known bugs related to high CPU and virus checkers without much success. Then I got the idea of just checking for query performance. After all, a poorly performing query can eat up a lot of CPU. These Windows boxes only have 2 cores so it would not take many concurrently running high CPU queries to max it out. So, I got an AWR report covering the last hour of a recent incident. This was the top SQL:

Top SQL

The top query, sql id 27d8x8p6139y6, stood out as very inefficient and all CPU. It seemed clear to me from this listing that the 2 core box had a heavy load and a lot of waiting for CPU queuing. %IO was zero but %CPU was only 31%. Most likely the rest was CPU queue time.

I also looked at my sqlstat report to see which plans 27d8x8p6139y6 had used over time.

PLAN_HASH_VALUE END_INTERVAL_TIME     EXECUTIONS Elapsed ms
--------------- --------------------- ---------- -----------
     3067874494 07-MAR-16 09.00.50 PM        287  948.102286
     3067874494 07-MAR-16 10.00.03 PM        292  1021.68191
     3067874494 07-MAR-16 11.00.18 PM        244  1214.96161
     3067874494 08-MAR-16 12.00.32 AM        276  1306.16222
     3067874494 08-MAR-16 01.00.45 AM        183  1491.31307
      467860697 08-MAR-16 01.00.45 AM        125      .31948
      467860697 08-MAR-16 02.00.59 AM        285  .234073684
      467860697 08-MAR-16 03.00.12 AM        279  .214354839
      467860697 08-MAR-16 04.00.25 AM        246   .17147561
      467860697 08-MAR-16 05.00.39 AM         18        .192
     2868766721 13-MAR-16 06.00.55 PM         89    159259.9
     3067874494 13-MAR-16 06.00.55 PM          8  854.384125
     2868766721 13-MAR-16 07.00.50 PM         70  1331837.56

Plan 2868766721 seemed terrible but plan 467860697 seemed great.

Our group doesn’t support these databases so I am not going to dig into how the application gathers statistics, what indexes it uses, or how the vendor designed the application. But, it seems possible that forcing the good plan with a SQL Profile could resolve this issue without having any access to the application or understanding of its design.

But, before plunging headlong into the use of a SQL Profile I looked at the plan and the SQL text.  I have edited these to hide any proprietary details:

SELECT T.*
    FROM TAB_MYTABLE1 T,
         TAB_MYTABLELNG A,
         TAB_MYTABLE1 PIR_T,
         TAB_MYTABLELNG PIR_A
   WHERE     A.MYTABLELNG_ID = T.MYTABLELNG_ID
         AND A.ASSIGNED_TO = :B1
         AND A.ACTIVE_FL = 1
         AND T.COMPLETE_FL = 0
         AND T.SHORTED_FL = 0
         AND PIR_T.MYTABLE1_ID = T.PIR_MYTABLE1_ID
         AND ((PIR_A.FLOATING_PIR_FL = 1 
               AND PIR_T.COMPLETE_FL = 1)
              OR PIR_T.QTY_PICKED IS NOT NULL)
         AND PIR_A.MYTABLELNG_ID = PIR_T.MYTABLELNG_ID
         AND PIR_A.ASSIGNED_TO IS NULL
ORDER BY T.MYTABLE1_ID

The key thing I noticed is that there was only one bind variable. The innermost part of the good plan uses an index on the column that the query equates with the bind variable. The rest of the plan is a nice nested loops plan with range and unique index scans. I see plans in this format in OLTP queries where you are looking up small numbers of rows using an index and join to related tables.

-----------------------------------------------------------------
Id | Operation                        | Name                     
-----------------------------------------------------------------
 0 | SELECT STATEMENT                 |                          
 1 |  SORT ORDER BY                   |                          
 2 |   NESTED LOOPS                   |                          
 3 |    NESTED LOOPS                  |                          
 4 |     NESTED LOOPS                 |                          
 5 |      NESTED LOOPS                |                          
 6 |       TABLE ACCESS BY INDEX ROWID| TAB_MYTABLELNG           
 7 |        INDEX RANGE SCAN          | AK_MYTABLELNG_BY_USER    
 8 |       TABLE ACCESS BY INDEX ROWID| TAB_MYTABLE1             
 9 |        INDEX RANGE SCAN          | AK_MYTABLE1_BY_MYTABLELNG
10 |      TABLE ACCESS BY INDEX ROWID | TAB_MYTABLE1             
11 |       INDEX UNIQUE SCAN          | PK_MYTABLE1              
12 |     INDEX UNIQUE SCAN            | PK_MYTABLELNG            
13 |    TABLE ACCESS BY INDEX ROWID   | TAB_MYTABLELNG           
-----------------------------------------------------------------

The bad plan had a gross Cartesian merge join:

Plan hash value: 2868766721

----------------------------------------------------------------
Id | Operation                       | Name                     
----------------------------------------------------------------
 0 | SELECT STATEMENT                |                          
 1 |  NESTED LOOPS                   |                          
 2 |   NESTED LOOPS                  |                          
 3 |    MERGE JOIN CARTESIAN         |                          
 4 |     TABLE ACCESS BY INDEX ROWID | TAB_MYTABLE1             
 5 |      INDEX FULL SCAN            | PK_MYTABLE1              
 6 |     BUFFER SORT                 |                          
 7 |      TABLE ACCESS BY INDEX ROWID| TAB_MYTABLELNG           
 8 |       INDEX RANGE SCAN          | AK_MYTABLELNG_BY_USER    
 9 |    TABLE ACCESS BY INDEX ROWID  | TAB_MYTABLE1             
10 |     INDEX RANGE SCAN            | AK_MYTABLE1_BY_MYTABLELNG
11 |   TABLE ACCESS BY INDEX ROWID   | TAB_MYTABLELNG           
12 |    INDEX RANGE SCAN             | AK_MYTABLELNG_BY_USER    
----------------------------------------------------------------

Reviewing the SQL made me believe that there was a good chance that a SQL Profile forcing the good plan would resolve the issue. Sure, there could be some weird combination of data and bind variable values that make the bad plan the better one. But, given that this was a simple transactional application it seems most likely that the straightforward nested loops with index on the only bind variable plan would be best.

We used the SQL Profile to force these plans on four servers and so far the SQL Profile has resolved the issues. I’m not saying that forcing a plan using a SQL Profile is the only or even best way to resolve query performance issues. But, this was a good example of where a SQL Profile makes sense. If modifying the application, statistics, parameters, and schema is not possible then a SQL Profile can come to your rescue in a heartbeat.

Bobby

Categories: DBA Blogs

printing from apex using pl/sql

Pat Shuff - Mon, 2016-03-28 15:55
As part of our exercise to convert an excel spreadsheet to APEX, we ran into some simple logic that required more logic and less database select statements. The question comes up how to do an if - then - else statement in APEX. In this example we are trying to advise between using compute on demand or dedicated compute resources. With dedicated compute resources, we can purchase 50 OCPUs as an aggregate. With dedicated we can go as low as 1 OCPU. If we request 14 systems with 4 OCPUs it might be better to request 50 OCPUs.

A typical question form would look like the following image allowing us to ask processor shape as well as quantity. If the total mount exceeds, 50 processors, we output a message suggesting dedicated compute rather than compute on demand.

To get this message on the screen, we first had to pull in the questions that we ask using variables. In this example, we read in the UNMETERED_COMPUTE_SHAPE which is a pick list that allows you to select (1, 2, 4, 8, or 16) OCPU shapes. You can also type in a quantity number into UNMETERED_COMPUTE_QUANTITY. The product of these two values allows us to suggest dedicated or compute on demand for economic reasons.

To execute pl/sql commands, we have to change the content type. To create this area we first create a sub-region. We change the name of the sub-region to represent the question that we are trying to answer. For this example we use the title "Compute on Demand vs Dedicated Compute" as the sub-region header. We then change the type to "PL/SQL Dynamic Content". Under this we can then enter our dynamic code. The sub-region looks like

If you click on the expand button it opens a full editor allowing you to edit the code. In our example we are going to read the variables :UNMETERED_COMPUTE_SHAPE and :UNMETERED_COMPUTE_QUANTITY. Notice the colon in front of these names. This is how we treat the values as variables read from APEX. The code is very simple. It starts with a begin statement followed by an if statement. The if statements looks to see if we are allocating more than 50 processors. We then output a statement suggesting dedicated or compute on demand using the htp.p function call. This call prints what is passed to it to the screen. The code should look like .

Overall, this is a simple way of outputting code that requires control flow. In the previous example we used a select statement to output calculations. In this example we are outputting different sections and different recommendations based on our selections. We could also set variables that would expose or hide different sub-regions below this section. This is done by setting :OUTPUT_VARIABLE = desired_value. If we set the value inside the pl/sql code loop, we can hide or expose sections as we did in a previous blog by setting a value from a pull down menu.

The code used to output the recommendation is as follows

BEGIN
if (:UNMETERED_COMPUTE_SHAPE * :UNMETERED_COMPUTE_QUANTITY > 50) THEN
    htp.p('You might consider dedicated compute since you have ' 
     || :UNMETERED_COMPUTE_SHAPE * :UNMETERED_COMPUTE_QUANTITY 
     || ' OCPUs which is greater than the smallest dedicated compute of 50 OCPUs');
else
    htp.p('Compute on Demand for a total of ' 
     || :UNMETERED_COMPUTE_SHAPE * :UNMETERED_COMPUTE_QUANTITY || ' OCPUs');
end if;
END;

JRE 1.8.0.77 Certified with Oracle EBS 12.x

Steven Chan - Mon, 2016-03-28 13:46

Java logo

Java Runtime Environment 1.8.0_77 (a.k.a. JRE 8u77-b3) and later updates on the JRE 8 codeline are now certified with Oracle E-Business Suite 12.1 and 12.2 for Windows desktop clients.

This certification addresses Oracle Security Alert for CVE-2016-0636.

All JRE 6, 7, and 8 releases are certified with EBS upon release

Our standard policy is that all E-Business Suite customers can apply all JRE updates to end-user desktops:

  • From JRE 1.6.0_03 and later updates on the JRE 6 codeline
  • From JRE 1.7.0_10 and later updates on the JRE 7 codeline 
  • From JRE 1.8.0_25 and later updates on the JRE 8 codeline
We test all new JRE releases in parallel with the JRE development process, so all new JRE releases are considered certified with the E-Business Suite on the same day that they're released by our Java team. 

You do not need to wait for a certification announcement before applying new JRE 6, 7, or 8 releases to your EBS users' desktops.

32-bit and 64-bit versions certified

This certification includes both the 32-bit and 64-bit JRE versions for various Windows operating systems. See the respective Recommended Browser documentation for your EBS release for details.

Where are the official patch requirements documented?

All patches required for ensuring full compatibility of the E-Business Suite with JRE 8 are documented in these Notes:

For EBS 12.1 & 12.2

EBS + Discoverer 11g Users

This JRE release is certified for Discoverer 11g in E-Business Suite environments with the following minimum requirements:

Implications of Java 6 and 7 End of Public Updates for EBS Users

The Oracle Java SE Support Roadmap and Oracle Lifetime Support Policy for Oracle Fusion Middleware documents explain the dates and policies governing Oracle's Java Support.  The client-side Java technology (Java Runtime Environment / JRE) is now referred to as Java SE Deployment Technology in these documents.

Starting with Java 7, Extended Support is not available for Java SE Deployment Technology.  It is more important than ever for you to stay current with new JRE versions.

If you are currently running JRE 6 on your EBS desktops:

  • You can continue to do so until the end of Java SE 6 Deployment Technology Extended Support in June 2017
  • You can obtain JRE 6 updates from My Oracle Support.  See:

If you are currently running JRE 7 on your EBS desktops:

  • You can continue to do so until the end of Java SE 7 Deployment Technology Premier Support in July 2016
  • You can obtain JRE 7 updates from My Oracle Support.  See:

If you are currently running JRE 8 on your EBS desktops:

Will EBS users be forced to upgrade to JRE 8 for Windows desktop clients?

No.

This upgrade is highly recommended but remains optional while Java 6 and 7 are covered by Extended Support. Updates will be delivered via My Oracle Support, where you can continue to receive critical bug fixes and security fixes as well as general maintenance for JRE 6 and 7 desktop clients. Note that there are different impacts of enabling JRE Auto-Update depending on your current JRE release installed, despite the availability of ongoing support for JRE 6 and 7 for EBS customers; see the next section below.

Impact of enabling JRE Auto-Update

Java Auto-Update is a feature that keeps desktops up-to-date with the latest Java release.  The Java Auto-Update feature connects to java.com at a scheduled time and checks to see if there is an update available.

Enabling the JRE Auto-Update feature on desktops with JRE 6 installed will have no effect.

With the release of the January Critical patch Updates, the Java Auto-Update Mechanism will automatically update JRE 7 plug-ins to JRE 8.

Enabling the JRE Auto-Update feature on desktops with JRE 8 installed will apply JRE 8 updates.

Coexistence of multiple JRE releases Windows desktops

The upgrade to JRE 8 is recommended for EBS users, but some users may need to run older versions of JRE 6 or 7 on their Windows desktops for reasons unrelated to the E-Business Suite.

Most EBS configurations with IE and Firefox use non-static versioning by default. JRE 8 will be invoked instead of earlier JRE releases if both are installed on a Windows desktop. For more details, see "Appendix B: Static vs. Non-static Versioning and Set Up Options" in Notes 290807.1 and 393931.1.

What do Mac users need?

JRE 8 is certified for Mac OS X 10.8 (Mountain Lion), 10.9 (Mavericks), and 10.10 (Yosemite) desktops.  For details, see:

Will EBS users be forced to upgrade to JDK 8 for EBS application tier servers?

No.

JRE is used for desktop clients.  JDK is used for application tier servers.

JRE 8 desktop clients can connect to EBS environments running JDK 6 or 7.

JDK 8 is not certified with the E-Business Suite.  EBS customers should continue to run EBS servers on JDK 6 or 7.

Known Iusses

Internet Explorer Performance Issue

Launching JRE 1.8.0_73 through Internet Explorer will have a delay of around 20 seconds before the applet starts to load (Java Console will come up if enabled).

This issue fixed in JRE 1.8.0_74.  Internet Explorer users are recommended to uptake this version of JRE 8.

Form Focus Issue

Clicking outside the frame during forms launch may cause a loss of focus when running with JRE 8 and can occur in all Oracle E-Business Suite releases. To fix this issue, apply the following patch:

References

Related Articles
Categories: APPS Blogs

JRE 1.7.0_99 Certified with Oracle E-Business Suite 12.x

Steven Chan - Mon, 2016-03-28 13:40

Java logo

Java Runtime Environment 1.7.0_99 (a.k.a. JRE 7u99-b4) and later updates on the JRE 7 codeline are now certified with Oracle E-Business Suite Release 12.x for Windows-based desktop clients.

This certification addresses Oracle Security Alert for CVE-2016-0636.

All JRE 6, 7, and 8 releases are certified with EBS upon release

Our standard policy is that all E-Business Suite customers can apply all JRE updates to end-user desktops:

  • From JRE 1.6.0_03 and later updates on the JRE 6 codeline
  • From JRE 1.7.0_10 and later updates on the JRE 7 codeline 
  • From JRE 1.8.0_25 and later updates on the JRE 8 codeline
We test all new JRE releases in parallel with the JRE development process, so all new JRE releases are considered certified with the E-Business Suite on the same day that they're released by our Java team. 

You do not need to wait for a certification announcement before applying new JRE 6, 7, or 8 releases to your EBS users' desktops.

Effects of new support dates on Java upgrades for EBS environments

Support dates for the E-Business Suite and Java have changed.  Please review the sections below for more details:

  • What does this mean for Oracle E-Business Suite users?
  • Will EBS users be forced to upgrade to JRE 7 for Windows desktop clients?
  • Will EBS users be forced to upgrade to JDK 7 for EBS application tier servers?

32-bit and 64-bit versions certified

This certification includes both the 32-bit and 64-bit JRE versions for various Windows operating systems. See the respective Recommended Browser documentation for your EBS release for details.

Where are the official patch requirements documented?

EBS + Discoverer 11g Users

This JRE release is certified for Discoverer 11g in E-Business Suite environments with the following minimum requirements:

JRE 7 End of Public Updates

The JRE 7u79 release was the last JRE 7 update available to the general public.  Java is an integral part of the Oracle E-Business Suite technology stack, so EBS users will continue to receive Java SE 7 updates to the end of Java SE 7 Premier Support to the end of July 2016.

How can EBS customers obtain Java 7 updates after the public end-of-life?

EBS customers can download Java 7 patches from My Oracle Support.  For a complete list of all Java SE patch numbers, see:

Both JDK and JRE packages are now contained in a single combined download.  Download the "JDK" package for both the desktop client JRE and the server-side JDK package. 

Coexistence of multiple JRE releases Windows desktops

The upgrade to JRE 8 is recommended for EBS users, but some users may need to run older versions of JRE 6 or 7 on their Windows desktops for reasons unrelated to the E-Business Suite.

Most EBS configurations with IE and Firefox use non-static versioning by default. JRE 8 will be invoked instead of earlier JRE releases if both are installed on a Windows desktop. For more details, see "Appendix B: Static vs. Non-static Versioning and Set Up Options" in Notes 290807.1 and 393931.1.

Java Auto-Update Mechanism

With the release of the January 2015 Critical patch Updates, the Java Auto-Update Mechanism will automatically update JRE 7 plug-ins to JRE 8.

Coexistence of multiple JRE releases Windows desktops

The upgrade to JRE 8 is recommended for EBS users, but some users may need to run older versions of JRE 6 or 7 on their Windows desktops for reasons unrelated to the E-Business Suite.

Most EBS configurations with IE and Firefox use non-static versioning by default. JRE 8 will be invoked instead of earlier JRE releases if both are installed on a Windows desktop. For more details, see "Appendix B: Static vs. Non-static Versioning and Set Up Options" in Notes 290807.1 and 393931.1.

What do Mac users need?

Mac users running Mac OS X 10.7 (Lion), 10.8 (Mountain Lion), 10.9 (Mavericks), and 10.10 (Yosemite) can run JRE 7 or 8 plug-ins.  See:

Will EBS users be forced to upgrade to JDK 7 for EBS application tier servers?

JRE is used for desktop clients.  JDK is used for application tier servers

JDK upgrades for E-Business Suite application tier servers are highly recommended but currently remain optional while Java 6 is covered by Extended Support. Updates will be delivered via My Oracle Support, where you can continue to receive critical bug fixes and security fixes as well as general maintenance for JDK 6 for application tier servers. 

Java SE 6 is covered by Extended Support until June 2017.  All EBS customers with application tier servers on Windows, Solaris, and Linux must upgrade to JDK 7 by June 2017. EBS customers running their application tier servers on other operating systems should check with their respective vendors for the support dates for those platforms.

JDK 7 is certified with E-Business Suite 12.  See:

Known Issues

When using Internet Explorer, JRE 1.7.0_01 had a delay of around 20 seconds before the applet started to load. This issue is fixed in JRE 1.7.0_95.

References

Related Articles
Categories: APPS Blogs

Adrift in a Data Lake – an Oracle Developer’s first steps in Hadoop

The Anti-Kyte - Mon, 2016-03-28 11:49

We live in interesting times. As I write, the political life of a great nation is currently in thrall to a wealthy right-wing populist with a rather distinctive hairstyle.
But enough about Boris Johnson.

For someone used to the way things are done in Oracle, Hadoop can be something of a culture shock.
My initial introduction to Hadoop and it’s “vibrant ecosystem” was some internally written documentation.
As with most technical documentation, it was written with the assumption of a certain level of prior knowledge on the part of the reader. For this particular reader, it proved to be an erroneous assumption.

After a half an hour of struggling through this text , I was left wondering what you would use a Khafka Banana Oozie for.

Maybe it’s for killing zombie processes ? Let’s face it, with all that Java running on your system there are bound to be a few knocking around.
I’m a little hazy on my Zombie Lore, so I’m not entirely clear as to why a rapid application of fresh fruit would pose an existential threat to your average zombie. Maybe they’re potassium intolerant ?

There are a bewildering array of tools associated with Hadoop, many of which appear to provide almost identical functionality.
For example, a relational database traditionally requires only one SQL Engine…and I’ll be taking a look at two of them.

Moving from Oracle to Hadoop can feel rather like trading your car in for a box of Lego.
If the box does contain any instructions they seem to have been translated from Java into English…by someone who doesn’t speak either.
Note to reader : please substitute your own language as appropriate.

Fortunately, there are Hadoop distributions available which bundle the core tools required to get up an running. We’ll gloss over the fact that these distributions – Cloudera and Hortonworks – don’t contain the exact same tools.

In my search to find a coherent explanation of how to use Hadoop, I’ve found that the VM provided by Cloudera, together with the introductory tutorial, is a valuable aid to familiarising myself with the basics.

To start with then, I’ll cover getting hold of the Cloudera VM and setting it up in VirtualBox.
Then I’ll go through some of the tools available and what they do.
I’ll do this from the perspective of an Oracle developer (mainly because I don’t have much option) and will point out the driftwood of familiarity that you might be able to cling to in order to stay afloat in your Data Lake.

What I’ll cover is :

  • The core components of Hadoop
  • HDFS commands
  • Transferring data between a relational database and hadoop using SQOOP
  • Querying structured data using Hive and Impala
  • Uploading data using Kite

Ready to Dive in ?

Cloudera VM setup for Virtualbox

Before we head off to get the VM, there are a couple of points worth mentioning.

First of all, the VM is running on a 64-bit version of Centos 6.4. Therefore you need to make sure that your host operating system ( i.e. the one that you’ll be running Virtual Box on) is also 64-bit.

Secondly, the minimum recommended memory allocation for the VM is 4GB.

Finally, I’m using the Cloudera Quickstart 5.5 VM. The behaviour of some of the tools covered here differs in 5.4 and 5.5
The Cloudera VM can be downloaded from here .

You’ll need to fill in some details about yourself before moving on.

Once you’re at the downloads page, select VirtualBox from the Platform drop-down list.
VMs for other platforms are also available.

Once you hit the “Download Now” button you’ll be asked for some further details. However, these do not appear to be validated.

The download is quite chunky, just over 5GB.

Once it’s completed you should have a zip file called :

cloudera-quickstart-vm-5.5.0-0-virtualbox.zip

You can use a standard unzip utility to extract this file ( e.g. Winzip on Windows, plain old zip on Linux).
Once extracted, you’ll see a directory called cloudera-quickstart-vm-5.5.0-0-virtualbox which contains two files :

cloudera-quickstart-vm-5.5.0-0-virtualbox-disk1.vmdk
cloudera-quickstart-vm-5.5.0-0-virtualbox.ovf

The .ovf file is the VirtualBox image and the .vmdk is the VM’s hard-drive image.

To set up the VM in Virtualbox, you’ll need to import the .ovf file as an Appliance.
The steps to do this are the same as those for importing an Oracle Developer Day 12c Image.

When you first start the VM it does take a while to come up. When it does, you should see

cloudera_vm_start

Tweaking the VM settings

If you don’t happen to live in Boston and/or you don’t have a US keyboard, you may want to make a couple of configuration changes to the VM.

To change the Location ( and consequently the Time Zone) :

  1. Click on the Date and Time on the Status Bar. It’s in the top right-hand corner of the screen.
  2. Hit the Edit button next to Locations
  3. Add your location by typing it in the Location Name dialog box (a drop-down list will appear as you start typing).
  4. Now highlight Boston and click the Remove button. Your new location should now show at the bottom of the Time window
  5. Move your mouse over your location and an Edit button should appear. Click this and the Date and Time should now reflect your new location

set_location

You will be prompted for a password to confirm this change. You need to enter cloudera

To change the keyboard layout :

  1. Go to the System Menu and select Preferences and then Keyboard
  2. Navigate to the Layouts tab and click Add
  3. Select your preferred layout from the drop-down list
  4. Once your new layout is shown, click the Default radio button next to it

set_keyboard

These changes should take effect without the need to re-start the VM.

OK, so let’s get started….

The Core components of Hadoop

Hadoop itself comprises three main components :

  • HDFS – a distributed file system
  • The MapReduce framework
  • Yarn – a Job scheduler / resource manager for the parallel execution of MapReduce jobs
MapReduce

The MapReduce framework consists of a Mapper and a Reducer.
In SQL terms, the Mapper program applies a query predicate against the data you are querying – essentially, it does the filtering.
The Reducer then does any aggregation on the result set produced from the Mapper process.

Yarn

Yarn is the default job scheduler and resource manager. It facilitates parallel execution of MapReduce jobs.

HDFS

HDFS – Hadoop File System – is a distributed file system. The idea is that datafiles are replicated across multiple nodes (physical servers) in a cluster. Essentially, any program can run on any node. By replicating the data to each node, network latency is minimised for these programs.
It comes with it’s own set of commands which you can use interactively.
These appear to be largely a subset of those you’d find on a Linux OS.

The format of these commands is :

hadoop fs -command [some arguments]

However, the way that the VM is configured, we need to run these commands as the hdfs user.
Therefore, in the examples that follow the commands will follow the format..

sudo -u hdfs hadoop fs -command [some arguments]

To start with, let’s see what we’d need to do to upload a csv file to HDFS.

The file in question is called tools.csv, which I’ve created in the cloudera user’s home directory on the vm.
It contains the following :

tools.csv

To start with, let’s see what’s currently in hdfs :

sudo -u hdfs hadoop fs -ls /

Found 5 items
drwxrwxrwx   - hdfs  supergroup          0 2015-11-18 10:57 /benchmarks
drwxr-xr-x   - hbase supergroup          0 2016-03-23 12:08 /hbase
drwxrwxrwt   - hdfs  supergroup          0 2016-03-22 12:23 /tmp
drwxr-xr-x   - hdfs  supergroup          0 2015-11-18 11:01 /user
drwxr-xr-x   - hdfs  supergroup          0 2015-11-18 11:00 /var

This shows us the top-level directories. Let’s take a look at what’s in /user :

sudo -u hdfs hadoop fs -ls /user

Found 8 items
drwxr-xr-x   - cloudera cloudera            0 2015-11-18 10:56 /user/cloudera
drwxr-xr-x   - mapred   hadoop              0 2015-11-18 10:57 /user/history
drwxrwxrwx   - hive     supergroup          0 2015-11-18 11:01 /user/hive
drwxrwxrwx   - hue      supergroup          0 2015-11-18 10:58 /user/hue
drwxrwxrwx   - jenkins  supergroup          0 2015-11-18 10:58 /user/jenkins
drwxrwxrwx   - oozie    supergroup          0 2015-11-18 10:59 /user/oozie
drwxrwxrwx   - root     supergroup          0 2015-11-18 10:58 /user/root
drwxr-xr-x   - hdfs     supergroup          0 2015-11-18 11:01 /user/spark

For the purposes of this test, I’ll create a directory under the /user/cloudera directory, and then check that it’s been created as expected :

sudo -u hdfs hadoop fs -mkdir /user/cloudera/test
sudo -u hdfs hadoop fs -ls /user/cloudera
Found 1 items
drwxr-xr-x   - hdfs cloudera          0 2016-03-23 14:25 /user/cloudera/test

Notice that only the directory owner has write permissions on the directory.
As I’m feeling reckless, I want to grant write permissions to everyone.
This can be done as follows :

sudo -u hdfs hadoop fs -chmod a+w /user/cloudera/test
sudo -u hdfs hadoop fs -ls /user/cloudera
Found 1 items
drwxrwxrwx   - hdfs cloudera          0 2016-03-23 14:25 /user/cloudera/test

In HDFS, the chmod command seems to accept the same arguments as it’s Linux counterpart.

To check that we can now see the directory :

sudo -u hdfs hadoop fs -ls /home/cloudera/test

The simplest way to load our csv is to use the put command :

sudo -u hdfs hadoop fs -put tools.csv /user/cloudera/test/put_tools.csv
sudo -u hdfs hadoop fs -ls /user/cloudera/test
Found 1 items
-rw-r--r--   1 hdfs cloudera        301 2016-03-23 14:49 /user/cloudera/test/put_tools.csv

There is another way to do this :

sudo -u hdfs hadoop fs -copyFromLocal tools.csv /user/cloudera/test/tools.csv
sudo -u hdfs hadoop fs -ls /user/cloudera/test
Found 2 items
-rw-r--r--   1 hdfs cloudera        301 2016-03-23 14:49 /user/cloudera/test/put_tools.csv
-rw-r--r--   1 hdfs cloudera        301 2016-03-23 17:04 /user/cloudera/test/tools.csv

If we want to delete a file then :

sudo -u hdfs hadoop fs -rm /user/cloudera/test/put_tools.csv
16/03/23 17:06:51 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/cloudera/test/put_tools.csv
sudo -u hdfs hadoop fs -ls /user/cloudera/test
Found 1 items
-rw-r--r--   1 hdfs cloudera        301 2016-03-23 17:04 /user/cloudera/test/tools.csv

You can display the contents of a file in hdfs by using the cat command :

sudo -u hdfs hadoop fs -cat /user/cloudera/test/tools.csv
tool,description
hue,Web-based UI for Hadoop
sqoop,Transfer structured data between an RDBMS and Hadoop
flume,stream a file into Hadoop
impala,a query engine
hive,another query engine
spark,a query engine that is not hive or impala
khafka,a scheduler
banana,a web UI framework
oozie,another scheduler

In order to demonstrate copying a file from hdfs to the local filesystem in the VM, we’ll need to create a directory that the hdfs user has access to :

mkdir test
chmod a+rw test
cd test
ls -ld
drwxrwxrwx 2 cloudera cloudera 4096 Mar 23 17:13 .

Now, as the hdfs user, we can retrieve our file from hdfs onto the local file system using copyFromLocal :

sudo -u hdfs hadoop fs -copyToLocal /user/cloudera/test/tools.csv /home/cloudera/test/welcome_back.csv
ls -l welcome_back.csv
-rw-r--r-- 1 hdfs hdfs 301 Mar 23 17:18 welcome_back.csv

Another method of doing this is using get :

sudo -u hdfs hadoop fs -get /user/cloudera/test/tools.csv /home/cloudera/test/got_it.csv
ls -l got_it.csv
-rw-r--r-- 1 hdfs hdfs 301 Mar 23 17:21 got_it.csv

One final hdfs command that may come in useful is du, which shows the amount of space used by a directory or file on hdfs :

sudo -u hdfs hadoop fs -du /user/cloudera
301  301  /user/cloudera/test
SQOOP

The VM comes with a MySQL database from which data is loaded into Hadoop via SQOOP.
SQOOP is a tool for transferring structured data between an RDBMS and Hadoop.

The documentation does say that SQOOP is capable of loading data into Oracle using the command (from the Local File System) :

sqoop import --connect jdbc:oracle:thin:@//db_name --table table_name

However, said documentation says that it’s been tested with Oracle 10.2 Express Edition, so you may want to have a play around with it before using it in anger.

The tutorial directs us to use SQOOP to ingest all of the data from the MySQL database by running the following command :

sqoop import-all-tables \
    -m 1 \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username=retail_dba \
    --password=cloudera \
    --compression-codec=snappy \
    --as-parquetfile \
    --warehouse-dir=/user/hive/warehouse \
    --hive-import

There’s a fair amount going on here, we’re connecting to MySQL, then outputting the data as a compressed file onto hdfs in the /user/hive/warehouse directory.
The compression library being used is Snappy.

It’s instructive to see the output when we run this command as it shows both MapReduce and Yarn in action. You’ll probably see lines like :

...
16/03/23 17:36:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1458734644938_0001
16/03/23 17:36:08 INFO impl.YarnClientImpl: Submitted application application_1458734644938_0001
16/03/23 17:36:08 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1458734644938_0001/
16/03/23 17:36:08 INFO mapreduce.Job: Running job: job_1458734644938_0001
16/03/23 17:37:28 INFO mapreduce.Job: Job job_1458734644938_0001 running in uber mode : false
16/03/23 17:37:28 INFO mapreduce.Job:  map 0% reduce 0%
16/03/23 17:38:29 INFO mapreduce.Job:  map 100% reduce 0%
16/03/23 17:38:34 INFO mapreduce.Job: Job job_1458734644938_0001 completed successfully
...

After a fair amount of time, the command should end with :

...
16/03/23 17:53:53 INFO mapreduce.ImportJobBase: Transferred 46.1318 KB in 157.9222 seconds (299.1283 bytes/sec)
16/03/23 17:53:53 INFO mapreduce.ImportJobBase: Retrieved 1345 records.

If we now check, we can see that a directory has been created for each table :

hadoop fs -ls /user/hive/warehouse
Found 6 items
drwxrwxrwx   - cloudera supergroup          0 2016-03-23 17:38 /user/hive/warehouse/categories
drwxrwxrwx   - cloudera supergroup          0 2016-03-23 17:41 /user/hive/warehouse/customers
drwxrwxrwx   - cloudera supergroup          0 2016-03-23 17:44 /user/hive/warehouse/departments
drwxrwxrwx   - cloudera supergroup          0 2016-03-23 17:48 /user/hive/warehouse/order_items
drwxrwxrwx   - cloudera supergroup          0 2016-03-23 17:51 /user/hive/warehouse/orders
drwxrwxrwx   - cloudera supergroup          0 2016-03-23 17:53 /user/hive/warehouse/products

Digging further into the catogories directory and it’s children, we find that the table structure has also been transferred across :

hadoop fs -cat /user/hive/warehouse/categories/.metadata/schemas/1.avsc
{
  "type" : "record",
  "name" : "categories",
  "doc" : "Sqoop import of categories",
  "fields" : [ {
    "name" : "category_id",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "category_id",
    "sqlType" : "4"
  }, {
    "name" : "category_department_id",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "category_department_id",
    "sqlType" : "4"
  }, {
    "name" : "category_name",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "category_name",
    "sqlType" : "12"
  } ],
  "tableName" : "categories"
}

The data is stored in a .parquet file :

hadoop fs -ls /user/hive/warehouse/categories
Found 3 items
drwxr-xr-x   - cloudera supergroup          0 2016-03-23 17:35 /user/hive/warehouse/categories/.metadata
drwxr-xr-x   - cloudera supergroup          0 2016-03-23 17:38 /user/hive/warehouse/categories/.signals
-rw-r--r--   1 cloudera supergroup       1956 2016-03-23 17:38 /user/hive/warehouse/categories/a3db2e78-2861-4906-a769-e8035c03d7d2.parquet

There are a number of file formats you can use with Hadoop, each with their own pros and cons.
You can see a discussion of some of these formats here.

NOTE – the first time I ran this, I shutdown the VM after I got to here. When I started it again, Impala (see below) refused to see any databases. I had to trash and re-create the VM to get it to work. I’m not sure why this happened ( I did have a hunt around) but just thought I’d mention it in case you’re thinking of doing the same.

Anyway, now we have data in Hadoop, it would be good if we could interrogate it…

Not Quite SQL – Hive and Impala

Cloudera comes with a Web based UI for Hadoop in the form of Hue.
Note that Hortonworks seem to be standardazing on a different UI tool – Apache Ambari.

In the Cloudera VM, there is a link to Hue on the Bookmarks bar in Firefox.
Click on this link and then connect as cloudera (password cloudera).

The favoured query engine for interactive queries in Cloudera is called Impala.
I believe that, at the time of writing, Hortonworks are sticking with Hive.

As far as I can tell, it seems that Hive has been retained by Cloudera to handle what, in RDBMS terms, would be called the catalog.
In Oracle terms this would be the Data Dictionary.

This is essentially the metadata for the tables in the database.
This metadata seems to be read by all of the SQL Engines irrespective of where it is updated from.

Once you’re connected via Hue you can select either of these tools from the Query Editors drop-down.

Initially, the tutorial directs you to Impala.

The Data Dictionary

The first thing to note about Impala is that it doesn’t bother reading the table metadata unless you tell it to. Therefore, if you make any changes DML or DDL changes, you probably need to tell Impala to check the metadata for any query results to pick up these changes.
For example, we know that we’ve got six tables in our database which we’ve created via SQOOP. However, if you ask Impala about it :

show tables;

…you get the not entirely helpful :

The operation has no results

By contrast, if you try this in Hive (Query Editors/Hive), the tables are all present and correct.

To persuade Impala to see these changes you need to run :

invalidate metadata;

Note that you can also run this command for individual tables should the need arise, e.g. :

invalidate metadata categories;

Anyway, now we can see the tables in Impala, we can run some queries against them.
Whilst we’re at it, we can do a simple comparison between Impala and Hive in terms of how they process the same query.

Comparative performance

The query in question (taken from the Getting Started Tutorial provided with the VM) is :

-- Most popular product categories
select c.category_name, count(order_item_quantity) as count
from order_items oi
inner join products p on oi.order_item_product_id = p.product_id
inner join categories c on c.category_id = p.product_category_id
group by c.category_name
order by count desc
limit 10;

The syntax looks reassuringly familiar.

In Impala this ran in around 35 seconds.
I then ran the same query in Hive, which took about 7 minutes.

Once the query is run, Hue shows an Explain button. Click on this and you can see the execution plan for the query.

In Hive the plan looks like this :

STAGE DEPENDENCIES:
  Stage-9 is a root stage
  Stage-3 depends on stages: Stage-9
  Stage-4 depends on stages: Stage-3
  Stage-0 depends on stages: Stage-4

STAGE PLANS:
  Stage: Stage-9
    Map Reduce Local Work
      Alias -&amp;gt; Map Local Tables:
        c
          Fetch Operator
            limit: -1
        p
          Fetch Operator
            limit: -1
      Alias -&amp;gt; Map Local Operator Tree:
        c
          TableScan
            alias: c
            Statistics: Num rows: 24 Data size: 2550 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: category_id is not null (type: boolean)
              Statistics: Num rows: 12 Data size: 1275 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 _col10 (type: int)
                  1 category_id (type: int)
        p
          TableScan
            alias: p
            Statistics: Num rows: 5737 Data size: 45896 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (product_id is not null and product_category_id is not null) (type: boolean)
              Statistics: Num rows: 1435 Data size: 11479 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 order_item_product_id (type: int)
                  1 product_id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: oi
            Statistics: Num rows: 206028 Data size: 1648231 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: order_item_product_id is not null (type: boolean)
              Statistics: Num rows: 103014 Data size: 824115 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 order_item_product_id (type: int)
                  1 product_id (type: int)
                outputColumnNames: _col3, _col10
                Statistics: Num rows: 113315 Data size: 906526 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col10 (type: int)
                    1 category_id (type: int)
                  outputColumnNames: _col3, _col20
                  Statistics: Num rows: 124646 Data size: 997178 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: _col20 (type: string), _col3 (type: int)
                    outputColumnNames: _col20, _col3
                    Statistics: Num rows: 124646 Data size: 997178 Basic stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: count(_col3)
                      keys: _col20 (type: string)
                      mode: hash
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 124646 Data size: 997178 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 124646 Data size: 997178 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: bigint)
      Local Work:
        Map Reduce Local Work
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          keys: KEY._col0 (type: string)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 62323 Data size: 498589 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-4
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col1 (type: bigint)
              sort order: -
              Statistics: Num rows: 62323 Data size: 498589 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col0 (type: string)
      Reduce Operator Tree:
        Select Operator
          expressions: VALUE._col0 (type: string), KEY.reducesinkkey0 (type: bigint)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 62323 Data size: 498589 Basic stats: COMPLETE Column stats: NONE
          Limit
            Number of rows: 10
            Statistics: Num rows: 10 Data size: 80 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 10 Data size: 80 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: 10
      Processor Tree:
        ListSink

The Impala plan looks a bit different :

Estimated Per-Host Requirements: Memory=4.16GB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
default.categories, default.order_items, default.products

11:MERGING-EXCHANGE [UNPARTITIONED]
|  order by: count(order_item_quantity) DESC
|  limit: 10
|
06:TOP-N [LIMIT=10]
|  order by: count(order_item_quantity) DESC
|
10:AGGREGATE [FINALIZE]
|  output: count:merge(order_item_quantity)
|  group by: c.category_name
|
09:EXCHANGE [HASH(c.category_name)]
|
05:AGGREGATE
|  output: count(order_item_quantity)
|  group by: c.category_name
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: p.product_category_id = c.category_id
|
|--08:EXCHANGE [BROADCAST]
|  |
|  02:SCAN HDFS [default.categories c]
|     partitions=1/1 files=1 size=1.91KB
|
03:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: oi.order_item_product_id = p.product_id
|
|--07:EXCHANGE [BROADCAST]
|  |
|  01:SCAN HDFS [default.products p]
|     partitions=1/1 files=1 size=43.80KB
|
00:SCAN HDFS [default.order_items oi]
   partitions=1/1 files=1 size=1.57MB

As well as being somewhat shorter, the Impala plan appears more familiar to someone looking at an Oracle Query Plan.

To check for the possible effect of caching, I then ran these queries again.
For Impala, the runtime dropped to around 13 seconds.
For Hive, the runtime was the same as for the first run.

The Impala plan was unchanged, despite the fact that the runtime was drastically reduced.
From this behaviour I would infer that there is some caching effect for Impala, although a cache similar to the Buffer Cache in Oracle is not shared between the two Query Engines.

The explanation offered in the tutorial is that Hive compiles SQL queries into MapReduce jobs wheras Impala was designed as a SQL engine.
It would appear that Impala is better for small-scale interactive queries wheras Hive is more suited to large-scale ETL.

There’s a benchmarking comparison between Hive, Impala, and Spark that may be of interest here.

Optimizer Statistics

You’ll notice that, at the start of the Impala query plan, there’s a warning about missing statistics.

If you roll the mouse over one of the tables on the left-hand side of the Hue Query Editor Window, you’ll see an icon appear which enables you to explore the table’s metadata.

Looking at one of these tables we can see that there are no stats present :

no_stats

We can fix this easily enough in Impala by gathering stats for each of our tables. For example :

compute stats categories;

If we now check the metadata for the CATEGORIES table, we can see that stats are present :

with_stats

If we now re-run the original query, the plan will no longer show the warning. However, the rest of the plan remains unchanged.
Given that these are quite small tables, this is probably not surprising.

There’s more information about stats gathering in Impala here.

External Tables

One thing that Hive is good for apparently is creating External Tables.

In the tutorial, some unstructured data ( a log file) is loaded and then external tables created in Hive as using the following code :


CREATE EXTERNAL TABLE intermediate_access_logs (
    ip STRING,
    date STRING,
    method STRING,
    url STRING,
    http_version STRING,
    code1 STRING,
    code2 STRING,
    dash STRING,
    user_agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
    'input.regex' = '([^ ]*) - - \\[([^\\]]*)\\] "([^\ ]*) ([^\ ]*) ([^\ ]*)" (\\d*) (\\d*) "([^"]*)" "([^"]*)"',
    'output.format.string' = '%1$$s %2$$s %3$$s %4$$s %5$$s %6$$s %7$$s %8$$s %9$$s')
LOCATION '/user/hive/warehouse/original_access_logs';

CREATE EXTERNAL TABLE tokenized_access_logs (
    ip STRING,
    date STRING,
    method STRING,
    url STRING,
    http_version STRING,
    code1 STRING,
    code2 STRING,
    dash STRING,
    user_agent STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/hive/warehouse/tokenized_access_logs';

ADD JAR /usr/lib/hive/lib/hive-contrib.jar;

INSERT OVERWRITE TABLE tokenized_access_logs SELECT * FROM intermediate_access_logs;

Once again, it looks familiar for the most part.
One point of interest is the INSERT OVERWRITE command used to insert data into the table.
This has the effect of removing any pre-existing data in the table before inserting the new data.

String functions and other SQL Stuff

We can query the new external table in Impala :

invalidate metadata tokenized_access_logs;
select * from tokenized_access_logs;

The values in the url column contain %20 characters instead of spaces.
Let’s see what Impala can do in terms of the standard SQL string manipulation functions.

Unlike Oracle, there’s no REPLACE function, there is however a REGEXP_REPLACE…

select regexp_replace(url, '%20', ' ')
from tokenized_access_logs;

When we run this, we can see that the REGEXP_REPLACE has done the job :

regexp_replace

The logs we’re really interested in are where a specific product has been viewed.
If we can get the name of the product from the url, then maybe that will help when relating this data back to the most popular items in terms of sales…

select substr(regexp_replace( url, '%20', ' '), instr( regexp_replace(url, '%20', ' '), '\/product\/') + 9)
from tokenized_access_logs
where url like '%\/product\/%';

When we run this, we can see that the INSTR function also works as expected…to a point. Note that the ‘/’ characters need to be escaped with a ‘\’.
The result looks like this :

products_url

Now, there are a couple of records with the product name and “/add_to_cart” appended. With INSTR in Oracle, you can search for a pattern starting at the end of a string – e.g. :

select instr('/product/Pelican Sunstream 100 Kayak/add_to_cart', '/',-1,1)
from dual;

INSTR('/PRODUCT/PELICANSUNSTREAM100KAYAK/ADD_TO_CART','/',-1,1)
---------------------------------------------------------------
							     37

There is no such option in the Impala equivalent. It simply searches the string from the start and reports the first occurrence.
Fortunately, we want to strip out these results for the stuff we’re going to do in a minute. Therefore, I’ve just amended the query to be :

select substr(regexp_replace( url, '%20', ' '), instr( regexp_replace(url, '%20', ' '), '\/product\/') + 9)
from tokenized_access_logs
where url like '%\/product\/%'
and url not like '%\/add_to_cart%';

We can use fairly standard SQL to get a listing of the products by number of views :

select count(*), substr(regexp_replace( url, '%20', ' '), instr( regexp_replace(url, '%20', ' '), '\/product\/') + 9)
from tokenized_access_logs
where url like '%\/product\/%'
and url not like '%\/add_to_cart%'
group by url
order by 1 desc;

In the tutorial, it mentions that there is one product that has a lot of views but no sales. I wonder if we can find this just using SQL ?

with viewed_products as
(
select count(*) as times_viewed,
substr(regexp_replace( url, '%20', ' '),
       instr( regexp_replace(url, '%20', ' '), '\/product\/') + 9) as product_name
from tokenized_access_logs
where url like '%\/product\/%'
and url not like '%\/add_to_cart%'
group by url
)
select v.times_viewed, v.product_name
from viewed_products v
where upper(v.product_name) not in
(
  select upper(p.product_name)
  from products p
  inner join
  (
    select oi.order_item_product_id as product_id,
        count(oi.order_item_id) as units_sold
    from order_items oi
    inner join orders o
        on oi.order_item_order_id = o.order_id
    where o.order_status not in ('CANCELED', 'SUSPTECTED_FRAUD')
    group by order_item_product_id
  ) s
    on p.product_id = s.product_id
);

OK, it’s not the most elegant SQL I’ve ever written but it does demonstrate that :

  • You can define in-line views using the WITH clause
  • You can use a NOT IN subquery

…better than that, it gives the correct answer :

adidas Kids' RG III Mid Football Cleat

In the tutorial, the reason for the lack of sales is a coding error in the product page. You can ask your own snarky question about Unit Testing practices here.

Going back to the csv files we were playing around with on hdfs earlier, how would we get those into our database ?

Go fly a Kite

Yes, it’s another tool. This one is called Kite. It’s purpose, in this case at least, is to help us create a table based on the tools.csv file and allow us to then make use of it in Impala.

First of all, I’ve created the tools.csv file on the local file system :

l tools.csv
-rw-rw-r-- 1 cloudera cloudera 303 Mar 26 17:21 tools.csv

Now to get kite to create a “table” definition based on the csv :

kite-dataset csv-schema tools.csv --class Tools -o tools.avsc
ls -l tools.*
-rw-rw-r-- 1 cloudera cloudera 373 Mar 26 17:22 tools.avsc
-rw-rw-r-- 1 cloudera cloudera 303 Mar 26 17:21 tools.csv

If we have a look at the new avro file that kite has generated, it looks similar in terms of format to the one that SQOOP generated earlier :

{
  "type" : "record",
  "name" : "Tools",
  "doc" : "Schema generated by Kite",
  "fields" : [ {
    "name" : "tool",
    "type" : [ "null", "string" ],
    "doc" : "Type inferred from 'hue'",
    "default" : null
  }, {
    "name" : "description",
    "type" : [ "null", "string" ],
    "doc" : "Type inferred from 'Web-based UI for Hadoop'",
    "default" : null
  } ]
}

The next step is to create the table metadata ( in kite terminology this is a dataset)…

kite-dataset create tools -s tools.avsc

…and finally add the data itself to our table…

kite-dataset csv-import tools.csv tools
The url to track the job: http://localhost:8080/
Added 9 records to "tools"

To check in Impala, we can head over to Hue, select Impala from the Query Editors drop-down and run :

invalidate metadata tools;
select * from tools;

The result should look like this :

tools_query

Back on the command line, we can see that kite has added files to hdfs :

hadoop fs -ls /user/hive/warehouse/tools
Found 3 items
drwxr-xr-x   - cloudera supergroup          0 2016-03-26 17:32 /user/hive/warehouse/tools/.metadata
drwxr-xr-x   - cloudera supergroup          0 2016-03-26 17:35 /user/hive/warehouse/tools/.signals
-rw-r--r--   1 cloudera supergroup        622 2016-03-26 17:35 /user/hive/warehouse/tools/8baf8440-41b0-4f29-8750-16aeb3aec9b0.avro

The .avro file appears to have been compressed using the Snappy compression tool also used by SQOOP earlier.
You can however read the file by running :

hadoop fs -text /user/hive/warehouse/tools/8baf8440-41b0-4f29-8750-16aeb3aec9b0.avro
{"tool":{"string":"banana"},"description":{"string":"a web UI framework"}}
{"tool":{"string":"flume"},"description":{"string":"stream a file into Hadoop"}}
{"tool":{"string":"hive"},"description":{"string":"another query engine"}}
{"tool":{"string":"hue"},"description":{"string":"Web-based UI for Hadoop"}}
{"tool":{"string":"impala"},"description":{"string":"a query engine"}}
{"tool":{"string":"khafka"},"description":{"string":"a scheduler"}}
{"tool":{"string":"oozie"},"description":{"string":"another scheduler"}}
{"tool":{"string":"spark"},"description":{"string":"a query engine that is not hive or impala ?"}}
{"tool":{"string":"sqoop"},"description":{"string":"Transfer structured data between an RDBMS and Hadoop"}}
Summary

The Getting Started Tutorial goes on to cover various other tools available in the Cloudera distribution for doing data manipulation and analysis.
Additionally, you may find this presentation on Hadoop internals for Oracle Devs by Tanel Poder useful.

For now though, I have enough to keep my head above water.


Filed under: Oracle, SQL Tagged: Hadoop, hdfs, Hive, Impala, Kite, SQOOP

Customer Service Thinking

Floyd Teter - Mon, 2016-03-28 11:28
My definition of good customer service is pretty simple:  deliver what you promise.  Period.  You can over-deliver and still make me happy.  But other than that, it's pretty simple.  Tell me what you promise you'll do.  If I sign up for the service, then delivery that service:  on-time, at the price you promised, and make it as easy for me as you promised.  That's it.

Earlier this week, I encountered back-to-back customer service failures with my preferred airline.  The first fail took place when winter weather struck on the runway...waited in the plane on the runway for two hours because the airline failed to have the wing de-icers ready to go.  The second failure occurred when the plane failed a brake inspection prior to boarding; lost two hours and rebooked a later flight.

In both cases, the airline did quite well in providing details for the cause of the delays and expressed profound apologies.  But here is a tip for the airline:  that is not customer service.  You blew it when you failed to deliver your service on-time at the price you promised.  Providing details and status is about mitigating the damages from your failure to provide customer service...it's not good customer service.  In fact, the line defining a customer service failure has already been crossed.

One more customer experience failure this week; I went to my favorite hardware store to shop for a few tools and place a larger order for some building materials (more house remodeling).  The tool shopping went well, but it took two hours to place the order for materials.  The cause of the delay?  Not one of the five service representatives knew how to enter the order into the store's order entry system.  An utter failure of customer service due to a lack of effective training for the team that works with customers.

Now you may be thinking that I've just had a crummy week and I'm using this forum to vent.  On the contrary, it was a good week...because this experience got me to thinking.

You see, Software-as-a-Service is not just about software hosted on a public cloud.  It's about delivering a service.  Telling customers what you promise to do.  Then delivering on that promise.  Including provisioning, implementing, and support...all the activities and exchanges that go into the "Service".

We see too many service fails in the SaaS world.  All the time.  Every day.  Regardless of software vendor.  The industry is still working through the transition from thinking about providing software applications to providing a service...a much, much wider scope of responsibility to our customers.  We need to up the standard - quickly!

Thoughts?  Find the comments.

Pages

Subscribe to Oracle FAQ aggregator