Feed aggregator

Developing a training course or university curriculum for Oracle Application Express (APEX)? Start here!

Joel Kallman - Thu, 2017-04-20 13:37
While education in Oracle Application Express (APEX) is offered as a part of many university and secondary school courses around the globe, in most cases, the educators took it upon themselves to develop their own custom curriculum.  To lessen the burden on educators, we have developed and made available for public download a full course curriculum for Oracle Application Express.

This courseware, developed by our product manager Chaitanya Koratamaddi over the past year, includes 16 distinct lessons, complete with PowerPoint presentations, hands-on-labs, and all necessary SQL scripts and application export files.  You can use all or a portion of these materials in your own curriculum.

This same courseware was provided to the Oracle Academy team, who now also offer an Oracle Application Express course.  The Oracle Academy course is offered in a hosted interactive form, complete with quizzes, and it also includes both educator and student curriculum.  There are many other benefits to joining Oracle Academy.  For more information about Oracle Academy, go here.

To access the publicly available Oracle Application Express curriculum, go to:

https://apex.oracle.com/education

How to make dashboard for Mimer database

Nilesh Jethwa - Thu, 2017-04-20 12:25

Mimer is a SQL based relational database management system (RDBMS.It provides small footprint, scalable and robust relational database solutions.

Are you using Mimer for your data marts or data-warehouse? If so, build your Free Mimer business intelligence dashboard software.

 A Mimer dashboard visually summarizes all the important metrics you have selected to track, to give you a quick-and- easy overview of where everything stands. With real-time Mimer SQL reporting reporting, it's a live view of exactly how your marketing campaign is performing.

  • Better Decision Making
  • Gain Competitive Advantage
  • Enhance Collaboration
  • Spotting potential problems
  • Merge with data from Excel Dashboards
  • Live SQL against database
  • No need for Data-warehouse or ETL
  • Leverage the speed and stability of your powerful database.

Read more at http://www.infocaptor.com/ice-database-connect-dashboard-to-mimer-sql 

Impdp

Tom Kyte - Thu, 2017-04-20 08:46
Hi team, I have imported the database schema from one db to another db with same schemas details and it is completed successfully but when i comparing the objects it is showing difference So, can you pls let me know in impdp some specific object...
Categories: DBA Blogs

SYS indexes

Tom Kyte - Thu, 2017-04-20 08:46
I have some indexes that starts with SYS I am not able to see them in user_indexes but SQL developer explorer shows them. Any idea?
Categories: DBA Blogs

plsql developer - delete memory record

Tom Kyte - Thu, 2017-04-20 08:46
can we delete a record from cursor memory with out effecting main table in plsql
Categories: DBA Blogs

Automatic Memory Management

Tom Kyte - Thu, 2017-04-20 08:46
Hi Tom, I am trying to understand For Automatic Memory Management in 11g.We have MEMORY_TARGET parameter. 1)Can you please explain how MEMORY_TARGET considers SGA_MAZ_SIZE if auto shared memory management is enabled. If i keep sga_targ...
Categories: DBA Blogs

Is it possible for a single session to fully utilize IO?

Tom Kyte - Thu, 2017-04-20 08:46
Dear Tom, We have an Oracle 12c database used mainly as a data warehouse for scientific data. The database has a few very large tables (>1TB) and they are usually accessed via a full table scan. The database is mostly idle having no more than one ...
Categories: DBA Blogs

Guide to PeopleSoft Logging and Auditing - Revised Whitepaper

After discussions at Collaborate2017 with several PeopleSoft architects we have revised our Guide to PeopleSoft Auditing. The key change is the recommendation NOT to use PeopleSoft’s native database auditing and to instead use Oracle Fine Grained Auditing (FGA). FGA comes free with the Enterprise Edition of the Oracle RDBMS and, not only is it easier to implement, FGA does not have the performance impact of PeopleSoft’s native auditing.

If you have questions, please contact us at info@integrigy.com

-Michael Miller, CISSP-ISSMP

References
 
 
Auditing, Oracle PeopleSoft
Categories: APPS Blogs, Security Blogs

Predicate evaluation order

Tom Kyte - Wed, 2017-04-19 14:26
I've known for a while that the order of predicate evaluation in a query is non deterministic, as posted by Tom here (https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:696647495510). I even wrote my own blog about it a while back (...
Categories: DBA Blogs

"accessible by" clause in 12.2

Tom Kyte - Wed, 2017-04-19 14:26
Team, Reading through the "accessible by" clause enhancements in 12.2 <u>http://docs.oracle.com/database/122/LNPLS/release-changes.htm#GUID-85A17D6B-4E7A-49D9-B5AC-B0D69390B449</u> <quote> Starting with Oracle Database 12c release 2 (12.2),...
Categories: DBA Blogs

Change font color in SQL*Plus

Tom Kyte - Wed, 2017-04-19 14:26
Hi Tom, I'm pretty sure that recently I stumbled upon an article on some Oracle blog, demonstrating how to change the color of fonts in SQL*plus. (No HTML etc., just the (server)output in SQL*Plus) Unfortunatly I didn't bookmark it, can't find ...
Categories: DBA Blogs

Huge Pages use for improvement of performance

Tom Kyte - Wed, 2017-04-19 14:26
hello, i would like to know the use of Huge pages in 12c for performance improvements. we got suggestion for implementing the Huge pages for performance improvements. could you advice on this?? thanks
Categories: DBA Blogs

SYS_REF_CURSOR AND REF CURSOR

Tom Kyte - Wed, 2017-04-19 14:26
Hi, When i learning about Cursor Variable (Ref Cursor) on One Website I found this Code. <code> CREATE OR REPLACE FUNCTION f RETURN SYS_REFCURSOR AS c SYS_REFCURSOR; BEGIN OPEN c FOR select * from dual; RETURN c; END; / set se...
Categories: DBA Blogs

import metadata with segment attributes deffered

Tom Kyte - Wed, 2017-04-19 14:26
Hi Tom, While doing metadata import of a schema using impdp its taking too much of space for creating empty partitions. Initialization parameter deferred_segment_creation is set to true still the partitions are taking space. Is there any way to im...
Categories: DBA Blogs

How does PIVOT work?

Tom Kyte - Wed, 2017-04-19 14:26
Hi, I tried to learn PIVOT concept to Pivot the Rows to Columns, but i could not understand how it works internally and how it identify which rows should get turn into Columns. So would you Explain How it Makes Rows as Columns.
Categories: DBA Blogs

Process to add temp tablesoaces in a Data Guard environment

Tom Kyte - Wed, 2017-04-19 14:26
We have a physical Data Guard environment. When checking the standby, a new temporary tablesoaces was not added to the standby (but is on the primary). I understand why it's not being automatically created on the standby. However, what's the best ...
Categories: DBA Blogs

How to create dashboard with Netezza database

Nilesh Jethwa - Wed, 2017-04-19 14:25

IBM Netezza is the part of IBM PureSystems – expert integrated systems with built in expertise, integration by design and a simplified user experience.

Are you using Netezza for your data marts or data-warehouse? If so, build your Free Netezza data dashboard software.

 Netezza dashboard visually summarizes all the important metrics you have selected to track, to give you a quick-and- easy overview of where everything stands. With real-time Netezza SQL reporting reporting, it's a live view of exactly how your marketing campaign is performing.

  • Better Decision Making
  • Gain Competitive Advantage
  • Enhance Collaboration
  • Spotting potential problems
  • Merge with data from Excel Dashboards
  • Live SQL against database
  • No need for Data-warehouse or ETL
  • Leverage the speed and stability of your powerful database.

Read more at http://www.infocaptor.com/ice-database-connect-dashboard-to-netezza-sql

Submitted two abstracts for Oracle OpenWorld 2017

Bobby Durrett's DBA Blog - Wed, 2017-04-19 14:07

I submitted two abstracts for Oracle OpenWorld 2017. I have two talks that I have thought of putting together:

  • Python for the Oracle DBA
  • Toastmasters for the Oracle DBA

I want to do these talks because they describe two things that I have spent time on and that have been valuable to me.

I have given several recent talks about Delphix. Kyle Hailey let me use his slot at Oaktable World in 2015 which was at the same time as Oracle OpenWorld 2015. Right after that I got to speak at Delphix Sync which was a Delphix user event. More recently I did a Delphix user panel webinar.

So, I’ve done a lot of Delphix lately and that is because I have done a lot with Delphix in my work. But, I have also done a lot with Python and Toastmasters so that is why I’m planning to put together presentations about these two topics.

I probably go to one conference every two years so I’m not a frequent speaker, but I have a list of conferences that I am thinking about submitting these two talks to, hoping to speak at one. These conferences are competitive and I’ve seen that better people than me have trouble getting speaking slots at them. But here is my rough idea of what I want to submit the talks to:

I’ve never gone to RMOUG but I think it is in Denver so that is a short flight and I have heard good things.

Also, we have our on local AZORA group in Phoenix. Recently we have had some really good ACE Director/Oak Table type speakers, but I think they might like to have some local speakers as well so we will see if that will work out.

If all else fails I can give the talks at work. I need to start working on the five speeches in my Toastmasters “Technical Presentations” manual which is part of the Advanced Communication Series. I haven’t even cracked the book open, so I don’t know if it applies but it seems likely that I can use these two talks for a couple of the speech projects.

Anyway, I’ve taken the first steps towards giving my Python and Toastmasters speeches. Time will tell when these will actually be presented, but I know the value that I have received from Python and Toastmasters and I’m happy to try to put this information out there for others.

Bobby

Categories: DBA Blogs

VirtualBox 5.1.20

Tim Hall - Wed, 2017-04-19 10:56

VirtualBox 5.1.20 has been released!

The downloads and changelog are in the usual places.

The installation on macOS Sierra and Oracle Linux 6 hosts went well and all seems to be working OK there.

I’m not 100% sure about the Windows 7 host though. The installation itself went fine, as did the installation of guest additions on a Linux VM running inside it. After a restart of the VM I attempted to do a yum update and the VM just seems to have died now. I can’t get more than a couple of commands out of it before it becomes unresponsive and has to be restarted. I think I’m going to have to remove and rebuild it. I don’t know if this was something I did, or something weird with VirtualBox.

That’s on my work PC and I’m not at work this week, so I was doing it over TeamViewer. I probably won’t have time to look at this again until next week. I’d be interested to know if anyone else has issues on Windows.

Happy upgrading.

Cheers

Tim…

VirtualBox 5.1.20 was first posted on April 19, 2017 at 4:56 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

SQL-on-Hadoop: Impala vs Drill

Rittman Mead Consulting - Wed, 2017-04-19 10:01
 Impala vs Drill

I recently wrote a blog post about Oracle's Analytic Views and how those can be used in order to provide a simple SQL interface to end users with data stored in a relational database. In today's post I'm expanding a little bit on my horizons by looking at how to effectively query data in Hadoop using SQL. The SQL-on-Hadoop interface is key for many organizations - it allows querying the Big Data world using existing tools (like OBIEE,Tableau, DVD) and skills (SQL).

Analytic Views, together with Oracle's Big Data SQL provide what we are looking for and have the benefit of unifying the data dictionary and the SQL dialect in use. It should be noted that Oracle Big Data SQL is licensed separately on top of the database and it's available for Exadata machines only.

Nowadays there is a multitude of open-source projects covering the SQL-on-Hadoop problem. In this post I'll look in detail at two of the most relevant: Cloudera Impala and Apache Drill. We'll see details of each technology, define the similarities, and spot the differences. Finally we'll show that Drill is most suited for exploration with tools like Oracle Data Visualization or Tableau while Impala fits in the explanation area with tools like OBIEE.

As we'll see later, both the tools are inspired by Dremel, a paper published by Google in 2010 that defines a scalable, interactive ad-hoc query system for the analysis of read-only nested data that is the base of Google's BigQuery. Dremel defines two aspects of big data analytics:

  • A columnar storage format representation for nested data
  • A query engine

The first point inspired Apache Parquet, the columnar storage format available in Hadoop. The second point provides the basis for both Impala and Drill.

Cloudera Impala

We started blogging about Impala a while ago, as soon as it was officially supported by OBIEE, testing it for reporting on top of big data Hadoop platforms. However, we never went into the details of the tool, which is the purpose of the current post.

Impala is an open source project inspired by Google's Dremel and one of the massively parallel processing (MPP) SQL engines running natively on Hadoop. And as per Cloudera definition is a tool that:

provides high-performance, low-latency SQL queries on data stored in popular Apache Hadoop file formats.

Two important bits to notice:

  • High performance and low latency SQL queries: Impala was created to overcome the slowness of Hive, which relied on MapReduce jobs to execute the queries. Impala uses its own set of daemons running on each of the datanodes saving time by:
    • Avoiding the MapReduce job startup latency
    • Compiling the query code for optimal performance
    • Streaming intermediate results in-memory while MapReduces always writing to disk
    • Starting the aggregation as soon as the first fragment starts returning results
    • Caching metadata definitions
    • Gathering tables and columns statistics
  • Data stored in popular Apache Hadoop file formats: Impala uses the Hive metastore database. Databases and tables are shared between both components. The list of supported file formats include Parquet, Avro, simple Text and SequenceFile amongst others. Choosing the right file format and the compression codec can have enormous impact on performance. Impala also supports, since CDH 5.8 / Impala 2.6, Amazon S3 filesystem for both writing and reading operations.

One of the performance improvements is related to "Streaming intermediate results": Impala works in memory as much as possible, writing on disk only if the data size is too big to fit in memory; as we'll see later this is called optimistic and pipelined query execution. This has immediate benefits compared to standard MapReduce jobs, which for reliability reasons always writes intermediate results to disk.
As per this Cloudera blog, the usage of Impala in combination with Parquet data format is able to achieve the performance benefits explained in the Dremel paper.

Impala Query Process

Impala runs a daemon, called impalad on each Datanode (a node storing data in the Hadoop cluster). The query can be submitted to any daemon in the cluster which will act as coordinator node for the query. Impala daemons are always connected to the statestore, which is a process keeping a central inventory of all available daemons and related health and pushes back the information to all daemons. A third component called catalog service checks for metadata changes driven by Impala SQL in order to invalidate related cache entries. Metadata are cached in Impala for performance reasons: accessing metadata from the cache is much faster than checking against the Hive metastore. The catalog service process is in charge of keeping Impala's metadata cache in sync with the Hive metastore.

Once the query is received, the coordinator verifies if the query is valid against the Hive metastore, then information about data location is retrieved from the Namenode (the node in charge of storing the list of blocks and related location in the datanodes), it fragments the query and distribute the fragments to other impalad daemons to execute the query. All the daemons read the needed data blocks, process the query, and stream partial result to the coordinator (avoiding the write to disk), which collects all the results and delivers it back to the requester. The result is returned as soon as it's available: certain SQL operations like aggregations or order by require all the input to be available before Impala can return the end result, while others, like a select of pre-existing columns without a order by can be returned with only partial results.

 Impala vs Drill

Apache Drill

Defining Apache Drill as SQL-on-Hadoop is limiting: also inspired by Google's Dremel is a distributed datasource agnostic query engine. The datasource agnostic part is very relevant: Drill is not closely coupled with Hadoop, in fact it can query a variety of sources like MongoDB, Azure Blob Storage, or Google Cloud Storage amongst others.

One of the most important features is that data can be queried schema-free: there is no need of defining the data structure or schema upfront - users can simply point the query to a file directory, MongoDB collection or Amazon S3 bucket and Drill will take care of the rest. For more details, check our overview of the tool. One of Apache Drill's objectives is cutting down the data modeling and transformation effort providing a zero-day analysis as explained in this MapR video.
 Impala vs Drill

Drill is designed for high performance on large datasets, with the following core components:

  • Distributed engine: Drill processes, called Drillbits, can be installed in many nodes and are the execution engine of the query. Nodes can be added/reduced manually to adjust the performances. Queries can be sent to any Drillbit in the cluster that will act as Foreman for the query.
  • Columnar execution: Drill is optimized for columnar storage (e.g. Parquet) and execution using the hierarchical and columnar in-memory data model.
  • Vectorization: Drill take advantage of the modern CPU's design - operating on record batches rather than iterating on single values.
  • Runtime compilation: Compiled code is faster than interpreted code and is generated ad-hoc for each query.
  • Optimistic and pipelined query execution: Drill assumes that none of the processes will fail and thus does all the pipeline operation in memory rather than writing to disk - writing on disk only when memory isn't sufficient.
Drill Query Process

Like Impala's impalad, Drill's main component is the Drillbit: a process running on each active Drill node that is capable of coordinating, planning, executing and distributing queries. Installing Drillbit on all of Hadoop's data nodes is not compulsory, however if done gives Drill the ability to achieve the data locality: execute the queries where the data resides without the need of moving it via network.

When a query is submitted against Drill, a client/application is sending a SQL statement to a Drillbit in the cluster (any Drillbit can be chosen), which will act as Foreman (coordinator in Impala terminology) that will parse the SQL and convert it into a logical plan composed by operators. The next step is the cost-based optimizer which, based on optimizations like rule/cost based, data locality and storage engine options, rearranges operations to generate the optimal physical plan. The Foreman then divides the physical plan in phases, called fragments, which are organised in a tree and executed in parallel against the data sources. The results are then sent back to the client/application. The following image taken from drill.apache.org explains the full process:

 Impala vs Drill

Similarities and Differences

As we saw above, Drill and Impala have a similar structure - both take advantage of always on daemons (faster compared to the start of a MapReduce job) and assume an optimistic query execution passing results in cache. The code compilation and the distributed engine are also common to both, which are optimized for columnar storage types like Parquet.

There are, however, several differences. Impala works only on top of the Hive metastore while Drill supports a larger variety of data sources and can link them together on the fly in the same query. For example, implicit schema-defined files like JSON and XML, which are not supported natively by Impala, can be read immediately by Drill.
Drill usually doesn't require a metadata definition done upfront, while for Impala, a view or external table has to be declared before querying. Following this point there is no concept of a central and persistent metastore, and there is no metadata repository to manage just for Drill. In OBIEE's world, both Impala and Drill are supported data sources. The same applies to Data Visualization Desktop.
 Impala vs Drill

The aim of this article isn't a performance-wise comparison since those depends on a huge amount of factors including data types, file format, configurations, and query types. A comparison dated back in 2015 can be found here. Please be aware that there are newer versions of the tools since this comparison, which bring a lot of changes and improvements for both projects in terms of performance.

Conclusion

Impala and Drill share a similar structure - both inspired by Google's Dremel - relying on always active daemons deployed on cluster nodes to provide the best query performances on top of Big Data data structures. So which one to choose and when?
As described, the capability of Apache Drill to query a raw data-source without requiring an upfront metadata definition makes the tool perfect for insights discovery on top of raw data. The capacity of joining data coming from one or more storage plugins in a unique query makes the mash-up of disparate data sources easy and immediate. Data science and prototyping before the design of a reporting schema are perfect use cases of Drill. However, as part of the discovery phase, a metadata definition layer is usually added on top of the data sources. This makes Impala a good candidate for reporting queries.
Summarizing, if all the data points are already modeled in the Hive metastore, then Impala is your perfect choice. If instead, you need a mashup with external sources, or need work directly with raw data formats (e.g. JSON), then Drill's auto-exploration and openness capabilities are what you're looking for.
Even though both tools are fully compatible with Oracle BIEE and Data Visualization (DV), due to Drill's data exploration nature, it could be considered more in line with DV use cases, while Impala is more suitable for standard reporting like OBIEE. The decision on tooling highly depends on the specific use case - source data types, file formats and configurations have deep impact on the agility of the business analytics process and query performance.

If you want to know more about Apache Drill, Impala and the use cases we have experienced, don't hesitate to contact us!

Categories: BI & Warehousing

Pages

Subscribe to Oracle FAQ aggregator