Skip navigation.

Feed aggregator

Oracle things that piss me off (pt 1)

Gary Myers - 18 hours 11 min ago

This annoys me.The fact that Oracle thinks it is appropriate to sell me to 'Ask' whenever I update my Oracle JRE. 

On my home machines,I've ditched the Oracle route for JRE. Java runtime is a requirement for running Minecraft (now owned by Microsoft) and they've now incorporated keeping the JRE updated as part of their updates. No attempts to install some crappy piece of spyware on my machine. 
And it is at the stage where I trust Microsoft over Oracle any day of the week.


Using the ELK Stack to Analyse Donor’s Choose Data

Rittman Mead Consulting - Sat, 2015-04-25 13:35

Donor’s Choose is an online charity in America through which teachers can post details of projects that need funding and donors can give money towards them. The data from the charity since it began in 2000 is available to download freely here in several CSV datasets. In this article I’m going to show how to use the ELK stack of data discovery tools from Elastic to easily import some data (the donations dataset) and quickly start analysing it to produce results such as this one:

I’m assuming you’ve downloaded and unzipped Elasticsearch, Logstash and Kibana and made Java available if not already. I did this on a Mac, but the tools are cross-platform and should work just the same on Windows and Linux. I’d also recommend installing Kopf, which is an excellent plugin for the management of Elasticsearch.

CSV Data Ingest with Logstash

First off we’re going to get the data in to Elasticsearch using Logstash, after which we can do some analysis using Kibana.

To import the data with Logstash requires a configuration file which in this case is pretty straightforward. We’ll use the file input plugin, process it with the csv filter, set the date of the event to the donation timestamp (rather than now), cast a few fields to numeric, and then output it using the elasticsearch plugin. See inline comments for explanation of each step:

input {  
    file {  
        # This is necessary to ensure that the file is  
        # processed in full. Without it logstash will default  
        # to only processing new entries to the file (as would  
        # be seen with a logfile for a live application, but  
        # not static data like we're working with here)  
        start_position  => beginning  
        # This is the full path to the file to process.  
        # Wildcards are valid.  
        path =>  ["/hdd/ELK/data/opendata/opendata_donations.csv"]  
    }
}

filter {  
        # Process the input using the csv filter.  
        # The list of column names I took manually from the  
        # file itself  
        csv {separator => ","  
                columns => ["_donationid","_projectid","_donor_acctid","_cartid","donor_city","donor_state","donor_zip","is_teacher_acct","donation_timestamp","donation_to_project","donation_optional_support","donation_total","dollar_amount","donation_included_optional_support","payment_method","payment_included_acct_credit","payment_included_campaign_gift_card","payment_included_web_purchased_gift_card","payment_was_promo_matched","via_giving_page","for_honoree","donation_message"]}

        # Store the date of the donation (rather than now) as the  
        # event's timestamp  
        # 
        # Note that the data in the file uses formats both with and  
        # without the milliseconds, so both formats are supplied  
        # here.  
        # Additional formats can be specified using the Joda syntax  
        # (http://joda-time.sourceforge.net/api-release/org/joda/time/format/DateTimeFormat.html)  
        date { match => ["donation_timestamp", "yyyy-MM-dd HH:mm:ss.SSS", "yyyy-MM-dd HH:mm:ss"]}  
        # ------------
        # Cast the numeric fields to float (not mandatory but makes for additional analysis potential)
        mutate {
        convert => ["donation_optional_support","float"]
        convert => ["donation_to_project","float"]
        convert => ["donation_total","float"]
        }
}

output {  
        # Now send it to Elasticsearch which here is running  
        # on the same machine.  
        elasticsearch { host => "localhost" index => "opendata" index_type => "donations"}  
        }

With the configuration file created, we can now run the import:

./logstash-1.5.0.rc2/bin/logstash agent -f ./logstash-opendata-donations.conf

This will take a few minutes, during which your machine CPU will rocket as logstash processes all the records. Since logstash was originally designed for ingesting logfiles as they’re created it doesn’t actually exit after finishing processing the file, but you’ll notice your machine’s CPU return to normal, at which point you can hit Ctrl-C to kill logstash.

If you’ve installed Kopf then you can see at a glance how much data has been loaded:

Or alternatively query the index using Elasticsearch’s API directly:

curl -XGET 'http://localhost:9200/opendata/_status?pretty=true'

[...]  
    "opendata" : {  
      "index" : {  
        "primary_size_in_bytes" : 3679712363,  
      },  
[...]  
      "docs" : {  
        "num_docs" : 2608803,

Note that Elasticsearch will take more space than the source data (in total the 1.2Gb dataset ends up taking c.5Gb)

Data Exploration with Kibana

Now we can go to Kibana and start to analyse the data. From the Settings page of Kibana add the opendata index that we’ve just created:

Go to Discover and if necessary click the cog icon in the top right to set the index to opendata. The time filter defaults to the last 15 minutes only, and if your logstash has done its job right the events should have the timestamp of the actual donation, so you need to click on the time filter in the very top right of the screen to change time period to, for example, Previous year. Now you should see a bunch of data:

Click the toggle on one of the events to see the full data for it, including things like the donation amount, the message with the donation, and geographical details of the donor. You can find details of all the fields on the Donor’s Choose website here.

Click on the fields on the left to see a summary of the data within, showing very easily that within that time frame and sample of 500 records:

  • two thirds of donations were in the 10-100 dollar range
  • four-fifths included the optional donation towards the running costs of Donor’s Choose.

You can add fields into the table itself (which by default just shows the complete row of data) by clicking on add for the fields you want:

Let’s save this view (known as a “Search”), since it can be used on a Dashboard later:

Data Visualisation with Kibana

One of my favourite features of Kibana is its ability to aggregate data at various dimensions and grains with ridiculous ease. Here’s an example: (click to open full size)

Now let’s amend that chart to show the method of donation, or the donation amount range, or both: (click to open full size)

You can also change the aggregation from the default “Count” (in this case, number of donations) to other aggregations including sum, median, min, max, etc. Here we can compare cheque (check) vs paypal as a payment method in terms of amount given:

Kibana Dashboards

Now let’s bring the visualisations together along with the data table we saw in the the Discover tab. Click on Dashboard, and then the + icon:

Select the visualisations that you’ve created, and then switch to the Searches tab and add in the one that you saved earlier. You’ve now got a data table showing all currently selected data, along with various summaries on it.

You can rearrange the dashboard by dragging each box around to suit. Once you’ve got the elements of the dashboard in place you can start to drill into your data further. To zoom in on a time period click and drag a selection over it, and to filter on a particular data item (for example, state in the “Top ten states” visualisation) click on it and accept the prompt at the top of the screen. You can also use the freetext search at the top of the screen (this is valid on the Discover and Visualize pages too) to search across the dataset, or within a given field.

Example Analysis

Let’s look at some actual data analyses now. One of the most simple is the amount given in donations over time, split by amount given to project and also as the optional support amount:

One of the nice things about Kibana is the ability to quickly change resolution in a graph’s time frame. By default a bar chart will use an “Auto” granularity on the time axis, updating as you zoom in and out so that you always see an appropriate level of aggregation. This can be overridden to show, for example, year-on-year changes:

You can also easily switch the layout of the chart, for example to show the percentage of the two aggregations relative to each other. So whilst the above chart shows the optional support amount increasing by the year, it’s actually remaining pretty much the same when taken as a percentage of the donations overall – which if you look into the definition of the field (“we encourage donors to dedicate 15% of each donation to support the work that we do.“) makes a lot of sense

Analysis based on text in the data is easy. You can use the Terms sub-aggregation, where here we can see the top five states in terms of donation amount, California consistently being the top of the table.

Since the Terms sub-aggregation shows the Top-x only, you can’t necessarily judge the importance of those values in relation to the rest of the data. To do this more specific analysis you can use the Filters sub-aggregation to use free-form searches to create buckets, such as here to look at how much those from NY and CA donated, vs all other states. The syntax is field:value to include it, and -field:value to negate it. You can string these expressions together using AND and OR.

A lot of the analysis generally sits well in the bar chart visualisation, but the line chart has a role to play too. Donations are grouped according to the value range (<10, between 10 and 100, > 100), and these plot out nicely when considering the number of donations made (rather than total value). Whilst the total donation in a time period is significant, so is the engagement with the donors hence the number of donations made is important to analyse:

As well as splitting lines and bars, you can split charts themselves, which works well when you want to start comparing multiple dimensions without cluttering up a single chart. Here’s the same chart as previously but split out with one line per instance. Arguably it’s clearer to understand, and the relative values of the three items can be better seen here than in the clutter of the previous chart:

Following on from this previous graph, I’m interested in the spike in mid-value ($10-$100) donations at the end of 2011. Let’s pull the graph onto a dashboard and dig into it a bit. I’ve saved the visualisation and brought it in with the saved Search (from the Discover page earlier) and an additional visualisation showing payment methods for the donations:

Now I can click and drag the time frame to isolate the data of interest and we see that the number of donations jumps eight-fold at this point:

Clicking on one of the data points drills into it, and we eventually see that the spike was attributable to the use of campaign gift cards, presumably issued with a value > $10 and < $100.

elkodvis0502

Limitations

The simplicity described in this article comes at a cost, or rather, has its limits. You may well notice fields in the input data such as “_projectid”, and if you wanted to relate a donation to a given project you’d need to go and look that project code up manually. There’s no (easy) way of doing this in Elasticsearch – whilst you can easily bring in all the project data too and search on projectid, you can’t display the two (project and donation) alongside each other (easily). That’s because Elasticsearch is a document store, not a relational database. There are some options discussed on the Elasticsearch blog for handling this, none of which to my mind are applicable to this kind of data discovery (but Elasticsearch is used in a variety of applications, not just as a data store for Kibana, so in others cases it is more relevant). Given that, and if you wanted to resolve this relationship, you’d have to go about it a different way, maybe using the linux join command to pre-process the files and denormalise them prior to ingest with logstash. At this point you reach the “right tool/right job” decision – ELK is great, but not for everything :-)

Reprocessing

If you need to reload the data (for example, when building this I reprocessed the file in order to define the numerics as such, rather than the default string), you need to :

  • Drop the Elasticsearch data:
    curl -XDELETE 'http://localhost:9200/opendata'
  • Remove the “sincedb” file that logstash uses to record where it last read from in a file (useful for tailing changing input files; not so for us with a static input file)
    rm ~/.sincedb*

    (better here would be to define a bespoke sincedb path in the file input parameters so we could delete a specific sincedb file without impacting other logstash processing that may be using sincedb in the same path)
  • Rerun the logstash as above

 

Categories: BI & Warehousing

Getting a File Listing from a Directory in PL/SQL

The Anti-Kyte - Sat, 2015-04-25 12:32

It’s General Election time here in the UK.
Rather than the traditional two-way fight to form a government, this time around we seem to have a reasonably broad range of choice.
In addition to red and blue, we also have purple and – depending on where you live in the country, multiple shades of yellow and green.
The net effect is to leave the political landscape looking not so much like a rainbow as a nasty bruise.

The message coming across from the politicians is that everything that’s wrong in this country is down to foreigners – Eastern Europeans…or English (once again, depending on your location).
Strangely, the people who’ve been running our economy and public services for the last several years tend not to get much of a mention.
Whatever we end up choosing, our ancient electoral system is not set up to cater for so many parties attracting a significant share of support.

The resulting wrangling to cobble together a Coalition Government will be hampered somewhat by our – equally ancient – constitution.

That’s largely because, since Magna Carta, no-one’s bothered to write it down.

In olden times, if you wanted to find out what files were in a directory from inside the database, you’re options were pretty undocumented as well.
Fortunately, times have changed…

What I’m going to cover here is how to use an External Table pre-process to retrieve a file listing from a directory from inside the database.
Whilst this technique will work on any platform, I’m going to focus on Linux in the examples that follow…

An Overview of the Solution

First of all we’re going to write a shell script to give us a file listing on Linux.
This will include all of the details supplied by ls -l listing, in a comma delimited format.
The directory we’re going to get a list of files for is the DATA_PUMP_DIR.
Then we’ll create an External Table to read the output of the script.
Here goes then….

Getting a useable file listing on Linux

The first thing we need here is a shell script for the external table to execute in it’s pre-processor directive.
The output needs to be in a format that can be easily loaded by the external table.

Let’s have a look then…

/u01/app/oracle/admin/XE/dpdump $ ls -l
total 322320
-rw-r--r-- 1 oracle dba     87816 Aug 22  2014 export.log
-rw-r--r-- 1 oracle dba         0 Apr 16 13:13 file_list_dummy.txt
-rw-r----- 1 oracle dba 327385088 Aug 22  2014 full220814.dmp
-rw-r--r-- 1 oracle dba      2889 Jan  9 15:29 hr_dev_imp.log
-rw-r----- 1 oracle dba    524288 Jan  9 15:16 hr_export.dmp
-rw-r--r-- 1 oracle dba      2171 Jan  9 15:16 hr_export.log
-rw-r----- 1 oracle dba   1560576 Feb 12 11:46 ossca_pre_050_export.dmp
-rw-r--r-- 1 oracle dba      3331 Feb 12 11:46 ossca_pre_050_export.log
-rw-r----- 1 oracle dba    462848 Mar  5 13:44 test_schemas.dmp
-rw-r--r-- 1 oracle dba      1759 Mar  5 13:44 test_schemas_exp.log

There are a few things to note from here.
Firstly, there are up to eight distinct columns in the output. You’ll notice that files over 6 months old are reported with a date and then a year. Newer files just have a date.

Secondly, the widths of the fields themselves vary in length between files, most notably the file sizes.
We need to tidy up this output a bit if we’re going to be able to load it easily into the External Table.

After some trial and error, it looks like the following will meet our needs :

ls -l --time-style=long-iso | awk 'BEGIN {OFS = ",";} {print $1, $2, $3, $4, $5, $6" "$7, $8}'
total,322320,,,, ,
-rw-r--r--,1,oracle,dba,87816,2014-08-22 13:30,export.log
-rw-r--r--,1,oracle,dba,0,2015-04-16 13:13,file_list_dummy.txt
-rw-r-----,1,oracle,dba,327385088,2014-08-22 13:30,full220814.dmp
-rw-r--r--,1,oracle,dba,2889,2015-01-09 15:29,hr_dev_imp.log
-rw-r-----,1,oracle,dba,524288,2015-01-09 15:16,hr_export.dmp
-rw-r--r--,1,oracle,dba,2171,2015-01-09 15:16,hr_export.log
-rw-r-----,1,oracle,dba,1560576,2015-02-12 11:46,ossca_pre_050_export.dmp
-rw-r--r--,1,oracle,dba,3331,2015-02-12 11:46,ossca_pre_050_export.log
-rw-r-----,1,oracle,dba,462848,2015-03-05 13:44,test_schemas.dmp
-rw-r--r--,1,oracle,dba,1759,2015-03-05 13:44,test_schemas_exp.log
The Shell Script

There are a few things we need to bear in mind for this script.
As it’s going to be called from the external table, we cannot assume that any environment variables have been set (even $PATH).
Therefore, the commands need to be called using their absolute path, rather than just their name.

Additionally, the script will be passed the fully qualified path and filename of the location of our external table as the first argument. This is going to be useful as we’ll need to direct the shell script to the appropriate directory to list.
We can strip the filename from this argument and use the directory by using basename…

If you want to find out what the paths for these commands are ( and they can differ between Linux Distros), then you can do the following :

which ls
/bin/ls
which awk
/usr/bin/awk
which basename
/usr/bin/basename

The final script looks like this :

#!/bin/sh
#
# Generate a comma separated ls -l for the directory in which the external table's
# location file resides
#
targetDir=`/usr/bin/dirname $1`
/bin/ls -l --time-style=long-iso $targetDir | /usr/bin/awk 'BEGIN {OFS = ",";} {print $1, $2, $3, $4, $5, $6" "$7, $8}'
exit 0

I’ve saved this file as list_files.sh

Now for…

The External Table

The output we’ll be reading starts with the Total number of blocks in the directory, so we’ll want to skip that.
As for the files themselves, each file record will have the following attributes :

  • File Permissions
  • Number of Hard Links
  • File Owner
  • Group of the file owner
  • File size in bytes
  • Date the file was last modified
  • The file name

As we’re using a pre-processor in our external table, the usual rules will apply.

The External Table needs to have a location file specified at runtime, even though it won’t actually be reading the contents of that file in this instance.

So, our table will look something like this :

create table list_files_xt
(
    permissions varchar2(15),
    hard_links number,
    file_owner varchar2(32),
    group_name varchar2(32),
    size_bytes number,
    last_modified date,
    file_name varchar2(255)
)
    organization external
    (
        type oracle_loader
        default directory data_pump_dir
        access parameters
        (
            records delimited by newline
            nologfile
            preprocessor pre_proc_dir : 'list_files.sh'            
            skip 1
            fields terminated by ','            
            ( 
                permissions,
                hard_links,
                file_owner,
                group_name,
                size_bytes,
                last_modified date 'YYYY-MM-DD HH24:MI',
                file_name
            )
        )
        location('list_files_dummy.txt')
    )
/
Deploying the Components

First, we need to put the script in place. To work out where, check the path of the directory object that we’re referencing in the External Table definition ( in this case, PRE_PROC_DIR).
Then we’ll need to create the external table’s dummy file in the directory specified as the Default Directory ( DATA_PUMP_DIR).

So, we need to know the OS location of these directories…

select directory_name, directory_path
from all_directories
where directory_name in ('PRE_PROC_DIR', 'DATA_PUMP_DIR')
/

DIRECTORY_NAME	DIRECTORY_PATH
--------------- ----------------------------------------
PRE_PROC_DIR	/u01/app/oracle/pre_proc
DATA_PUMP_DIR	/u01/app/oracle/admin/XE/dpdump/

Now to make sure that the shell script is in the right place and has the appropriate permissions :

sudo cp list_files.sh /u01/app/oracle/pre_proc/.
sudo chmod o+x /u01/app/oracle/pre_proc/list_files.sh

If we now check the permissions on the file, we can see that all users have execute :

ls -l /u01/app/oracle/pre_proc/list_files.sh
-rw-r--r-x 1 root root 281 Apr 23 13:02 /u01/app/oracle/pre_proc/list_files.sh

Next we need to create the dummy file for the External table. The oracle os user needs to have read and write permissions on this file.
The simplest way to achieve this is to create the file as the oracle os user :

sudo su oracle
touch /u01/app/oracle/admin/XE/dpdump/list_files_dummy.txt

Now we can see that we’ve got an empty file ….

ls -l /u01/app/oracle/admin/XE/dpdump/list_files_dummy.txt
-rw-r--r-- 1 oracle dba 0 Apr 23 13:05 /u01/app/oracle/admin/XE/dpdump/list_files_dummy.txt

Finally, we just need to login to the database and create the table using the script above.
Now let’s see what export files I’ve got :

select *
from list_files_xt
where file_name like '%.dmp'
order by last_modified
/

PERMISSION HARD_LINKS FILE_OWNER GROUP_NAME  SIZE_BYTES LAST_MODIFIED      FILE_NAME
---------- ---------- ---------- ----------- ---------- ------------------ --------------------------------------------------
-rw-r-----          1 oracle     dba          327385088 22-AUG-14          full220814.dmp
-rw-r-----          1 oracle     dba             524288 09-JAN-15          hr_export.dmp
-rw-r-----          1 oracle     dba            1560576 12-FEB-15          ossca_pre_050_export.dmp
-rw-r-----          1 oracle     dba             462848 05-MAR-15          test_schemas.dmp

NOTE – no minorities were persecuted in the writing of this post.


Filed under: Linux, Oracle, SQL Tagged: awk, basename, chmod, external table, external table preprocessor, ls, ls --time-style = long-iso, OFS, touch, which

RFM Analysis in Oracle BI Apps

Dylan's BI Notes - Fri, 2015-04-24 19:16
I wrote the article RFM Analysis around 7 years ago.  We recently posted a much more detailed explanations about how Oracle BI Apps implements this concept in the product. Customer RFM Analysis RFA related customer attributes are good examples of aggregated performance metrics as described in this design tip from the Kimball Group Design Tip #53: Dimension Embellishments […]
Categories: BI & Warehousing

Advanced Oracle Troubleshooting Guide – Part 12: control file reads causing enq: SQ – contention waits?

Tanel Poder - Fri, 2015-04-24 17:23

Vishal Desai systematically troubleshooted an interesting case where the initial symptoms of the problem showed a spike of enq: SQ – contention waits, but he dug deeper – and found the root cause to be quite different. He followed the blockers of waiting sessions manually to reach the root cause – and also used my @ash/ash_wait_chains.sql and @ash/event_hist.sql scripts to extract the same information more conveniently (note that he had modified the scripts to take AWR snap_ids as time range parameters instead of the usual date/timestamp):

Definitely worth a read if you’re into troubleshooting non-trivial performance problems :)

Related Posts

List listeners and services from the instance

Yann Neuhaus - Fri, 2015-04-24 11:35

Want to know all your listeners - including scan listeners, and the services it listens for? It is possible from the instance, with the - undocumented - view V$LISTENER_NETWORK which is there since 11.2

Parallel Execution -- 4 Parsing PX Queries

Hemant K Chitale - Fri, 2015-04-24 09:20
Unlike "regular" Serial Execution queries that undergo only 1 hard parse and multiple soft parses on repeated execution, Parallel Execution queries actually are hard parsed by each PX Server plus the co-ordinator at each execution.  [Correction, as noted by Yasin in his comment : Not hard parsed, but separately parsed by each PX Server]

Here's a quick demo.

First, I start with a Serial Execution query.

[oracle@localhost ~]$ sqlplus hemant/hemant

SQL*Plus: Release 11.2.0.2.0 Production on Fri Apr 24 22:53:55 2015

Copyright (c) 1982, 2010, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

HEMANT>set serveroutput off
HEMANT>alter table large_table noparallel;

Table altered.

HEMANT>select count(*) from large_table;

COUNT(*)
----------
4802944

HEMANT>select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5ys3vrapmbx6w, child number 0
-------------------------------------
select count(*) from large_table

Plan hash value: 3874713751

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 18894 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| LARGE_TABLE | 4802K| 18894 (1)| 00:03:47 |
--------------------------------------------------------------------------


14 rows selected.

HEMANT>select
2 executions, parse_calls, invalidations, sql_fulltext
3 from v$sqlstats
4 where sql_id = '5ys3vrapmbx6w';

EXECUTIONS PARSE_CALLS INVALIDATIONS SQL_FULLTEXT
---------- ----------- ------------- --------------------------------------------------------------------------------
1 1 0 select count(*) from large_table

HEMANT>
HEMANT>select count(*) from large_table;

COUNT(*)
----------
4802944

HEMANT>select
2 executions, parse_calls, invalidations, sql_fulltext
3 from v$sqlstats
4 where sql_id = '5ys3vrapmbx6w';

EXECUTIONS PARSE_CALLS INVALIDATIONS SQL_FULLTEXT
---------- ----------- ------------- --------------------------------------------------------------------------------
2 2 0 select count(*) from large_table

HEMANT>
HEMANT>select count(*) from large_table;

COUNT(*)
----------
4802944

HEMANT>select
2 executions, parse_calls, invalidations, sql_fulltext
3 from v$sqlstats
4 where sql_id = '5ys3vrapmbx6w';

EXECUTIONS PARSE_CALLS INVALIDATIONS SQL_FULLTEXT
---------- ----------- ------------- --------------------------------------------------------------------------------
3 3 0 select count(*) from large_table

HEMANT>
HEMANT>select count(*) from large_table;

COUNT(*)
----------
4802944

HEMANT>select
2 executions, parse_calls, invalidations, sql_fulltext
3 from v$sqlstats
4 where sql_id = '5ys3vrapmbx6w';

EXECUTIONS PARSE_CALLS INVALIDATIONS SQL_FULLTEXT
---------- ----------- ------------- --------------------------------------------------------------------------------
4 4 0 select count(*) from large_table

HEMANT>
HEMANT>select count(*) from large_table;

COUNT(*)
----------
4802944

HEMANT>select
2 executions, parse_calls, invalidations, sql_fulltext
3 from v$sqlstats
4 where sql_id = '5ys3vrapmbx6w';

EXECUTIONS PARSE_CALLS INVALIDATIONS SQL_FULLTEXT
---------- ----------- ------------- --------------------------------------------------------------------------------
5 5 0 select count(*) from large_table

HEMANT>


5 executions with no additional parse overheads.

Next, I run Parallel Execution.

[oracle@localhost ~]$ sqlplus hemant/hemant

SQL*Plus: Release 11.2.0.2.0 Production on Fri Apr 24 23:04:45 2015

Copyright (c) 1982, 2010, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

HEMANT>set serveroutput off
HEMANT>alter table large_table parallel 4;

Table altered.

HEMANT>select /*+ PARALLEL */ count(*) from large_table;

COUNT(*)
----------
4802944

HEMANT>select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 4wd97vn0ytfmc, child number 0
-------------------------------------
select /*+ PARALLEL */ count(*) from large_table

Plan hash value: 2085386270

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 1311 (100)| | | | |
| 1 | SORT AGGREGATE | | 1 | | | | | |
| 2 | PX COORDINATOR | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | | | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 4802K| 1311 (1)| 00:00:16 | Q1,00 | PCWC | |
|* 6 | TABLE ACCESS FULL| LARGE_TABLE | 4802K| 1311 (1)| 00:00:16 | Q1,00 | PCWP | |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
- automatic DOP: skipped because of IO calibrate statistics are missing


27 rows selected.

HEMANT>select
2 executions, parse_calls, invalidations, sql_fulltext
3 from v$sqlstats
4 where sql_id = '4wd97vn0ytfmc';

EXECUTIONS PARSE_CALLS INVALIDATIONS SQL_FULLTEXT
---------- ----------- ------------- --------------------------------------------------------------------------------
1 5 0 select /*+ PARALLEL */ count(*) from large_table

HEMANT>
HEMANT>select /*+ PARALLEL */ count(*) from large_table;

COUNT(*)
----------
4802944

HEMANT>select
2 executions, parse_calls, invalidations, sql_fulltext
3 from v$sqlstats
4 where sql_id = '4wd97vn0ytfmc';

EXECUTIONS PARSE_CALLS INVALIDATIONS SQL_FULLTEXT
---------- ----------- ------------- --------------------------------------------------------------------------------
2 10 0 select /*+ PARALLEL */ count(*) from large_table

HEMANT>
HEMANT>select /*+ PARALLEL */ count(*) from large_table;

COUNT(*)
----------
4802944

HEMANT>select
2 executions, parse_calls, invalidations, sql_fulltext
3 from v$sqlstats
4 where sql_id = '4wd97vn0ytfmc';

EXECUTIONS PARSE_CALLS INVALIDATIONS SQL_FULLTEXT
---------- ----------- ------------- --------------------------------------------------------------------------------
3 15 0 select /*+ PARALLEL */ count(*) from large_table

HEMANT>
HEMANT>select /*+ PARALLEL */ count(*) from large_table;

COUNT(*)
----------
4802944

HEMANT>select
2 executions, parse_calls, invalidations, sql_fulltext
3 from v$sqlstats
4 where sql_id = '4wd97vn0ytfmc';

EXECUTIONS PARSE_CALLS INVALIDATIONS SQL_FULLTEXT
---------- ----------- ------------- --------------------------------------------------------------------------------
4 20 0 select /*+ PARALLEL */ count(*) from large_table

HEMANT>
HEMANT>select /*+ PARALLEL */ count(*) from large_table;

COUNT(*)
----------
4802944

HEMANT>select
2 executions, parse_calls, invalidations, sql_fulltext
3 from v$sqlstats
4 where sql_id = '4wd97vn0ytfmc';

EXECUTIONS PARSE_CALLS INVALIDATIONS SQL_FULLTEXT
---------- ----------- ------------- --------------------------------------------------------------------------------
5 25 0 select /*+ PARALLEL */ count(*) from large_table

HEMANT>


Each of the 5 executions had parse overheads for each PX server.
Note : The 5 "PARSE_CALLS" per execution is a result of 4 PX servers.  You might see a different number in your tests.

.
.
.


Categories: DBA Blogs

Pillars of PowerShell: Profiling

Pythian Group - Fri, 2015-04-24 06:53
Introduction

This is the fourth blog post continuing the series on the Pillars of PowerShell. The previous post in the series are:

  1. Interacting
  2. Commanding
  3. Debugging
Profiles

This is something I mentioned in the second post and can be a great way to keep up with those one-liners you use most often in your work. A profile with PowerShell is like using start up scripts in an Active Directory environment. You can “pre-run” things on a domain computer at start up or when a user logs into the machine. In a PowerShell profile you can “pre-load” information, modules, custom functions, or any command you want to execute in the PowerShell console. There is a separate profile for the console and then for PowerShell ISE. Your profile is basically a PowerShell script saved into a specific location under your Documents folder. The path to this profile is actually kept within a system variable, most notably called, $PROFILE.

Output of the $PROFILE variable

Output of the $PROFILE variable

I am using a Windows Azure VM that I just built, so I have not created any profiles on this machine. The path is kept within this variable but that does not mean it actually exists. We will need to create this file and the easiest method to do this is to actually use a cmdlet, New-Item. You can use this cmdlet to create files or folders. You can execute this one-liner to generate the PowerShell script in the path shown above:

New-Item $PROFILE -ItemType File -Force
New-Item $PROFILE

New-Item $PROFILE

Now, from here you can use another cmdlet to open the file within the default editor set to open any “.ps1″ file on your machine, Invoke-Item. This might be Notepad or you can set it to be the PowerShell ISE as well. Just execute this cmdlet followed by the $PROFILE variable (e.g. Invoke-Item $PROFILE).

One of the things I picked up on when I started using my profile more often was you can actually format your console. More specifically, I like to shorten the “PS C:\Users\melton_admin” portion. If you start working in directories that are 3 or 4 folders deep this can take up a good portion of your prompt. I came across a function that I truthfully cannot find the original poster, so sorry for not attributing it.

function prompt
{
if($host.UI.RawUI.CursorPosition.Y -eq 0) { "<$pwd> `n`r" + "PS["+$host.UI.RawUI.CursorPosition.Y+"]> "} else { "PS["+$host.UI.RawUI.CursorPosition.Y+"]> "}
}

Any function you save in your profile that performs an action you can call anytime in the PowerShell console, once it is loaded. However if I want that action to take effect when it loads the profile I simply need to call the function at the end of the profile script. I just add these two lines and ensure they are always the last two lines of my profile, anything added will go between the function above and these two lines:

prompt;
clear;
Profile_format

I use the clear command (just like using cls in the DOS prompt) to just get rid of any output a function or command I have may cause; just starts me out on a fresh clean slate.

If you want to test your profile script you can force it to load into your current session by doing this: .$profile. That is enter “period $profile” and just hit enter. You will need to take note that since I use the clear command in my profile if any cmdlet or one-liner I add outputs an error you will not see it. So when I have issues like this I simply comment the line out of my profile. You can put comments into your script using the pound sign (#), putting that before a command will allow it to be ignored or not run.

Set-ExecutionPolicy

PowerShell is a security product by default, so in certain operating system environments when you try to run your profile script above you may have gotten an error like this:

ExecutionPolicyError

 

 

 

This means pretty much what it says, execution of scripts is disabled. To enable this you need to use the Set-ExecutionPolicy cmdlet with a few parameters. You can find the documentation for this if you want by looking at the “about_Execution_Policies” in PowerShell or follow the link in the error. The documentation will explain the various options and policies you can set. The command below will allow you to execute scripts in your console and let it load your profile scripts:

Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned
Summary

In this post I pointed out the following cmdlets and concepts:

  • New-Item
  • Invoke-Item
  • Commenting in your script
  • Set-ExecutionPolicy

These are fairly basic areas of PowerShell and putting each one into your favorite search engine should lead you to a plentiful list of reading material. This post by no means encompassed all the potential you can do with Profiles, but was simply meant to get you started, and hopefully excited about what can be done.

Categories: DBA Blogs

BI Forum 2015 Preview — OBIEE Regression Testing, and Data Discovery with the ELK stack

Rittman Mead Consulting - Fri, 2015-04-24 06:18

I’m pleased to be presenting at both of the Rittman Mead BI Forums this year; in Brighton it’ll be my fourth time, whilst Atlanta will be my first, and my first trip to the city too. I’ve heard great things about the food, and I’m sure the forum content is going to be awesome too (Ed: get your priorities right).

OBIEE Regression Testing

In Atlanta I’ll be talking about Smarter Regression testing for OBIEE. The topic of Regression Testing in OBIEE is one that is – at last – starting to gain some real momentum. One of the drivers of this is the recognition in the industry that a more Agile approach to delivering BI projects is important, and to do this you need to have a good way of rapidly testing changes made. The other driver that I see is OBIEE 12c and the Baseline Validation Tool that Oracle announced at Oracle OpenWorld last year. Understanding how OBIEE works, and therefore how changes made can be tested most effectively, is key to a successful and efficient testing process.

In this presentation I’ll be diving into the OBIEE stack and explaining where it can be tested and how. I’ll discuss the common approaches and the relative strengths of each.

If you’ve not registered for the Atlanta BI Forum then do so now as places are limited and selling out fast. It runs May 14–15 with an optional masterclass on Wednesday 13th May from Mark Rittman and Jordan Meyer.

Data Discovery with the ELK Stack

My second presentation is at the Brighton forum the week before Atlanta, and I’ll be talking about Data Discovery and Systems Diagnostics with the ELK stack. The ELK stack is a set of tools from a company called Elastic, comprising Elasticsearch, Logstash and Kibana (E – L – K!). Data Discovery is a crucial part of the life cycle of acquiring, understanding, and exploiting data (one could even say, leverage the data). Before you can operationalise your reporting, you need to understand what data you have, how it relates, and what insights it can give you. This idea of a “Discovery Lab” is one of the key components of the Information Management and Big Data Reference Architecture that Oracle and Rittman Mead produced last year:

ELK gives you great flexibility to ingest data with loose data structures and rapidly visualise and analyse it. I wrote about it last year with an example of analysing data from our blog and associated tweets with data originating in Hadoop, and more recently have been analysing twitter activity using it. The great power of Kibana (the “K” of ELK) is the ability to rapidly filter and aggregate data, as well as see a summary of values within a data field:

The second aspect of my presentation is still on data discovery, but “discovering data” within the logfiles of an application stack such as OBIEE. ELK is perfectly suited to in-depth diagnostics against dense volumes of log data that you simply could not handle within simple log viewers or Enterprise Manager, such as the individual HTTP requests and types of value passed within the interactions of a single user session:

By its nature of log streaming and full text search, ELK also lends itself well to near real time system monitoring dashboards reporting the status of systems including OBIEE and ODI, and I’ll be discussing this in more detail during my talk.

The Brighton BI Forum is on 7–8 May, with an optional masterclass on Wednesday 6th May from Mark Rittman and Jordan Meyer. If you’ve not registered for the Brighton BI Forum then do so now as places are very limited!

Don’t forget, we’re running a Data Visualisation Challenge at each of the forums, and if you need to convince your boss to let you go you can find a pre-written ‘justification’ letter here.

Categories: BI & Warehousing

Database landscape 2014 visualization

Marco Gralike - Fri, 2015-04-24 04:18
I saw this database landscape 2014 overview from “451 Research” with an very nice visualization…

Using the Oracle Developer Cloud Service for Git version management for JDeveloper/ADF apps

Shay Shmeltzer - Thu, 2015-04-23 10:35

The Oracle Developer Cloud Service (DevCS for short) provides a complete cloud-hosted development platform for your team. This makes it very easy to start adopting development best practices for your team, and adopt a more agile development approach.

If you haven't tried it yet, you should!

It's simple to get a free trial instance - just sign up for a trial of the Java cloud service (which, by the way, will take you through anOracle ADF based registration wizard) and an instance of the Developer cloud service will be provisioned for you as part of the trial. No need for any additional machines or installations on your side.

I'm going to write a couple of blogs about the various features that DevCS provides such as build and continuous integration, but let's start with the very basic feature we all should be using - source code management.

Every project needs to do version management - even if you are a single developer - and with DevCS there is no server and network setup required. Create a new DevCS project and 10 seconds later you have your git server accessible from any computer that has internet access.

The demo below is using JDeveloper 12.1.3 and the sample summit ADF application that you can get from OTN. 

In the demo I start from scratch and demo how to

  • create a new DevCS project
  • check code into the git repository
  • branch my code to work on fixes
  • submit the changes back
  • how to do code review by team members
  • merge fixes to the master branch

 

Go ahead try it out with your project and your team.

If you are new to git (which has quickly became the new standard for source management) - check out the Oracle A-Team blog entry that explains a good workflow for team work with git that you can adopt. 

Have any further questions about using the Developer Cloud Service? Ask them on the DevCS community page

Categories: Development

Oracle Priority Support Infogram for 23-APR-2015

Oracle Infogram - Thu, 2015-04-23 10:31

Time to Patch!
From the Oracle E-Business Suite Technology blog:
Critical Patch Update for April 2015 Now Available
RDBMS
From that JEFF SMITH: A Quick Hit on Database Auditing Support
CDBs with less options now supported in Oracle 12.1.0.2. from Update your Database – NOW!
PL/SQL
Optimizing the PL/SQL Challenge II: How to Figure Out the Root of the Problem, from All Things SQL.
BI
From the Oracle BI applications blog, Customer RFM Analysis.
Analytics
From Business Analytics - Proactive Support: New Whitepaper: OBIEE to Essbase Authentication Methods
Fusion
Managing Attachments Using Web Services, from Fusion Applications Developer Relations
WebLogic
The WebLogic Partner Community Newsletter April 2015, from WebLogic Partner Community EMEA.
SOA
Enable SOA Composer in SOA Suite, from SOA & BPM Partner Community Blog.
MAF
From Shay Shmeltzer's Weblog: Dynamically refresh a part of a page (PPR) in Oracle MAF
Ops Center
Disabling ASR for a specific asset, from the Ops Center blog.
OEM
April 2015 EM Recommended Patch List, from Enterprise Manager Best Practices.
EBS
From the Oracle E-Business Suite Support blog:
Webcast: Pick Release Move Order Related To OPM Production Batches
Whats New in the Procurement Approval Analyzer - Version 200.2
Webcast: Oracle Receivables Balance Forward Billing (BFB) Setup & Usage
SR Automation Explained
Are You Considering Item Web Services for Oracle Product Hub?
There's a New Tool in Oracle Payroll...it's the Payroll Dashboard!
Webcast: Enhancement Request to My Oracle Support Community
From the Oracle E-Business Suite Technology blog:
Quarterly EBS Upgrade Recommendations: April 2015 Edition
Best Practices for Testing EBS Endeca Applications
JRE 1.8.0_45 Certified with Oracle E-Business Suite
JRE 1.7.0_79 and 1.7.0_80 Certified with Oracle E-Business Suite

Java JRE 1.6.0_95 Certified with Oracle E-Business Suite

ALTER TABLE INMEMORY

Yann Neuhaus - Thu, 2015-04-23 06:52

In-Memory Column Store is amazing. It brings very good performance to full table scans. I't easy: just 'flip a switch' and you accelerate all reporting queries on your table, without thinking about what to index and how. But in this post, I would like to warn you about the consequences when you just flip that switch. The new full table scan plan will replace the old ones... even before the table is populated in memory...
I'm not sure that it is the expected behaviour. In my opinion the CBO should consider INMEMORY plans only once the population is done. But here is the exemple.

Test case

Here is the testcase. I have a table DEMO with bitmap indexes on its columns:

12:04:54 SQL> create table DEMO compress as
12:04:54   2  with M as (select substr(dbms_random.string('U',1),1,1) U from dual connect by 10>=level)
12:04:54   3  select M1.U U1, M2.U U2, M3.U U3, M4.U U4 from M M1,M M2, M M3, M M4, (select * from dual connect by 1000>=level)
12:04:54   4  /
Table created.

12:05:00 SQL> create bitmap index DEMO_U1 on DEMO(U1);
Index created.
12:05:01 SQL> create bitmap index DEMO_U2 on DEMO(U2);
Index created.
12:05:03 SQL> create bitmap index DEMO_U3 on DEMO(U3);
Index created.
12:05:04 SQL> create bitmap index DEMO_U4 on DEMO(U4);
Index created.
And my test query on those columns:
12:05:05 SQL> alter session set statistics_level=all;
Session altered.
12:05:05 SQL> select distinct * from DEMO where U1='A' and U2>'X' and U3 in ('A','E') and U4='B';
no rows selected
with its execution plan:
12:05:06 SQL> select * from table(dbms_xplan.display_cursor(format=>'iostats last'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID  64skw45ghn5a0, child number 0
-------------------------------------
select distinct * from DEMO where U1='A' and U2>'X' and U3 in ('A','E')
and U4='B'

Plan hash value: 3881032911

---------------------------------------------------------------------------------------
| Id  | Operation                      | Name    | Starts | E-Rows | A-Rows | Buffers |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |         |      1 |        |      0 |       2 |
|   1 |  HASH UNIQUE                   |         |      1 |      2 |      0 |       2 |
|   2 |   TABLE ACCESS BY INDEX ROWID  | DEMO    |      1 |   4070 |      0 |       2 |
|   3 |    BITMAP CONVERSION TO ROWIDS |         |      1 |        |      0 |       2 |
|   4 |     BITMAP AND                 |         |      1 |        |      0 |       2 |
|   5 |      BITMAP MERGE              |         |      1 |        |      0 |       2 |
|*  6 |       BITMAP INDEX RANGE SCAN  | DEMO_U2 |      1 |        |      0 |       2 |
|*  7 |      BITMAP INDEX SINGLE VALUE | DEMO_U1 |      1 |        |      0 |       0 |
|*  8 |      BITMAP INDEX SINGLE VALUE | DEMO_U4 |      1 |        |      0 |       0 |
|   9 |      BITMAP OR                 |         |      1 |        |      0 |       0 |
|* 10 |       BITMAP INDEX SINGLE VALUE| DEMO_U3 |      1 |        |      0 |       0 |
|* 11 |       BITMAP INDEX SINGLE VALUE| DEMO_U3 |      1 |        |      0 |       0 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("U2">'X')
       filter("U2">'X')
   7 - access("U1"='A')
   8 - access("U4"='B')
  10 - access("U3"='A')
  11 - access("U3"='E')


34 rows selected.
Good. I'm happy with that plan. But I've In-Memory option so probably I can get rid of those bitmap indexes.

alter table INMEMORY

Let's put that query in memory:

12:05:06 SQL> alter table DEMO inmemory priority none memcompress for query high;
Table altered.
and run that query again
12:05:06 SQL> select distinct * from DEMO where U1='A' and U2>'X' and U3 in ('A','E') and U4='B';
no rows selected

12:05:07 SQL> select * from table(dbms_xplan.display_cursor(format=>'iostats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  64skw45ghn5a0, child number 0
-------------------------------------
select distinct * from DEMO where U1='A' and U2>'X' and U3 in ('A','E')
and U4='B'

Plan hash value: 51067428

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Starts | E-Rows | A-Rows | Buffers | Reads  |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      1 |        |      0 |   13740 |  13736 |
|   1 |  HASH UNIQUE                |      |      1 |      2 |      0 |   13740 |  13736 |
|*  2 |   TABLE ACCESS INMEMORY FULL| DEMO |      1 |   4070 |      0 |   13740 |  13736 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - inmemory(("U2">'X' AND "U1"='A' AND "U4"='B' AND INTERNAL_FUNCTION("U3")))
       filter(("U2">'X' AND "U1"='A' AND "U4"='B' AND INTERNAL_FUNCTION("U3")))
Here is my problem. Now that I have defined the table to be populated into the In-Memory Column Store, then the CBO choose an In-Memory plan for my query.

This is a FULL TABLE SCAN because you can only do full table scans from the In-Memory Column Store. But I have a problem. The column store is not yet populated:

12:05:07 SQL> select segment_name,inmemory_size,bytes_not_populated from v$im_segments;
no rows selected
So the FULL TABLE SCAN occured on the row store. Look at the statistics above: 1370 logical reads from the buffer cache. And 13736 physical reads because that table is not in the buffer cache. I always used index access for it before, so the table blocks are not in buffer cache. And the full table scan has good change to be done in direct-path.
I still have a very good access from the bitmap indexes - which are still there - but now I'm now doing a very expensive full table scan.

Population

Look at the same query two seconds later:

12:05:09 SQL> select distinct * from DEMO where U1='A' and U2>'X' and U3 in ('A','E') and U4='B';
no rows selected

12:05:09 SQL> select * from table(dbms_xplan.display_cursor(format=>'iostats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  64skw45ghn5a0, child number 0
-------------------------------------
select distinct * from DEMO where U1='A' and U2>'X' and U3 in ('A','E')
and U4='B'

Plan hash value: 51067428

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Starts | E-Rows | A-Rows | Buffers | Reads  |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      1 |        |      0 |   11120 |  11117 |
|   1 |  HASH UNIQUE                |      |      1 |      2 |      0 |   11120 |  11117 |
|*  2 |   TABLE ACCESS INMEMORY FULL| DEMO |      1 |   4070 |      0 |   11120 |  11117 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - inmemory(("U2">'X' AND "U1"='A' AND "U4"='B' AND INTERNAL_FUNCTION("U3")))
       filter(("U2">'X' AND "U1"='A' AND "U4"='B' AND INTERNAL_FUNCTION("U3")))
It is just a bit better: 11117 physical reads instead of 13736. This is because some In-Memory Compression Units are already there in the In-Memory Column Store:
12:05:10 SQL> select segment_name,inmemory_size,bytes_not_populated from v$im_segments;

SEGMENT_NA INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- ------------- ---------- -------------------
DEMO             6815744  117440512            88973312
Among the 117440512 bytes (which is 14336 8k blocks) only 88973312 are not yet populated (10861 8k blocks). This is why a bit earlier the query still had to read 11120 blocks from buffer cache.

Let's wait 1 minute for population. Remember that during that time, the population uses a lot of CPU in order to read the row store blocs, put it in column, compress it and store it into the column store.

12:06:04 SQL> select distinct * from DEMO where U1='A' and U2>'X' and U3 in ('A','E') and U4='B';
no rows selected

12:06:04 SQL> select * from table(dbms_xplan.display_cursor(format=>'iostats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
SQL_ID  64skw45ghn5a0, child number 0
-------------------------------------
select distinct * from DEMO where U1='A' and U2>'X' and U3 in ('A','E')
and U4='B'

Plan hash value: 51067428

---------------------------------------------------------------------------------
| Id  | Operation                   | Name | Starts | E-Rows | A-Rows | Buffers |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      1 |        |      0 |       3 |
|   1 |  HASH UNIQUE                |      |      1 |      2 |      0 |       3 |
|*  2 |   TABLE ACCESS INMEMORY FULL| DEMO |      1 |   2546 |      0 |       3 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - inmemory(("U1"='A' AND "U4"='B' AND "U2">'X' AND INTERNAL_FUNCTION("U3")))
       filter(("U1"='A' AND "U4"='B' AND "U2">'X' AND INTERNAL_FUNCTION("U3")))
Ok. not only 3 blocks were read from buffer cache. I have now good performance that I can compare with what I had with the bitmap indexes.

This is because population is completed:

12:06:15 SQL> select segment_name,inmemory_size,bytes,bytes_not_populated from v$im_segments;

SEGMENT_NA INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- ------------- ---------- -------------------
DEMO            31195136  117440512                   0

Conclusion

My conclusion is that altering a table to populate it into the In-memory Column Store looks like an easy operation. But it is not. When you do that:

  • You change the plans to FULL TABLE SCAN which will not be optimal until the table is fully populated.
  • You trigger the population which will increase a lot your server CPU usage
  • you have the risk to get tables only partially populated in case you're in RAC, or if you don't have enough space in the inmemory_size
So this is something to plan and to monitor. And you will also need to think about what happens if your instance crashes and you have to restart it. How long will it take to get back to correct performance?
And that's even without asking yourself yet if you can drop those bitmap indexes that are superseeded by the In-Memory column store now.

Of course, there are solutions for any problem. if you are on Exadata, then SmartScan will come to the rescue until the IMCS is populated. Full table scan is offloaded to storage nodes. Database node CPU resources are available for quick population. In that way, they are complementary.

Singapore Maths Question Solution and Very Interesting Observation (The Trickster)

Richard Foote - Thu, 2015-04-23 02:22
OK, time to reveal the solution to the somewhat tricky Singapore maths exam question I blogged previously. Remember, there were 10 dates: May 13   May 15   May 19 June 13   June 14 July 16   July 18 August 14   August 15   August 16 Bowie only knew the month of my birthday, Ziggy only knew the day. Bowie […]
Categories: DBA Blogs

Golden Oldies

Jonathan Lewis - Thu, 2015-04-23 01:45

I’ve just been motivated to resurrect a couple of articles I wrote for DBAZine about 12 years ago on the topic of bitmap indexes. All three links point to Word 97 documents which I posted on my old website in September 2003. Despite their age they’re still surprisingly good.

Update: 26th April 2015

Prompted by my reply to comment #2 below to look at what I said about bitmap indexes in Practical Oracle 8i (published 15 years ago), and found this gem:

An interesting feature of bitmap indexes is that it is rather hard to predict how large the index segment will be. The size of a B-tree index is based very closely on the number of rows and the typical size of the entries in the index column. The size of a bitmap index is dictated by a fairly small number of bit-strings which may have been compressed to some degree depending upon the number of consecutive 1’s and 0’s.

To pick an extreme example, imagine a table of one million rows that has one column that may contain one of eight values ‘A’ to ‘H’ say, which has been generated in one of of the two following extreme patterns:

  • All the rows for a given value appear together, so scanning down the table we get 125,000 rows with ‘A’ followed by 125,000 rows of ‘B’ and so on.
  • The rows cycle through the values in turn, so scanning down the table we get ‘A’,’B’. . . ‘H’ repeated 125,000 times.

What will the bitmap indexes look like in the two cases case?

For the first example, the basic map for the ‘A’ value will be 125,000 one-bits, followed by 875,000 zero bits – which will be trimmed off. Splitting the 125,000 bits into bytes and adding the necessary overhead of about 12% we get an entry of the ‘A’ rows of 18K. A similar argument applies for each of the values ‘B’ to ‘H’, so we get a total index size of around 8 x 18K – giving 156K.

For the second example, the basic map for the ‘A’ value will be a one followed by 7 zeros, repeated 125,000 times. There is no chance of compression here, so the ‘A’ entry will start at 125,000 bytes. Adding the overhead this goes up to 140K, and repeating the argument for the values ‘B’ to ‘H’ we get a total index of 1.12 MB.

This wild variation in size looks like a threat, but to put this into perspective, a standard B-tree index on this column would run to about 12 Mb irrespective of the pattern of the data. It would probably take about ten times as long to build as well.

As we can see, the size of a bitmap index can be affected dramatically by the packing of the column it depends upon as well as the number of different possible values the column can hold and the number of rows in the table. The compression that is applied before the index is stored, and the amazing variation in the resulting index does mean that the number of different values allowed in the column can be much larger than you might first expect. In fact it is often better to think of bitmap indexes in terms of how many occurrences of each value there are, rather than in terms of how many different values exist. Viewing the issue from this direction, a bitmap is often better than a B-tree when each value occurs more than a few hundred times in the table (but see the note below following the description of bitmap index entries).

 


Is MERGE a bug?

Chet Justice - Wed, 2015-04-22 20:57
A few years back I pondered whether DISTINCT was a bug.

My premise was that if you are depending on DISTINCT to return a correct result set, something is seriously wrong with your table design. I was reminded of this again recently when I ran across Kent Graziano's post on Better Data Modeling: Are you making these 3 beginner mistakes in your data models?. Specifically:
Instead of that, you should be defining a natural, or business, key for every table in your system. A natural key is a an attribute or set of attributes (that occur naturally in the data set) required to uniquely identify a row in that table. In addition you should define a Unique Key Constraint on those attributes in the database. Then you can be sure you will not get any duplicate data into the tables.

CLARIFICATION: This point has caused a lot of questions and comments. To be clear, the mistake here is to have ONLY defined a surrogate key. i believe that even if using surrogate keys is the best solution for your design, you should ALSO define an alternate unique natural key. So why MERGE?

I learned about the MERGE statement in 2008. During an interview, Frank Davis asked me about when I would use it. I didn't even know what it was (and admitted that) but I went home that night and...wait...I think he asked me about multi table inserts. Whatever, credit is still going to Mr. Davis. Where was I? OK, so I had been working with Oracle for about 6 years at that point and I didn't know about it. My initial reaction was to use it everywhere (not really)! You know, shiny object and all. Look! Squirrel!

Why am I considering MERGE a bug? Let me be more specific. I was working with a couple of tables and had not written the API for them yet and a developer was writing some PL/SQL to update the records from APEX. In his loop he had a MERGE. I realized at that moment there was 1, no surrogate key and 2, no natural key defined (which ties in with Kent's comments up above). Upon realizing the developer was doing this, I knew immediately what the problem was (besides not using a PL/SQL API to nicely encapsulate the business logic). The table was poorly designed.

Easy fix. Update the table with a surrogate key and define a natural key. I was thankful for the reminder, I hadn't added the unique constraint yet. Of course had I written the API already I probably would have noticed the design error, either way, a win for design.

Now, there are perfectly good occasions to use the MERGE statement. Most of those, to me anyway, relate to legacy systems where you don't have the ability to change the underlying table structures (or it's just cost prohibitive) or ETL, where you want to load/update a dimension table in your data warehouse.

Noons, how's that? First time out in 10 months. Thanks for the push.
Categories: BI & Warehousing

ASU, edX and The Black Knight: MOOCs are not dead yet

Michael Feldstein - Wed, 2015-04-22 18:24

By Phil HillMore Posts (307)

In 2012 I wrote a post during the emergence of MOOC mania, pointing out some barriers that must be overcome for the new model to survive.

So what are the barriers that must be overcome for the MOOC concept (in future generations) to become self-sustaining? To me the most obvious barriers are:

  • Developing revenue models to make the concept self-sustaining;
  • Delivering valuable signifiers of completion such as credentials, badges or acceptance into accredited programs;
  • Providing an experience and perceived value that enables higher course completion rates (most today have less than 10% of registered students actually completing the course); and
  • Authenticating students in a manner to satisfy accrediting institutions or hiring companies that the student identify is actually known.

Fig 3 EvolutionCombine20120927

Since that time, of course, the MOOC hype has faded away, partially based on the above barriers not being overcome.

Today, Arizona State University (ASU) and edX announced a new program, Global Freshman Academy, that takes direct aim at all four barriers and could be the most significant MOOC program yet. From the New York Times story:

Arizona State University, one of the nation’s largest universities, is joining with edX, a nonprofit online venture founded by M.I.T. and Harvard, to offer an online freshman year that will be available worldwide with no admissions process and full university credit.

In the new Global Freshman Academy, each credit will cost $200, but students will not have to pay until they pass the courses, which will be offered on the edX platform as MOOCs, or Massive Open Online Courses.

Later in the article we find out more details on pricing and number of courses.

The new program will offer 12 courses — eight make up a freshman year — created by Arizona State professors. It will take an unlimited number of students. Neither Mr. Agarwal nor Mr. Crow would predict how many might enroll this year.

The only upfront cost will be $45 a course for an identity-verified certificate. Altogether, eight courses and a year of credit will cost less than $6,000.

ASU will pay for the course development and edX will pay for the platform. They eventually hope to get foundation funding, but ASU president Michael Crow promised that “we’re going ahead no matter what”.

This is a big commitment, and it will be interesting to see the results of program that addresses revenue models, identity verification, completion rates and awarding actual credit. As Crow described:

“We were not big believers in MOOCs without credit, courses without a connection to degrees, so we focused our attention on building degree programs,” Mr. Crow said.

Pay attention to this one, whether you’re a MOOC fan or not.

Update:

The post ASU, edX and The Black Knight: MOOCs are not dead yet appeared first on e-Literate.

PLAN_HASH_VALUE calculation different HP-UX and Linux?

Bobby Durrett's DBA Blog - Wed, 2015-04-22 17:01

I’m trying to compare how a query runs on two different 11.2.0.3 systems.  One runs on HP-UX Itanium and one runs on 64 bit x86 Linux.  Same query, same plan, different hash value.

HP-UX:

SQL_ID 0kkhhb2w93cx0
--------------------
update seg$ set type#=:4,blocks=:5,extents=:6,minexts=:7,maxexts=:8,exts
ize=:9,extpct=:10,user#=:11,iniexts=:12,lists=decode(:13, 65535, NULL,
:13),groups=decode(:14, 65535, NULL, :14), cachehint=:15, hwmincr=:16,
spare1=DECODE(:17,0,NULL,:17),scanhint=:18, bitmapranges=:19 where
ts#=:1 and file#=:2 and block#=:3

Plan hash value: 1283625304

----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT      |                |       |       |     2 (100)|          |
|   1 |  UPDATE               | SEG$           |       |       |            |          |
|   2 |   TABLE ACCESS CLUSTER| SEG$           |     1 |    65 |     2   (0)| 00:00:01 |
|   3 |    INDEX UNIQUE SCAN  | I_FILE#_BLOCK# |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Linux:

SQL_ID 0kkhhb2w93cx0
--------------------
update seg$ set type#=:4,blocks=:5,extents=:6,minexts=:7,maxexts=:8,exts
ize=:9,extpct=:10,user#=:11,iniexts=:12,lists=decode(:13, 65535, NULL,
:13),groups=decode(:14, 65535, NULL, :14), cachehint=:15, hwmincr=:16,
spare1=DECODE(:17,0,NULL,:17),scanhint=:18, bitmapranges=:19 where
ts#=:1 and file#=:2 and block#=:3

Plan hash value: 2170058777

----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT      |                |       |       |     2 (100)|          |
|   1 |  UPDATE               | SEG$           |       |       |            |          |
|   2 |   TABLE ACCESS CLUSTER| SEG$           |     1 |    64 |     2   (0)| 00:00:01 |
|   3 |    INDEX UNIQUE SCAN  | I_FILE#_BLOCK# |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

I wonder if the endianness plays into the plan hash value calculation? Or is it just a port specific calculation?

Odd.

– Bobby

Categories: DBA Blogs

Successful launch dbi services Zürich

Yann Neuhaus - Wed, 2015-04-22 12:50

Yesterday evening dbi services (headquarters in Delémont) launched officially its third branch in Zürich (Beside Basel and Lausanne). Five years after its take off, the "Oracle Database Partner of Year 2014", is employing more than 40 consultants. I would like to use this opportunity to thank all the customers and partners who trust dbi services. A particular thanks goes to the customers and partners who helped us to enjoy a very pleasant inauguration party yesterday.

Thanks also to Mr Thomas Salzmann (KKG) who presented our successful collaboration and to Massimo Castelli (Oracle) who presented the challenges of recruitment in the IT branch. I was pleased to see that large players like Oracle do, sometimes, have the same challenges as mid-sized companies :-) .

All this adventure would not have been possible without our incredible teams, working hard every day to transform ideas and problems into projects and solutions. dbi services will continue to leverage the skills of its employees, to look for opportunities, in order to remain at top-level provider for the operating systems, database and middleware layers.

A final thanks goes to Kurt Meier who will lead the dbi services branch in Zürich and for the very good organisation of this party. After having won the first customers, Kurt proved that dbi services will succeed and manage this new challenge.

b2ap3_thumbnail_rsz_dsc_2404.jpg

 

b2ap3_thumbnail_rsz_dsc_2390.jpg

 

Dynamically refresh a part of a page (PPR) in Oracle MAF

Shay Shmeltzer - Wed, 2015-04-22 10:23

A common question for developers who are just starting with Oracle MAF, especially if they have a background in Oracle ADF, is how do you do a partial page refresh in Oracle MAF.

Partial Page Refresh basically means that I want to change something in my UI based on another event in the UI - for example hide or show a section of the page. (In ADF there is a partialTrigger property for components which is not there in MAF).

In MAF the UI behaves differently - it is not based on JSF after all - the UI directly reflects changes in managed beans as long as it knows about changes there. How does it know about changes? For this you need to enable firing change event notifications. This is actually quite easy to do - just turn on the checkbox in JDeveloper's accessors generation and it will do the job for you.

Here is a quick demo showing you how to achieve  this:

Here is the code used.

in AMX page:

     <amx:tableLayout id="tl1">

      <amx:rowLayout id="rl1">

        <amx:cellFormat id="cf2">

          <amx:listView var="row" showMoreStrategy="autoScroll" bufferStrategy="viewport" id="lv1">

            <amx:listItem id="li1">

              <amx:outputText value="ListItem Text" id="ot2"/>

              <amx:setPropertyListener id="spl1" from="#{'true'}" to="#{viewScope.backingPPR.showIt}"

                                       type="swipeRight"/>

              <amx:setPropertyListener id="spl2" from="#{'false'}" to="#{viewScope.backingPPR.showIt}"

                                       type="swipeLeft"/>

            </amx:listItem>

          </amx:listView>

        </amx:cellFormat>

      </amx:rowLayout>

      <amx:rowLayout id="rl2" rendered="#{viewScope.backingPPR.showIt}">

        <amx:cellFormat id="cf1">

          <amx:commandButton text="commandButton1" id="cb3"/>

        </amx:cellFormat>

      </amx:rowLayout>

    </amx:tableLayout>


in managed bean:

     boolean showIt = false;


    public void setShowIt(boolean showIt) {

        boolean oldShowIt = this.showIt;

        this.showIt = showIt;

        propertyChangeSupport.firePropertyChange("showIt", oldShowIt, showIt);

    }


    public boolean isShowIt() {

        return showIt;

    }


    public void addPropertyChangeListener(PropertyChangeListener l) {

        propertyChangeSupport.addPropertyChangeListener(l);

    }


    public void removePropertyChangeListener(PropertyChangeListener l) {

        propertyChangeSupport.removePropertyChangeListener(l);

    }


Categories: Development