Skip navigation.

Feed aggregator

Annonce : PostgreSQL plugin pour EM12c

Jean-Philippe Pinte - Thu, 2014-04-03 11:20
En mai prochain, Blue Medora mettra à disposition un plugin pour PostgreSQL

Ce plugin sera disponible gratuitement avec un support via le forum.

A Simple Way to Monitor Java in Linux

Pythian Group - Thu, 2014-04-03 07:49

A quick and easy way to know what is it inside Java process that is using your CPU. Using just Linux command line tools and JDK supplied command line utilities.

Introduction

Here are a few things you need to know before starting. Following the links is not necessary, they are available for the reference.

  • there are different vendors of Java Virtual Machine. This post is about Oracle’s JVM which is called HotSpot. Linux x86-64 is considered as OS platform. Most of the things about HotSpot are applicable to other vendors too but with slight changes. OSes other than Linux may add some more complications
  • it’s called Virtual Machine, because it virtualizes runtime environment for a Java application. So to know where to look at you need to know a few things about how specific VM is organized. For a detailed overview of the HotSpot, please refer to this article
  • on Linux, a thread inside HotSpot VM is mapped directly to an OS level thread. Well, it may not be true on all OSes, but for modern Linux kernels this is true. So every thread on the OS level is a thread inside a Java application
  • there are generally two types of threads inside a HotSpot VM: native and application threads. Application threads are those that run some Java code, and usually this is what applications are using to run their code. Native threads run something which is not written in Java, usually it’s code in C/C++ and usually all these threads are special utility threads started by a VM itself.
Identifying Threads

Since a Java program may start many threads, each executing some program code, it is necessary to understand which threads are using CPUs. On Linux, top -H will show you CPU usage on a per-thread basis. Here is an example. First, a process which consumes CPU:

top - 16:32:29 up 10:29,  3 users,  load average: 1.08, 0.64, 0.56
Tasks: 172 total,   1 running, 171 sleeping,   0 stopped,   0 zombie
Cpu(s): 48.7%us, 51.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   1500048k total,  1476708k used,    23340k free,    62012k buffers
Swap:  4128764k total,    75356k used,  4053408k free,   919836k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7599 oracle    20   0 1151m  28m 9100 S 85.5  1.9   0:12.67 java
 2575 oracle    -2   0  709m  10m 8336 S 10.6  0.7  47:34.05 oracle
 2151 root      20   0  207m  44m 6420 S  1.7  3.0   0:27.18 Xorg

If we check the details of CPU usage for PID=7599 with “top -H -p 7599″, then we will see something like this:

top - 16:40:39 up 10:37,  3 users,  load average: 1.47, 1.25, 0.90
Tasks:  10 total,   1 running,   9 sleeping,   0 stopped,   0 zombie
Cpu(s): 49.3%us, 50.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   1500048k total,  1460468k used,    39580k free,    50140k buffers
Swap:  4128764k total,    76208k used,  4052556k free,   912644k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7601 oracle    20   0 1151m  28m 9100 R 85.9  1.9   7:19.98 java
 7602 oracle    20   0 1151m  28m 9100 S  1.0  1.9   0:02.95 java
 7599 oracle    20   0 1151m  28m 9100 S  0.0  1.9   0:00.01 java

So there is 1 execution thread inside a Java process, which is constantly on top and is utilizing around 85% of a single core.

Now the next thing to know is: what is this thread doing. To answer that question we need to know two things: thread stacks from a Java process and a way to map OS level thread to a Java thread. As I mentioned previously, there is one to one mapping between OS and Java level threads in HotSpot running on Linux.

To get a thread dump we need to use a JDK utility called jstack:

[oracle@oel6u4-2 test]$ jstack 7599
2014-02-28 16:57:23
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.12-b01 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007f05a0001000 nid=0x1e66 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Low Memory Detector" daemon prio=10 tid=0x00007f05c4088000 nid=0x1db8 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007f05c4085800 nid=0x1db7 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007f05c4083000 nid=0x1db6 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007f05c4081000 nid=0x1db5 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007f05c4064800 nid=0x1db4 in Object.wait() [0x00007f05c0631000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000eb8a02e0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
	- locked <0x00000000eb8a02e0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x00007f05c4062800 nid=0x1db3 in Object.wait() [0x00007f05c0732000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000eb8a0380> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:485)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
	- locked <0x00000000eb8a0380> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]
   java.lang.Thread.State: RUNNABLE
	at java.security.SecureRandom.getInstance(SecureRandom.java:254)
	at java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:176)
	at java.security.SecureRandom.<init>(SecureRandom.java:133)
	at RandomUser.main(RandomUser.java:9)

"VM Thread" prio=10 tid=0x00007f05c405c000 nid=0x1db2 runnable

"VM Periodic Task Thread" prio=10 tid=0x00007f05c408b000 nid=0x1db9 waiting on condition

JNI global references: 975

To map OS level thread to a Java thread in a thread dump, we need to convert native thread ID from Linux to base 16, and search for “nid=$ID” in the stack trace. In our case thread ID is 7601 which is 0x1db1, and the Java thread had following stack trace at the time of running jstack:

"main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]
   java.lang.Thread.State: RUNNABLE
	at java.security.SecureRandom.getInstance(SecureRandom.java:254)
	at java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:176)
	at java.security.SecureRandom.<init>(SecureRandom.java:133)
	at RandomUser.main(RandomUser.java:9)
A Way to Monitor

Here is a way to get a stack trace of a thread inside a Java process with command line tools (PID and TID are Process ID of Java process, and Thread ID of an interesting thread on the OS level):

[oracle@oel6u4-2 test]$ jstack $PID | awk '/ nid='"$(printf '%#x' $TID)"' /,/^$/'
"main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]
   java.lang.Thread.State: RUNNABLE
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:220)
	at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:185)
	at sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(NativePRNG.java:247)
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:261)
	- locked <0x00000000eb8a3370> (a java.lang.Object)
	at sun.security.provider.NativePRNG$RandomIO.access$200(NativePRNG.java:108)
	at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:97)
	at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
	- locked <0x00000000e43adc90> (a java.security.SecureRandom)
	at java.security.SecureRandom.next(SecureRandom.java:455)
	at java.util.Random.nextInt(Random.java:189)
	at RandomUser.main(RandomUser.java:9)

As you can see here, the thread is executing a main method of RandomUser class – at least at the time of taking a thread dump. If you would like to see how this changes over time, then a simple watch command may help to get an idea if this thread stack changes frequently or not:

watch -n .5 "jstack $PID | awk '/ nid='"$(printf '%#x' $TID)"' /,/^$/'"

Every 0.5s: jstack 7599 | awk '/ nid='0x1db1' /,/^$/'                                                                                Fri Mar 14 16:29:37 2014

"main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]
   java.lang.Thread.State: RUNNABLE
        at java.util.LinkedHashMap$LinkedHashIterator.<init>(LinkedHashMap.java:345)
        at java.util.LinkedHashMap$KeyIterator.<init>(LinkedHashMap.java:383)
        at java.util.LinkedHashMap$KeyIterator.<init>(LinkedHashMap.java:383)
        at java.util.LinkedHashMap.newKeyIterator(LinkedHashMap.java:396)
        at java.util.HashMap$KeySet.iterator(HashMap.java:874)
        at java.util.HashSet.iterator(HashSet.java:153)
        at java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1005)
        at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1004)
        at java.security.SecureRandom.getPrngAlgorithm(SecureRandom.java:523)
        at java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:165)
        at java.security.SecureRandom.<init>(SecureRandom.java:133)
        at RandomUser.main(RandomUser.java:9)

So this way you could see what the application thread is doing right now. Since it could be quite a lot of different type of work, the next reasonable step is to add aggregation.

[oracle@oel6u4-2 test]$ ./prof.sh 7599 7601
Sampling PID=7599 every 0.5 seconds for 10 samples
      6  "main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]    java.lang.Thread.State: RUNNABLE
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:220)
	at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:185)
	at sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(NativePRNG.java:247)
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:261)
	- locked <address> (a java.lang.Object)
	at sun.security.provider.NativePRNG$RandomIO.access$200(NativePRNG.java:108)
	at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:97)
	at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
	- locked <address> (a java.security.SecureRandom)
	at java.security.SecureRandom.next(SecureRandom.java:455)
	at java.util.Random.nextInt(Random.java:189)
	at RandomUser.main(RandomUser.java:9)

      2  "main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]    java.lang.Thread.State: RUNNABLE
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:268)
	- locked <address> (a java.lang.Object)
	at sun.security.provider.NativePRNG$RandomIO.access$200(NativePRNG.java:108)
	at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:97)
	at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
	- locked <address> (a java.security.SecureRandom)
	at java.security.SecureRandom.next(SecureRandom.java:455)
	at java.util.Random.nextInt(Random.java:189)
	at RandomUser.main(RandomUser.java:9)

      1  "main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]    java.lang.Thread.State: RUNNABLE
	at java.util.Random.nextInt(Random.java:189)
	at RandomUser.main(RandomUser.java:9)

      1  "main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]    java.lang.Thread.State: RUNNABLE
	at java.security.SecureRandom.next(SecureRandom.java:452)
	at java.util.Random.nextInt(Random.java:189)
	at RandomUser.main(RandomUser.java:9)

Here is what’s inside the prof.sh script:

#!/bin/bash

P_PID=$1
P_NID=$2

if [ "$P_SLEEP" == "" ]; then
  P_SLEEP=0.5
fi

if [ "$P_CNT" == "" ]; then
  P_CNT=10
fi

echo Sampling PID=$P_PID every $P_SLEEP seconds for $P_CNT samples

if [ "$P_NID" == "" ]; then
  CMD="awk '//'"
else
  CMD="awk '/ nid='"$(printf '%#x' $P_NID)"' /,/^$/'"
fi

for i in `seq $P_CNT`
do
  jstack $P_PID | eval $CMD
  sleep $P_SLEEP;
done |
  awk ' BEGIN { x = 0; s = "" }
    /nid=/ { x = 1; }
    // {if (x == 1) {s = s ", "gensub(/<\w*>/, "<address>", "g") } }
    /^$/ { if ( x == 1) { print s; x = 0; s = ""; } }' |
  sort | uniq -c | sort -n -r | head -10 |
  sed -e 's/$/\n/g' -e 's/\t/\n\t/g' -e 's/,//g'

The idea of the script is based on the method from poor man’s profiler adapted for HotSpot thread dumps. The script does the following things:

  • Takes $P_CNT thread dumps of the Java process ID passed as $1 (10 by default)
  • If a native thread ID has been supplied as $2, then searches for the thread stack of this thread in the thread dump
  • Concatenates each thread stack trace into a comma-separated string
  • Aggregates strings and sorts them by the number of occurrences
  • Prettifies the output: removes tabs, commas, and adds new lines back to the thread stack
Conclusion

With a few little things it is possible to understand quite a lot of things in almost any situation related to Java: you can find out the most frequent stack trace by sampling thread dumps.
With this knowledge it is then easy to understand why an issue happening. In my test case, the application instantly generates random numbers without a pause, and 1 thread is occupying 1 CPU core.

Categories: DBA Blogs

Cache anomaly

Jonathan Lewis - Thu, 2014-04-03 06:27

Just a quick heads-up for anyone who likes to play around with the Keep and Recycle caches.

In 11g Oracle introduced the option for serial direct path reads for tablescans on tables that was sufficiently large – which meant more than the small_table_threshold – provided the table wasn’t already sufficient well cached.  (The rules mean that the choice of mechanism can appear to be a little random in the production environment for tables that are near the threshold size – but if you try testing by doing “alter system flush buffer_cache” you find that you always get direct path reads in testing.)

I’ve just discovered a little oddity about this, though.  I have a table of about 50MB which is comfortably over the threshold for direct path reads. But if I create a KEEP cache (db_keep_cache_size) that is a little larger than the table and then assign the table to the KEEP cache (alter table xxx storage(buffer_pool keep)) then 11.2.0.4 stops doing direct path reads, and caches the table.

Now this doesn’t seem unreasonable – if you’ve assigned an object to the KEEP cache you probably want it (or once wanted it) to be kept in cache as much as possible; so using the KEEP cache if it’s defined and specified makes sense. The reason I mention this as an oddity, though, is that it doesn’t reproduce in 11.1.0.7.

I think I saw a bug note about this combination a few months ago- I was looking for something else at the time and, almost inevitably, I can’t find it when I want it – but I don’t remember whether it was the 11.1 or 11.2 behaviour that was deemed to be correct.

 Update

See comments 1 and 2 below.  I’ve written about this previously, and the caching bechaviour is the correct behaviour. The patch is in 11.2.0.2 and backports are available for 11.1.0.7 and 11.2.0.1. The patch ensures that the table will be read into the cache if it is smaller than the db_keep_cache_size.  (Although we might want to check – see Tanel’s notes – whether this is based on the high water mark recorded in the segment header or on the optimizer stats for the table; and I suppose it might be worth checking that the same feature applies to index fast full scans). From the MoS document:

With the bug fix applied, any object in the KEEP buffer pool, whose size is less than DB_KEEP_CACHE_SIZE, is considered as a small or medium sized object. This will cache the read blocks and avoid subsequent direct read for these objects.

 

 


Cache anomaly

Jonathan Lewis - Thu, 2014-04-03 06:27

Just a quick heads-up for anyone who likes to play around with the Keep and Recycle caches.

In 11g Oracle introduced the option for serial direct path reads for tablescans on tables that was sufficiently large – which meant more than the small_table_threshold – provided the table wasn’t already sufficient well cached.  (The rules mean that the choice of mechanism can appear to be a little random in the production environment for tables that are near the threshold size – but if you try testing by doing “alter system flush buffer_cache” you find that you always get direct path reads in testing.)

I’ve just discovered a little oddity about this, though.  I have a table of about 50MB which is comfortably over the threshold for direct path reads. But if I create a KEEP cache (db_keep_cache_size) that is a little larger than the table and then assign the table to the KEEP cache (alter table xxx storage(buffer_pool keep)) then 11.2.0.4 stops doing direct path reads, and caches the table.

Now this doesn’t seem unreasonable – if you’ve assigned an object to the KEEP cache you probably want it (or once wanted it) to be kept in cache as much as possible; so using the KEEP cache if it’s defined and specified makes sense. The reason I mention this as an oddity, though, is that it doesn’t reproduce in 11.1.0.7.

I think I saw a bug note about this combination a few months ago- I was looking for something else at the time and, almost inevitably, I can’t find it when I want it – but I don’t remember whether it was the 11.1 or 11.2 behaviour that was deemed to be correct.

 Update

See comments 1 and 2 below.  I’ve written about this previously, and the caching bechaviour is the correct behaviour. The patch is in 11.2.0.2 and backports are available for 11.1.0.7 and 11.2.0.1. The patch ensures that the table will be read into the cache if it is smaller than the db_keep_cache_size.  (Although we might want to check – see Tanel’s notes – whether this is based on the high water mark recorded in the segment header or on the optimizer stats for the table; and I suppose it might be worth checking that the same feature applies to index fast full scans). From the MoS document:

With the bug fix applied, any object in the KEEP buffer pool, whose size is less than DB_KEEP_CACHE_SIZE, is considered as a small or medium sized object. This will cache the read blocks and avoid subsequent direct read for these objects.

 

 


SoapUI: Property Based Assertions

Darwin IT - Thu, 2014-04-03 06:14
In many cases you need to or are allowed to send a messageId and a correlationId to a service. In the response these Id's are mirrored back. In SoapUI it is pretty easy to generate those ID's using Groovy. The code for this is as follows:

def guidVal = "${java.util.UUID.randomUUID()}";
def testCase = testRunner.testCase;
testCase.setPropertyValue("MessageId", guidVal);
def msgDate = new Date() ;
def msgDateStr = msgDate.format("yyyy-MM-dd'T'HH:mm:ss");
testCase.setPropertyValue("MessageTimeStamp",msgDateStr);
This as you can conclude from the code this is from a Groovy test step in a test Case. In the succeeding soap request you can have something like the following to have the property values embedded in the message:

<ber:berichtHeader>
<ber:messageData>
<ber:messageId>${#TestCase#MessageId}
<ber:messageTimestamp>${#TestCase#MessageTimeStamp}
</ber:messageData>
<ber:correlatieId>${#TestCase#MessageId}
</ber:berichtHeader>
Here you see that I use the same id for both messageId as well as the correlationId. A correlationId might have a longer lifespan as a messageId. In my (simple) case we have just one-to-one conversations. In the response of the message you might find something like the following:

<abct:berichtHeader>
<abct:messageData>
<abct:messageId>23c20898-9164-449d-87ef-3d9ed96ba946
<abct:messageTimestamp>2014-04-03T13:40:25.147+02:00
<abct:refToMessageId>23bf6b7a-61f9-4c91-b932-b288c5e358be
</abct:messageData>
<abct:correlatieId>23bf6b7a-61f9-4c91-b932-b288c5e358be
</abct:berichtHeader>
Nice. Apparently SoapUI generated global unique message id's and apparently my service mirrored them back. But how do I test that automatically? The thing is in this that I don't know the expected value at designtime, since the messageId's are generated at runtime. But the nice thing in SoapUI is that almost everywhere you can define properties at several level's (General, Project, TestSuite, TestCase, etc.) and reference them almost every where. For instance, you can use properties to buildup a soap-endpoint, using properties at project level. So you could define an endpoint as follows:
http://${#Project#ServiceEndpoint}/${#Project#ServiceURI}
This can be used in assertions as well. So define an Xpath-Match assertion on your response and instead of a fixed expected value, give in
${#TestCase#messageId}
Like: As you can see you reference the TestCase property by prefixing it with #TestCase. This refers to the current, running TestCase. For the sharp readers: my property reference in the assert is lower-init-case, where in the groovy script and the message it has an initial capital. I found though that the property is apparently not case-sensitive.

BI Forum 2014 preview – No Silver Bullets : OBIEE Performance in the Real World

Rittman Mead Consulting - Thu, 2014-04-03 03:35

I’m honoured to have been accepted to speak at this year’s Rittman Mead BI Forum, the sixth year of this expert-level conference that draws some of the best Oracle BI/DW minds together from around the world. It’s running May 8th-9th in Brighton, and May 15-16th in Atlanta, with an optional masterclass from Cloudera’s Lars George the day before the conference itself at each venue.

My first visit to the BI Forum was in 2009 where I presented Performance Testing OBIEE, and now five years later (five years!) I’m back, like a stuck record, talking about the same thing – performance. That I’m still talking about it means that there’s still an audience for it, and this time I’m looking beyond just testing performance, but how it’s approached by people working with OBIEE. For an industry built around 1s and 0s, computers doing just what you tell them to and nothing else, there is a surprising amount of suspect folklore and “best practices” used when it comes to “fixing” performance problems.

OBIEE performance good luck charm

Getting good performance with OBIEE is just a matter of being methodical. Understanding where to look for information is half the battle. By understanding where the time goes, improvements can be targeted where they will be most effective. Heavily influence by Cary Millsap and his Method-R approach to performance, I will look at how to practically apply this to OBIEE. Most of the information needed to build up a full picture is available readily from OBIEE’s log files

I’ll also dig a bit deeper into OBIEE, exploring how to determine how the system’s behaving “under the covers”. The primary technique for this is through OBIEE’s DMS metrics which I have written about recently in relation to the new Rittman Mead open-source tool, obi-metrics-agent and am using day-to-day to rapidly examine and resolve performance problems that clients see.

I’m excited to be presenting again on this topic, and I hope to see you in Brighton next month. The conference always sells out, so don’t delay – register today!

Categories: BI & Warehousing

Corporations seek to find optimal database security

Chris Foot - Thu, 2014-04-03 01:46

Though it may sound counterintuitive, a number of database experts have claimed that a company may benefit from disclosing information regarding its IT infrastructure to competitors. This may seem like a network security nightmare in and of itself, but collaborating with other market participants may provide valuable insight as to how organizations can deter cybercriminals. Others prefer to stick with improvements issued by established professionals. 

Applying updates 
Possessing quality database protection is being seen more as a profit-driver than an expense, primarily due to the fact that if digital information is stolen from a corporate server, it could potentially result in millions of dollars in losses. It's no surprise that database administration services are being consulted now more than ever. In addition, the makers of the products these professionals interact with have assessed security concerns and sought to mitigate potential problems. 

Oracle NoSQL Database 3.0 was recently released, with improved performance, usability and safeguards. The upgrade utilizes cluster-wide, password-based user authentication and session-level SSL encryption techniques to deter cybercriminals from hacking into a company infrastructure. Andrew Mendelsohn, executive vice president of database server technologies for Oracle, claimed that that it helps remote DBA personnel construct and deploy state-of-the-art applications in a secure environment. 

Walking around naked 
Corporations often misunderstand the advice of IT professionals to share security protocols with their competitors. It's not about exposing weaknesses to cybercriminals and providing them with a comprehensive framework of the database's infrastructure, it's about collaborating with like-minded executives attempting to find a solution to an issue that isn't going to disappear. 

Evan Schuman, a contributor to Computerworld, cited Full Disclosure, an online community through which database administration support, C-suite personnel and IT professionals could publicly report network breaches and discuss methods through which security problems could be resolved.

Due to the fact that gray hat hackers could access the forum, researchers would notify software companies at least 30 days prior to posting on the website so that the developers could apply the appropriate patches beforehand. This kind of initiative identified problems before cybercriminals could exploit them. Unfortunately, to the dismay of its participants, rumors have been circulating that Full Disclosure will shut down in the near future.

"By not having this place to expose them, the vulnerabilities will remain hidden longer, they will remain unpatched longer, yet the attacks will keep coming," said an anonymous security manager for a retailer. 

Ultimately, black hat hackers have extensive communities through which they can share the same kind of information professionals posting to Full Disclosure are. If the website goes dark, cybercriminals will still have networks of communication, while law-abiding IT industry participants will not. 

SQL Developer’s Interface for GIT: Cloning a GitHub Repository

Galo Balda's Blog - Wed, 2014-04-02 22:05

SQL Developer 4 provides an interface that allows us to interact with Git repositories. In this post, I’m going to show how to clone a GitHub (A web based hosting service for software development projects that uses the Git revision control system) repository.

First you need to sign up for a GitHub account. You can skip this step if you already have one.

Your account will give you access to public repositories that could be cloned but I suggest you create your own repository so that you can play with SQL Developer and see what the different options can do.

Once you have an account, click on the green button that says “New Repository”. It will take you to a screen like this:

github_create_repo

Give your repository a name, decide if you want it to be public or private (you have to pay), click on the check box and then click on the green button. Now you should be taken to the main repository page.

github_repo

Pay attention to the red arrow on the previous image. It points to a text box that contains the HTTPS clone URL that we’re going to use in SQL Developer to connect to GitHub.

Let’s go to SQL Developer and click on Team –> Git –> Clone… to open the “Clone from Git Wizard”. Click on the next button and you should see the screen that lets you enter the repository details:

remote_repo

Enter the repository name, the HTTPS clone URL, your GitHub user name and your password. Click on next to connect to the repository and see the remote branches that are available.

remote_branch

The master branch gets created by default for every new repository. Take the defaults on this screen and click on next to get to the screen where you specify the destination for your local Git repository.

destination

Enter the path for the local repository and click on next. A summary screen is displayed and showing the options you chose. Click on finish to complete the setup.

How do we know if it worked? Go to the path of your local repository and it should contain the same structure as in the online repository.

local_repo

On a next post I’ll show how to commit changes to the local repository and how to push them to GitHub.


Filed under: Source Control, SQL Developer Tagged: Source Control, SQL Developer
Categories: DBA Blogs

MaxPermSize Be Gone!

Steve Button - Wed, 2014-04-02 17:48

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
No further commentary required.

Two Adaptive Plans Join Methods Examples

Bobby Durrett's DBA Blog - Wed, 2014-04-02 14:49

Here is a zip of two examples I built as I’m learning about the new adaptive plans features of Oracle 12c: zip

The first example has the optimizer underestimate the number of rows and the adaptive plans feature switches the plan on the fly from nested loops to hash join.

In the second example the optimizer overestimates the number of rows and the adaptive plans feature switches the plan from merge join to nested loops.

I ran the same scripts on 12c and 11.2.0.3 for comparison.

Example 1 11g:

Plan hash value: 2697562628

------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |      1 |        |      1 |00:00:00.01 |      18 |
|   1 |  SORT AGGREGATE               |      |      1 |      1 |      1 |00:00:00.01 |      18 |
|   2 |   NESTED LOOPS                |      |      1 |        |      8 |00:00:00.01 |      18 |
|   3 |    NESTED LOOPS               |      |      1 |      1 |      8 |00:00:00.01 |      17 |
|*  4 |     TABLE ACCESS FULL         | T1   |      1 |      1 |      8 |00:00:00.01 |      14 |
|*  5 |     INDEX RANGE SCAN          | T2I  |      8 |      1 |      8 |00:00:00.01 |       3 |
|   6 |    TABLE ACCESS BY INDEX ROWID| T2   |      8 |      1 |      8 |00:00:00.01 |       1 |
------------------------------------------------------------------------------------------------

Example 1 12c:

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |       6 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |      1 |      8 |00:00:00.01 |       6 |  2168K|  2168K|     1/0/0|
|*  3 |    TABLE ACCESS FULL| T1   |      1 |      1 |      8 |00:00:00.01 |       3 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |      1 |     16 |00:00:00.01 |       3 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Example 2 11g

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |      1 |        |      1 |00:00:00.01 |      16 |       |       |          |
|   1 |  SORT AGGREGATE               |      |      1 |      1 |      1 |00:00:00.01 |      16 |       |       |          |
|   2 |   MERGE JOIN                  |      |      1 |      4 |      1 |00:00:00.01 |      16 |       |       |          |
|   3 |    TABLE ACCESS BY INDEX ROWID| T2   |      1 |     16 |      2 |00:00:00.01 |       2 |       |       |          |
|   4 |     INDEX FULL SCAN           | T2I  |      1 |     16 |      2 |00:00:00.01 |       1 |       |       |          |
|*  5 |    SORT JOIN                  |      |      2 |      4 |      1 |00:00:00.01 |      14 | 73728 | 73728 |          |
|*  6 |     TABLE ACCESS FULL         | T1   |      1 |      4 |      1 |00:00:00.01 |      14 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

Example 2 12c

------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |      1 |        |      1 |00:00:00.01 |       5 |
|   1 |  SORT AGGREGATE               |      |      1 |      1 |      1 |00:00:00.01 |       5 |
|   2 |   NESTED LOOPS                |      |      1 |        |      1 |00:00:00.01 |       5 |
|   3 |    NESTED LOOPS               |      |      1 |      4 |      1 |00:00:00.01 |       4 |
|*  4 |     TABLE ACCESS FULL         | T1   |      1 |      4 |      1 |00:00:00.01 |       3 |
|*  5 |     INDEX RANGE SCAN          | T2I  |      1 |        |      1 |00:00:00.01 |       1 |
|   6 |    TABLE ACCESS BY INDEX ROWID| T2   |      1 |      1 |      1 |00:00:00.01 |       1 |
------------------------------------------------------------------------------------------------

The output of the plans for the 12c examples end with this line:

Note
-----
   - this is an adaptive plan

So, that tells me it is the adaptive plan feature that is changing the plan despite the wrong estimate of the number of rows.

- Bobby

 

Categories: DBA Blogs

Easy – Oops.

Jonathan Lewis - Wed, 2014-04-02 12:47

A question came up on OTN today asking for suggestions on how to enforce uniqueness on a pair of columns only when the second column was not null. There’s an easy and obvious solution – but I decided to clone the OP’s example and check that I’d typed my definition up before posting it; and the result came as a bit of a surprise. Here’s a demo script (not using the OP’s table):


create table t1  
(  
	col1	int not null,
	col2	varchar2(1)
);  

create unique index t1_i1 on t1( 
--	case col2 when null then cast(null as int) else col1 end,
--	case when col2 is null then cast(null as int) else col1 end,
	case when col2 is not null then col1 end,
	col2
)
;

insert into t1 values(1,null);
insert into t1 values(1,null);
insert into t1 values(1,'x');
insert into t1 values(1,'y');
insert into t1 values(1,'y');

commit;

column ind1_is   format a5
column ind1_when format 9999

set null N/A

select
	case when col2 is null then cast (null as int) else col1 end	ind1_is,
	case col2 when null then cast (null as int)  else col1 end	ind1_when
from 
	t1
;

The strategy is simple, you create a unique function-based index with two columns; the first column of the index id defined to show the first column of the table if the second column of the table is not null, the second column of the index is simply the second column of the table. So if the second column of the table is null, both columns in the index are null and there is no entry in the index; but if the second column of the table is not null then the index copies both columns from the table and a uniqueness test applies.

Based on the requirement and definition you would expect the first 4 of my insert statements to succeed and the last one to fail. The index will then have two entries, corresponding to my 3rd and 4th insertions.

I’ve actually shown three ways to use the case statement to produce the first column of the index. The last version is the cleanest, but the first option is the one I first thought of – it’s virtually a literal translation the original requirement. The trouble is, with my first definition the index acquired an entry it should not have got, and the second insert raised a “duplicate key” error; the error didn’t appear when I switched the syntax of the case statement to the second version.

That’s why the closing query of the demo is there – when you run it the two values reported should be the same as each other for all four rows in the table – but they’re not. This is what I got on 11.2.0.4:


IND1_IS IND1_WHEN
------- ---------
N/A             1
N/A             1
      1         1
      1         1


I’m glad I did a quick test before suggesting my original answer.

Anyone who has other versions of Oracle available is welcome to repeat the test and report back which versions they finding working correctly (or not).

Update

It’s not a bug (see note 2 below from Jason Bucata), it’s expected behaviour.

 


Easy – Oops.

Jonathan Lewis - Wed, 2014-04-02 12:47

A question came up on OTN today asking for suggestions on how to enforce uniqueness on a pair of columns only when the second column was not null. There’s an easy and obvious solution – but I decided to clone the OP’s example and check that I’d typed my definition up before posting it; and the result came as a bit of a surprise. Here’s a demo script (not using the OP’s table):


create table t1  
(  
	col1	int not null,
	col2	varchar2(1)
);  

create unique index t1_i1 on t1( 
--	case col2 when null then cast(null as int) else col1 end,
--	case when col2 is null then cast(null as int) else col1 end,
	case when col2 is not null then col1 end,
	col2
)
;

insert into t1 values(1,null);
insert into t1 values(1,null);
insert into t1 values(1,'x');
insert into t1 values(1,'y');
insert into t1 values(1,'y');

commit;

column ind1_is   format a5
column ind1_when format 9999

set null N/A

select
	case when col2 is null then cast (null as int) else col1 end	ind1_is,
	case col2 when null then cast (null as int)  else col1 end	ind1_when
from 
	t1
;

The strategy is simple, you create a unique function-based index with two columns; the first column of the index id defined to show the first column of the table if the second column of the table is not null, the second column of the index is simply the second column of the table. So if the second column of the table is null, both columns in the index are null and there is no entry in the index; but if the second column of the table is not null then the index copies both columns from the table and a uniqueness test applies.

Based on the requirement and definition you would expect the first 4 of my insert statements to succeed and the last one to fail. The index will then have two entries, corresponding to my 3rd and 4th insertions.

I’ve actually shown three ways to use the case statement to produce the first column of the index. The last version is the cleanest, but the first option is the one I first thought of – it’s virtually a literal translation the original requirement. The trouble is, with my first definition the index acquired an entry it should not have got, and the second insert raised a “duplicate key” error; the error didn’t appear when I switched the syntax of the case statement to the second version.

That’s why the closing query of the demo is there – when you run it the two values reported should be the same as each other for all four rows in the table – but they’re not. This is what I got on 11.2.0.4:


IND1_IS IND1_WHEN
------- ---------
N/A             1
N/A             1
      1         1
      1         1


I’m glad I did a quick test before suggesting my original answer.

Anyone who has other versions of Oracle available is welcome to repeat the test and report back which versions they finding working correctly (or not).

Update

It’s not a bug (see note 2 below from Jason Bucata), it’s expected behaviour.

 


Efficacy, Adaptive Learning, and the Flipped Classroom

Michael Feldstein - Wed, 2014-04-02 12:10

Speaking of efficacy and the complexity of measuring it, I had an interesting conversation the other day with Danae Hudson, a professor of Psychology at Missouri State University, about a course redesign effort that she participated in. The initial contact came from a P.R. firm hired by Pearson. Phil and I get a lot of these and turn down most of them. This one interested me for several reasons. First, it was an opportunity to talk directly to a faculty member who was involved in the project. (A tip to all you P.R. folks out there: You will have a better chance of getting our attention when the focus of the call is to put us in direct contact with your customers about what they are achieving.) Second, the project was facilitated by The National Center for Academic Transformation (NCAT). I am a big fan of NCAT’s work, despite the fact that they seem to have an almost pathological urge to sabotage efforts to bring their work the attention that they deserve. Pearson’s interest lab in the call was that MyPsychLab was an integral part of the course redesign. My interest was to see what I could learn about the interaction between educational technology products and educational practices in delivering educational efficacy.

What I heard tended to confirm my suspicions (and common sense): Educational technology products can produce significant learning gains, but they often doing so by supporting changes in classroom practices.

The Goals

Like all NCAT redesign projects, this one has a complete write-up on the NCAT site. The document summarizes the redesign context and goals as follows:

Introductory Psychology is a semester-long, general education course at Missouri State University (MSU). The course falls within the self understanding/social behavioral perspective area of general education and is by far, the most popular choice for students within that area. Each academic year, at least 18 traditional face-to-face sections are offered with a total enrollment of 2,500-2,700 students. The course is lecture-based and typically taught by 65% full-time faculty and 35% adjunct instructors. While there are common general education goals across all sections, each instructor makes individual choices of content and delivery.

Despite being a popular choice among students, Introductory Psychology has traditionally experienced a high DFW rate (approximately 25%). The department wants to find ways to develop a more engaging course that will result in improved student learning outcomes and student satisfaction. Due to the large enrollment and numerous sections offered throughout the year, a significant number of adjunct instructors teach the course, which has contributed to some course drift and grade inflation. Currently, each section of 153 students is taught by one instructor, which significantly limits the type of activities that can be assigned and graded. The vast majority of the final course grade is derived from a series of multiple-choice exams. The goal is to redesign the course to be much more engaging and interactive, with an emphasis on true mastery of the course material.

To sum up: We have a popular Gen Ed course with a high failure and withdrawal rate. Danae also told me that the psychology department had long delivered a formative exam at the beginning of that class, and that they were unhappy with the level of improvement students were showing between the formative and summative exams. The faculty wanted to improve those numbers by making the course “more engaging and interactive, with an emphasis on the true mastery of the course material.”

This is typically where we start hearing that teaching effectively is expensive. But NCAT has a strong track record of proving that to be false. It turns out that ineffective teaching methods are usually often inefficient. Let’s pause and think about the formulation of that last sentence for a moment. It’s not always the case that effective teaching measures are cost-efficient. Of course we know that good seminars with low teacher/student ratios can be very effective but, to adopt the current parlance, “don’t scale.”  In that situation, there is a tension between effectiveness and efficiency. But despite appearances, some traditional styles of teaching—most notably the classic ginormous lecture class—are both ineffective and inefficient. Why is that so? For several reasons. First, both the physical plant and the labor structure of the large lecture class limit its ability to scale. If you run out of lecture hall seats, or you run out of TAs, you have exhausted your ability to increase the number of students taught with the faculty that you have. The central innovation of video-based xMOOCs is that they remove this limitation without changing the underlying pedagogical model of the large lecture. But the central problem is that cost and effectiveness is a two-way street in education. In my last post, I discussed David Wiley’s argument that cost of curricular materials impact effectiveness insofar as cost limits student access to those materials. But it goes the other way too. There is a cost for every student who fails or withdraws from a class and therefore has to retake it. The direct cost is in the tuition paid for two classes rather than one—a cost paid but the financial aid providers in addition to the student—but indirect costs include increased chances that the student might have to stay an extra semester or drop out altogether as well as the knock-on effect of the student blocking the seat for another student in an enrollment-capped but graduation-required course. NCAT typically doesn’t even look at these indirect costs and are often able to find significant direct cost savings by restructuring courses away from ineffective pedagogical approaches toward more effective pedagogical approaches that also happen to be more scalable. In MSU’s case, they projected that they would be able to lower the direct cost of the course by 17.8% while still achieving the primary goal of increasing effectiveness. The NCAT report notes,

The cost savings will remain in the psychology department and will be used to provide support for the redesigned course in the future, faculty wishing to take on additional course redesign projects and faculty travel to present at conferences related to the scholarship of teaching and learning.

But How?

MSU decided to redesign its course around what NCAT calls “the Replacement Model,” which can be thought of as a combination of flipped and hybrid. At this point most people have at least a basic idea of what “hybrid” means, but “flipped” is a lot less clear. The Chronicle of Higher Education recently published a column by Robert Talbert highlighting a group that is trying to establish definition and best practices around what they call “flipped learning,” which they describe as follows:

Flipped Learning is a pedagogical approach in which direct instruction moves from the group learning space to the individual learning space, and the resulting group space is transformed into a dynamic, interactive learning environment where the educator guides students as they apply concepts and engage creatively in the subject matter.

That’s it in a nutshell: Move direct instruction (i.e., lectures) out of class time so that there can be more direct student interaction time. Which sounds great, but it leads to a frequently asked question. If students have to do all the homework they were doing before plus watching all the lecture videos at home, isn’t that going to dramatically increase the amount of time they have to spend on the class? How can they do all of that work? NCAT’s answer is that you give them back some of that time by making the class “hybrid” in the sense that you reduce their in-class seat time by 50%. That’s why it’s called the “Replacement Model.”

While Danae never used the term “flipped learning”, she did talk about the flipped classroom and made it very clear that she meant using it to increase the amount of class time spent interacting with students and focusing on their particular needs. But the MSU plan called for decreasing class time by 50% while doubling the number of students per class from an average of 153 to 300. How was that supposed to work?

Part of the answer lies in using traditional techniques like group projects, but a lot of it is in using data to provide students with more feedback and fine tune the classroom experience. This is where Pearson comes in. I wrote a while back that the promise of adaptive learning programs is to transform the economics of tutoring:

The simplest way to think about adaptive learning products in their current state is as tutors. Tutors, in the American usage of the word, provide supplemental instruction and coaching to students on a one-on-one basis. They are not expected to know everything that the instructor knows, but they are good at helping to ensure that the students get the basics right. They might quiz students and give them tips to help them remember key concepts. They might help a student get unstuck on a particular step that he hasn’t quite understood.  And above all, they help each student to figure out exactly where she is doing well and where she still needs help.

Adaptive learning technologies are potentially transformative in that they may be able to change the economics of tutoring. Imagine if every student in your class could have a private tutor, available to them at any time for as long as they need. Imagine further that these tutors work together to give you a daily report of your whole class—who is doing well, who is struggling on which concepts, and what areas are most difficult for the class as a whole. How could such a capability change the way that you teach? What would it enable you to spend less of your class time doing, and what else would it enable you to spend more of your class time doing? How might it impact your students’ preparedness and change the kinds of conversations you could have with them? The answers to these questions are certainly different for every discipline and possibly even for every class. The point is that these technologies can open up a world of new possibilities.

This is exactly how MSU is using MyPsychLab. One of the biggest benefits that Danae cited was being able to walk into a class knowing what students were doing well with and what they were struggling with. This enables her and her colleagues to focus on the topics that those particular students need the most help with in class while simultaneously communicating to the students that their teacher is aware of how they are doing and what they need. Likewise, she said that the students are coming to class more engaged with better questions. MSU also uses clickers in class to augment the feedback loop that they are getting from the homework platform. This certainly was a critical enabler at a class size of 300 and would be useful in a significantly smaller lecture class as well.

Did it work? The results are overall very positive but mixed:

  • On the 30-item comprehensive exam, students in the redesigned sections performed significantly better (84% improvement) compared to the traditional comparison group (54% improvement).
  • Students in the redesigned course demonstrated significantly more improvement from pre to post on the 50-item comprehensive exam (62% improvement) compared to the traditional sections (37% improvement).
  • Attendance improved substantially in the redesigned section. (Fall 2011 traditional mean percent attendance = 75% versus fall 2012 redesign mean percent attendance = 83%)
  • They did not get a statistically significant improvement in the number of failures and withdrawals, which was one of the main goals of the redesign, although they note that “it does appear that the distribution of A’s, B’s, and C’s shifted such that in the redesign, there were more A’s and B’s and fewer C’s compared to the traditional course.”
  • In terms of cost reduction, while they fell short of their 17.8% goal, they did achieve a 10% drop in the cost of the course.
Intuitions and Open Questions

The study of the course redesign was intended to measure the overall impact of the effort rather than to research the components of efficacy, which means that we don’t have good data from which we can draw strong conclusions on the most interesting questions in this regard. But I’m not afraid to make some guesses and I asked Danae to do the same with me. To be clear, her first answer to any of the questions I’m going to bring up in this section of the post was consistently along the lines of, “I don’t have data that speaks to that question.” Which is the right answer. I want to be clear that wherever I reference her opinions here that it was in this context and that she was appropriately tentative.

First of all, what role did MyPsychLab have in the improvements? Here we have at least one hard number:

A significant portion of the redesigned course utilized publisher-customized digital learning technology. A correlation was calculated between the students’ online total score of assigned material and the total of five exam scores. This correlation was .68, p < .001 suggesting a strong relationship between the completion of online learning activities and exam performance.

But why? Obviously, practice is part of the equation. Students who do the homework tend to do better in classes in general. I asked Danae what she thought the big drivers were beyond that. She cited the feedback to faculty and student engagement. The product seemed to succeed in getting students engaged, from her perspective. When pressed about the degree to which the adaptive component of the product made a difference, she guessed that it wasn’t as big a factor. “My gut tells me that it is less about the personalization,” she said. But then she added that the personalization may have helped to drive student engagement by making the students feel like the content was tailored to their needs. “I think personalization is the part that appeals to the students.” This raises the question about the degree to which any gains that we see added to an adaptive product may be because of a digital…er…analog to teaching presence and as opposed to the software’s real ability to adapt to individual student needs and capabilities.

Second, I asked Danae to characterize how much she thinks adopting MyPsychLab would have driven improvements had it been added to the original class before the redesign. Her instinct was not nearly as much, which is my instinct too. We don’t have numbers to separate the impact of the practice from the impact of the tailored instruction that resulted from having the student data in the product. Nor do we know how much student engagement with the product was impacted by the fact that it was integrated into the whole course redesign. These would be important questions to answer before we can have a clear and nuanced answer to the question of the product’s “efficacy.” Efficacious under what circumstances?

Finally, I’d like to return to David Wiley’s question about cost as a proxy for access and its impact on efficacy. Danae was traveling and didn’t have access to the course materials cost information when I reached her by email today, but she was confident that the cost had not gone up significantly and thought it might have actually gone done post-redesign. (And no, Pearson did not subsidize the cost of MyPsychLab to the students.) So we have no before/after data from which we can make inferences regarding the impact of cost on student outcomes. But it’s possible that MSU could have had a more significant impact on its DFW rate had the total cost to the students been lower. It’s also worth noting that MSU expected to increase enrollment by 72 students annually but actually saw a decline of enrollment by 126 students, which impacted their ability to deliver decreased costs to the institution. Would they have seen different enrollments had the curricular materials been less expensive? Or free? We don’t know. But this raises the point that efficacy cannot be reduced to one aggregate number. Improving student aggregate test scores and reducing the number of students who fail or withdraw are two different goals which certainly need to be measured differently and probably need different sorts of interventions to achieve.

Postscript

After this post went live, Danae shared some data with me from the semesters after the NCAT report was published. As it turns out, the course did see significant reduction in its DFW rates and enrollments bounced back over the course of several semesters. You can read about the details, as well as possible explanations, here.

The post Efficacy, Adaptive Learning, and the Flipped Classroom appeared first on e-Literate.

Tweaking

Jonathan Lewis - Wed, 2014-04-02 11:24

The following question came up on OTN recently:

Which one gives better performance? Could please explain.

1) nvl( my_column, ‘N’) <> ‘Y’

2) nvl( my_column, ‘N’) = ‘N’

It’s a question that can lead to a good 20 minute discussion – if you were in some sort of development environment and had a fairly free hand to do whatever you wanted.

The most direct answer is that you could expect the performance to be the same whichever option you chose – but the results might be different, of course, unless you had a constraint on my_column that ensured that it would hold only null, ‘Y’, or ‘N’.  (Reminder:  the constraint check (my_column in (‘Y’,’N’) will allow nulls in the column).

On the other hand, if you create a function-based index on nvl(my_column,’N’) then the second option would give the optimizer the opportunity to use the index – which could make it the more efficient option if a sufficiently small quantity of the data from a large table matched the predicate. Of course in this case you would need a histogram on the hidden column definition supporting the index so that the optimizer could detect the data skew.

But if a function-based index is a step in the right direction it’s worth thinking about how to pick the best function-based index.


create index t1_i1 on t1(case my_column when 'Y' then null else 'N' end);

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1');

select * from t1 where case my_column when 'Y' then null else 'N' end = 'N';

Provided you can change the queries that use the original predicate, you can create a function-based index that is the smallest possible index to assist your query, and you won’t need a histogram to allow the optimizer to get the correct cardinality since there will be just one distinct value in the index (and the underlying column).

It’s possible to go a little further – what if the typical query that started this discussion was really something like:


select  *
from    t1
where   nvl(my_column, 'N') = 'N'
and     other_column = {some value}

If the nvl() predicate always returned a small fraction of the data we might engineer an index to handle the requirement of the entire where clause:


create index t1_i2 on t1(case my_column when 'Y' then null else other_column end);
execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all hidden columns size 254');

In this case you might need a histogram on the hidden column definition supporting the index, which is why I’ve changed my method_opt. We’ve constructed an index that is the smallest possible index that will satisfy the query as precisely as possible while giving the optimizer the best chance of producing accurate cardinality estimates. [Update: until the appearance of Kirill's comment (#1) I forgot to point out that once again you have to change the original SQL from using the nvl() expression to using a case expression that matches the index.]

Footnote

It would be easy to fall into the trap of playing around with this type of tweaking every time you hit a performance problem. Bear in mind, though, that even indexes which are as small and accurately targeted as possible can introduce various types of overhead (contention, cost of gathering stats, instability of histogram collection); so always try to think through the potential costs as well as the benefits of such an approach - is it really worth the effort.

It’s also worth noting that things don’t always work the way you expect.

Finally, of course, you might choose to create “proper” virtual columns in 11g so that you can refer to them by name, rather than handling them indirectly as you have to when they are “undocumented” hidden columns generated by function-based indexes.

 


Tweaking

Jonathan Lewis - Wed, 2014-04-02 11:24

The following question came up on OTN recently:

Which one gives better performance? Could please explain.

1) nvl( my_column, ‘N’) <> ‘Y’

2) nvl( my_column, ‘N’) = ‘N’

It’s a question that can lead to a good 20 minute discussion – if you were in some sort of development environment and had a fairly free hand to do whatever you wanted.

The most direct answer is that you could expect the performance to be the same whichever option you chose – but the results might be different, of course, unless you had a constraint on my_column that ensured that it would hold only null, ‘Y’, or ‘N’.  (Reminder:  the constraint check (my_column in (‘Y’,’N’) will allow nulls in the column).

On the other hand, if you create a function-based index on nvl(my_column,’N’) then the second option would give the optimizer the opportunity to use the index – which could make it the more efficient option if a sufficiently small quantity of the data from a large table matched the predicate. Of course in this case you would need a histogram on the hidden column definition supporting the index so that the optimizer could detect the data skew.

But if a function-based index is a step in the right direction it’s worth thinking about how to pick the best function-based index.


create index t1_i1 on t1(case my_column when 'Y' then null else 'N' end);

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1');

select * from t1 where case my_column when 'Y' then null else 'N' end = 'N';

Provided you can change the queries that use the original predicate, you can create a function-based index that is the smallest possible index to assist your query, and you won’t need a histogram to allow the optimizer to get the correct cardinality since there will be just one distinct value in the index (and the underlying column).

It’s possible to go a little further – what if the typical query that started this discussion was really something like:


select  *
from    t1
where   nvl(my_column, 'N') = 'N'
and     other_column = {some value}

If the nvl() predicate always returned a small fraction of the data we might engineer an index to handle the requirement of the entire where clause:


create index t1_i2 on t1(case my_column when 'Y' then null else other_column end);
execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all hidden columns size 254');

In this case you might need a histogram on the hidden column definition supporting the index, which is why I’ve changed my method_opt. We’ve constructed an index that is the smallest possible index that will satisfy the query as precisely as possible while giving the optimizer the best chance of producing accurate cardinality estimates. [Update: until the appearance of Kirill's comment (#1) I forgot to point out that once again you have to change the original SQL from using the nvl() expression to using a case expression that matches the index.]

Footnote

It would be easy to fall into the trap of playing around with this type of tweaking every time you hit a performance problem. Bear in mind, though, that even indexes which are as small and accurately targeted as possible can introduce various types of overhead (contention, cost of gathering stats, instability of histogram collection); so always try to think through the potential costs as well as the benefits of such an approach - is it really worth the effort.

It’s also worth noting that things don’t always work the way you expect.

Finally, of course, you might choose to create “proper” virtual columns in 11g so that you can refer to them by name, rather than handling them indirectly as you have to when they are “undocumented” hidden columns generated by function-based indexes.

 


Organizations using data to track environmental change

Chris Foot - Wed, 2014-04-02 10:58

In light of a study recently released by the Intergovernmental Panel on Climate Change, the database administration needs of public agencies and organizations are expected to expand significantly. As it was industrialization and innovation that incited this worldwide issue, the Internet of Things will continue to be used to identify the detrimental effects climate change has on particular ecosystems and economies of the world. 

Patrick Thibodeau, a contributor to Computerworld, claimed that the IPCC's study acknowledged the importance of sensor networks to monitor the shifting global environment. Potentially, these devices could help government officials anticipate droughts, floods and natural disasters caused by rising temperatures. In addition, it is hoped that the mechanisms will identify ways to preserve water and food supplies as well as determine methods for reducing energy consumption. 

If public authorities choose to acknowledge the recommendations of the IPCC, the influx of new data derived from the IoT is sure to increase network traffic, requiring the expertise of remote database support to ensure that all analytics programs are running efficiently. As it's somewhat ambiguous as to how these sensors will be deployed, the kinds of avenues through which information flows into networks may pose as a challenge to in-house IT departments.

An example of a new innovation 
The types of devices the government and non-profit environmental agencies use are quite variable. Some may track the shifting tides across the Atlantic and the Pacific while others will determine the acidity of farming soil. If all the data collected by these devices is assembled onto a single server, outsourced database experts may be consulted to mitigate it all. It looks as if scientists have already taken the first step. 

According to Space Daily, engineers from Europe developed the Sentinel-5 instrument, a mechanism which allows the continent's Copernicus program to monitor air quality around the globe. The article noted that atmospheric pollution is linked to millions of deaths around the world. 

"The readings will help to both monitor and differentiate between natural and human-produced emissions, providing new insight on the human impact on climate," noted the news source. 

Amassing and translating such an incredible amount of data will most likely necessitate the expertise of remote DBA to ensure that networks don't crash or overload. It's hoped that Copernicus, the world's first operational environmental surveillance system, will provide scientists with specific insight on how the earth's population can reduce emissions. 

The Art of Easy: Make It Easy To Deliver Great Customer Service Experiences (Part 1 of 6)

Linda Fishman Hoyle - Wed, 2014-04-02 10:54

A Guest Post by JP Saunders, Product Strategist for Oracle CX Service offerings

Modern Customer Service is helping businesses to differentiate and grow in today’s competitive market. Why? Because your success gets accelerated when you take care of the people who make you successful. Modern Customer Service is about just that, taking care of your customers, and your people that support them. To do that, you need to make it EASY to deliver great experiences, time and time again.

Businesses that make it “look” easy stand out, because they consistently deliver experiences so memorable that their customers feel compelled to share them. But ask any of them how they do it, and you will quickly learn that the ART of EASY really isn’t EASY at all―or we would all be doing it!

The most common business challenge stems from underestimating the complexity of simplicity, which can be devastating to the customer’s experience. EASY is an ART. And it requires a deep understanding to get the right results―understanding of your business and of your customer, which can be applied in your cultural skills. Your canvas is the combination of every engagement channel you have where the picture of your brand gets painted. Your pallet of colors is the collection of your people, knowledge and data. Your brushes are the technologies that blend all together. In order to make “EASY”, beautiful and repeatable, you need solutions that provide a paint by numbers approach.

In this six part blog series, we will help guide you in managing the complexity of your customer service business challenges, while avoiding the common pitfalls, and deliver the type of great experiences that your customers will want to share with others. Experiences that are easily repeatable by you, and hard for your competitors to replicate. We will show you how Oracle Service Cloud empowers your business to care, take action and succeed in your Web, Contact Center, Knowledge, Policy, and Cloud customer service initiatives, to become a Modern Customer Service organization.

Dealing with technical questions about #Oracle

The Oracle Instructor - Wed, 2014-04-02 10:51

During the OCM Preparation Workshop today in Bucharest, we got into a discussion about parallel inserts vs. direct load with append hint, where I said something like: “I think a parallel insert is always* also a direct load.” Most attendees were satisfied with that, but some still looked a bit sceptical. And I was also not 100% sure about it. While the others practiced, I did a brief research and came up with this answer from Tom Kyte and quoted him: “Parallel is always a direct path, if you go parallel, you will be appending.” Everybody in the room was immediately convinced – only I felt a bit uncomfortable with the outcome myself. After all, I don’t want to teach people to blindly follow my or any other guys claims without checking it if possible. And as with so many other claims about Oracle, it is possible to prove (or falsify) it with little effort:

 

SQL> grant dba to adam identified by adam;

Grant succeeded.

SQL> connect adam/adam
Connected.

SQL> create table t as select 'TEST' as col1 from dual connect by level<=1e6; 
Table created. 

SQL> select bytes/1024/1024 from user_segments;

BYTES/1024/1024
---------------
             13

SQL> delete from t;

1000000 rows deleted.

SQL> commit;

Commit complete.

SQL> insert into t select 'TEST' as col1 from dual connect by level<=1e6; 
1000000 rows created. 

SQL> commit;

Commit complete.

SQL> select bytes/1024/1024 from user_segments;

BYTES/1024/1024
---------------
             13

The above shows that a conventional insert reuses the empty space before the high water mark of the table and does not allocate additional extents. Now I’m going to do a parallel insert:

SQL> delete from t;

1000000 rows deleted.

SQL> commit;

Commit complete.
SQL> alter table t parallel;

Table altered.

SQL> alter session enable parallel dml;

Session altered.

SQL> insert into t select 'TEST' as col1 from dual connect by level<=1e6; 
1000000 rows created.

SQL> commit;

Commit complete.

SQL> select bytes/1024/1024 from user_segments;

BYTES/1024/1024
---------------
         24.375

QED! Obviously, the parallel insert appended rows after the high water mark – it was a direct load, in other words. That is in itself not so spectacular, but that little anecdote illustrates the in my opinion proper way to deal with technical questions about Oracle: Always try to back up your assumptions respectively claims with proof instead of relying only on (your or others) authority.

In other words: Don’t believe it, test it! :-)

*Addendum: Never say always, I always say: If you include the NOAPPEND hint to the parallel insert, it will not be done as a direct load. Thank you guys for pointing that out!


Tagged: 3e
Categories: DBA Blogs