Skip navigation.

Feed aggregator

OTN Tour of Latin America 2015 : UYOUG, Uruguay – Day 1

Tim Hall - Mon, 2015-08-03 15:53

After 15 hours of sleep I still managed to feel tired. :) I went for breakfast at 6:30, then started to feel a little weird, so I took some headache pills and headed back to bed for a hour before meeting up with Debra and Mike to head down to the venue for the first day of the UYOUG leg of the tour…

The order of events went like this:

  • Pablo Ciccarelo started with an introduction OTN and the ACE program, which was in Spanish, so I ducked out of it. :)
  • Mike Dietrich speaking about “How Oracle Single/Multitenant will change a DBA’s life”.
  • Me with “Pluggable Databases : What they will break and why you should use them anyway!” There was some crossover with Mike’s session, but we both emphasised different things, which is interesting in itself. :)
  • Debra Lilley with “PaaS4SaaS”. The talk focused on a POC for using PaaS to extend the functionality of Fusion Apps, which is a SaaS product.
  • Me with “Oracle Database Consolidation : It’s not all about Oracle database 12c”. I think this is the least technical talk I’ve ever done and that makes me rather nervous. Technical content and demos are a reassuring safety blanket for me, so having them taken away feels a bit like being naked in public (why am I now thinking of Bjoern?). The session is a beginner session, so I hope people didn’t come expecting something more than I delivered. See, I’m paranoid already!
  • Mike Dietrich on “Simple Minimal Downtime Migration to Oracle 12c using Full Transportable Export/Import”. I think I’ve used every feature discussed in this session, but I’ve never used them all together in this manner. I think I may go back to the drawing board for one of the migrations I’ve got coming up in the next few months.
  • Debra Lilley with “Are cloud apps really ready?”. There was some similarity between the message Debra was putting out here and some of the stuff I spoke about in my final talk.
  • Me with “It’s raining data! Oracle Databases in the Cloud”. This was also not a heavy technical session, but because so few people have experience of running databases in the cloud at the moment, I think it has a wider appeal, so I’m not so paranoid about the limited technical content.

So that was the first day of the UYOUG conference done. Tomorrow is an easy day for me. I’ve got a panel session in the middle of the day, then I’m done. :)

Thanks to everyone who came to my sessions. I hope you found them useful.

Having slept through yesterday’s social event, I will be going out to get some food tonight. They eat really late here, so by the time we get some food I’ll probably be thinking about bed. :)

Cheers

Tim…

OTN Tour of Latin America 2015 : UYOUG, Uruguay – Day 1 was first posted on August 3, 2015 at 10:53 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Using a Tomcat provided buildpack in Bluemix

Pas Apicella - Mon, 2015-08-03 15:48
By default if you push a java application into public Bluemix you will use the Liberty java buildpack. If you want to use tomcat you can do that as follows.

1. Show the buildpacks available as follows

> cf buildpacks

2. The buildpack which uses Tomcat is as follows

java_buildpack

3. Specify you would like to use the buildpack as shown below when using

cf push pas-props -d mybluemix.net -i 1 -m 256M -b java_buildpack -p props.war

Example:

pas@Pass-MacBook-Pro-2:~/bluemix-apps/simple-java$ cf push pas-props -d mybluemix.net -i 1 -m 256M -b java_buildpack -p props.war
Creating app pas-props in org pasapi@au1.ibm.com / space dev as pasapi@au1.ibm.com...
OK

Creating route pas-props.mybluemix.net...
OK

Binding pas-props.mybluemix.net to pas-props...
OK

Uploading pas-props...
Uploading app files from: props.war
Uploading 2.9K, 6 files
Done uploading
OK

Starting app pas-props in org pasapi@au1.ibm.com / space dev as pasapi@au1.ibm.com...
-----> Downloaded app package (4.0K)
-----> Java Buildpack Version: v3.0 | https://github.com/cloudfoundry/java-buildpack.git#3bd15e1
-----> Downloading Open Jdk JRE 1.8.0_51 from https://download.run.pivotal.io/openjdk/lucid/x86_64/openjdk-1.8.0_51.tar.gz (11.6s)
       Expanding Open Jdk JRE to .java-buildpack/open_jdk_jre (1.2s)
-----> Downloading Tomcat Instance 8.0.24 from https://download.run.pivotal.io/tomcat/tomcat-8.0.24.tar.gz (2.4s)
       Expanding Tomcat to .java-buildpack/tomcat (0.1s)
-----> Downloading Tomcat Lifecycle Support 2.4.0_RELEASE from https://download.run.pivotal.io/tomcat-lifecycle-support/tomcat-lifecycle-support-2.4.0_RELEASE.jar (0.1s)
-----> Downloading Tomcat Logging Support 2.4.0_RELEASE from https://download.run.pivotal.io/tomcat-logging-support/tomcat-logging-support-2.4.0_RELEASE.jar (0.4s)
-----> Downloading Tomcat Access Logging Support 2.4.0_RELEASE from https://download.run.pivotal.io/tomcat-access-logging-support/tomcat-access-logging-support-2.4.0_RELEASE.jar (0.4s)

-----> Uploading droplet (50M)

0 of 1 instances running, 1 starting
1 of 1 instances running

App started


OK

App pas-props was started using this command `JAVA_HOME=$PWD/.java-buildpack/open_jdk_jre JAVA_OPTS="-Djava.io.tmpdir=$TMPDIR -XX:OnOutOfMemoryError=$PWD/.java-buildpack/open_jdk_jre/bin/killjava.sh -Xmx160M -Xms160M -XX:MaxMetaspaceSize=64M -XX:MetaspaceSize=64M -Xss853K -Daccess.logging.enabled=false -Dhttp.port=$PORT" $PWD/.java-buildpack/tomcat/bin/catalina.sh run`

Showing health and status for app pas-props in org pasapi@au1.ibm.com / space dev as pasapi@au1.ibm.com...
OK

requested state: started
instances: 1/1
usage: 256M x 1 instances
urls: pas-props.mybluemix.net
last uploaded: Mon Aug 3 21:40:36 UTC 2015

     state     since                    cpu    memory           disk           details
#0   running   2015-08-03 02:41:47 PM   0.0%   136.7M of 256M   125.7M of 1Ghttp://feeds.feedburner.com/TheBlasFromPas
Categories: Fusion Middleware

History Lesson

Floyd Teter - Mon, 2015-08-03 12:53
I'm a student of history.  There is so much to be learned from it.  Today's lesson comes from NASA and relates directly to enterprise software projects.

From 1992 to 1999, NASA launched 16 major missions under the umbrella of the "Faster, Better, Cheaper" or "FBC" program umbrella.  These unmanned space exploration missions included five trips to Mars, one to the Moon, four Earth-orbiting satellites and an asteroid rendezvous.  10 of the 15 missions were great successes, including:
  • The NEAR Earth Asteroid Rendezvous (NEAR)
  • The Pathfinder mission to Mars
  • The Stardust mission that collected, analyzed and returned comet tail particles to Earth
The nine successful FBC missions started with tight budgets, tight scopes, and tight schedules. They all delivered early and under budget.

So long as NASA stuck to the core principles of FBC, the program was a great success:  9 missions successfully executed in seven years.  By comparison the Cassini mission, while also very successful, took over 15 years to execute.  And all 10 successful missions were completed for a fraction of the cost of the Cassini mission.  The FBC program came to a grinding halt when NASA strayed from the core ideas that made the program work:  the failed Mars Polar Lander and the Mars Climate Observer came from the latter part of the FBC program.

Let's look at the core principles that made FBC successful:
  • Do it wrong:  Alexander Laufer and Edward Hoffman explained in a 1998 report entitled "Ninety-Nine Rules for Managing Faster, Better, Cheaper Projects" that in order to do things quickly and right, you have to be willing to do it wrong first.  Experience is the best teacher.
  • Reject good ideas:  NEAR project manager Thomas Coughlin had no shortage of well-meaning good ideas for features, parts and functions to add to the spacecraft.   Essentially all stayed on the cutting room floor.  Reject good ideas and stick to your original scope.
  • Simplify and accelerate everything:  the NEAR project used a 12-line project schedule for the entire mission.  That's right - 12 lines.  Progress reports were limited to three minutes.  If we can build spacecraft with a 12-line project schedule, tell me again why our enterprise project schedules run multiple pages?
  • Limit innovation by keeping it relevant.  While innovation is always cool, it's not relevant if it does not contribute meaningfully to your project's objectives.  Shipping something that actually does something well is much better than shipping something built on the newest technology that is complex to use or fails to perform reliably in a multitude of circumstances.
  • You can't inspect quality into the system.  NASA's failure to stick with this principle lead to the poor ending for the FBC program.  To a great degree, the Mars Pathfinder was a success at JPL because the project was so small that it flew under the radar - no significant administrative oversight.  When FBC oversight increased after 1999 at all NASA centers, the successes stopped coming.  You can put the clues together here, can't you?
Do you recognize the themes here?  Simplicity.  Restraint.  Freedom to act and take risks within tight constraints.  The combination led to some elegant and highly successful projects.

And, by the way, the recent New Horizons mission sending us pictures and data from Pluto?  Lots of heritage from FBC.  So, yes, these ideas still work...for missions much more complex than enterprise software.

So, you want to #beat39 with your enterprise software projects?  This history lesson is a great place to start.

PeopleSoft Spotlight Series: Fluid

Jim Marion - Mon, 2015-08-03 11:03

Have you seen the new PeopleTools Spotlight series? I just finished the Developing Fluid Applications lesson. This 1-hour video walks you through creating a fluid component, including some of the special CSS style classes, such as psc_column-2. The example is very well done and should get new fluid developers headed down the right path.

Open last one-note page

Laurent Schneider - Mon, 2015-08-03 10:52

If you got a one-note document, you may want to automatically go to the last page. This is possible with powershell.

First you create a ComObject. There are incredibly many ComObject that could be manipulated in powershell.


$o = New-Object -ComObject OneNote.Application

Now it get’s a bit confusing. First you open your document


[ref]$x = ""
$o.OpenHierarchy("Z:\Reports.one", "", $x, "cftNone")

Now you get the XML


$o.GetHierarchy("", "hsPages", $x)

With the XML, you select the last page. For instance :


$p = (([xml]($x.value)).Notebooks.OpenSections.Section.Page | select -last 1).ID

And from the id, you generate an URL the GetHyperlinkToObject.


[ref]$h = ""
$o.GetHyperlinkToObject($p,"",$h)

Now we can open the url onenote:///Z:\Reports.one#W31&section-id={12345678-1234-1234-123456789ABC}&page-id={12345678-1234-1234-123456789ABC}&end


start-process $h.value

Provisioning Virtual ASM (vASM)

Steve Karam - Mon, 2015-08-03 06:30
Create vFiles

In this post, we’re going to use Delphix to create a virtual ASM diskgroup, and provision a clone of the virtual ASM diskgroup to a target system. I call it vASM, which is pronounced “vawesome.” Let’s make it happen.

She’s always hungry. She always needs to feed. She must eat. – Gollum

Most viewers assume Gollum was talking about Shelob the giant spider here, but I have it on good authority that he was actually talking about Delphix. You see, Delphix (Data tamquam servitium in trinomial nomenclature) is the world’s most voracious datavore. Simply put, Delphix eats all the data.

Now friends of mine will tell you that I absolutely love to cook, and they actively make it a point to never talk to me about cooking because they know I’ll go on like Bubba in Forrest Gump and recite the million ways to make a slab of meat. But if there’s one thing I’ve learned from all my cooking, it’s that it’s fun to feed people who love to eat. With that in mind, I went searching for new recipes that Delphix might like and thought, “what better meal for a ravenous data muncher than an entire volume management system?”

vASM

Delphix ArchitectureIn normal use, Delphix links to an Oracle database and ingests changes over time by using RMAN “incremental as of SCN” backups, archive logs, and online redo. This creates what we call a compressed, deduped timeline (called a Timeflow) that you can provision as one or more Virtual Databases (VDBs) from any points in time.

However, Delphix has another interesting feature known as AppData, which allows you to link to and provision copies of flat files like unstructured files, scripts, software binaries, code repositories, etc. It uses rsync to build a Timeflow, and allows you to provision one or more vFiles from any points in time. But on top of that (and even cooler in my opinion), you have the ability to create “empty vFiles” which amounts to an empty directory on a system; except that the storage for the directory is served straight from Delphix. And it is this area that serves as an excellent home for ASM.

We’re going to create an ASM diskgroup using Delphix storage, and connect to it with Oracle’s dNFS protocol. Because the ASM storage lives completely on Delphix, it takes advantage of Delphix’s deduplication, compression, snapshots, and provisioning capabilities.

Some of you particularly meticulous (read: cynical) readers may wonder about running ASM over NFS, even with dNFS. I’d direct your attention to this excellent test by Yury Velikanov. Of course, your own testing is always recommended.

I built this with:

  • A Virtual Private Cloud (VPC) in Amazon Web Services
  • Redhat Enterprise Linux 6.5 Source and Target servers
    • Oracle 11.2.0.4 Grid Infrastructure
    • 11.2.0.4 Oracle Enterprise Edition
  • Delphix Engine 4.2.4.0
  • Alchemy
Making a vASM Diskgroup

Before we get started, let’s turn on dNFS while nothing is running. This is as simple as using the following commands on the GRID home:

[oracle@ip-172-31-0-61 lib]$ cd $ORACLE_HOME/rdbms/lib
[oracle@ip-172-31-0-61 lib]$ pwd
/u01/app/oracle/product/11.2.0/grid/rdbms/lib
[oracle@ip-172-31-0-61 lib]$ make -f ins_rdbms.mk dnfs_on
rm -f /u01/app/oracle/product/11.2.0/grid/lib/libodm11.so; cp /u01/app/oracle/product/11.2.0/grid/lib/libnfsodm11.so /u01/app/oracle/product/11.2.0/grid/lib/libodm11.so
[oracle@ip-172-31-0-61 lib]$

Now we can create the empty vFiles area in Delphix. This can be done through the Delphix command line interface, API, or through the GUI. It’s exceedingly simple to do, requiring only a server selection and a path.

Create vFiles Choose Path for vASM disks Choose Delphix Name Set Hooks Finalize

Let’s check our Linux source environment and see the result:

[oracle@ip-172-31-0-61 lib]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvda1            9.8G  5.0G  4.3G  54% /
tmpfs                 7.8G   94M  7.7G   2% /dev/shm
/dev/xvdd              40G   15G   23G  39% /u01
172.31.7.233:/domain0/group-35/appdata_container-32/appdata_timeflow-44/datafile
                       76G     0   76G   0% /delphix/mnt

Now we’ll create a couple ASM disk files that we can add to an ASM diskgroup:

[oracle@ip-172-31-0-61 lib]$ cd /delphix/mnt
[oracle@ip-172-31-0-61 mnt]$ truncate --size 20G disk1
[oracle@ip-172-31-0-61 mnt]$ truncate --size 20G disk2
[oracle@ip-172-31-0-61 mnt]$ ls -ltr
total 1
-rw-r--r--. 1 oracle oinstall 21474836480 Aug  2 19:26 disk1
-rw-r--r--. 1 oracle oinstall 21474836480 Aug  2 19:26 disk2

Usually the “dd if=/dev/zero of=/path/to/file” command is used for this purpose, but I used the “truncate” command. This command quickly creates sparse files that are perfectly suitable.

And we’re ready! Time to create our first vASM diskgroup.

SQL> create diskgroup data
  2  disk '/delphix/mnt/disk*';

Diskgroup created.

SQL> select name, total_mb, free_mb from v$asm_diskgroup;

NAME				 TOTAL_MB    FREE_MB
------------------------------ ---------- ----------
DATA				    40960      40858

SQL> select filename from v$dnfs_files;

FILENAME
--------------------------------------------------------------------------------
/delphix/mnt/disk1
/delphix/mnt/disk2

The diskgroup has been created, and we verified that it is using dNFS. But creating a diskgroup is only 1/4th the battle. Let’s create a database in it. I’ll start with the simplest of pfiles, making use of OMF to get the database up quickly.

[oracle@ip-172-31-0-61 ~]$ cat init.ora
db_name=orcl
db_create_file_dest=+DATA
sga_target=4G
diagnostic_dest='/u01/app/oracle'

And create the database:

SQL> startup nomount pfile='init.ora';
ORACLE instance started.

Total System Global Area 4275781632 bytes
Fixed Size		    2260088 bytes
Variable Size		  838861704 bytes
Database Buffers	 3422552064 bytes
Redo Buffers		   12107776 bytes
SQL> create database;

Database created.

I’ve also run catalog.sql, catproc.sql, and pupbld.sql and created an SPFILE in ASM, but I’ll skip pasting those here for at least some semblance of brevity. You’re welcome. I also created a table called “TEST” that we’ll try to query after the next part.

Cloning our vASM Diskgroup

Let’s recap what we’ve done thus far:

  • Created an empty vFiles area from Delphix on our Source server
  • Created two 20GB “virtual” disk files with the truncate command
  • Created a +DATA ASM diskgroup with the disks
  • Created a database called “orcl” on the +DATA diskgroup

In sum, Delphix has eaten well. Now it’s time for Delphix to do what it does best, which is to provision virtual objects. In this case, we will snapshot the vFiles directory containing our vASM disks, and provision a clone of them to the target server. You can follow along with the gallery images below.

Snapshot vASM area Provision vFiles Choose the Target and vASM Path Name the Target for vASM Set Hooks Finalize Target

Here’s the vASM location on the target system:

[oracle@ip-172-31-2-237 ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvda1            9.8G  4.1G  5.2G  44% /
tmpfs                 7.8G   92M  7.7G   2% /dev/shm
/dev/xvdd              40G   15G   23G  39% /u01
172.31.7.233:/domain0/group-35/appdata_container-34/appdata_timeflow-46/datafile
                       76G  372M   75G   1% /delphix/mnt

Now we’re talking. Let’s bring up our vASM clone on the target system!

SQL> alter system set asm_diskstring = '/delphix/mnt/disk*';

System altered.

SQL> alter diskgroup data mount;

Diskgroup altered.

SQL> select name, total_mb, free_mb from v$asm_diskgroup;

NAME				 TOTAL_MB    FREE_MB
------------------------------ ---------- ----------
DATA				    40960      39436

But of course, we can’t stop there. Let’s crack it open and access the tasty “orcl” database locked inside. I copied over the “initorcl.ora” file from my source so it knows where to find the SPFILE in ASM. Let’s start it up and verify.

SQL> startup;
ORACLE instance started.

Total System Global Area 4275781632 bytes
Fixed Size		    2260088 bytes
Variable Size		  838861704 bytes
Database Buffers	 3422552064 bytes
Redo Buffers		   12107776 bytes
Database mounted.
Database opened.
SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
+DATA/orcl/datafile/system.259.886707477
+DATA/orcl/datafile/sysaux.260.886707481
+DATA/orcl/datafile/sys_undots.261.886707485

SQL> select * from test;

MESSAGE
--------------------------------------------------------------------------------
WE DID IT!

As you can see, the database came online, the datafiles are located on our virtual ASM diskgroup, and the table I created prior to the clone operation came over with the database inside of ASM. I declare this recipe a resounding success.

Conclusion

A lot happened here. Such is the case with a good recipe. But in the end, my actions were deceptively simple:

  • Create a vFiles area
  • Create disk files and an ASM diskgroup inside of Delphix vFiles
  • Create an Oracle database inside the ASM diskgroup
  • Clone the Delphix vFiles to a target server
  • Bring up vASM and the Oracle database on the target server

With this capability, it’s possible to do some pretty incredible things. We can provision multiple copies of one or more vASM diskgroups to as many systems as we please. What’s more, we can use Delphix’s data controls to rewind vASM diskgroups, refresh them from their source diskgroups, and even create vASM diskgroups from cloned vASM diskgroups. Delphix can also replicate vASM to other Delphix engines so you can provision in other datacenters or cloud platforms. And did I mention it works with RAC? vFiles can be mounted on multiple systems, a feature we use for multi-tier EBS provisioning projects.

But perhaps the best feature is that you can use Delphix’s vASM disks as a failgroup to a production ASM diskgroup. That means that your physical ASM diskgroups (using normal or high redundancy) can be mirrored via Oracle’s built in rebalancing to a vASM failgroup comprised of virtual disks from Delphix. In the event of a disk loss on your source environment, vASM will protect the diskgroup. And you can still provision a copy of the vASM diskgroup to another system and force mount for the same effect we saw earlier.

There is plenty more to play with and discover here. But we’ll save that for dessert. Delphix is hungry again.

The post Provisioning Virtual ASM (vASM) appeared first on Oracle Alchemist.

Demo data

Jonathan Lewis - Mon, 2015-08-03 06:26

One of the articles I wrote for redgate’s AllthingsOracle site some time ago included a listing of the data distribution for some client data which I had camouflaged. A recent comment on the article asked how I had generated the data – of course the answer was that I hadn’t generated it, but I had done something to take advantage of its existence without revealing the actual values.  This article is just a little note showing what I did; it’s not intended as an elegant and stylish display of perfectly optimised SQL, it’s an example of a quick and dirty one-off  hack that wasn’t (in my case) a disaster to run.

I’ve based the demonstration on the view all_objects. We start with a simple query showing the distribution of the values of column object_type:


break on report
compute sum of count(*) on report

select
        object_type, count(*)
from
        all_objects
group by object_type
order by
        count(*) desc
;

OBJECT_TYPE           COUNT(*)
------------------- ----------
SYNONYM                  30889
JAVA CLASS               26447
...
JAVA RESOURCE              865
TRIGGER                    509
JAVA DATA                  312
...
JAVA SOURCE                  2
DESTINATION                  2
LOB PARTITION                1
EDITION                      1
MATERIALIZED VIEW            1
RULE                         1
                    ----------
sum                      76085

44 rows selected.

Starting from this data set I want 44 randomly generated strings and an easy way to translate the actual object type into one of those strings. There are various ways to do this but the code I hacked out put the original query into an inline view, surrounded it with a query that added a rownum to the result set to give each row a unique id, then used the well-known and much-loved  “connect by level” query against  dual to generate a numbered list of randomly generated strings as an inline view that I could use in a join to do the translation.


execute dbms_random.seed(0)

column random_string format a6

select
        generator.id,
        dbms_random.string('U',6)       random_string,
        sum_view.specifier,
        sum_view.ct                     "COUNT(*)"
from
        (
        select
                rownum  id
                from    dual
                connect by
                        level <= 100
        )       generator,
        (
        select
                rownum          id,
                specifier,
                ct
        from
                (
                select
                        object_type specifier, count(*) ct
                from
                        all_objects
                group by
                        object_type
                order by
                        count(*) desc
                )
        )       sum_view
where
        sum_view.id = generator.id
order by
        ct desc
;

        ID RANDOM SPECIFIER             COUNT(*)
---------- ------ ------------------- ----------
         1 BVGFJB SYNONYM                  30889
         2 LYYVLH JAVA CLASS               26447
...
         9 DNRYKC JAVA RESOURCE              865
        10 BEWPEQ TRIGGER                    509
        11 UMVYVP JAVA DATA                  312
...
        39 EYYFUJ JAVA SOURCE                  2
        40 SVWKRC DESTINATION                  2
        41 CFKBRX LOB PARTITION                1
        42 ZWVEVH EDITION                      1
        43 DDAHZX MATERIALIZED VIEW            1
        44 HFWZBX RULE                         1
                                      ----------
sum                                        76085

44 rows selected.

I’ve selected the id and original value here to show the correspondance, but didn’t need to show them in the original posting. I’ve also left the original (now redundant) “order by” clause in the main inline view, and you’ll notice that even though I needed only 44 distinct strings for the instance I produced the results on I generated 100 values as a safety margin for testing the code on a couple of other versions of Oracle.

A quick check for efficiency – a brief glance at the execution plan, which might have prompted me to add a couple of /*+ no_merge */ hints if they’d been necessary – showed that the work done was basically the work of the original query plus a tiny increment for adding the rownum and doing the “translation join”. Of course, if I’d then wanted to translate the full 76,000 row data set and save it as a table I’d have to join the result set above back to a second copy of all_objects – and it’s translating full data sets , while trying to deal with problems of referential integrity and correlation, where the time disappears when masking data.

It is a minor detail of this code that it produced fixed length strings (which matched the structure of the original client data). Had I felt the urge I might have used something like: dbms_random.string(‘U’,trunc(dbms_random.value(4,21))) to give me a random distribution of string lengths between 4 and 20. Getting fussier I might have extracted the distinct values for object_type and then generated a random string that matched the length of the value it was due to replace. Fussier still I might have generated the right number of random strings matching the length of the longest value, sorted the original and random values into alphabetical order to align them, then trimmed each random value to the length of the corresponding original value.

It’s extraordinary how complicated it can be to mask data realistically – even when you’re looking at just one column in one table. And here’s a related thought – if an important type of predicate in the original application with the original data is where object_type like ‘PACK%’ how do you ensure that your masked data is consistent with the data that would be returned by this query and how do you determine the value to use instead of “PACK” as the critical input when you run the critial queries against the masked data ? (Being privileged may give you part of the answer, but bear in mind that the people doing the testing with that data shouldn’t be able to see the unmasked data or any translation tables.)

 

 

 


We’d Appreciate Your Help

Duncan Davies - Mon, 2015-08-03 05:35

We’d really appreciate your help. But first, a bit of background:

vote-buttonThe Partner of the Year awards is an annual awards ceremony held by the UK Oracle User Group. It allows customers to show appreciation for partners that have provided a good service to them over the previous 12 months. As you would imagine, being voted best by end-users is a wonderful accolade.

If you’re the recipient of any Cedar service – and this can be consultancy, advisory, or even the free PeopleSoft and Fusion Weekly newsletters that we send out – we’d be very, very grateful if you gave 3 minutes of your time to vote for us.

We’re up for 3 awards, scroll down to see why we think we deserve your vote. In case you’re already convinced, here’s the drill:

What we’d like you to do:

1) Click here (opens in new window).

2) Fill in your company name, first name and surname. Then click Next.

3) Enter your email address in both fields, then click Next.

4) Select any checkboxes if you want ‘follow-up communications’ from the UKOUG, or leave all blank, and click Next.

5) Select Cedar Consulting from the drop-down, and click Next.

6) On the PeopleSoft page, select the Gold radio button on the Cedar Consulting row (note, it’s the 3rd column!), then click Next.

7) Repeat by selecting the Gold radio button on the Cedar Consulting row of the Managed Services page, then click Next.

8) Repeat by selecting the Gold radio button on the Cedar Consulting row of the Fusion page, then click Next.

9) Click Submit.

And you’re done. Thank you very much. If you want some gratitude for your 3 minutes of effort drop me an email and I’ll thank you personally!

Why Vote For Us?

PeopleSoft Partner of the Year

This year we have worked with over 40 PeopleSoft clients. We have completed major global implementations, have 15 PS v9.2 upgrade projects either complete or in progress as well as delivering a busy programme of user-focused events.  Our events included the 16th PeopleSoft Executive Dinner, PeopleSoft Optimisation, the PeopleSoft & Oracle Cloud Day plus this year’s Executive Forums.  We also presented at the UKOUG conference and SIG.

Oracle Cloud (Fusion / Taleo) Partner of the Year

Over the last few years, we have developed from a PeopleSoft implementer to a specialist provider of Oracle HR Cloud services. We have a completed Fusion implementation under our belt and are currently implementing multiple Fusion and Taleo projects for different clients.

Managed Service / Outsourcing Partner of the Year

Throughout last year, 20 PeopleSoft customers have trusted us to provide market-leading Managed Services. This makes us the largest PeopleSoft Managed Services provider in the UK. Add to this the ability for us to outsource work to our Cedar India office to deliver for clients at a lower price point and we have a strong set of offerings.

We’ve had great success in this competition in the last couple of years and would value your vote. Thanks for your time.


Data messes

DBMS2 - Mon, 2015-08-03 03:58

A lot of what I hear and talk about boils down to “data is a mess”. Below is a very partial list of examples.

To a first approximation, one would expect operational data to be rather clean. After all, it drives and/or records business transactions. So if something goes awry, the result can be lost money, disappointed customers, or worse, and those are outcomes to be strenuously avoided. Up to a point, that’s indeed true, at least at businesses large enough to be properly automated. (Unlike, for example — :) — mine.)

Even so, operational data has some canonical problems. First, it could be inaccurate; somebody can just misspell or otherwise botch an entry. Further, there are multiple ways data can be unreachable, typically because it’s:

  • Inconsistent, in which case humans might not know how to look it up and database JOINs might fail.
  • Unintegrated, in which case one application might not be able to use data that another happily maintains. (This is the classic data silo problem.)

Inconsistency can take multiple forms, including: 

  • Variant names.
  • Variant spellings.
  • Variant data structures (not to mention datatypes, formats, etc.).

Addressing the first two is the province of master data management (MDM), and also of the same data cleaning technologies that might help with outright errors. Addressing the third is the province of other data integration technology, which also may be what’s needed to break down the barriers between data silos.

So far I’ve been assuming that data is neatly arranged in fields in some kind of database. But suppose it’s in documents or videos or something? Well, then there’s a needed step of data enhancement; even when that’s done, further data integration issues are likely to be present.

All of the above issues occur with analytic data too. In some cases it probably makes sense not to fix them until the data is shipped over for analysis. In other cases, it should be fixed earlier, but isn’t. And in hybrid cases, data is explicitly shipped to an operational data warehouse where the problems are presumably fixed.

Further, some problems are much greater in their analytic guise. Harmonization and integration among data silos are likely to be much more intense. (What is one table for analytic purposes might be many different ones operationally, for reasons that might span geography, time period, or application legacy.) Addressing those issues is the province of data integration technologies old and new. Also, data transformation and enhancement are likely to be much bigger deals in the analytic sphere, in part because of poly-structured internet data. Many Hadoop and now Spark use cases address exactly those needs.

Let’s now consider missing data. In operational cases, there are three main kinds of missing data:

  • Missing values, as a special case of inaccuracy.
  • Data that was only collected over certain time periods, as a special case of changing data structure.
  • Data that hasn’t been derived yet, as the main case of a need for data enhancement.

All of those cases can ripple through to cause analytic headaches. But for certain inherently analytic data sets — e.g. a weblog or similar stream — the problem can be even worse. The data source might stop functioning, or might change the format in which it transmits; but with no immediate operations compromised, it might take a while to even notice. I don’t know of any technology that does a good, simple job of addressing these problems, but I am advising one startup that plans to try.

Further analytics-mainly data messes can be found in three broad areas:

  • Problems caused by new or changing data sources hit much faster in analytics than in operations, because analytics draws on a greater variety of data.
  • Event recognition, in which most of a super-high-volume stream is discarded while the “good stuff” is kept, is more commonly a problem in analytics than in pure operations. (That said, it may arise on the boundary of operations and analytics, namely in “real-time” monitoring.
  • Analytics has major problems with data scavenger hunts, in which business analysts and data scientists don’t know what data is available for them to examine.

That last area is the domain of a lot of analytics innovation. In particular:

  • It’s central to the dubious Gartner concept of a Logical Data Warehouse, and to the more modest logical data layers I advocate as alternative.
  • It’s been part of BI since the introduction of Business Objects’ “semantic layer”. (See, for example, my recent post on Zoomdata.)
  • It’s a big part of the story of startups such as Alation or Tamr.
  • In a failed effort, it was part of Greenplum’s pitch some years back, as an aspect of the “enterprise data cloud”.
  • It led to some of the earliest differentiated features at Gooddata.
  • It’s implicit in the some BI collaboration stories, in some BI/search integration, and in ClearStory’s “Data You May Like”.

Finally, suppose we return to the case of operational data, assumed to be accurately stored in fielded databases, with sufficient data integration technologies in place. There’s still a whole other kind of possible mess than those I cited above — applications may not be doing a good job of understanding and using it. I could write a whole series of posts on that subject alone … but it’s going slowly. :) So I’ll leave that subject area for another time.

Categories: Other

OTN Tour of Latin America 2015 : The Journey Begins – Arrival at Montevideo, Uruguay

Tim Hall - Mon, 2015-08-03 01:45

ace-directorAfter the quick flight to Montevideo, I was met by Edelwisse and Nelson. A couple of minutes later Mike Dietrich arrived. You know, that guy that pretends to understand upgrades! We drove over to the hotel, arriving at about 11:00. Check in was not until 15:00, so I had to wait a few minutes for them to prep my room. The others were going out to get some food, but I had a hot date with my bed. I got to my room, showered and hit the hay.

I was meant to meet up with the others at about 19:00 to get some food, but I slept through. In fact, I slept until about 04:00 the next day, which was about 15 hours. I think that may be a record… I’m feeling a bit punch-drunk now, but I’m sure once I start moving things will be fine…

Today is the first day of the tour proper. Fingers crossed…

Cheers

Tim…

OTN Tour of Latin America 2015 : The Journey Begins – Arrival at Montevideo, Uruguay was first posted on August 3, 2015 at 8:45 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

User Hash Support

Anthony Shorten - Sun, 2015-08-02 21:45

In Oracle Utilities Application Framework V4.x, a new column was added to the user object to add an additional layer of security. This field is a user hash that generates on the complete user object. The idea behind the hash is that when a user logs in a hash is calculated for the session and is checked against the user record registered in the system. If the user hash generated does not match the user hash recorded on the user object then the user object may not be valid so the user cannot login.

This hash is there to detect any attempt to alter the user definition using an invalid method. If there is an alteration was not using the provided interfaces (using the online or a Web Service) then the record cannot be trusted so the user cannot use that identity. The idea is that if someone "hacks" the user definition using an invalid method, the user object will become invalid and therefore effectively locked. It protects the integrity of the user definition.

This facility typically causes no issues but here are a few guidelines to use it appropriately:

  • The user object should only be modified using the online maintenance transaction, F1-LDAP job, user preferences maintenance or a Web Service against the user object. The user hash is regenerated correctly when a valid access method is used.
  • If you are loading new users from a repository, the user hash must be generated. It is recommended to use a Web Services based interface to the user object to load the users to avoid the hash becoming invalid.
  • If a user uses a valid identity and the valid password but gets a message Invalid Login then it is more likely the user hash compare has found an inconsistency. You might want to investigate this before resolving the user hash inconsistency.
  • The user hash is generated using the keystore key used by the product installation. If the keystore or values in the keystore are changed, you will need to regenerate ALL the hash keys.
  • There are two ways of addressing this use:
    • A valid administrator can edit the individual user object within the product and make a simple change to force the hash key to be regenerated.
    • Regenerate the hash keys globally using the commands outlined in the Security Guide. This should be done if it is a global issue or at least an issue for more than one user.

For more information about this facility and other security facilities, refer to the Security Guide shipped with your product.

Two great speakers coming up in fall for NEOOUG quarterly meetings

Grumpy old DBA - Sun, 2015-08-02 18:03
Full details out soon over at NEOOUG website but we have John King from Colorado speaking in September and Daniel Morgan speaking in November for our quarterly meetings.

These guys are both Oracle Ace Directors and great dynamic speakers it should be a great time.

Complete information including bio's and abstracts for the sessions should be out soon.

Thanks John
Categories: DBA Blogs

OTN Tour of Latin America 2015 : The Journey Begins – Buenos Aires Airport

Tim Hall - Sun, 2015-08-02 01:27

ace-directorThe flight from Paris to Buenos Aires was long, but relatively uneventful. One little patch of turbulence, then plain sailing.

For the main meal they didn’t have me down as vegetarian. I don’t know why I bother ordering special meals because the vast majority of the times I don’t get them. Interestingly, they did have a vegetarian breakfast for me, probably fixed one up after the dinner issue, but they gave it to the lady 2 seats away from me. She had seen the issue with the dinner and passed it across to me. In big letters on the tray it said 27J, which was my seat number, so I’m not quite sure why it was so difficult. I honestly think a lot of people look at me and think, “There is no way he is that fat and a vegetarian!”, so they give it to someone who looks suitably skinny… :)

I watched Insurgent, which was OK, then started to watch Fast & Furious 7, but couldn’t get into it on such a small screen. Amazingly, I did manage to catch small snatches of sleep, which was very welcome, interspersed with the obligatory periods of standing at the back of the plane pretending there aren’t loads of hours of sitting left.

So now I’m in Buenos Aires airport waiting to get back on to the same plane to fly the last 25 mins to Montevideo. I will be back in Buenos Aires in a couple of days, but I will be arriving by ferry next time! :)

Cheers

Tim…

OTN Tour of Latin America 2015 : The Journey Begins – Buenos Aires Airport was first posted on August 2, 2015 at 8:27 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

OTN Tour of Latin America 2015 : The Journey Begins – CDG Airport

Tim Hall - Sat, 2015-08-01 11:59

ace-directorI’ve been in Charles de Gaulle airport for about three hours now. Only another four to go… :)

I tried to record another technical video, but you can hear kids in the background. Now the timings are sorted, it should be pretty quick to re-record when I get to a hotel, so that’s good I guess. I’m not sure I can face doing another one today.

My YouTube channel is on 199 subscribers. About to ding to the magic 200. :)

Perhaps I should get the GoPro out and do some filming of the barren wasteland, which is the K gates in Terminal 2E.

Cheers

Tim…

OTN Tour of Latin America 2015 : The Journey Begins – CDG Airport was first posted on August 1, 2015 at 6:59 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Combining Spark Streaming and Data Frames for Near-Real Time Log Analysis & Enrichment

Rittman Mead Consulting - Sat, 2015-08-01 08:51

A few months ago I posted an article on the blog around using Apache Spark to analyse activity on our website, using Spark to join the site activity to some reference tables for some one-off analysis. In this article I’ll be taking an initial look at Spark Streaming, a component within the overall Spark platform that allows you to ingest and process data in near real-time whilst keeping the same overall code-based as your batch-style Spark programs.

NewImage

Like regular batch-based Spark programs, Spark Streaming builds on the concept of RDDs (Resilient Distributed Datasets) and provides an additional high-level abstraction called a “discretized stream” or DStream, representing a continuous stream of RDDs over a defined time period. In the example I’m going to create I’ll use Spark Streaming’s DStream feature to hold in-memory the last 24hrs worth of website activity, and use it to update a “Top Ten Pages” Impala table that’ll get updated once a minute.

NewImage

To create the example I started with the Log Analyzer example in the set of DataBricks Spark Reference Applications, and adapted the Spark Streaming / Spark SQL example to work with our CombinedLogFormat log format that contains two additional log elements. In addition, I’ll also join the incoming data stream with some reference data sitting in an Oracle database and then output a parquet-format file to the HDFS filesystem containing the top ten pages over that period.

The bits of the Log Analyzer reference application that we reused comprise of two scripts that compile into a single JAR file; a script that creates a Scala object to parse the incoming CombinedLogFormat log files, and other with the main program in. The log parsing object contains a single function that takes a set of log lines, then returns a Scala class that breaks the log entries down into the individual elements (IP address, endpoint (URL), referrer and so on). Compared to the DataBricks reference application I had to add two extra log file elements to the ApacheAccessLog class (referer and agent), and add some code in to deal with “-“ values that could be in the log for the content size; I also added some extra code to ensure the URLs (endpoints) quoted in the log matched the format used in the data extracted from our WordPress install, which stores all URLs with a trailing forward-slash (“/“).

package com.databricks.apps.logs
case class ApacheAccessLog(ipAddress: String, clientIdentd: String,
 userId: String, dateTime: String, method: String,
 endpoint: String, protocol: String,
 responseCode: Int, contentSize: Long, 
 referer: String, agent: String) {
}
 
object ApacheAccessLog {
val PATTERN = """^(\S+) (\S+) (\S+) \[([\w\d:\/]+\s[+\-]\d{4})\] "(\S+) (\S+) (\S+)" (\d{3}) ([\d\-]+) "([^"]+)" "([^"]+)"""".r
def parseLogLine(log: String): ApacheAccessLog = {
 val res = PATTERN.findFirstMatchIn(log)
 if (res.isEmpty) {
 ApacheAccessLog("", "", "", "","", "", "", 0, 0, "", "")
 }
 else {
 val m = res.get
 val contentSizeSafe : Long = if (m.group(9) == "-") 0 else m.group(9).toLong
 val formattedEndpoint : String = (if (m.group(6).charAt(m.group(6).length-1).toString == "/") m.group(6) else m.group(6).concat("/"))
 
 ApacheAccessLog(m.group(1), m.group(2), m.group(3), m.group(4),
 m.group(5), formattedEndpoint, m.group(7), m.group(8).toInt, contentSizeSafe, m.group(10), m.group(11))
 }
 }
}

The body of the main application script looks like this – I’ll go through it step-by-step afterwards:</>

package com.databricks.apps.logs.chapter1

import com.databricks.apps.logs.ApacheAccessLog
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SaveMode
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.streaming.{StreamingContext, Duration}

object LogAnalyzerStreamingSQL {
  val WINDOW_LENGTH = new Duration(86400 * 1000)
  val SLIDE_INTERVAL = new Duration(10 * 1000)

  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("Log Analyzer Streaming in Scala")
    val sc = new SparkContext(sparkConf)

    val sqlContext = new SQLContext(sc)
    import sqlContext.implicits._
    
    val postsDF = sqlContext.load("jdbc", Map(
                  "url" -> "jdbc:oracle:thin:blog_refdata/password@bigdatalite.rittmandev.com:1521:orcl",
                  "dbtable" -> "BLOG_REFDATA.POST_DETAILS"))
                  
    postsDF.registerTempTable("posts")

    val streamingContext = new StreamingContext(sc, SLIDE_INTERVAL)

    val logLinesDStream = streamingContext.textFileStream("/user/oracle/rm_logs_incoming")

    val accessLogsDStream = logLinesDStream.map(ApacheAccessLog.parseLogLine).cache()

    val windowDStream = accessLogsDStream.window(WINDOW_LENGTH, SLIDE_INTERVAL)

    windowDStream.foreachRDD(accessLogs => {
      if (accessLogs.count() == 0) {
        println("No logs received in this time interval")
      } else {
        accessLogs.toDF()

        // Filter out bots 
        val accessLogsFilteredDF = accessLogs
                                      .filter( r => ! r.agent.matches(".*(spider|robot|bot|slurp|bot|monitis|Baiduspider|AhrefsBot|EasouSpider|HTTrack|Uptime|FeedFetcher|dummy).*"))
                                      .filter( r => ! r.endpoint.matches(".*(wp-content|wp-admin|wp-includes|favicon.ico|xmlrpc.php|wp-comments-post.php).*")).toDF()
                                      .registerTempTable("accessLogsFiltered")
                                      
        val topTenPostsLast24Hour = sqlContext.sql("SELECT p.POST_TITLE, p.POST_AUTHOR, COUNT(*) as total FROM accessLogsFiltered a JOIN posts p ON a.endpoint = p.POST_SLUG GROUP BY p.POST_TITLE, p.POST_AUTHOR ORDER BY total DESC LIMIT 10 ")                 
        
        // Persist top ten table for this window to HDFS as parquet file
        
        topTenPostsLast24Hour.save("/user/oracle/rm_logs_batch_output/topTenPostsLast24Hour.parquet", "parquet", SaveMode.Overwrite)      
      }
    })

    streamingContext.start()
    streamingContext.awaitTermination()
  }
}

The application code starts then by importing Scala classes for Spark, Spark SQL and Spark Streaming, and then defines two variable that determine the amount of log data the application will consider; WINDOW_LENGTH (86400 milliseconds, or 24hrs) which determines the window of log activity that the application will consider, and SLIDE_INTERVAL, set to 60 milliseconds or one minute, which determines how often the statistics are recalculated. Using these values means that our Spark Streaming application will recompute every minute the top ten most popular pages over the last 24 hours.

package com.databricks.apps.logs.chapter1
import com.databricks.apps.logs.ApacheAccessLog
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SaveMode
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.streaming.{StreamingContext, Duration}
object LogAnalyzerStreamingSQL {
 val WINDOW_LENGTH = new Duration(86400 * 1000)
 val SLIDE_INTERVAL = new Duration(60 * 1000)

In our Spark Streaming application, we’re also going to load-up reference data from our WordPress site, exported and stored in an Oracle database, to add post title and post author values to the raw log entries that come in via Spark Streaming. In the next part of the script then we define a new Spark context and then a Spark SQL context off-of the base Spark context, then create a Spark SQL data frame to hold the Oracle-sourced WordPress data to later-on join to the incoming DStream data – using Spark’s new Data Frame feature and the Oracle JDBC drivers that I separately download off-of the Oracle website, I can pull in reference data from Oracle or other database sources, or bring it in from a CSV file as I did in the previous Spark example, to supplement my raw incoming log data. 

def main(args: Array[String]) {
 val sparkConf = new SparkConf().setAppName("Log Analyzer Streaming in Scala")
 val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
 import sqlContext.implicits._
 
 val postsDF = sqlContext.load("jdbc", Map(
 "url" -> "jdbc:oracle:thin:blog_refdata/password@bigdatalite.rittmandev.com:1521:orcl",
 "dbtable" -> "BLOG_REFDATA.POST_DETAILS"))
 
 postsDF.registerTempTable("posts")

Note also how Spark SQL lets me declare a data frame (or indeed any RDD with an associated schema) as a Spark SQL table, so that I can later run SQL queries against it – I’ll come back to this at the end).

Now comes the first part of the Spark Streaming code. I start by defining a new Spark Streaming content off of the same base Spark context that I created the Spark SQL one off-of, then I use that Spark Streaming context to create a DStream that reads newly-arrived files landed in an HDFS directory  – for this example I’ll manually copy the log files into an “incoming” HDFS directory, whereas in real-life I’d connect Spark Streaming to Flume using FlumeUtils for a more direct-connection to activity on the webserver. 

val streamingContext = new StreamingContext(sc, SLIDE_INTERVAL)
val logLinesDStream = streamingContext.textFileStream("/user/oracle/rm_logs_incoming")

Then I call the Scala “map” transformation to convert the incoming DStream into an ApacheAccessLog-formatted DStream, and cache this new DStream in-memory. Next and as the final part of this stage, I call the Spark Streaming “window” function which packages the input data into in this case a 24-hour window of data, and creates a new Spark RDD every SLIDE_INTERVAL – in this case 1 minute – of time.

val accessLogsDStream = logLinesDStream.map(ApacheAccessLog.parseLogLine).cache()
val windowDStream = accessLogsDStream.window(WINDOW_LENGTH, SLIDE_INTERVAL)

Now that Spark Streaming is creating RDDs for me to represent all the log activity over my 24 hour period, I can use the .foreachRDD control structure to turn that RDD into its own data frame (using the schema I’ve inherited from the ApacheAccessLog Scala class earlier on), and filter out bot activity and references to internal WordPress pages so that I’m left with actual page accesses to then calculate the top ten list from.

windowDStream.foreachRDD(accessLogs => {
 if (accessLogs.count() == 0) {
 println("No logs received in this time interval")
 } else {
 accessLogs.toDF().registerTempTable("accessLogs")
// Filter out bots 
 val accessLogsFilteredDF = accessLogs
 .filter( r => ! r.agent.matches(".*(spider|robot|bot|slurp|bot|monitis|Baiduspider|AhrefsBot|EasouSpider|HTTrack|Uptime|FeedFetcher|dummy).*"))
 .filter( r => ! r.endpoint.matches(".*(wp-content|wp-admin|wp-includes|favicon.ico|xmlrpc.php|wp-comments-post.php).*")).toDF()
 .registerTempTable("accessLogsFiltered")

Then, I use Spark SQL’s ability to join tables created against the windowed log data and the Oracle reference data I brought in earlier, to create a parquet-formatted file containing the top-ten most popular pages over the past 24 hours. Parquet is the default storage format used by Spark SQL and is suited best to BI-style columnar queries, but I could use Avro, CSV or another file format If I brought the correct library imports in.

val topTenPostsLast24Hour = sqlContext.sql("SELECT p.POST_TITLE, p.POST_AUTHOR, COUNT(*) as total FROM accessLogsFiltered a JOIN posts p ON a.endpoint = p.POST_SLUG GROUP BY p.POST_TITLE, P.POST_AUTHOR ORDER BY total DESC LIMIT 10 ") 
 
 // Persist top ten table for this window to HDFS as parquet file
 
 topTenPostsLast24Hour.save("/user/oracle/rm_logs_batch_output/topTenPostsLast24Hour.parquet", "parquet", SaveMode.Overwrite) 
 }
 })

Finally, the last piece of the code starts-off the data ingestion process and then continues until the process is interrupted or stopped.

streamingContext.start()
    streamingContext.awaitTermination()
  }
}

I can now go over to Hue and move some log files into the HDFS directory that the Spark application is running on, like this:

file_upload

Then, based on the SLIDE_INTERVAL I defined in the main Spark application earlier on (60 seconds, in my case) the Spark Streaming application picks up the new files and processes them, outputting the results as a Parquet file back on the HDFS filesystem (these two screenshots should display as animated GIFs)

spark_processing

So what to do with the top-ten pages parquet file that the Spark Streaming application creates? The most obvious thing to do would be to create an Impala table over it, using the schema metadata embedded into the parquet file, like this:

CREATE EXTERNAL TABLE rm_logs_24hr_top_ten
LIKE PARQUET '/user/oracle/rm_logs_batch_output/topTenPostsLast24Hour.parquet/part-r-00001.parquet'
STORED AS PARQUET
LOCATION '/user/oracle/rm_logs_batch_output/topTenPostsLast24Hour.parquet';

Then I can query the table using Hue again, or I can import the Impala table metadata into OBIEE and analyse it using Answers and Dashboards.

NewImage

So that’s a very basic example of Spark Streaming, and I’ll be building on this example over the new few weeks to add features such as persistent storing of all processed data, and classification and clustering the data using Spark MLlib. More importantly, copying files into HDFS for ingestion into Spark Streaming adds quite a lot of latency and it’d be better to connect Spark directly to the webserver using Flume or even better, Kafka – I’ll add examples showing these features in the next few posts in this series.

Categories: BI & Warehousing

RMAN -- 6 : RETENTION POLICY and CONTROL_FILE_RECORD_KEEP_TIME

Hemant K Chitale - Sat, 2015-08-01 08:12
Most people read the documentation on CONTROL_FILE_RECORD_KEEP_TIME and believe that this parameter *guarantees* that Oracle will retain backup records for that long.  (Some do understand that backup records may be retained longer, depending on the availability of slots (or "records") for the various types of metadata in the controlfile).

However, .... as you should know from real-world experience ... there is always a "BUT".

Please read Oracle Support Note "How to ensure that backup metadata is retained in the controlfile when setting a retention policy and an RMAN catalog is NOT used. (Doc ID 461125.1)" and Bug 6458068

Oracle may need to "grow" the controlfile when adding information about ArchiveLogs or BackupSets / BackupPieces.
An example is this set of entries that occurred when I had created very many archivelogs and backuppieces for them :
Trying to expand controlfile section 13 for Oracle Managed Files
Expanded controlfile section 13 from 200 to 400 records
Requested to grow by 200 records; added 9 blocks of records


To understand the contents of the controlfile see how this listing shows that I have space for 400 records of Backup Pieces and am currently using 232 records.  :

SQL> select * from v$controlfile_record_section where type like '%BACKUP%' order by 1;

TYPE RECORD_SIZE RECORDS_TOTAL RECORDS_USED FIRST_INDEX LAST_INDEX LAST_RECID
---------------------------- ----------- ------------- ------------ ----------- ---------- ----------
BACKUP CORRUPTION 44 371 0 0 0 0
BACKUP DATAFILE 200 245 159 1 159 159
BACKUP PIECE 736 400 232 230 61 261
BACKUP REDOLOG 76 215 173 1 173 173
BACKUP SET 40 409 249 1 249 249
BACKUP SPFILE 124 131 36 1 36 36

6 rows selected.

SQL>


However, if I start creating new Backup Pieces without deleting older ones (without Oracle auto-deleting older ones) and Oracle hits the allocation of 400 records, it may try to add new records.  Oracle then prints a message (as shown above) into the alert.log.  Oracle may overwrite records older than control_file_record_keep_time.  If necesssary, it tries to expand the controlfile. If, however, there is not enough filesystem space (or space in the raw device or ASM DiskGroup) to expand the controlfile, it may have to ovewrite some records from the controlfile.  If it has to overwrite records that are older than control_file_record_keep_time, it provides no warning.  However, if it has to overwrite records that are not older than the control_file_record_keep_time, it *does* write a warning to the alert.log

I don't want to violate the Oracle Support policy and quote from the Note and the Bug but I urge you to read both very carefully.  The Note has a particular line about whether there is a relationship between the setting of the control_file_record_time and the Retention Policy.  In the Bug, there is one particularly line about whether the algorithm to extend / reuse / purge records in the controlfile is or is not related to the Retention Policy.  So it IS important to ensure that you have enough space for the controlfile to grow in case it needs to expand space for these records.

Also, remember that not all Retention Policies are defined in terms of days.  Some may be defined in terms of REDUNDANCY (the *number* of Full / L0 backups that are not to be obsoleted).  This does NOT relate to the number of days because Oracle can't predict how many backups you run in a day / in a week / in a month.  Take an organisation with a small database and runs 3 Full / L0 backups per day versus another with a very large database that runs Full / L0 backup only once a fortnight !  How many days of Full / L0 backups would each have to retain if the REDUNDANCY is set to, say, 3 ?

.
.
.




Categories: DBA Blogs

OTN Tour of Latin America 2015 : The Journey Begins

Tim Hall - Sat, 2015-08-01 06:45

ace-directorI’m about to board a flight to Paris, where I will wait for 7 hours before starting my 14 hour flight to Montevideo, Uruguay. I think you can probably guess how I’m feeling at this moment…

Why won’t someone hurry up and invent a teleport device?

I will probably put out little posts like this along the way, just so friends and family know what is going on. It’s wrong to wish your life away, but I’m really not looking forward to the next 20+ hours…

Hopefully I will get power in Paris, so I can do some stuff on my laptop…

Cheers

Tim…

OTN Tour of Latin America 2015 : The Journey Begins was first posted on August 1, 2015 at 1:45 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

FASTSYNC Redo Transport for Data Guard in #Oracle 12c

The Oracle Instructor - Sat, 2015-08-01 04:11

FASTSYNC is a new LogXptMode for Data Guard in 12c. It enables Maximum Availability protection mode at larger distances with less performance impact than LogXptMode SYNC has had before. The old SYNC behavior looks like this:

LogXptMode=SYNC

LogXptMode=SYNC

The point is that we need to wait for two acknowledgements by RFS (got it & wrote it) before we can write the redo entry locally and get the transaction committed. This may slow down the speed of transactions on the Primary, especially with long distances. Now to the new feature:

LogXptMode=FASTSYNC

LogXptMode=FASTSYNC

Here, we wait only for the first acknowledgement (got it) by RFS before we can write locally. There is still a possible performance impact with large distances here, but it is less than before. This is how it looks implemented:

DGMGRL> show configuration;   

Configuration - myconf

  Protection Mode: MaxAvailability
  Members:
  prima - Primary database
    physt - (*) Physical standby database 

Fast-Start Failover: ENABLED

Configuration Status:
SUCCESS   (status updated 26 seconds ago)

DGMGRL> show database physt logxptmode
  LogXptMode = 'fastsync'
DGMGRL> exit
[oracle@uhesse ~]$ sqlplus sys/oracle@prima as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Sat Aug 1 10:41:27 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show parameter log_archive_dest_2

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_2		     string	 service="physt", SYNC NOAFFIRM
						  delay=0 optional compression=
						 disable max_failure=0 max_conn
						 ections=1 reopen=300 db_unique
						 _name="physt" net_timeout=30,
						 valid_for=(online_logfile,all_
						 roles)

My configuration uses Fast-Start Failover, just to show that this is no restriction. Possible but not required is the usage of FASTSYNC together with Far Sync Instances. You can’t have Maximum Protection with FASTSYNC, though:

DGMGRL> disable fast_start failover;
Disabled.
DGMGRL> edit configuration set protection mode as maxprotection;
Error: ORA-16627: operation disallowed since no standby databases would remain to support protection mode

Failed.
DGMGRL> edit database physt set property logxptmode=sync;
Property "logxptmode" updated
DGMGRL> edit configuration set protection mode as maxprotection;
Succeeded.

Addendum: As my dear colleague Joel Goodman pointed out, the name of the process that does the Redo Transport from Primary to Standby has changed from LNS to NSS (for synchronous Redo Transport):

SQL> select name,description from v$bgprocess where paddr<>'00';

NAME  DESCRIPTION
----- ----------------------------------------------------------------
PMON  process cleanup
VKTM  Virtual Keeper of TiMe process
GEN0  generic0
DIAG  diagnosibility process
DBRM  DataBase Resource Manager
VKRM  Virtual sKeduler for Resource Manager
PSP0  process spawner 0
DIA0  diagnosibility process 0
MMAN  Memory Manager
DBW0  db writer process 0
MRP0  Managed Standby Recovery
TMON  Transport Monitor
ARC0  Archival Process 0
ARC1  Archival Process 1
ARC2  Archival Process 2
ARC3  Archival Process 3
ARC4  Archival Process 4
NSS2  Redo transport NSS2
LGWR  Redo etc.
CKPT  checkpoint
RVWR  Recovery Writer
SMON  System Monitor Process
SMCO  Space Manager Process
RECO  distributed recovery
LREG  Listener Registration
CJQ0  Job Queue Coordinator
PXMN  PX Monitor
AQPC  AQ Process Coord
DMON  DG Broker Monitor Process
RSM0  Data Guard Broker Resource Guard Process 0
NSV1  Data Guard Broker NetSlave Process 1
INSV  Data Guard Broker INstance SlaVe Process
FSFP  Data Guard Broker FSFO Pinger
MMON  Manageability Monitor Process
MMNL  Manageability Monitor Process 2

35 rows selected.

I’m not quite sure, but I think that was even in 11gR2 already the case. Just kept the old name in sketches as a habit :-)


Tagged: 12c New Features, Data Guard
Categories: DBA Blogs

Internet of Things (IoT) - What is your plan?

Peeyush Tugnawat - Fri, 2015-07-31 21:42

Proliferation of connected devices and ever-growing data is driving some very interesting Internet of Things (IoT) use cases and challenges.

Check out some very interesting facts. What is your plan?

Click on the interactive image below...

What is your Plan?