Feed aggregator

Oracle SQL Repeated words in the String

Tom Kyte - Thu, 2017-01-12 15:26
I need your suggestions/inputs on of the following task. I have the following table ID ID_NAME 1 TOM HANKS TOM JR 2 PETER PATROL PETER JOHN PETER 3 SAM LIVING 4 JOHNSON & JOHNSON INC 5 DUHGT LLC 6 THE POST OF THE OFFICE 7 ...
Categories: DBA Blogs

NO_DATA_FOUND in Functions

Tom Kyte - Thu, 2017-01-12 15:26
Tom, We've just migrated from Oracle 9.0.1 to 9.2 and, coincidence or not, I'm facing a problem I had never faced before. The NO_DATA_FOUND exception is not being raised from my PL/SQL functions anymore!!! If I put a exception block to handle t...
Categories: DBA Blogs

NEXT vs UNIFORM SIZE

Tom Kyte - Thu, 2017-01-12 15:26
Hi All, Please clarify below doubt, If I use NEXT 10M and UNIFIOM SIZE 1M, which size will my next extent will have 1M or 10M? if we use this command: create TABLESPACE ts_cvliste11_20180815 datafile '/ora1/app/oracle/oradata/L11/2018/ts_c...
Categories: DBA Blogs

MIT Prof Claims RDBMS Model is All Wrong

Tom Kyte - Thu, 2017-01-12 15:26
Hi Team, My manager recently sent out this interesting talk by the founder of Postgres and VoltDB, Michael Stonebraker. https://blog.jooq.org/2013/08/24/mit-prof-michael-stonebraker-the-traditional-rdbms-wisdom-is-all-wrong/ I am curious of ...
Categories: DBA Blogs

Historical CPU and memory usage by service

Tom Kyte - Thu, 2017-01-12 15:26
Hello Tom, Is there a database view like DBA_HIST_SYSSTAT grouping historical session statistics by user or service? I was asked by my boss to create a report showing the CPU and memory usage by department but I could not find a view with that kin...
Categories: DBA Blogs

DYNAMIC INSERT SCRIPT WITH TIMESTAMP ISSUE

Tom Kyte - Thu, 2017-01-12 15:26
Hi, I am new to oracle, i have used your create dynamic insert script for generating the insert script. But that query is taking care of only three datatypes like NUMBER, DATE and VARCHAR2(). It is not taking care about the TIMESTAMP data type si...
Categories: DBA Blogs

Use of Wrap utility for 12K LOC packages

Tom Kyte - Thu, 2017-01-12 15:26
I've create a SP to encrypt the source code of existing packages, my problem is that legacy packages contain more than 12K LoC (lines of code), with small packages I have no issue, but with the heavy ones I got : <i>06502. 00000 - "PL/SQL: nu...
Categories: DBA Blogs

Oracle 11g Exadata SQL performance Tuning books?

Tom Kyte - Thu, 2017-01-12 15:26
Hi Tom, Can you please suggest some books for Oracle 11g Exadata SQL performance Tuning books? Regards, Mayank Jain
Categories: DBA Blogs

Where clause with multiple arguments which can be null or populated

Tom Kyte - Thu, 2017-01-12 15:26
I have a search feature in my application where a user can type filter on 3 different columns(can be more), each of these filters can be used or null. What is the best way to select a ref cursor for the data? I can think of 2 different ways. 1st wa...
Categories: DBA Blogs

Two New Oracle Security Public Class Dates

Pete Finnigan - Thu, 2017-01-12 15:26

I will be teaching two of my Oracle Security classes with Oracle University soon. The first is my class "Securing and Locking Down Oracle Databases". This class will be taught on the 24th January on-line via the Oracle LVC platform....[Read More]

Posted by Pete On 12/01/17 At 02:47 PM

Categories: Security Blogs

GoldenGate 12c Certified with E-Business Suite 12.2

Steven Chan - Thu, 2017-01-12 13:37

Oracle GoldenGate provides real-time capture, transformation, routing, and delivery of database transactions across heterogeneous systems.  The GoldenGate Administrator's Guide shows six supported topologies for generic databases.  Oracle GoldenGate 12c is now certified for a subset of three of those methods in Oracle E-Business Suite Release 12.2 environments:

  1. Unidirectional (Reporting Instance):  From the E-Business Suite to a reporting instance
  2. Broadcast (Data Distribution): From the E-Business Suite to multiple external databases
  3. Consolidation (Data Warehouse/Mart/Store): From multiple sources, including the E-Business Suite, into a single external database
Three supported GoldenGate topologies in E-Business Suite environments

Procedures for replicating information from E-Business Suite 12.2 using GoldenGate to a secondary system are documented in:

Unidirectional  reporting only

Certified Combinations

  • EBS 12.2.5 and later 12.2.x releases
  • Database 12.1.0.2 or 11.2.0.4
  • GoldenGate 12.1.2 and later 12.x releases

Replication to a Non-Editioned Target Database

EBS 12.2's Online Patching architecture has significant implications for data replication solutions implemented using Oracle GoldenGate. The Note above describes how to set up a unidirectional configuration of Oracle GoldenGate replication from an Oracle E-Business Suite Release 12.2 source database to a non-editioned target database.

Using GoldenGate with EBS 12.2 environments is significantly different from previous EBS releases.  If you are upgrading your GoldenGate + EBS 12.1 environment to EBS 12.2, and you must plan carefully for the new integration steps detailed in the documentation above.

Only Unidirectional replication for EBS

GoldenGate is an extremely powerful solution, and one of its most-interesting features is the ability to provide data replication in both directions.  In some generic scenarios, you can choose to make alterations to replicated data on the external instance and have those changes updated back in the source database.

Bidirectional replication is not permitted for E-Business Suite environments.  You can use GoldenGate to replicate EBS data to an external database, but you cannot move that altered data back into the EBS database.  GoldenGate's ability to read data in EBS databases is certified, but it must not be used to write to EBS databases.

"How" But Not "What"

GoldenGate is akin to a scalpel.  You're free to use it, but you must learn how to perform open heart surgery yourself.

The document above describes how to use GoldenGate to replicate data from the E-Business Suite to another system.  It doesn't provide any guidance about which database objects to replicate.

The E-Business Suite data model is vast, as you'd expect of something that has supports the integrated functioning of over 200 functional product modules.  GoldenGate is a development tool, and like all development tools, requires an expert-level understanding of the E-Business Suite data model for successful use.  Here's a good EBS data model resource:

Not Certified for EBS Migrations and Upgrades

GoldenGate is not certified for:

  • Upgrading E-Business Suite databases from one version to another
  • Migrating E-Business Suite databases from one operating system platform to another

Why not?  Oracle GoldenGate does not currently support the replication of all of the datatypes that are used by the E-Business Suite.  In addition, GoldenGate relies upon tables with primary keys for replication. EBS has over 10,000 tables without primary keys.  That means that attempting to use GoldenGate to migrate or upgrade an entire E-Business Suite database requires a combination of GoldenGate and other manual steps to preserve referential integrity.

Oracle does not currently have any plans to produce a certified or automated method of using GoldenGate for E-Business Suite upgrades or platform migrations.

EBS 12.2 customers should use our documented procedures for upgrades and migrations.  

Related Articles
Categories: APPS Blogs

Partner Webcast – Oracle Identity Cloud Service: Introducing Secure, On-Demand Identity Management

As corporate computing services continue to evolve and many aspects of the IT infrastructure are on the journey to the cloud, authorizing people to use enterprise information systems becomes more and...

We share our skills to maximize your revenue!
Categories: DBA Blogs

How to add an NCSA style Access Log to ORDS Standalone

Kris Rice - Thu, 2017-01-12 09:32
What ORDS Standalone is      ORDS Standalone webserver which is Eclipse Jetty, https://eclipse.org/jetty/ .  For the standalone, ORDS sends output to STDOUT, it runs on the command line.  That means there's nothing like a control commands like startup, shutdown,status nor log files, access logs.  It's bare bones intentionally to get up and running fast.  Then it's recommended for anything with

Getting Started with Spark Streaming, Python, and Kafka

Rittman Mead Consulting - Thu, 2017-01-12 07:42

Last month I wrote a series of articles in which I looked at the use of Spark for performing data transformation and manipulation. This was in the context of replatforming an existing Oracle-based ETL and datawarehouse solution onto cheaper and more elastic alternatives. The processing that I wrote was very much batch-focussed; read a set of files from block storage ('disk'), process and enrich the data, and write it back to block storage.

In this article I am going to look at Spark Streaming. This is one of several libraries that the Spark platform provides (others include Spark SQL, Spark MLlib, and Spark GraphX). Spark Streaming provides a way of processing "unbounded" data - commonly referred to as "streaming" data. It does this by breaking it up into microbatches, and supporting windowing capabilities for processing across multiple batches. You can read more in the excellent Streaming Programming Guide.

(image src)

Why Stream Processing?

Processing unbounded data sets, or "stream processing", is a new way of looking at what has always been done as batch in the past. Whilst intra-day ETL and frequent batch executions have brought latencies down, they are still independent executions with optional bespoke code in place to handle intra-batch accumulations. With a platform such as Spark Streaming we have a framework that natively supports processing both within-batch and across-batch (windowing).

By taking a stream processing approach we can benefit in several ways. The most obvious is reducing latency between an event occurring and taking an action driven by it, whether automatic or via analytics presented to a human. Other benefits include a more smoothed out resource consumption profile. We can avoid the very 'spiky' demands on CPU/memory/etc every time a batch runs by instead processing the same volume of data processed but in smaller intervals. Finally, given that most data we process is actually unbounded ("life doesn't happen in batches"), designing new systems to be batch driven - with streaming seen as an exception - is actually an anachronism with roots in technology limitations that are rapidly becoming moot. Stream processing doesn't have to imply, or require, "fast data" or "big data". It can just mean processing data continually as it arrives, and not artificially splitting it into batches.

For more details and discussion of streaming in depth and some of its challenges, I would recommend:

Use-Case and Development Environment

So with that case made above for stream processing, I'm actually going to go back to a very modest example. The use-case I'm going to put together is - almost inevitably for a generic unbounded data example - using Twitter, read from an Apache Kafka topic. We'll start simply, counting the number of tweets per user within each batch and doing some very simple string manipulations. After that we'll see how to do the same but over a period of time (windowing). In the next blog we'll extend this further into a more useful example, still based on Twitter but demonstrating how to satisfy some real-world requirements in the processing.

I developed all of this code using Jupyter Notebooks. I've written before about how awesome notebooks are (along with Jupyter, there's Apache Zeppelin). As well as providing a superb development environment in which both the code and the generated results can be seen, Jupyter gives the option to download a Notebook to Markdown. This blog runs on Ghost, which uses Markdown as its native syntax for composing posts - so in fact what you're reading here comes directly from the notebook in which I developed the code. Pretty cool.

If you want can view the notebook online here, and from there download it and run it live on your own Jupyter instance.

I used the docker image all-spark-notebook to provide both Jupyter and the Spark runtime environment. By using Docker I don't have to really worry about provisioning the platform on which I want to develop the code - I can just dive straight in and start coding. As and when I'm ready to deploy the code to a 'real' execution environment (for example EMR), then I can start to worry about that. The only external aspect was an Apache Kafka cluster that I had already, with tweets from the live Twitter feed on an Apache Kafka topic imaginatively called twitter.

To run the code in Jupyter, you can put the cursor in each cell and press Shift-Enter to run it each cell at a time -- or you can use menu option Kernel -> Restart & Run All. When a cell is executing you'll see a [*] next to it, and once the execution is complete this changes to [y] where y is execution step number. Any output from that step will be shown immediately below it.

To run the code standalone, you would download the .py from Jupyter, and execute it from the commandline using:

/usr/local/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 spark_code.py
Preparing the Environment

We need to make sure that the packages we're going to use are available to Spark. Instead of downloading jar files and worrying about paths, we can instead use the --packages option and specify the group/artifact/version based on what's available on Maven and Spark will handle the downloading. We specify PYSPARK_SUBMIT_ARGS for this to get passed correctly when executing from within Jupyter.

import os  
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 pyspark-shell'  
Import dependencies

We need to import the necessary pySpark modules for Spark, Spark Streaming, and Spark Streaming with Kafka. We also need the python json module for parsing the inbound twitter data

#    Spark
from pyspark import SparkContext  
#    Spark Streaming
from pyspark.streaming import StreamingContext  
#    Kafka
from pyspark.streaming.kafka import KafkaUtils  
#    json parsing
import json  
Create Spark context

The Spark context is the primary object under which everything else is called. The setLogLevel call is optional, but saves a lot of noise on stdout that otherwise can swamp the actual outputs from the job.

sc = SparkContext(appName="PythonSparkStreamingKafka_RM_01")  
sc.setLogLevel("WARN")  
Create Streaming Context

We pass the Spark context (from above) along with the batch duration which here is set to 60 seconds.

See the API reference and programming guide for more details.

ssc = StreamingContext(sc, 60)  
Connect to Kafka

Using the native Spark Streaming Kafka capabilities, we use the streaming context from above to connect to our Kafka cluster. The topic connected to is twitter, from consumer group spark-streaming. The latter is an arbitrary name that can be changed as required.

For more information see the documentation.

kafkaStream = KafkaUtils.createStream(ssc, 'cdh57-01-node-01.moffatt.me:2181', 'spark-streaming', {'twitter':1})  
Message Processing Parse the inbound message as json

The inbound stream is a DStream, which supports various built-in transformations such as map which is used here to parse the inbound messages from their native JSON format.

Note that this will fail horribly if the inbound message isn't valid JSON.

parsed = kafkaStream.map(lambda v: json.loads(v[1]))  
Count number of tweets in the batch

The DStream object provides native functions to count the number of messages in the batch, and to print them to the output:

We use the map function to add in some text explaining the value printed.

Note that nothing gets written to output from the Spark Streaming context and descendent objects until the Spark Streaming Context is started, which happens later in the code. Also note that pprint by default only prints the first 10 values.

parsed.count().map(lambda x:'Tweets in this batch: %s' % x).pprint()  

If you jump ahead and try to use Windowing at this point, for example to count the number of tweets in the last hour using the countByWindow function, it'll fail. This is because we've not set up the streaming context with a checkpoint directory yet. You'll get the error: java.lang.IllegalArgumentException: requirement failed: The checkpoint directory has not been set. Please set it by StreamingContext.checkpoint().. See later on in the blog for details about how to do this.

Extract Author name from each tweet

Tweets come through in a JSON structure, of which you can see an example here. We're going to analyse tweets by author, which is accessible in the JSON structure at user.screen_name.

The lambda anonymous function is used to apply the map to each RDD within the DStream. The result is a DStream holding just the author's screenname for each tweet in the original DStream.

authors_dstream = parsed.map(lambda tweet: tweet['user']['screen_name'])  
Count the number of tweets per author

With our authors DStream, we can now count them using the countByValue function. This is conceptually the same as this quasi-SQL statement:

SELECT   AUTHOR, COUNT(*)
FROM     DSTREAM
GROUP BY AUTHOR

Using countByValue is a more legible way of doing the same thing that you'll see done in tutorials elsewhere with a map / reduceBy.

author_counts = authors_dstream.countByValue()  
author_counts.pprint()  
Sort the author count

If you try and use the sortBy function directly against the DStream you get an error:

'TransformedDStream' object has no attribute 'sortBy'

This is because sort is not a built-in DStream function. Instad we use the transform function to access sortBy from pySpark.

To use sortBy you specify a lambda function to define the sort order. Here we're going to do it based on the number of tweets (index 1 of the RDD) per author. You'll note this index references being used in the sortBy lambda function x[1], negated to reverse the sort order.

Here I'm using \ as line continuation characters to make the code more legible.

author_counts_sorted_dstream = author_counts.transform(\  
  (lambda foo:foo\
   .sortBy(lambda x:( -x[1]))))
author_counts_sorted_dstream.pprint()  
Get top 5 authors by tweet count

To display just the top five authors, based on number of tweets in the batch period, we'll using the take function. My first attempt at this failed with:

AttributeError: 'list' object has no attribute '_jrdd'

Per my woes on StackOverflow a parallelize is necessary to return the values into a DStream form.

top_five_authors = author_counts_sorted_dstream.transform\  
  (lambda rdd:sc.parallelize(rdd.take(5)))
top_five_authors.pprint()  
Get authors with more than one tweet, or whose username starts with 'rm'

Let's get a bit more fancy now - filtering the resulting list of authors to only show the ones who have tweeted more than once in our batch window, or -arbitrarily- whose screenname begins with rm...

filtered_authors = author_counts.filter(lambda x:\  
                                                x[1]>1 \
                                                or \
                                                x[0].lower().startswith('rm'))

We'll print this list of authors matching the criteria, sorted by the number of tweets. Note how the sort is being done inline to the calling of the pprint function. Assigning variables and then pprinting them as I've done above is only done for clarity. It also makes sense if you're going to subsequently reuse the derived stream variable (such as with the author_counts in this code).

filtered_authors.transform\  
  (lambda rdd:rdd\
  .sortBy(lambda x:-x[1]))\
  .pprint()
List the most common words in the tweets

Every example has to have a version of wordcount, right? Here's an all-in-one with line continuations to make it clearer what's going on. Note that whilst it makes for tidier code, it also makes it harder to debug...

parsed.\  
    flatMap(lambda tweet:tweet['text'].split(" "))\
    .countByValue()\
    .transform\
      (lambda rdd:rdd.sortBy(lambda x:-x[1]))\
    .pprint()
Start the streaming context

Having defined the streaming context, now we're ready to actually start it! When you run this cell, the program will start, and you'll see the result of all the pprint functions above appear in the output to this cell below. If you're running it outside of Jupyter (via spark-submit) then you'll see the output on stdout.

ssc.start()  
ssc.awaitTermination()  
-------------------------------------------
Time: 2017-01-11 15:34:00
-------------------------------------------
Tweets in this batch: 188

-------------------------------------------
Time: 2017-01-11 15:34:00
-------------------------------------------
(u'jenniekmz', 1)
(u'SpamNewton', 1)
(u'ShawtieMac', 1)
(u'niggajorge_2', 1)
(u'agathatochetti', 1)
(u'Tommyguns&#95;&#95;&#95;&#95;&#95;', 1)
(u'zwonderwomanzzz', 1)
(u'Blesschubstin', 1)
(u'Prikes5', 1)
(u'MayaParms', 1)
...

-------------------------------------------
Time: 2017-01-11 15:34:00
-------------------------------------------
(u'RitaBezerra12', 3)
(u'xKYLN', 2)
(u'yourmydw', 2)
(u'wintersheat', 2)
(u'biebercuzou', 2)
(u'pchrin_', 2)
(u'uslaybieber', 2)
(u'rowblanchsrd', 2)
(u'__Creammy__', 2)
(u'jenniekmz', 1)
...

-------------------------------------------
Time: 2017-01-11 15:34:00
-------------------------------------------
(u'RitaBezerra12', 3)
(u'xKYLN', 2)
(u'yourmydw', 2)
(u'wintersheat', 2)
(u'biebercuzou', 2)

-------------------------------------------
Time: 2017-01-11 15:34:00
-------------------------------------------
(u'RitaBezerra12', 3)
(u'xKYLN', 2)
(u'yourmydw', 2)
(u'wintersheat', 2)
(u'biebercuzou', 2)
(u'pchrin_', 2)
(u'uslaybieber', 2)
(u'rowblanchsrd', 2)
(u'__Creammy__', 2)

-------------------------------------------
Time: 2017-01-11 15:34:00
-------------------------------------------
(u'RT', 135)
(u'Justin', 61)
(u'Bieber', 59)
(u'on', 41)
(u'a', 32)
(u'&amp;', 32)
(u'Ros\xe9', 31)
(u'Drake', 31)
(u'the', 29)
(u'Love', 28)
...
[...]

You can see the full output from the job in the notebook here.

So there we have it, a very simple Spark Streaming application doing some basic processing against an inbound data stream from Kafka.

Windowed Stream Processing

Now let's have a look at how we can do windowed processing. This is where data is processed based on a 'window' which is a multiple of the batch duration that we worked with above. So instead of counting how many tweets there are every batch (say, 5 seconds), we could instead count how many there are per minute. Here, a minutes (60 seconds) is the window interval. We can perform this count potentially every time the batch runs; how frequently we do the count is known as the slide interval.


Image credit, and more details about window processing, here.

The first thing to do to enable windowed processing in Spark Streaming is to launch the Streaming Context with a checkpoint directory configured. This is used to store information between batches if necessary, and also to recover from failures. You need to rework your code into the pattern shown here. All the code to be executed by the streaming context goes in a function - which makes it less easy to present in a step-by-step form in a notebook as I have above.

Reset the Environment

If you're running this code in the same session as above, first go to the Jupyter Kernel menu and select Restart.

Prepare the environment

These are the same steps as above.

import os  
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 pyspark-shell'  
from pyspark import SparkContext  
from pyspark.streaming import StreamingContext  
from pyspark.streaming.kafka import KafkaUtils  
import json  
Define the stream processing code
def createContext():  
    sc = SparkContext(appName="PythonSparkStreamingKafka_RM_02")
    sc.setLogLevel("WARN")
    ssc = StreamingContext(sc, 5)

    # Define Kafka Consumer
    kafkaStream = KafkaUtils.createStream(ssc, 'cdh57-01-node-01.moffatt.me:2181', 'spark-streaming2', {'twitter':1})

    ## --- Processing
    # Extract tweets
    parsed = kafkaStream.map(lambda v: json.loads(v[1]))

    # Count number of tweets in the batch
    count_this_batch = kafkaStream.count().map(lambda x:('Tweets this batch: %s' % x))

    # Count by windowed time period
    count_windowed = kafkaStream.countByWindow(60,5).map(lambda x:('Tweets total (One minute rolling count): %s' % x))

    # Get authors
    authors_dstream = parsed.map(lambda tweet: tweet['user']['screen_name'])

    # Count each value and number of occurences 
    count_values_this_batch = authors_dstream.countByValue()\
                                .transform(lambda rdd:rdd\
                                  .sortBy(lambda x:-x[1]))\
                              .map(lambda x:"Author counts this batch:\tValue %s\tCount %s" % (x[0],x[1]))

    # Count each value and number of occurences in the batch windowed
    count_values_windowed = authors_dstream.countByValueAndWindow(60,5)\
                                .transform(lambda rdd:rdd\
                                  .sortBy(lambda x:-x[1]))\
                            .map(lambda x:"Author counts (One minute rolling):\tValue %s\tCount %s" % (x[0],x[1]))

    # Write total tweet counts to stdout
    # Done with a union here instead of two separate pprint statements just to make it cleaner to display
    count_this_batch.union(count_windowed).pprint()

    # Write tweet author counts to stdout
    count_values_this_batch.pprint(5)
    count_values_windowed.pprint(5)

    return ssc
Launch the stream processing

This uses local disk to store the checkpoint data. In a Production deployment this would be on resilient storage such as HDFS.

Note that, by design, if you restart this code using the same checkpoint folder, it will execute the previous code - so if you need to amend the code being executed, specify a different checkpoint folder.

ssc = StreamingContext.getOrCreate('/tmp/checkpoint_v01',lambda: createContext())  
ssc.start()  
ssc.awaitTermination()  
-------------------------------------------
Time: 2017-01-11 17:08:55
-------------------------------------------
Tweets this batch: 782
Tweets total (One minute rolling count): 782

-------------------------------------------
Time: 2017-01-11 17:08:55
-------------------------------------------
Author counts this batch:    Value AnnaSabryan   Count 8
Author counts this batch:    Value KHALILSAFADO  Count 7
Author counts this batch:    Value socialvidpress    Count 6
Author counts this batch:    Value SabSad_   Count 5
Author counts this batch:    Value CooleeBravo   Count 5
...

-------------------------------------------
Time: 2017-01-11 17:08:55
-------------------------------------------
Author counts (One minute rolling):    Value AnnaSabryan   Count 8
Author counts (One minute rolling):    Value KHALILSAFADO  Count 7
Author counts (One minute rolling):    Value socialvidpress    Count 6
Author counts (One minute rolling):    Value SabSad_   Count 5
Author counts (One minute rolling):    Value CooleeBravo   Count 5
...

[...]

-------------------------------------------
Time: 2017-01-11 17:10:10
-------------------------------------------
Tweets this batch: 5
Tweets total (One minute rolling count): 245

-------------------------------------------
Time: 2017-01-11 17:10:10
-------------------------------------------
Author counts this batch:    Value NowOnFR   Count 1
Author counts this batch:    Value IKeepIt2000   Count 1
Author counts this batch:    Value PCH_Intl  Count 1
Author counts this batch:    Value ___GlBBS  Count 1
Author counts this batch:    Value lauracoutinho24   Count 1

-------------------------------------------
Time: 2017-01-11 17:10:10
-------------------------------------------
Author counts (One minute rolling):    Value OdaSethre Count 3
Author counts (One minute rolling):    Value CooleeBravo   Count 2
Author counts (One minute rolling):    Value ArrezinaR Count 2
Author counts (One minute rolling):    Value blackpinkkot4 Count 2
Author counts (One minute rolling):    Value mat_lucidream Count 1
...

You can see the full output from the job in the notebook here. Let's take some extracts and walk through them.

Total tweet counts

First, the total tweet counts. In the first slide window, they're the same, since we only have one batch of data so far:

-------------------------------------------
Time: 2017-01-11 17:08:55
-------------------------------------------
Tweets this batch: 782
Tweets total (One minute rolling count): 782 

Five seconds later, we have 25 tweets in the current batch - giving us a total of 807 (782 + 25):

-------------------------------------------
Time: 2017-01-11 17:09:00
-------------------------------------------
Tweets this batch: 25
Tweets total (One minute rolling count): 807 

Fast forward just over a minute and we see that the windowed count for a minute is not just going up - in some cases it goes down - since our window is now not simply the full duration of the inbound data stream, but is shifting along and giving a total count for the last 60 seconds only.

-------------------------------------------
Time: 2017-01-11 17:09:50
-------------------------------------------
Tweets this batch: 28
Tweets total (One minute rolling count): 1012

-------------------------------------------
Time: 2017-01-11 17:09:55
-------------------------------------------
Tweets this batch: 24
Tweets total (One minute rolling count): 254
Count by Author

In the first batch, as with the total tweets, the batch tally is the same as the windowed one:

-------------------------------------------
Time: 2017-01-11 17:08:55
-------------------------------------------
Author counts this batch:    Value AnnaSabryan   Count 8
Author counts this batch:    Value KHALILSAFADO  Count 7
Author counts this batch:    Value socialvidpress    Count 6
Author counts this batch:    Value SabSad_   Count 5
Author counts this batch:    Value CooleeBravo   Count 5
...

-------------------------------------------
Time: 2017-01-11 17:08:55
-------------------------------------------
Author counts (One minute rolling):    Value AnnaSabryan   Count 8
Author counts (One minute rolling):    Value KHALILSAFADO  Count 7
Author counts (One minute rolling):    Value socialvidpress    Count 6
Author counts (One minute rolling):    Value SabSad_   Count 5
Author counts (One minute rolling):    Value CooleeBravo   Count 5    

But notice in subsequent batches the rolling totals are accumulating for each author. Here we can see KHALILSAFADO (with a previous rolling total of 7, as above) has another tweet in this batch, giving a rolling total of 8:

-------------------------------------------
Time: 2017-01-11 17:09:00
-------------------------------------------
Author counts this batch:    Value DawnExperience    Count 1
Author counts this batch:    Value KHALILSAFADO  Count 1
Author counts this batch:    Value Alchemister5  Count 1
Author counts this batch:    Value uused2callme  Count 1
Author counts this batch:    Value comfyjongin   Count 1
...

-------------------------------------------
Time: 2017-01-11 17:09:00
-------------------------------------------
Author counts (One minute rolling):    Value AnnaSabryan   Count 9
Author counts (One minute rolling):    Value KHALILSAFADO  Count 8
Author counts (One minute rolling):    Value socialvidpress    Count 6
Author counts (One minute rolling):    Value SabSad_   Count 5
Author counts (One minute rolling):    Value CooleeBravo   Count 5
Summary

What I've put together is a very rudimentary example, simply to get started with the concepts. In the examples in this article I used Spark Streaming because of its native support for Python, and the previous work I'd done with Spark. Jupyter Notebooks are a fantastic environment in which to prototype code, and for a local environment providing both Jupyter and Spark it all you can't beat the Docker image all-spark-notebook.

There are other stream processing frameworks and languages out there, including Apache Flink, Kafka Streams, and Apache Beam, to name but three. Apache Storm and Apache Samza are also relevant, but whilst were early to the party seem to crop up less frequently in stream processing discussions and literature nowadays.

In the next blog we'll see how to extend this Spark Streaming further with processing that includes:

  • Matching tweet contents to predefined list of filter terms, and filtering out retweets
  • Including only tweets that include URLs, and comparing those URLs to a whitelist of domains
  • Sending tweets matching a given condition to a Kafka topic
  • Keeping a tally of tweet counts per batch and over a longer period of time, as well as counts for terms matched within the tweets
Categories: BI & Warehousing

Hamleys Deploys Oracle Retail Xstore Point-of-Service in 8 Months

Oracle Press Releases - Thu, 2017-01-12 07:00
Press Release
Hamleys Deploys Oracle Retail Xstore Point-of-Service in 8 Months International Specialty Retailer Empowers Store Associates with In-Store Innovation

Redwood Shores, Calif.—Jan 12, 2017

Today Oracle announced leading International Toy Retailer Hamleys has deployed Oracle Retail Xstore Point-of-Service across its UK, Republic of Ireland and Finnish store estate. Hamleys is a respected international brand operating in 23 markets across EMEA, Asia and the Americas. Hamleys differentiates itself through its in-store experience for children and families with a product assortment that includes key brands as well as quality private/own brand products.

Hamleys partnered with MICROS prior to the acquisition by Oracle. Hamleys is one of the first customers to successfully and quickly transition from an older version of MICROS RJ to Oracle Retail Xstore Point-of-Service, a more global and strategic solution that supports the organization’s global needs.

“Oracle were identified as an opportunity for Hamleys to introduce a new epos solution which gives us many enhancements to improve customer offering whilst also offering improved trading terms as well as being compliant to the current version of PCI,” said David Oakley, Head of Information Technology, Hamleys. “Oracle delivered on time to support our critical Christmas trading season.”

Hamleys deployed the new Point-of-Service solution in less than 8 months on-time and within budget with Oracle Retail Consulting. The first store was live in six months followed by a rollout across the UK and Ireland. By late October, Hamleys was able to upgrade the 50 tills at the Regent Street London England flagship store location.

 “Oracle is dedicated to providing customers with a path to the latest and most appropriate version of mission-critical solutions like Point-of-Service. Our converged commerce solutions unify experiences, deliver common execution across touch points and provide consistent views of customer, order and inventory,” said Ray Carlin, Senior Vice President and General Manager, Oracle Retail. “We are empowering store associates to work the way they live, and use mobile devices to better participate in empowered commerce-retailing.”

Oracle Retail at NRF 2017

Oracle Retail will be showcasing the full suite of Oracle Retail solutions and cloud services at the National Retail Federation Big Show Jan. 15-17, 2017, in New York City at the Jacob K. Javitz Convention Center. Oracle Retail will be located at booth #2521. For more information check out: www.oracle.com/nrf2017

Contact Info
Matthew Torres
Oracle
+415.595.1584
matt.torres@oracle.com
About Oracle

Oracle offers a comprehensive and fully integrated stack of cloud applications and platform services. For more information about Oracle (NYSE:ORCL), visit www.oracle.com.

Trademarks

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Safe Harbor

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle Corporation. 

Talk to a Press Contact

Matthew Torres

  • +415.595.1584

Oracle Cloud Applications Drive Carbon’s Quest to Lead in 3D Printing and Additive Manufacturing

Oracle Press Releases - Thu, 2017-01-12 07:00
Press Release
Oracle Cloud Applications Drive Carbon’s Quest to Lead in 3D Printing and Additive Manufacturing Carbon to streamline business processes across organizational development, manufacturing, finance, and services

Redwood Shores, Calif.—Jan 12, 2017

Carbon (Carbon, Inc.), whose CLIP (Continuous Liquid Interface Production) technology allows commercial customers to 3D print manufacturing grade components, has selected Oracle Cloud Applications to modernize its systems, transform business processes, and help ensure a secure, scalable, and connected cloud suite for the next phase of the company’s growth. Oracle’s Cloud Applications will enable Carbon to maintain a competitive edge in the dynamic additive manufacturing market by helping to increase business agility, lower costs, and reduce IT complexity.

Working at the intersection of hardware, software, and molecular science to deliver on the promise of 3D printing, Carbon sought a cloud solution that could be configured to its business needs and could keep pace with its anticipated rapid growth both domestically and globally. Carbon selected Oracle Enterprise Resource Planning (ERP) Cloud, Oracle Human Capital Management (HCM) Cloud, Oracle Supply Chain Management (SCM) Cloud, and Oracle Service Cloud over SAP because of the strength of Oracle’s integrated breadth and depth of offerings, embedded analytics, and the modern, easy-to-use interface of its applications.

“As we evaluated our business needs, we knew we needed a future-proof cloud-based solution that could scale as we grow. Not only did we need a sales-ready product, but we also needed a sales-ready business organization,” said Chris Hutton, director of business operations at Carbon. “Oracle’s Cloud Applications help ensure our business processes are seamless so we can focus on delivering the best experience to every customer.”

Carbon’s deployment of Oracle’s comprehensive and modern suite of Cloud Applications will streamline business processes, and optimize collaboration and efficiencies in the following ways:

  • Oracle HCM Cloud will allow Carbon to leverage talent and workforce management offerings to provide an engaging and collaborative HR experience, and enhance how they find and retain quality employees
  • Oracle SCM Cloud will help transform supply chain operations to optimize cost, service levels, and the supply chain network
    • Oracle Manufacturing Cloud will streamline manufacturing processes through visualizations and real-time views into shop floor operations and work orders, while controlling costs
  • Oracle ERP Cloud will allow finance to spend more time on higher-value activities, such as analyzing results and advising the business on strategy
  • Oracle Service Cloud will help differentiate Carbon’s brand with intuitive, engaging experiences through the cloud for customers and employees alike

“Carbon has a vision to enable every member of their company to have a comprehensive 360-degree view of each customer. We are pleased to work with them to ensure our Cloud Applications deliver this visibility to support their ongoing success,” said Gretchen Alarcon, group vice president of product strategy at Oracle. “Oracle’s enterprise-grade Cloud Applications work together seamlessly to empower everyone from a relationship manager to Carbon’s CEO to see how all parts of the business are performing.”

Oracle delivers the industry’s broadest suite of enterprise-grade Cloud services, including Software as a Service (SaaS), Platform as a Service (PaaS), Infrastructure as a Service (IaaS), and Data as a Service (DaaS).

For additional information, visit Carbon, Oracle Cloud, and connect with Oracle Cloud on Facebook and Twitter.

Contact Info
Joann Wardrip
Oracle
+1.650.607.1343
joann.wardrip@oracle.com
About Carbon

Carbon is working at the intersection of hardware, software and molecular science to deliver on the promise of 3D printing, enabling creators to think beyond the limitations of convention to design the parts and products that will drive the businesses of the future. Carbon’s Continuous Liquid Interface Production technology (CLIP) was introduced simultaneously at TED 2015 and to the scientific community on the cover of Science Magazine (Science, March 2015). In April 2016, Carbon introduced the M1, its first commercial CLIP-based additive manufacturing machine, created as a comprehensive solution for the next era of design and manufacturing. To date Carbon has received $141 Million in funding. For more information please visit Carbon3D.com or follow Carbon on Twitter @Carbon.

About Oracle

Oracle offers a comprehensive and fully integrated stack of cloud applications and platform services. For more information about Oracle (NYSE:ORCL), visit www.oracle.com.

Trademarks

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Safe Harbor

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle Corporation. 

Talk to a Press Contact

Joann Wardrip

  • +1.650.607.1343

Identifying Redundant Indexes in PeopleSoft

David Kurtz - Thu, 2017-01-12 05:57
This is the first of a two-part series that discusses how to identify and remove redundant indexes from a PeopleSoft system.
I recently wrote a series of articles on my Oracle blog about identifying redundant indexes on an Oracle database. The same approach can be taken with PeopleSoft, but using the metadata in the PeopleTools tables rather than the database catalogue.
(This means you could take a similar approach for PeopleSoft on databases other than Oracle, but I can't say anything about the behaviour of other optimizers) 
The following query returns a report of superset and redundant subset indexes in a PeopleSoft system. It can be downloaded as a script from my website (psredundant.sql).
The generic Oracle version of this script will identify subset indexes that are unique as redundant because the primary key and unique constraints can be altered to use the superset index.  However, this query does not make subset unique indexes as redundant because PeopleSoft does not explicitly create unique constraints, only unique indexes.  If I disabled a unique index, I would not be able to maintain the unique constraint via Application Designer.
Subquery IC returns a list of columns for each index. Subqueries UNI, AS2 and USI retrieve columns for unique/duplicate key, alternate search and user defined indexes respectively. The absolute value of key position is not important for this exercise, it is merely used to sort the column list in the listagg() function in subquery I.
REM psredundant.sql
set lines 160 pages 50 trimspool on
column recname format a15
column superset_index format a60
column redundant_index format a60
break on recname skip 1 on indexid skip 1
spool psredundant

WITH uni AS (/*unique indexes*/
SELECT /*+MATERIALIZE*/ f.recname, i.indexid
, MIN(i.uniqueflag) OVER (PARTITION BY f.recname) uniqueflag
, CASE WHEN MAX(CASE WHEN f.recname != f.recname_parent THEN 1 ELSE 0 END) OVER (PARTITION BY f.recname)=1
THEN f.fieldnum ELSE k.keyposn END keyposn
, k.fieldname
FROM psrecfielddb f
, psindexdefn i
, pskeydefn k
WHERE i.recname IN(f.recname,f.recname_parent)
AND i.recname = k.recname
AND k.fieldname = f.fieldname
AND i.indexid = '_' /*unique index*/
AND k.indexid = i.indexid
AND bitand(f.useedit,3) > 0 /*unique or dup key*/
), as0 AS (/*leading column on alternate search indexes*/
SELECT f0.recname, k0.indexid, i0.uniqueflag, 0 keyposn, f0.fieldname
FROM psrecfielddb f0
, psindexdefn i0
, pskeydefn k0
WHERE bitand(f0.useedit,16) = 16 /*alternate search key*/
AND k0.recname = f0.recname_parent
AND k0.fieldname = f0.fieldname
AND i0.recname = k0.recname
AND i0.indexid = k0.indexid
AND i0.indexid BETWEEN '0' AND '9' /*alternate search index*/
), as1 AS ( /*now add unique columns*/
SELECT as0.recname, as0.indexid, as0.uniqueflag, as0.keyposn, as0.fieldname
FROM as0
UNION ALL /*append unique key index columns*/
SELECT as0.recname, as0.indexid, as0.uniqueflag, uni.keyposn, uni.fieldname
FROM as0, uni
WHERE as0.recname = uni.recname
), as2 AS (
SELECT as1.recname, as1.indexid, as1.uniqueflag, NVL(k.keyposn,as1.keyposn), as1.fieldname
FROM as1
LEFT OUTER JOIN pskeydefn k /*to handle custom key order*/
ON k.recname = as1.recname
AND k.indexid = as1.indexid
AND k.fieldname = as1.fieldname
), usi AS (/*user indexes*/
SELECT i.recname, i.indexid, i.uniqueflag, k.keyposn, k.fieldname
FROM psindexdefn i
, pskeydefn k
WHERE k.recname = i.recname
AND k.indexid = i.indexid
AND k.indexid BETWEEN 'A' AND 'Z'
AND i.platform_ora = 1
), m AS (/*merge three kinds of index here*/
SELECT uni.recname, uni.indexid, uni.uniqueflag, uni.keyposn, uni.fieldname
FROM uni
UNION ALL
SELECT as1.recname, as1.indexid, as1.uniqueflag, as1.keyposn, as1.fieldname
FROM as1
UNION ALL
SELECT usi.recname, usi.indexid, usi.uniqueflag, usi.keyposn, usi.fieldname
FROM usi
), ic AS ( /*list of columns, restrict to tables*/
SELECT r.recname, m.indexid, m.uniqueflag, m.keyposn, m.fieldname
FROM m
, psrecdefn r
WHERE r.rectype IN(0,7)
And r.recname = m.recname
), i AS ( --construct column list
SELECT /*+ MATERIALIZE*/
ic.recname, ic.indexid, ic.uniqueflag
, count(*) num_columns
, listagg(ic.fieldname,',') within group (order by ic.keyposn) AS fieldlist
FROM ic
GROUP BY ic.recname, ic.indexid, ic.uniqueflag
)
SELECT r.recname
, i.indexid||' ('||i.fieldlist||')' superset_index
, r.indexid||' ('||r.fieldlist||')' redundant_index
FROM i
, i r
WHERE i.recname = r.recname
AND i.indexid != r.indexid
AND r.uniqueflag = 0 /*non-unique redundant*/
AND i.fieldlist LIKE r.fieldlist||',%'
AND i.num_columns > r.num_columns
order by r.recname, r.indexid
/

spool off
This is part of the output for a HR 9.2 demo database. In all it identified 58 indexes on 41 records. However, some of those records are temporary records with several physical instances, so the number of database objects that will be removed is higher.
RECNAME         SUPERSET_INDEX                                               REDUNDANT_INDEX
--------------- ------------------------------------------------------------ -----------------------------------------------------------

HRS_APP_PROFILE C (HRS_AL_LOG_ID,HRS_PERSON_ID) B (HRS_AL_LOG_ID)

HRS_SJT_JO B (HRS_JOB_OPENING_ID,SCRTY_TYPE_CD) A (HRS_JOB_OPENING_ID)

HR_PROMOTE_XREF C (EMPLID,EMPL_RCD) A (EMPLID)
B (EMPLID,EMPL_RCD,ACTION_DT_SS,EFFSEQ) A (EMPLID)
B (EMPLID,EMPL_RCD,ACTION_DT_SS,EFFSEQ) C (EMPLID,EMPL_RCD)

HR_SSTEXT_EFFDT _ (OBJECTOWNERID,HR_SSTEXT_SUB_ID,TEXT_ID,EFFDT) A (OBJECTOWNERID,HR_SSTEXT_SUB_ID,TEXT_ID)

HR_TRANSFR_XREF C (EMPLID,EMPL_RCD) A (EMPLID)
B (EMPLID,EMPL_RCD,ACTION_DT_SS,EFFSEQ) A (EMPLID)
B (EMPLID,EMPL_RCD,ACTION_DT_SS,EFFSEQ) C (EMPLID,EMPL_RCD)

PA_ALIAS_USE 0 (ALIAS_TYPE,ALIAS_NAME) A (ALIAS_TYPE)


WRK_XREF_CALC _ (PROCESS_INSTANCE,SEQ_NBR,XREF_NUM) A (PROCESS_INSTANCE,SEQ_NBR)
The articles on my Oracle blog discuss how to identify redundant indexes on an Oracle database, and then how to go about removing them.  Of course, in PeopleSoft, you should only remove indexes by making the change within Application Designer, and that is what the next posting discusses.

Oracle COVAR_POP Function with Examples

Complete IT Professional - Thu, 2017-01-12 05:00
In this article, I’ll explain what the COVAR_POP function does and show you some examples. Purpose of the Oracle COVAR_POP Function The Oracle COVAR_POP function calculates the population covariance of a set of number pairs. What is a “population covariance”? Good question. Here’s an article that explains what the definition is. Just like the COVAR_SAMP […]
Categories: Development

Links for 2017-01-11 [del.icio.us]

Categories: DBA Blogs

Pages

Subscribe to Oracle FAQ aggregator