Feed aggregator

Managing Oracle Big Data Cloud – CE with REST API

Yann Neuhaus - 8 hours 19 min ago

In this blog post, we will see how to manage Oracle Public Cloud Big Data service Compute Edition with REST API. Scheduling the start/stop/restart of a metered PaaS in the Oracle cloud can be interesting for managing your cloud credits consumptions.

We need first consult the official documentation, to understand how the API is composed. https://docs.oracle.com/en/cloud/paas/big-data-compute-cloud/csbdp/QuickStart.html 

Use the following URL composition for accessing to REST endpoint:
https://region-prefix.oraclecloud.com/resource-path

According to Oracle documentation, the following information should be taken into account.

Connection Information:

  • Identity Domain: axxxxxx
  • REstFull URL: https://psm.europe.oraclecloud.com/
  • username -password

Terminology:

  • {instanceName} = Name of the BDCS-CE service (= Cluster Name)
  • {identityDomainId} = “X-ID-TENANT-NAME: axxxxxx”
  • {function} = start, stop, restart
  • {allServiceHosts} = the entire cluster VMs (all instances which composed the cluster)
  • “Accept: <value>” = Media Type (default value = application/json)

Before starting an automation script to manage your Big Data cluster, execute single GET/POST commands to understand how the API is working.

GET request: View all Service BDCS-CE instances

/paas/api/v1.1/instancemgmt/{identityDomainId}/services/BDCSCE/instances

curl -i -X GET \
        -u "username:password" \
        -H "X-ID-TENANT-NAME: axxxxxx" \
        -H "Accept: application/json" \
        "https://psm.europe.oraclecloud.com/paas/api/v1.1/instancemgmt/axxxxxx/services/BDCSCE/instances"

Result:

HTTP/1.1 200 OK
Server: Oracle-Application-Server-11g
Strict-Transport-Security: max-age=31536000;includeSubDomains
Content-Language: en
...

{"services":{"cluster-iot":{"...

According to the HTTP status code, the command was successful.

GET request: View a specific Service BDCS-CE instances

Add the instance name to get the status of a specific cluster. Note that a BDCS-CE instance is your Big Data cluster.

/paas/api/v1.1/instancemgmt/{identityDomainId}/services/BDCSCE/instances/{instanceName}

curl -i -X GET \
        -u "username:password" \
        -H "X-ID-TENANT-NAME: axxxxxx" \
        -H "Accept: application/json" \
        "https://psm.europe.oraclecloud.com/paas/api/v1.1/instancemgmt/axxxxxx/services/BDCSCE/instances/cluster-iot"

Then use the same requests structure to start/stop/restart your Big Data cluster.

POST request: Start / Stop / Restart Service Instances BDCS-CE: cluster-iot

/paas/api/v1.1/instancemgmt/{identityDomainId}/services/BDCSCE/instances/{instanceName}/hosts/{function}

As it’s specified in the documentation, you need to change the media type to application/vnd.com.oracle.oracloud.provisioning.Service+json and use a body parameter to specify which hosts you want to manage. In our case, we want to manage all cluster hosts.

curl -i -X POST -u "username:password" \
-H "X-ID-TENANT-NAME: axxxxxx" \
-H "Content-Type: application/vnd.com.oracle.oracloud.provisioning.Service+json" \
-d '{"allServiceHosts":"true"}' "https://psm.europe.oraclecloud.com/paas/api/v1.1/instancemgmt/axxxxxx/services/BDCSCE/instances/cluster-iot/hosts/stop"

You can now, start to develop an automation script to manage your Oracle Big Data Compute Edition cluster.

Python prerequistes:

Install Python-PIP before:

dbi@host:~/$ sudo apt-get install python-pip

Install Requests module with PIP:

dbi@host:~/$ sudo pip install requests

Code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
__author__ = "Mehdi Bada"
__company__= dbi services sa
__version__ = "1.0"
__maintainer__ = "Mehdi Bada"
__email__ = ""
__status__ = "Dev"

"""

import os, sys, getopt
import requests
import simplejson
import json

# Variables defintion.

identityDomainId="axxxxxx"
instanceName="cluster-iot"
server = "https://psm.europe.oraclecloud.com"
commands = ['start', 'stop', 'restart']


def usage():
    print "\nScript Usage \n"
    print "Usage:", sys.argv[0], "-c [start|stop|restart] | -h \n"

if len(sys.argv) < 3:
    usage()
    sys.exit(2)

try:
    opts, args = getopt.getopt(sys.argv[1:], "ch", ["command", "help"])
except getopt.GetoptError:
    usage()
    sys.exit(2)

for opt, arg in opts:
    if opt in ("-h", "--help"):
        usage()
        sys.exit()
    elif opt in ("-c", "--command"):
        icommand=sys.argv[2]
        if icommand in commands:
                icommand=sys.argv[2]
        else:
                usage()
                sys.exit(2)


url = server + "/paas/api/v1.1/instancemgmt/%s/services/BDCSCE/instances/%s/hosts/%s" % (identityDomainId,instanceName,icommand)

payload = "{\"allServiceHosts\":\"true\"}"

headers = {
    'x-id-tenant-name': "%s" %(identityDomainId),
    'accept': "application/vnd.com.oracle.oracloud.provisioning.Service+json",
    'content-type': "application/json",
    'authorization': " ",
    }

response = requests.request("POST", url, data=payload, headers=headers)

# Print the status code of the response.
print("\n")
print(response.status_code)

# Json Parsing
content=response.content
j = simplejson.loads(content)
print (j['details']['message'])

Usage:

dbi@host:~/$ ./bdcsce_start_stop_test.py -h

Script Usage

Usage: ./bdcsce_start_stop_test.py -c [start|stop|restart] | -h

 

Oracle REST API is not very well documented, that why multiple tests should be performed before understanding how it works.

 

Cet article Managing Oracle Big Data Cloud – CE with REST API est apparu en premier sur Blog dbi services.

How to expand Exadata Database Storage capacity on demand

Syed Jaffar - 14 hours 22 min ago


Exadata Storage expansion

Most of us knew the capabilities that Exadata Database Machine delivers. Its known fact that Exadata comes in different fixed rack size capacity: 1/8 rack (2 db nodes, 3 cells), quarter rack (2 db nodes, 3 cells), half rack (4 db nodes, 7 cells) and full rack (8 db nodes, 14 cells). When you want to expand the capacity, it must be in fixed size as well, like, 1/8 to quarter, quarter to half and half to full.

With Exadata X5 Elastic configuration, one can also have customized sizing by extending capacity of the rack by adding any number of DB servers or storage servers or combination of both, up to the maximum allowed capacity in the rack.

In this blog post, I will summarize and walk through a procedure about extending Exadata storage capacity, i.e, adding a new cell to an existing Exadata Database Machine.

Preparing to Extend Exadata Database Machine

·        Ensure HW placed in the rack, and all necessary network and cabling requirements are completed. (2 IPs from the management network is required for the new cell).
·         
·        Re-image or upgrade of image:
o   Extract the imageinfo from one of the existing cell server.
o   Login to the new cell through ILOM, connect to the console as root user and get the imageinfo
o   If the image version on the new cell doesn’t match with the existing image version, either you download the exact image version and re-image the new cell or upgrade the image on the existing servers.

Review "Reimaging Exadata Cell Node Guidance (Doc ID 2151671.1)" if you want to reimage the new cell.
  • Add the IP addresses acquired for the new cell to the /etc/oracle/cell/network-config/cellip.ora file on each DB node. To do this, perform the steps below from the first 1 db serer in the cluster:
    • cd /etc/oracle/cell/network-config
    • cp cellip.ora cellip.ora.orig
    • cp cellip.ora cellip.ora-bak
 
    • Add the new entries to /etc/oracle/cell/network-config/cellip.ora-bak.
    • /usr/local/bin/dcli -g database_nodes -l root -f cellip.ora-bak -d /etc/oracle/cell/network-config/cellip.ora

  • If ASR alerting was set up on the existing storage cells, configure cell ASR alerting for the cell being added.
    • List the cell attributes required for configuring cell ASR alerting. Run the following command from any existing storage grid cell:
o   CellCLI> list cell attributes snmpsubscriber
    • Apply the same SNMP values to the new cell by running the command below as the celladmin user, as shown in the below example:
o   CellCLI> alter cell snmpSubscriber=((host='10.20.14.21',port=162,community=public))
  • Configure cell alerting for the cell being added.
    • List the cell attributes required for configuring cell alerting. Run the following command from any existing storage grid cell:
o   CellCLI> list cell attributes
o    notificationMethod,notificationPolicy,smtpToAddr,smtpFrom,
o    smtpFromAddr,smtpServer,smtpUseSSL,smtpPort
    • Apply the same values to the new cell by running the command below as the celladmin user, as shown in the example below:
o   CellCLI> alter cell notificationmethod='mail,snmp',notificationpolicy='critical,warning,clear',smtptoaddr= 'dba@email.com',smtpfrom='Exadata',smtpfromaddr='dba@email.com',smtpserver='10.20.14.21',smtpusessl=FALSE,smtpport=25
  • Create cell disks on the cell being added.
    • Log in to the cell as celladmin and run the following command:
o   CellCLI> create celldisk all
    • Check that the flash log was created by default:
o   CellCLI> list flashlog
You should see the name of the flash log. It should look like cellnodename_FLASHLOG, and its status should be "normal".
If the flash log does not exist, create it using:
CellCLI> create flashlog all
    • Check the current flash cache mode and compare it to the flash cache mode on existing cells:
o   CellCLI> list cell attributes flashcachemode
To change the flash cache mode to match the flash cache mode of existing cells, do the following:
i. If the flash cache exists and the cell is in WriteBack flash cache mode, you must first flush the flash cache:
CellCLI> alter flashcache all flush
Wait for the command to return.
ii. Drop the flash cache:
CellCLI> "drop flashcache all"
iii. Change the flash cache mode:
CellCLI> "alter cell flashCacheMode=writeback_or_writethrough"
The value of the flashCacheMode attribute is either writeback or writethrough. The value must match the flash cache mode of the other storage cells in the cluster.
iv. Create the flash cache:
cellcli -e create flashcache all
  • Create grid disks on the cell being added.
    • Query the size and cachingpolicy of the existing grid disks from an existing cell.
o   CellCLI> list griddisk attributes name,asmDiskGroupName,cachingpolicy,size,offset
    • For each disk group found by the above command, create grid disks on the new cell that is being added to the cluster. Match the size and the cachingpolicy of the existing grid disks for the disk group reported by the command above. Grid disks should be created in the order of increasing offset to ensure similar layout and performance characteristics as the existing cells. For example, the "list griddisk" command could return something like this:
o   DATAC1          default         5.6953125T         32M
o   DBFS_DG         default         33.796875G         7.1192474365234375T
o   RECOC1          none            1.42388916015625T  5.6953582763671875T
When creating grid disks, begin with DATAC1, then RECOC1, and finally DBFS_DG using the following command:
CellCLI> create griddisk ALL HARDDISK PREFIX=DATAC1, size=5.6953125T, cachingpolicy='default', comment="Cluster cluster-clux6 DR diskgroup DATAC1"

CellCLI> create griddisk ALL HARDDISK PREFIX=RECOC1,size=1.42388916015625T, cachingpolicy='none', comment="Cluster cluster-clux6 DR diskgroup RECOC1"

CellCLI> create griddisk ALL HARDDISK PREFIX=DBFS_DG,size=33.796875G, cachingpolicy='default', comment="Cluster cluster-clux6 DR diskgroup DBFS_DG"
CAUTION: Be sure to specify the EXACT size shown along with the unit (either T or G).
  • Verify the newly created grid disks are visible from the Oracle RAC nodes. Log in to each Oracle RAC node and run the following command:
·        $GI_HOME/bin/kfod op=disks disks=all | grep cellName_being_added
This should list all the grid disks created in step 7 above.
  • Add the newly created grid disks to the respective existing ASM disk groups.
·        alter diskgroup disk_group_nameadd disk 'comma_separated_disk_names';
The command above kicks off an ASM rebalance at the default power level. Monitor the progress of the rebalance by querying gv$asm_operation:
SQL> select * from gv$asm_operation;
Once the rebalance completes, the addition of the cell to the Oracle RAC is complete.
  • Download and run the latest exachk to ensure that the resulting configuration implements the latest best practices for Oracle Exadata.
References:

http://docs.oracle.com/cd/E80920_01/DBMMR/extending-exadata.htm#DBMMR21158
Reimaging Exadata Cell Node Guidance (Doc ID 2151671.1)

 







Dynamic SQL in CURSOR

Tom Kyte - 17 hours 3 min ago
Hi, I am trying to create a procedure that will display logs. It has an IN parameter which is the table nam, the cursor will SELECT data based on that parameter. I cannot compile my procedure. Hope you can help me. :) <code>CREATE OR REPLACE PROCE...
Categories: DBA Blogs

Instead of Triggers - where to use?

Tom Kyte - 17 hours 3 min ago
Dear Mr. Tom, 1) First of all when I tried creating an INSTEAD OF Trigger it gave me ORA-00439: feature not enabled: error Kindly explain why so....? 2) Where exactly would u advise me to use Instead Of trigger ...
Categories: DBA Blogs

How to enforce conditional unique on multiple columns

Tom Kyte - 17 hours 3 min ago
Hi, Tom, I have a table create table project (project_ID number primary key, teamid number, job varchar2(100), status number(1)); status=1 means it is an active project, otherwise it is archiv...
Categories: DBA Blogs

Merging 55 millions of records along with hash key

Tom Kyte - 17 hours 3 min ago
Hi Tom, I need to load my table with 55 millions of records using Merge statement 1. Those 55 millions records are being fetched from many different tables. For example T1,T2,T3 etc. 2. i have a date table from where i fetch business weeks ba...
Categories: DBA Blogs

Calling a linked procedure with out cursor parameter in another procedure

Tom Kyte - 17 hours 3 min ago
Hello, I need to have a procedure with out cursor parameter which I am calling it via a db link, inside a procedure, how can I implement that? procedure Myouterprocedure(..., my_out_cursor out cursor, ...
Categories: DBA Blogs

Your DBA Career in the Age of Oracle Cloud

Gerger Consulting - Wed, 2017-10-18 11:48
Attend the free webinar by Oracle ACE Director Craig Shallahamer and learn how the Oracle Cloud and the Oracle 18c autonomous database changes your role as an Oracle DBA.
About the Webinar
The cloud is a change that all Oracle DBAs must face. The cloud is here stay, and that means Oracle DBAs need to adapt or get out of the game. It makes no difference if you are a new Oracle DBA or retiring in five years, before us is one of the most significant changes you will ever face.

In this webinar, you'll learn what has happened, what is happening, what you can expect, and what you can do today to ensure you are positioned to thrive in a cloud world full of surprising and exciting opportunities.

Register at this link.
Categories: Development

KSQL: Streaming SQL for Apache Kafka

Rittman Mead Consulting - Wed, 2017-10-18 10:18
 Streaming SQL for Apache Kafka

Few weeks back, while I was enjoying my holidays in the south of Italy, I started receiving notifications about an imminent announcement by Confluent. Reading the highlights almost (...I said almost) made me willing to go immediately back to work and check all the details about it.
The announcement regarded KSQL: a streaming SQL engine for Apache Kafka!


My office today... not bad! #sea pic.twitter.com/A7skHIcplS

— Francesco Tisiot (@FTisiot) August 7, 2017

Before going in detail, lets try to clarify the basics: what is KSQL? Why was it introduced and how does it complement Kafka?

What is KSQL?

We have been writing about Kafka several times, including my recent blogs were I was using it as data hub to capture Game of Thrones tweets and store them in BigQuery in order to do sentiment analysis with Tableau. In all our examples Kafka has been used just for data transportation with any necessary transformation happening in the target datastore like BigQuery, with the usage of languages like Python and engines like Spark Streaming or directly in the querying tool like Presto.

KSQL enables something really effective: reading, writing and transforming data in real-time and a scale using a semantic already known by the majority of the community working in the data space, the SQL!

 Streaming SQL for Apache Kafka

KSQL is now available as developer preview, but the basic operations like joins, aggregations and event-time windowing are already covered.

What Problem is KSQL Solving?

As anticipated before, KSQL solve the main problem of providing a SQL interface over Kafka, without the need of using external languages like Python or Java.
However one could argue that the same problem was solved before by the ETL operations made on the target datastores like Oracle Database or BigQuery. What is the difference then in KSQL approach? What are the benefits?

The main difference in my opinion is the concept of continuous queries: with KSQL transformations are done continuously as new data arrives in the Kafka topic. On the other side transformations done in a database (or big data platforms like BigQuery) are one off and if new data arrives the same transformation has to be executed again.

 Streaming SQL for Apache Kafka

So what is KSQL good for? Confluent's KSQL introduction blog post provides some use cases like real time analytics, security and anomaly detection, online data integration or general application development. From a generic point of view KSQL is what you should use when transformations, integrations and analytics need to happen on the fly during the data stream. KSQL provides a way of keeping Kafka as unique datahub: no need of taking out data, transforming and re-inserting in Kafka. Every transformation can be done Kafka using SQL!

As mentioned before KSQL is now available on developer preview and the feature/function list is somehow limited compared to more mature SQL products. However in cases where very complex transformations need to happen those can still be solved either via another language like Java or a dedicated ETL (or view) once the data is landed in the destination datastore.

How does KSQL work?

So how does KSQL work under the hood? There are two concepts to keep in mind: streams and tables. A Stream is a sequence of structured data, once an event was introduced into a stream it is immutable, meaning that it can't be updated or deleted. Imagine the number of items pushed or pulled from a storage: "e.g. 200 pieces of ProductA were stocked today, while 100 pieces of ProductB were taken out".
A Table on the other hand represents the current situation based on the events coming from a stream. E.g. what's the overall quantity of stocks for ProductA? Facts in a table are mutable, the quantity of ProductA can be updated or deleted if ProductA is not anymore in stock.

 Streaming SQL for Apache Kafka

KSQL enables the definition of streams and tables via a simple SQL dialect. Various streams and tables coming from different sources can be joined directly in KSQL enabling data combination and transformation on the fly.

Each stream or table created in KSQL will be stored in a separate topic, allowing the usage of the usual connectors or scripts to extract the informations from it.

KSQL in Action Starting KSQL

KSQL can work both in standalone and client-server mode with the first one aimed at development and testing scenarios while the second supporting production environments.
With the standalone mode KSQL client and server are hosted on the same machine, in the same JVM. On the other side, in client-server mode, a pool of KSQL server are running on remote machine and the client connects to them over HTTP.

For my test purposes I decided to use the standalone mode, the procedure is well explained in confluent documentation and consist in three steps:

  • Clone the KSQL repository
  • Compile the code
  • Start KSQL using local parameter
./bin/ksql-cli local
Analysing OOW Tweets

I'll use for my example the same Twitter producer created for my Wimbledon post. If you notice I'm not using the Kafka Connect, this is due to KSQL not supporting AVRO formats as of now (remember is still in dev phase?). I had then to rely on the old producer which stored the tweet in JSON format.

For my tests I've been filtering the tweets containing OOW17 and OOW (Oracle Open World 2017), and as mentioned before, those are coming in JSON format and stored in a Kafka topic named rm.oow. The first step is then to create a Stream on top of the topic in order to structure the data before doing any transformation.
The guidelines for the stream definition can be found here, the following is a cutdown version of the code used

CREATE STREAM twitter_raw ( \  
  Created_At VARCHAR, \
  Id BIGINT, \
  Text VARCHAR, \
  Source VARCHAR, \
  Truncated VARCHAR, \
  ... 
  User VARCHAR, \
  Retweet VARCHAR, \
  Contributors VARCHAR, \
  ...) \
WITH ( \  
  kafka_topic='rm.oow', \
  value_format='JSON' \
  );

Few things to notice:

  • Created_At VARCHAR: Created_At is a timestamp, however in the first stream definition I can't apply any date/timestamp conversion. I keep it as VARCHAR which is one of the allowed types (others are BOOLEAN, INTEGER, BIGINT, DOUBLE, VARCHAR, ARRAY<ArrayType> and MAP<VARCHAR, ValueType>).
  • User VARCHAR: the User field is a JSON nested structure, for the basic stream definition we'll leave it as VARCHAR with further transformations happening later on.
  • kafka_topic='rm.oow': source declaration
  • value_format='JSON': data format

Once created the first stream we can then query it in SQL like

select Created_at, text from twitter_raw  

with the output being in the form of a continuous flow: as soon as a new tweet arrives its visualized in the console.

 Streaming SQL for Apache Kafka

The first part I want to fix now is the Created_At field, which was declared as VARCHAR but needs to be mutated into timestamp. I can do it using the function STRINGTOTIMESTAMP with the mask being EEE MMM dd HH:mm:ss ZZZZZ yyyy. This function converts the string to a BIGINT which is the datatype used by Kafka to store timestamps.

Another section of the tweet that needs further parsing is the User, that as per the previous definition returns the whole nested JSON object.

{
"id":575384370,
"id_str":"575384370",
"name":"Francesco Tisiot",
"screen_name":"FTisiot",
"location":"Verona, Italy","url":"http://it.linkedin.com/in/francescotisiot",
"description":"ABC"
...
}

Fortunately KSQL provides the EXTRACTJSONFIELD function that we can then use to parse the JSON and retrieve the required fields

I can now define a new twitter_fixed stream with the following code

create stream twitter_fixed as  
  select STRINGTOTIMESTAMP(Created_At, 'EEE MMM dd HH:mm:ss ZZZZZ yyyy') AS  Created_At, \
    Id, \
    Text, \
    Source, \
    ..., \
    EXTRACTJSONFIELD(User, '$.name') as User_name, \
    EXTRACTJSONFIELD(User, '$.screen_name') as User_screen_name, \
    EXTRACTJSONFIELD(User, '$.id') as User_id, \
    EXTRACTJSONFIELD(User, '$.location') as User_location, \
    EXTRACTJSONFIELD(User, '$.description') as description \
  from twitter_raw

An important thing to notice is that the Created_At is not encoded as BigInt, thus if I execute select Created_At from twitter_fixed I get only the raw number. To translate it to a readable date I can use the STRINGTOTIMESTAMP function passing the column and the data format.

The last part of the stream definition I wanted to fix is the settings of KEY and TIMESTAMP: a KEY is the unique identifier of a message and, if not declared, is auto-generated by Kafka. However the tweet JSON contains the Id which is Twitter's unique identifier, so we should to use it. TIMESTAMP associates the message timestamp with a column in the stream: Created_At should be used. I can defined the two above in the WITH clause of the stream declaration.

create stream twitter_with_key_and_timestamp \  
as \  
select * from twitter_fixed \  
with \  
(KEY='Id', TIMESTAMP='Created_At');

When doing a select * from twitter_with_key_and_timestamp we can clearly see that KSQL adds two columns before the others containing TIMESTAMP and KEY and the two are equal to Created_At and Id.

 Streaming SQL for Apache Kafka

Now I have all the fields correctly parsed as KSQL stream, nice but in my previous blog post I had almost the same for free using Kafka Connect. Now It's time to discover the next step of KSQL: tables!

Let's first create a simple table containing the number of tweets by User_name.

create table tweets_by_users as \  
select user_screen_name, count(Id) nr_of_tweets \  
from twitter_with_key_and_timestamp \  
group by user_screen_name  

When then executing a simple select * from table we can see the expected result.

 Streaming SQL for Apache Kafka

Two things to notice:

  • We see a new row in the console every time there is a new record inserted in the oow topic, the new row contains the updated count of tweets for the screen_name selected
  • The KEY is automatically generated by KSQL and contains the screen_name

I can retrieve the list of tables define with the show tables command.

 Streaming SQL for Apache Kafka

It's interesting to notice that the format is automatically set as JSON. The format property, configured via the VALUE_FORMAT parameter, defines how the message is stored in the topic and can either be JSON or DELIMITED.

Windowing

When grouping, KSQL provides three different windowing functions:

  • Tumbling: Fixed size, non overlapping. The SIZE of the window needs to be specified.
  • Hopping: Fixed size, possibly overlapping. The SIZE and ADVANCE parameters need to be specified.
  • Session: Fixed size, starting from the first entry for a particular Key, it remains active until a new message with the same key happens within the INACTIVITY_GAP which is the parameter to be specified.

 Streaming SQL for Apache Kafka

I can create simple table definition like the number of tweets by location for each tumbling session with

create table rm.tweets_by_location \  
as \  
select user_location, \  
count(Id) nr_of_tweets \  
from twitter_with_key_and_timestamp \  
WINDOW TUMBLING (SIZE 30 SECONDS) \  
group by user_location  

the output looks like

 Streaming SQL for Apache Kafka

As you can see the KEY of the table contains both the user_location and the window Timestamp (e.g Colombes : Window{start=1507016760000 end=-})

An example of hopping can be created with a similar query

create table rm.tweets_by_location_hopping \  
as \  
select user_location, \  
count(Id) nr_of_tweets \  
from twitter_with_key_and_timestamp \  
WINDOW HOPPING (SIZE 30 SECONDS, ADVANCE BY 10 SECONDS) \  
group by user_location;  

With the output being like

 Streaming SQL for Apache Kafka

It's interesting to notice that each entry (e.g. Europe North, Switzerland) is listed at least three times. This is due to the fact that in any point in time there are three overlapping windows (SIZE is 30 seconds and ADVANCE is 10 seconds). The same example can be turn into the session windows by just defining WINDOW SESSION (30 SECONDS).

The windowing is an useful option, especially when combined with HAVING clauses since it gives the option to define metrics for real time analysis.
E.g. I may be interested only items that have been ordered more than 100 times in the last hour, or, in my twitter example in user_locations having a nr_of_tweets greater than 5 in the last 30 minutes.

Joining

So far so good, a nice set of SQL functions on top of data coming from a source (in my case twitter). In the real word however we'll need to mix information coming from disparate sources.... what if I tell you that you can achieve that in a single KSQL statement?

 Streaming SQL for Apache Kafka

To show an integration example I created a simple topic known_twitters using the kafka-console-producer.

./bin/kafka-console-producer --topic known_twitters --broker-list myserver:9092

Once started I can type in messages and those will be stored in the known_twitters topic. For this example I'll insert the twitter handle and real name of known people that are talking about OOW. The format will be:

username,real_name  

like

FTisiot,Francesco Tisiot  
Nephentur,Christian Berg  

Once inserted the rows with the producer I'm then able to create a KSQL stream on top of it with the following syntax (note the VALUE_FORMAT='DELIMITED')

create stream people_known_stream (\  
screen_name VARCHAR, \  
real_name VARCHAR) \  
WITH (\  
KAFKA_TOPIC='known_twitters', \  
VALUE_FORMAT='DELIMITED');  

I can now join this stream with the others streams or tables built previously. However when trying the following statement

select user_screen_name from rm.tweets_by_users a join PEOPLE_KNOWN_STREAM b on a.user_screen_name=b.screen_name;  

I get a nice error

Unsupported join logical node: Left: io.confluent.ksql.planner.plan.StructuredDataSourceNode@6ceba9de , Right: io.confluent.ksql.planner.plan.StructuredDataSourceNode@69518572  

This is due to the fact that as of now KSQL supports only joins between a stream and a table, and the stream needs to be specified first in the KSQL query. If I then just swap the two sources in the select statement above:

select user_screen_name from PEOPLE_KNOWN_STREAM a join rm.tweets_by_users b on a.screen_name=b.user_screen_name;  

...I get another error

Join type is not supportd yet: INNER  

We have to remember that KSQL is still in developer beta phase, a lot of new features will be included before the official release.

adding a LEFT JOIN clause (see bug related) solves the issue and I should be able to see the combined data. However when running

select * from PEOPLE_KNOWN_STREAM left join TWEETS_BY_USERS on screen_name=user_screen_name;  

Didn't retrieve any rows. After adding a proper KEY to the stream definition

create stream PEOPLE_KNOWN_STREAM_PARTITIONED \  
as select screen_name , \  
real_name from  people_known_stream \  
PARTITION BY screen_name;  

I was able to retrieve the correct rowset! Again, we are in early stages of KSQL, those fixes will be enhanced or better documented in future releases!

Conclusion

As we saw in this small example, all transformations, summaries and data enrichments were done directly in Kafka with a dialect very easy to learn for anyone already familiar with SQL. All the created streams/tables are stored as Kafka topics thus the standard connectors can be used for sink integration.

As mentioned above KSQL is still in developer preview but the overall idea is very simple and at the same time powerful. If you want to learn more check out the Confluent page and the KSQL github repository!

Categories: BI & Warehousing

sqlcl command line tool warning

Tom Kyte - Wed, 2017-10-18 07:26
Hi I am using sqlcl quite a while and constantly get this null pointer exception. Any Idea Apr 15, 2016 9:28:22 AM oracle.dbtools.plusplus.JDBCHelper getOH WARNING: oracle.dbtools.plusplus.JDBCHelper.getOH(JDBCHelper.java:83) java.lang.NullP...
Categories: DBA Blogs

Merge statement

Tom Kyte - Wed, 2017-10-18 07:26
Hi Tom, consider this: create table test ( id number(10), username varchar2(100), reason varchar2(100), timestamp_ number(10), cnt number(10), CONSTRAINT test_pk primary key (id) ); insert into test(id,username,reason,timestamp_,cn...
Categories: DBA Blogs

Optimize Query

Tom Kyte - Wed, 2017-10-18 07:26
I want to optimize the following query SELECT t.merchant_id, t.brand_id, t.transaction_type, t.trns_currency_code, <b>COUNT (*) AS total_no_tx, SUM (t.mv_is_trnx_approved) AS approved_no_tx, ...
Categories: DBA Blogs

DDL for tables and constraints

Tom Kyte - Wed, 2017-10-18 07:26
Team, I am using SQL Developer 17.3 and SQLCL 17.3 for this demo. while doing DDL from SQLCL we got this <code> demo@ORA11G> show ddl STORAGE : ON INHERIT : ON SQLTERMINATOR : ON OID : ON SPECIFICATION : ON TABLESPACE : ON SIZE_BYTE_K...
Categories: DBA Blogs

Is there a way I can press a button on home page just after logging automatically in apex application

Tom Kyte - Wed, 2017-10-18 07:26
Hi, I have an apex application https://apex.oraclecorp.com/pls/apex/f?p=32888:1:4894003287217::::: and I want to click Search button on home page automatically every time once a user login to this application. I need to do this to submit the val...
Categories: DBA Blogs

Select and duplicates

Tom Kyte - Wed, 2017-10-18 07:26
Hello, I'd like to remove duplicates from ma table let say V1 is the key column. the key should be max 5 digits as a length Example: V1 01254T 50300 50300W 45662 Here I should keep the first one (it has no duplicates) , ...
Categories: DBA Blogs

Different in costs for selecting sysdate from dual and using directly as sysdate

Tom Kyte - Wed, 2017-10-18 07:26
The below 2 queries returns the same results. The first one takes very long time and the cost of fetching the records from the custdata is around 82000. I cancelled the query since it takes longer. But the second query takes the cost of only 2 an...
Categories: DBA Blogs

Converting compressed securefile LOBS to basicfile

Tom Kyte - Wed, 2017-10-18 07:26
Hello Tom, We are migrating a database which is using Advance compression for LOBS. We do not have licenses for the new environment to replicate the as-is. What are the available options we have and how to convert compressed securefile LOBS to s...
Categories: DBA Blogs

Documentum: IndexAgent uninstalled continues to queue requests

Yann Neuhaus - Wed, 2017-10-18 03:15

We had a strange behavior by a customer regarding the indexing queue. We used to have two IA configured and we uninstalled one.
I figured out that we still had indexing queue requests for the old index agent while it was totally uninstalled.

I checked the following objects to see if the agent was still configured somewhere: dm_fulltext_index, dm_ftengine_config, dm_ftindex_agent_config. But the old IA was not declared anymore.

The main problem is that it continued to queue all changes in the indexing queue and nothing cleaned it up, so we got like 2 million requests, filling up the db table.

I finally found out where the old IA was declared: in the registry events.
select * from dmi_registry where user_name = ‘dm_fulltext_index_user_01′;

r_object_id          user_name                     registered_id        event
2601b86480001d03     dm_fulltext_index_user_01     0301b86480000104     dm_save
2601b86480001d04     dm_fulltext_index_user_01     0301b86480000104     dm_destroy
2601b86480001d05     dm_fulltext_index_user_01     0301b86480000105     dm_save
2601b86480001d06     dm_fulltext_index_user_01     0301b86480000105     dm_readonlysave
2601b86480001d07     dm_fulltext_index_user_01     0301b86480000105     dm_checkin
...

In order to unregister the events, use the following:
unregister,c,<registered_id>,<event>,<queue_name>

So for me:

unregister,c,0301b86480000104,dm_save,dm_fulltext_index_user_01
unregister,c,0301b86480000104,dm_destroy,dm_fulltext_index_user_01
unregister,c,0301b86480000105,dm_save,dm_fulltext_index_user_01
unregister,c,0301b86480000105,dm_readonlysave,dm_fulltext_index_user_01
...

If you want to check if the old IA still queues requests, you can use:
select distinct name from dmi_queue_item where name like ‘dm_fulltext%';

If you see the old queue name, that means you still have the registered events.

 

Cet article Documentum: IndexAgent uninstalled continues to queue requests est apparu en premier sur Blog dbi services.

Critical Patch Update for October 2017 Now Available

Steven Chan - Tue, 2017-10-17 22:26

The Critical Patch Update (CPU) for October 2017 was released on October 17, 2017. Oracle strongly recommends applying the patches as soon as possible.

The Critical Patch Update Advisory is the starting point for relevant information. It includes a list of products affected, pointers to obtain the patches, a summary of the security vulnerabilities, and links to other important documents. 

Supported products that are not listed in the "Supported Products and Components Affected" Section of the advisory do not require new patches to be applied.

The Critical Patch Update Advisory is available at the following location:

It is essential to review the Critical Patch Update supporting documentation referenced in the Advisory before applying patches.

The next four Critical Patch Update release dates are:

  • January 16, 2018
  • April 17, 2018
  • July 17, 2018
  • October 16, 2018
References Related Articles
Categories: APPS Blogs

If you're in Panama, Colombia, Ecuador, Paraguay, Brazil or Argentina, Oracle APEX is coming to you!

Joel Kallman - Tue, 2017-10-17 21:31

In the first part of November, my colleague David Peake and I are taking the Oracle APEX & Oracle Database Cloud message to a number of Oracle user groups and communities who are graciously hosting us as part of the Oracle Developer Tour Latinoamérica.  These are countries for which there is growing interest in Oracle Application Express, and we wish to help support these groups and aid in fostering their growing APEX communities.

The dates and locations are:
  1. Panamá, November 1, 2017
  2. Colombia, November 2-3, 2017
  3. Ecuador, November 7, 2017
  4. Paraguay, November 8, 2017
  5. Brasil, November 9-10, 2017
  6. Argentina, November 13-14, 2017
You should consider attending one of these conferences if:

  • You're a CIO or manager, and you wish to understand what Oracle Application Express is and if it can help you and your business.
  • You're a PL/SQL developer, and you want to learn how easy or difficult it is to exploit your skills on the Web and in the Cloud.
  • You come from a client/server background and you want to understand what you can do with your skills but in Web development and Cloud development.
  • You're an Oracle DBA, and you want to understand if you can use Oracle Application Express in your daily responsibilities.
  • You know nothing about Oracle Application Express and you want to learn a bit more.
  • You're experienced with Oracle Application Express and you want to learn what's in the future for Oracle APEX.

If you have any interest or questions or concerns (or complaints!) about Oracle Application Express, and you are nearby, we would be very honored to meet you in person and assist in any way we can.  We hope you can make it!

P.S. I couldn't make it to Panamá, but David will be there, so please join him.

Pages

Subscribe to Oracle FAQ aggregator